-
Notifications
You must be signed in to change notification settings - Fork 182
Description
I'm using zoekt to index 9k repositories of Drupal contributed modules at: https://search.tresbien.tech it's been a great fit so far. Very nice to set up and surprisingly fast. For now I need to make an additional Ajax query to the JSON api to get all I need to display the results and I'd like to avoid it, as well as make it possible to build a query based on some of the extra metadata I'm indexing. I wanted to know what to expect from zoekt and the built-in webserver/template. I don't have a separate frontend stack where I could massage the data. Let me know if that should be the way to go based on what I'm after :)
## Context
We need to search for code across all our repositories when we change an API to have an idea of the impact it'll have on the ecosystem. It informs things like the level of backwards compatibility to implement, how much publicity we should do, or if we should reach out directly to impacted projects before/after the change.
To do this efficiently we need to have some extra informations about the project and it's releases. So the plan is to display extra metadata about the repository on every result like shown below. On the second row in the file metadata header, it show the release version with the associated core compatibility (as a PHP composer version constraint), the there is the usage, and whether it's covered by our security team or not:

or with multiple branches match:

## Current implementation
Currently I'm adding that information to the repository during indexing (having #432 would help, I'd be able to drop a good chunk of code) and I can correctly take it out when querying the json endpoint:
"RawConfig": {
// This is the "supported releases" from the project, with their drupal core compatibility
"drupal-core": "3.5.x:^9.5 || ^10 || ^11;3.6.x:^9.5 || ^10 || ^11",
// Simple yes/no value.
"drupal-security": "covered",
// The number of installs for every release of the module
"drupal-usage": "3.0.x:5013;3.1.x:6058;3.2.x:3138;3.3.x:8302;3.4.x:32757;3.5.x:42092;3.6.x:167949;3.x:104;8.x-1.x:9374;8.x-2.x:19031",
"name": "admin_toolbar",
// Sum of all the usage data
"priority": "293818",
"web-url": "https://git.drupalcode.org/project/admin_toolbar",
"web-url-type": "gitlab"
},We have some branch-specific data going on, what I can index today is the priority mapped to the sum of installs but I'd like to have the priority depend on the branches that are matched in the response, for example:
branch 3.5.x:
- core compatibility: ^9.5 || ^10 || ^11
- usage: 42092
branch 3.6.x:
- core compatibility: ^9.5 || ^10 || ^11
- usage: 167949
Questions
- Would it make sense to expose the repository metadata to
results.html.tplso I can refer to it from the templates instead of having to do an ajax request to get the result from the repo list endpoint? making the repository RawConfig accessible from theresults.html.tplfile. - Language detection, I see that detecting the language happens in https://github.com/sourcegraph/zoekt/blob/main/languages/languages.go, we have PHP files that are named with the extensions:
.module,.theme,.installand a few others. Ideally they'd show up when usinglang:php. Could that live in a config file somewhere for zoekt to pick up or would it be a case of improving upstream? - Ideally I'd like to have a custom
core: 11query that uses the version constraint to filter results and asecurity: yesto further filter things. I had a look at Feature request: is it possible to add a query filter on "topics:" #783 and add topic:XXX support to find data by github topic #939 but that seems as static as the archived/fork/public special keywords. Trying to make it more generic might make it too generic? I don't know, I could live with a config file to declare the additional filters. - Could a different sorting method be used? I'd be interested to have a sort based on usage/priority exclusively.
- Branch-specific data: linked to the question above, for example if a search matched only the 3.5.x branch, the priority used would be
42092, if the search matched both branches, the priority would be210041(sum of the two branches). - In the same spirit, having branch-specific rawconfig, so when I'm in the template I can get the metadata associated with the branches matched for that result.
I used LLM based tooling to do the set up and it's very happy to patch zoekt code and compile a custom version to implement some of the things above. I don't want that headache to maintain a fork, so I wanted to know what was the level of customization to expect, or if patching/maintaining a fork would be the preferred solution here.
I'm not a go developer but lately a few go tools started being useful for me, so I'd be happy to take on a few things and get to know go a bit better :)
In any case, it's already very useful as-is. Thanks!