Add batch Foldseek structure search via /ticket/foldseek/batch#119
Open
igmorv-genesis wants to merge 12 commits intosoedinglab:masterfrom
Open
Add batch Foldseek structure search via /ticket/foldseek/batch#119igmorv-genesis wants to merge 12 commits intosoedinglab:masterfrom
igmorv-genesis wants to merge 12 commits intosoedinglab:masterfrom
Conversation
Enable searching multiple structures with different motifs in a single request. The API accepts queries[] and motifs[] arrays (multipart or URL-encoded), writes individual structure files and a tab-separated batch file, and invokes folddisco with -q <batch_file> instead of single -p/-q flags. Also removes the motif flag requirement from database validation so any registered database can be used for Folddisco queries. Made-with: Cursor
Reuse the same first-line check in both ismmCIFFile and WriteBatchFiles instead of duplicating the heuristic inline. Also adds TrimSpace to the batch path for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only databases with the Motif flag (i.e. a valid folddisco index) should be accepted for Folddisco queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows clients to control how many neighbors are returned via a "top" form parameter, defaulting to 1000 for backwards compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each motifs[] form value can now contain multiple motifs separated by ";", so a single structure file can be searched with several motif specifications without re-uploading. WriteBatchFiles emits one batch line per (file, motif) pair. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add batch mode support for Folddisco queries
Folddisco batch mode (-q batchfile) writes results to stdout instead of the -o file. Wrap the batch command in sh -c with stdout redirection to create the expected output file for post-processing. Made-with: Cursor
folddisco writes batch results to stdout, and concurrent threads interleave their output mid-line, corrupting the TSV. Force -t 1 for batch queries to produce clean output. Made-with: Cursor
…ding Instead of using folddisco's built-in batch mode (which writes all results to stdout, causing interleaved output with multiple threads), run each (structure, motif) pair as a separate `folddisco query` invocation with its own `-o` output file. Each query gets full multi-threading via `-t`, and the per-query results are concatenated into the final output file. This replaces the previous `-t 1` workaround which sacrificed parallelism within each query. Made-with: Cursor
folddisco batch format supports an optional 3rd column for per-query output file paths. Use this to write each query's results to its own file, then concatenate. This runs a single folddisco process with full threading instead of N sequential invocations. Made-with: Cursor
Run batch folddisco queries individually for proper --top N and threading
New endpoint POST /ticket/foldseek/batch accepts queries[] (multiple
structure files), database[], mode, and taxfilter parameters. Query
structures are written to a queries/ directory and passed to foldseek
easy-search, which natively handles multi-query directory input.
Per-query results are returned via GET /result/foldseek/{ticket} as
a JSON object with per-query alignment arrays, reusing the existing
ReadAlignments infrastructure.
Worker detects batch mode by checking for the queries/ directory on
disk (unexported struct fields are not preserved across JSON
serialization).
Route uses /ticket/foldseek/batch to avoid gorilla/mux 405 conflict
with the /ticket/{ticket} variable route.
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /ticket/foldseek/batchacceptsqueries[](multiple uploaded structure files),database[],mode,email, andtaxfilterqueries/directory and passed tofoldseek easy-search, which natively handles multi-query directory input and produces per-query entries in the output alignment DBGET /result/foldseek/{ticket}endpoint returns all per-query results by reading all entries from the alignment DB, reusing the existingReadAlignmentsinfrastructurequeries/directory on disk (unexported struct fields aren't preserved across JSON serialization)/ticket/foldseek/batchto avoid gorilla/mux v1.8.0 405 conflict with the existing/ticket/{ticket}variable routeExample usage
Test plan
/ticketendpoint unaffected