Skip to content

Add batch Foldseek structure search via /ticket/foldseek/batch#119

Open
igmorv-genesis wants to merge 12 commits intosoedinglab:masterfrom
genesistherapeutics:foldseek-batch
Open

Add batch Foldseek structure search via /ticket/foldseek/batch#119
igmorv-genesis wants to merge 12 commits intosoedinglab:masterfrom
genesistherapeutics:foldseek-batch

Conversation

@igmorv-genesis
Copy link
Copy Markdown

@igmorv-genesis igmorv-genesis commented Apr 1, 2026

Summary

  • New endpoint POST /ticket/foldseek/batch accepts queries[] (multiple uploaded structure files), database[], mode, email, and taxfilter
  • Query structures are written to a queries/ directory and passed to foldseek easy-search, which natively handles multi-query directory input and produces per-query entries in the output alignment DB
  • New GET /result/foldseek/{ticket} endpoint returns all per-query results by reading all entries from the alignment DB, reusing the existing ReadAlignments infrastructure
  • Worker detects batch mode by checking for the queries/ directory on disk (unexported struct fields aren't preserved across JSON serialization)
  • Route uses /ticket/foldseek/batch to avoid gorilla/mux v1.8.0 405 conflict with the existing /ticket/{ticket} variable route

Example usage

curl -X POST http://host/api/ticket/foldseek/batch \
  -F "queries[]=@structure1.pdb" \
  -F "queries[]=@structure2.cif" \
  -F "database[]=PDB" \
  -F "mode=3di"

# Check results
curl http://host/api/result/foldseek/{ticket_id}

Test plan

  • Batch query with 2 structures against PDB (tmalign mode) — returned 41 alignments per query
  • Folddisco batch mode still works correctly
  • Single-query foldseek via existing /ticket endpoint unaffected

Igor Morozov and others added 12 commits March 24, 2026 09:43
Enable searching multiple structures with different motifs in a single
request. The API accepts queries[] and motifs[] arrays (multipart or
URL-encoded), writes individual structure files and a tab-separated
batch file, and invokes folddisco with -q <batch_file> instead of
single -p/-q flags. Also removes the motif flag requirement from
database validation so any registered database can be used for
Folddisco queries.

Made-with: Cursor
Reuse the same first-line check in both ismmCIFFile and
WriteBatchFiles instead of duplicating the heuristic inline.
Also adds TrimSpace to the batch path for consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only databases with the Motif flag (i.e. a valid folddisco index)
should be accepted for Folddisco queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows clients to control how many neighbors are returned via a "top"
form parameter, defaulting to 1000 for backwards compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each motifs[] form value can now contain multiple motifs separated by
";", so a single structure file can be searched with several motif
specifications without re-uploading. WriteBatchFiles emits one batch
line per (file, motif) pair.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add batch mode support for Folddisco queries
Folddisco batch mode (-q batchfile) writes results to stdout instead
of the -o file. Wrap the batch command in sh -c with stdout redirection
to create the expected output file for post-processing.

Made-with: Cursor
folddisco writes batch results to stdout, and concurrent threads
interleave their output mid-line, corrupting the TSV. Force -t 1
for batch queries to produce clean output.

Made-with: Cursor
…ding

Instead of using folddisco's built-in batch mode (which writes all
results to stdout, causing interleaved output with multiple threads),
run each (structure, motif) pair as a separate `folddisco query`
invocation with its own `-o` output file. Each query gets full
multi-threading via `-t`, and the per-query results are concatenated
into the final output file.

This replaces the previous `-t 1` workaround which sacrificed
parallelism within each query.

Made-with: Cursor
folddisco batch format supports an optional 3rd column for per-query
output file paths. Use this to write each query's results to its own
file, then concatenate. This runs a single folddisco process with full
threading instead of N sequential invocations.

Made-with: Cursor
Run batch folddisco queries individually for proper --top N and threading
New endpoint POST /ticket/foldseek/batch accepts queries[] (multiple
structure files), database[], mode, and taxfilter parameters. Query
structures are written to a queries/ directory and passed to foldseek
easy-search, which natively handles multi-query directory input.

Per-query results are returned via GET /result/foldseek/{ticket} as
a JSON object with per-query alignment arrays, reusing the existing
ReadAlignments infrastructure.

Worker detects batch mode by checking for the queries/ directory on
disk (unexported struct fields are not preserved across JSON
serialization).

Route uses /ticket/foldseek/batch to avoid gorilla/mux 405 conflict
with the /ticket/{ticket} variable route.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants