Summary
easy-multimersearch (and the web server multimer search) returns empty results when searching with a heteromeric complex against targets that contain multiple copies of the same heterodimer. The issue is in scoremultimer's chain assignment when query chain count != target chain count, even though excellent monomer-level alignments exist at every pipeline stage.
Reproducer
Using PDB 3A5Z (EpmA/EF-P complex, 4 copies of the heterodimer = 8 chains):
# Download and prepare files
curl -sL "https://files.rcsb.org/download/3A5Z.pdb" -o 3a5z.pdb
# Extract chain subsets
python3 -c "
lines = open('3a5z.pdb').readlines()
for chains, name in [('AB','3a5z_AB'), ('ABCD','3a5z_ABCD')]:
with open(f'{name}.pdb','w') as f:
for l in lines:
if l.startswith(('ATOM','TER')) and l[21] in chains: f.write(l)
f.write('END\n')
"
# These work (equal chain counts):
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z_AB.pdb result_2v2.m8 tmp # 2 hits
foldseek easy-multimersearch 3a5z_ABCD.pdb 3a5z_ABCD.pdb result_4v4.m8 tmp # 4 hits
foldseek easy-multimersearch 3a5z.pdb 3a5z.pdb result_8v8.m8 tmp # 8 hits
# These fail (unequal chain counts):
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z_ABCD.pdb result_2v4.m8 tmp # 0 hits
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z.pdb result_2v8.m8 tmp # 0 hits
Analysis
We traced the issue through the full pipeline:
- Monomer search (
search): finds excellent hits (TM ~ 1.0) -- working correctly
expandmultimer: correctly expands hits to all chains in target complexes -- working correctly
structurealign on expanded hits: produces high-quality alignments (TM ~ 1.0 for correct chain pairs, TM < 0.2 for cross-type pairs) -- working correctly
scoremultimer: outputs empty results despite valid chain-level alignments -- this is where results are lost
For context, the expand_aligned output for a 2-chain query (EpmA + EF-P) vs a 4-chain target (2x EpmA + 2x EF-P) contains:
- query_A (EpmA) -> target_A (EpmA): TM = 1.0
- query_B (EF-P) -> target_B (EF-P): TM = 1.0
- Cross-type alignments: TM < 0.2 (correctly poor)
Even with fully relaxed parameters (--chain-tm-threshold 0 --min-aligned-chains 1 --min-assigned-chains-ratio 0), scoremultimer still returns empty results for the unequal chain count case.
Practical impact
This affects searching against PDB100, which stores biological assemblies. For example, 3A5Z is stored as 4-chain assemblies (ABCD and EFGH). A user searching with a single heterodimer (2 chains) will get 0 multimer hits, even against its own structure in the database.
The web server (https://search.foldseek.com/multimer) shows the same behavior -- 0 results for 3A5Z against PDB100.
Hemoglobin (4HHB) with 2-chain query vs 4-chain targets does work, likely because alpha and beta globins have similar folds, so every query chain can plausibly map to every target chain. The issue manifests specifically when query and target chains are structurally distinct (the chain assignment becomes ambiguous in the homo-oligomeric case).
Expected behavior
A 2-chain query heterodimer should match against a 4-chain target containing 2 copies of that heterodimer. The optimal 1-to-1 chain assignment exists and has excellent TM-scores.
Environment
foldseek version: b103be26ed66d02e185f49ea84fbd4cdf6b57cd2 (Release 10)
OS: macOS (darwin, ARM64)
Database: PDB (foldseek databases PDB)
Related issues: #304, #414
Summary
easy-multimersearch(and the web server multimer search) returns empty results when searching with a heteromeric complex against targets that contain multiple copies of the same heterodimer. The issue is inscoremultimer's chain assignment when query chain count != target chain count, even though excellent monomer-level alignments exist at every pipeline stage.Reproducer
Using PDB 3A5Z (EpmA/EF-P complex, 4 copies of the heterodimer = 8 chains):
Analysis
We traced the issue through the full pipeline:
search): finds excellent hits (TM ~ 1.0) -- working correctlyexpandmultimer: correctly expands hits to all chains in target complexes -- working correctlystructurealignon expanded hits: produces high-quality alignments (TM ~ 1.0 for correct chain pairs, TM < 0.2 for cross-type pairs) -- working correctlyscoremultimer: outputs empty results despite valid chain-level alignments -- this is where results are lostFor context, the expand_aligned output for a 2-chain query (EpmA + EF-P) vs a 4-chain target (2x EpmA + 2x EF-P) contains:
Even with fully relaxed parameters (
--chain-tm-threshold 0 --min-aligned-chains 1 --min-assigned-chains-ratio 0),scoremultimerstill returns empty results for the unequal chain count case.Practical impact
This affects searching against PDB100, which stores biological assemblies. For example, 3A5Z is stored as 4-chain assemblies (ABCD and EFGH). A user searching with a single heterodimer (2 chains) will get 0 multimer hits, even against its own structure in the database.
The web server (https://search.foldseek.com/multimer) shows the same behavior -- 0 results for 3A5Z against PDB100.
Hemoglobin (4HHB) with 2-chain query vs 4-chain targets does work, likely because alpha and beta globins have similar folds, so every query chain can plausibly map to every target chain. The issue manifests specifically when query and target chains are structurally distinct (the chain assignment becomes ambiguous in the homo-oligomeric case).
Expected behavior
A 2-chain query heterodimer should match against a 4-chain target containing 2 copies of that heterodimer. The optimal 1-to-1 chain assignment exists and has excellent TM-scores.
Environment
Related issues: #304, #414