Skip to content

easy-multimersearch returns empty results for homo-oligomeric complexes with chain count mismatch #568

@mf-rug

Description

@mf-rug

Summary

easy-multimersearch (and the web server multimer search) returns empty results when searching with a heteromeric complex against targets that contain multiple copies of the same heterodimer. The issue is in scoremultimer's chain assignment when query chain count != target chain count, even though excellent monomer-level alignments exist at every pipeline stage.

Reproducer

Using PDB 3A5Z (EpmA/EF-P complex, 4 copies of the heterodimer = 8 chains):

# Download and prepare files
curl -sL "https://files.rcsb.org/download/3A5Z.pdb" -o 3a5z.pdb

# Extract chain subsets
python3 -c "
lines = open('3a5z.pdb').readlines()
for chains, name in [('AB','3a5z_AB'), ('ABCD','3a5z_ABCD')]:
    with open(f'{name}.pdb','w') as f:
        for l in lines:
            if l.startswith(('ATOM','TER')) and l[21] in chains: f.write(l)
        f.write('END\n')
"

# These work (equal chain counts):
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z_AB.pdb result_2v2.m8 tmp    # 2 hits
foldseek easy-multimersearch 3a5z_ABCD.pdb 3a5z_ABCD.pdb result_4v4.m8 tmp # 4 hits
foldseek easy-multimersearch 3a5z.pdb 3a5z.pdb result_8v8.m8 tmp           # 8 hits

# These fail (unequal chain counts):
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z_ABCD.pdb result_2v4.m8 tmp   # 0 hits
foldseek easy-multimersearch 3a5z_AB.pdb 3a5z.pdb result_2v8.m8 tmp        # 0 hits

Analysis

We traced the issue through the full pipeline:

  1. Monomer search (search): finds excellent hits (TM ~ 1.0) -- working correctly
  2. expandmultimer: correctly expands hits to all chains in target complexes -- working correctly
  3. structurealign on expanded hits: produces high-quality alignments (TM ~ 1.0 for correct chain pairs, TM < 0.2 for cross-type pairs) -- working correctly
  4. scoremultimer: outputs empty results despite valid chain-level alignments -- this is where results are lost

For context, the expand_aligned output for a 2-chain query (EpmA + EF-P) vs a 4-chain target (2x EpmA + 2x EF-P) contains:

  • query_A (EpmA) -> target_A (EpmA): TM = 1.0
  • query_B (EF-P) -> target_B (EF-P): TM = 1.0
  • Cross-type alignments: TM < 0.2 (correctly poor)

Even with fully relaxed parameters (--chain-tm-threshold 0 --min-aligned-chains 1 --min-assigned-chains-ratio 0), scoremultimer still returns empty results for the unequal chain count case.

Practical impact

This affects searching against PDB100, which stores biological assemblies. For example, 3A5Z is stored as 4-chain assemblies (ABCD and EFGH). A user searching with a single heterodimer (2 chains) will get 0 multimer hits, even against its own structure in the database.

The web server (https://search.foldseek.com/multimer) shows the same behavior -- 0 results for 3A5Z against PDB100.

Hemoglobin (4HHB) with 2-chain query vs 4-chain targets does work, likely because alpha and beta globins have similar folds, so every query chain can plausibly map to every target chain. The issue manifests specifically when query and target chains are structurally distinct (the chain assignment becomes ambiguous in the homo-oligomeric case).

Expected behavior

A 2-chain query heterodimer should match against a 4-chain target containing 2 copies of that heterodimer. The optimal 1-to-1 chain assignment exists and has excellent TM-scores.

Environment

foldseek version: b103be26ed66d02e185f49ea84fbd4cdf6b57cd2 (Release 10)
OS: macOS (darwin, ARM64)
Database: PDB (foldseek databases PDB)

Related issues: #304, #414

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions