Summary
source/odp_functions.py:reciprocal_best_hits_blastp_or_diamond_blastp raises KeyError: 'qseqid' on pandas 3.x. The CI smoke test against tests/test_odp_basic/test_odp_basic.sh reproduces this; locally with pandas 2.x it works.
Root cause
The idiom
fdf = (fdf.groupby("qseqid")
.apply(lambda group: group.loc[group["evalue"] == group['evalue'].min()])
.reset_index(drop=True))
worked on older pandas because the grouping column was kept in apply's result. On pandas 3.x, groupby().apply() no longer includes the grouping column by default, so after reset_index(drop=True) the qseqid column is gone and the next line that does fdf["qseqid"] (line 390 in odp_functions.py) raises KeyError.
Workaround in place
CI pins pandas<3 in requirements-dev.txt so the smoke test passes against the API the codebase was written for. Users running odp from source on a fresh pandas 3.x install will hit the same error.
Fix scope
grep -n "groupby" source/*.py scripts/odp scripts/odp_nway_rbh scripts/odp_filechecker shows ~85 groupby sites. The problematic pattern is specifically groupby(col).apply(...).reset_index(drop=True). Hot spots:
source/odp_functions.py:312, 321, 379, 384
Migration: either pass include_groups=False then reset_index() (without drop=True), or rewrite as a vectorized form, e.g.:
fdf = fdf.loc[fdf.groupby("qseqid")["evalue"].transform("min") == fdf["evalue"]]
Followup PR should audit all 85 sites and migrate to a form that runs cleanly on pandas 2.x and 3.x.
Summary
source/odp_functions.py:reciprocal_best_hits_blastp_or_diamond_blastpraisesKeyError: 'qseqid'on pandas 3.x. The CI smoke test againsttests/test_odp_basic/test_odp_basic.shreproduces this; locally with pandas 2.x it works.Root cause
The idiom
worked on older pandas because the grouping column was kept in
apply's result. On pandas 3.x,groupby().apply()no longer includes the grouping column by default, so afterreset_index(drop=True)theqseqidcolumn is gone and the next line that doesfdf["qseqid"](line 390 in odp_functions.py) raisesKeyError.Workaround in place
CI pins
pandas<3inrequirements-dev.txtso the smoke test passes against the API the codebase was written for. Users running odp from source on a fresh pandas 3.x install will hit the same error.Fix scope
grep -n "groupby" source/*.py scripts/odp scripts/odp_nway_rbh scripts/odp_filecheckershows ~85 groupby sites. The problematic pattern is specificallygroupby(col).apply(...).reset_index(drop=True). Hot spots:source/odp_functions.py:312, 321, 379, 384Migration: either pass
include_groups=Falsethenreset_index()(withoutdrop=True), or rewrite as a vectorized form, e.g.:Followup PR should audit all 85 sites and migrate to a form that runs cleanly on pandas 2.x and 3.x.