pandas 3.x compatibility: groupby().apply() drops grouping column

## Summary

`source/odp_functions.py:reciprocal_best_hits_blastp_or_diamond_blastp` raises `KeyError: 'qseqid'` on pandas 3.x. The CI smoke test against `tests/test_odp_basic/test_odp_basic.sh` reproduces this; locally with pandas 2.x it works.

## Root cause

The idiom

```python
fdf = (fdf.groupby("qseqid")
         .apply(lambda group: group.loc[group["evalue"] == group['evalue'].min()])
         .reset_index(drop=True))
```

worked on older pandas because the grouping column was kept in `apply`'s result. On pandas 3.x, `groupby().apply()` no longer includes the grouping column by default, so after `reset_index(drop=True)` the `qseqid` column is gone and the next line that does `fdf["qseqid"]` (line 390 in odp_functions.py) raises `KeyError`.

## Workaround in place

CI pins `pandas<3` in `requirements-dev.txt` so the smoke test passes against the API the codebase was written for. Users running odp from source on a fresh pandas 3.x install will hit the same error.

## Fix scope

`grep -n "groupby" source/*.py scripts/odp scripts/odp_nway_rbh scripts/odp_filechecker` shows ~85 groupby sites. The problematic pattern is specifically `groupby(col).apply(...).reset_index(drop=True)`. Hot spots:

- `source/odp_functions.py:312, 321, 379, 384`

Migration: either pass `include_groups=False` then `reset_index()` (without `drop=True`), or rewrite as a vectorized form, e.g.:

```python
fdf = fdf.loc[fdf.groupby("qseqid")["evalue"].transform("min") == fdf["evalue"]]
```

Followup PR should audit all 85 sites and migrate to a form that runs cleanly on pandas 2.x and 3.x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas 3.x compatibility: groupby().apply() drops grouping column #113

Summary

Root cause

Workaround in place

Fix scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pandas 3.x compatibility: groupby().apply() drops grouping column #113

Description

Summary

Root cause

Workaround in place

Fix scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions