Fix pairwise inference task index selection#505
Conversation
The variable j in _generate_pairwise_df was the leaked loop variable from the list comprehension [smiles[j] for i, j in pairs], not the intended task index. This caused predictions to always use the wrong column. Add an explicit task_idx parameter and pass it from the caller where j is the proper enumerate index over tasknames. Signed-off-by: Nikolenko.Sergei <Nikolenko.Sergei@icloud.com>
for more information, see https://pre-commit.ci
|
Closing per author request during fork cleanup/reset. |
|
@SergeiNikolenko does this have a specific task that is not working or issue that this is associated with? I wasn't aware that anything was wrong. |
|
Hi, yes, this is fixing a real bug in the pairwise inference path. Before this change, In practice, that could lead to two problems:
I added tests covering both the task index selection and the pairwise inference path. |
There was a problem hiding this comment.
The fix to inference.py is correct and the core test (test_generate_pairwise_df_uses_task_idx_column) is solid. A few changes are required before merge.
1. Test style: replace monkeypatch + hand-rolled dummies with pytest-mock
test_predict_single_task_pairwise_uses_column_zero and test_predict_pairwise_multitask_passes_task_idx both use monkeypatch.setattr with custom nested dummy classes (DummyPairwiseFeaturizer, DummyModel, fake_loader, fake_generate_pairwise_df). We're trying to use pytest-mock with mocker.patch(..., autospec=True) for this sort of thing, and will be updating the contribution guidelines soon.
Please rewrite both tests using the mocker fixture, e.g.:
def test_predict_single_task_pairwise_uses_column_zero(mocker, tmp_path):
mock_load = mocker.patch(
"openadmet.models.inference.inference.load_anvil_model_and_metadata",
autospec=True,
)
mock_load.return_value = (...)
...test_generate_pairwise_df_uses_task_idx_column is fine as-is — keep it.
2. DummyPairwiseFeaturizer is copy-pasted verbatim between the twomonkeypatch tests. Once rewritten with mocker, this duplication goes away naturally.
dwwest
left a comment
There was a problem hiding this comment.
This looks good, thank you for catching this bug. Just a quick cosmetic change, plus Sean's requests re: testing, and it should be good to go.
| predictions = model.predict(X_feat, accelerator=accelerator) | ||
| std = np.full(predictions.shape, np.nan) | ||
|
|
||
| for j, taskname in enumerate(tasknames): |
There was a problem hiding this comment.
Let's change j to task_idx for consistency.
Description
Fix pairwise inference dataframe generation to use explicit task index selection (
task_idx) and add regression tests for single-task and multitask pairwise paths.Status
Developers certificate of origin