Skip to content

Question about Dataset Split Logic: Val Set Directly Assigned as Test Set in kfold/handout Split #4

@qinians-cn

Description

@qinians-cn

Dear authors,
First of all, I would like to express my great admiration for your excellent work on CLAT published in IEEE TMI—this concept-based interpretable framework for retinal disease diagnosis is really inspiring for the medical imaging community!
When I was reproducing your work by reading the open-source code, I noticed a key detail in the dataset split logic in clat/utils.py:
In the kfold_split function, the validation set is directly assigned to the test set (test_disease_labels = val_disease_labels), with no independent test set partitioned.
In the handout_split function, if val_size is not specified, the validation set is also set to the test set by default.
As we all know, the independence of the test set is a fundamental principle in machine learning experiments—reusing the validation set as the test set may lead to test set leakage (the test set participates in model optimization such as early stopping and hyperparameter tuning), which will make the reported test set performance metrics systematically higher than the real generalization ability.
Given that your work is a high-quality TMI publication with rigorous experimental design, I am curious about the design rationale for this split strategy, and would like to ask:
Was this setting adopted to fully utilize the scarce labeled medical image data (e.g., the small size of DDR subset/FGADR)?
Is there any special experimental consideration for unifying the val/test set in cross-validation for all compared models?
Have you conducted supplementary experiments with an independent test set to verify the actual generalization performance of CLAT?
I would be very grateful if you could spare some time to answer this question—it will be of great help for my reproduction and in-depth research of your work.
Thank you very much for your contributions to the field and your open-source spirit!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions