-
Notifications
You must be signed in to change notification settings - Fork 13
Fix spectronaut reader mapping #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ctronaut_report' reader configuration; refactor 'spectronaut' and 'spectronaut_report' reader configurations and remove separate key 'precursor_id_columns' in favor of 'precursor_id' inside 'column_mapping'
…ctronaut_report' reader configuration; refactor 'spectronaut' and 'spectronaut_report' reader configurations and remove separate key 'precursor_id_columns' in favor of 'precursor_id' inside 'column_mapping'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes issues reading Spectronaut report files from versions 20.0.250515.50606 where certain columns were missing or named differently, causing failures in the spectronaut_report reader configuration.
Changes:
- Extended the
spectronaut_reportconfiguration to support additional column names for charge, genes, and other fields - Added
EG.PrecursorIdas a fallback mod_seq_column for extracting charge information - Modified
_pre_process()to conditionally parse charge from precursor_id only when the charge column is missing - Changed msfragger_psm_tsv modification_mapping_type from 'msfragger' to 'maxquant'
- Removed unused msfragger-specific modification mappings (SATA, SATP, mTRAQ variants, TMTpro)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| alphabase/psm_reader/dia_psm_reader.py | Added conditional check in _pre_process() to only parse charge from precursor_id when charge column is missing |
| alphabase/constants/const_files/psm_reader.yaml | Extended spectronaut_report column mappings, added EG.PrecursorId to mod_seq_columns, removed unused msfragger modification mappings, changed msfragger_psm_tsv to use 'maxquant' mapping type |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| pfind: |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion of the msfragger-specific modification mappings section (lines 165-201 in the original file) also removes mappings that are still referenced by msfragger_psm_tsv's mass_mapped_mods list (lines 211-212), specifically 'TMTpro@K' and 'TMTpro@Any_N-term'. While changing modification_mapping_type to 'maxquant' may provide these mappings, this should be verified. The maxquant modification_mappings section does not appear to contain TMTpro mappings based on the visible code (lines 100-157), which could cause translation failures for TMTpro-labeled peptides in MSFragger data.
| 'genes': 'PG.Genes' | ||
| 'charge': ['charge', 'FG.Charge'] | ||
| 'mobility': 'FG.ApexIonMobility' | ||
| 'proteins': ['PG.ProteinNames','PG.ProteinGroups', 'PG.ProteinAccessions'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please enumerate lists items with - to be consistent with the rest of this
| ].str.split(".", expand=True, n=2) | ||
| # In case charge state column is missing, we splice it out of the precursor id column | ||
| if PsmDfCols.CHARGE not in df.columns: | ||
| df[[self.mod_seq_column, PsmDfCols.CHARGE]] = df[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now df[self.mod_seq_column] is not set when PsmDfCols.CHARGE is present .. is this intended?
Reasons for the PR:
spectronaut_reportreader config due to missing columns (intensity, precursor_intensity)._pre_process()tried to parse charge out of non-existent columns (ModifiedSequence)The issues
charge, PG.ProteinNames forgenes)._preprocess()method tried to parse charge from the mod_seq_column "ModifiedSequence", which does not exist in these reports.The fixes
spectronaut_reportconfiguration to parse additional columns correctlyEG.PrecursorIdtomod_seq_columnsto get charge information in caseFG.PrecMzis missing_pre_process()to first check whether the charge column is present and only then parse it from theprecursor_idcolumn