Skip to content

Conversation

@vbrennsteiner
Copy link
Contributor

Reasons for the PR:

  • Reading Spectronaut reports from versions 20.0.250515.50606 and 20.0.250515.50606 failed with the spectronaut_report reader config due to missing columns (intensity, precursor_intensity).
  • _pre_process() tried to parse charge out of non-existent columns (ModifiedSequence)

The issues

  • Some columns that are present in these report files were not covered by the reader config (FG.PrecMz for charge, PG.ProteinNames for genes).
  • The _preprocess() method tried to parse charge from the mod_seq_column "ModifiedSequence", which does not exist in these reports.

The fixes

  • Extend spectronaut_report configuration to parse additional columns correctly
  • Add EG.PrecursorId to mod_seq_columns to get charge information in case FG.PrecMz is missing
  • Modify _pre_process() to first check whether the charge column is present and only then parse it from the precursor_id column

…ctronaut_report' reader configuration; refactor 'spectronaut' and 'spectronaut_report' reader configurations and remove separate key 'precursor_id_columns' in favor of 'precursor_id' inside 'column_mapping'
…ctronaut_report' reader configuration; refactor 'spectronaut' and 'spectronaut_report' reader configurations and remove separate key 'precursor_id_columns' in favor of 'precursor_id' inside 'column_mapping'
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issues reading Spectronaut report files from versions 20.0.250515.50606 where certain columns were missing or named differently, causing failures in the spectronaut_report reader configuration.

Changes:

  • Extended the spectronaut_report configuration to support additional column names for charge, genes, and other fields
  • Added EG.PrecursorId as a fallback mod_seq_column for extracting charge information
  • Modified _pre_process() to conditionally parse charge from precursor_id only when the charge column is missing
  • Changed msfragger_psm_tsv modification_mapping_type from 'msfragger' to 'maxquant'
  • Removed unused msfragger-specific modification mappings (SATA, SATP, mTRAQ variants, TMTpro)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
alphabase/psm_reader/dia_psm_reader.py Added conditional check in _pre_process() to only parse charge from precursor_id when charge column is missing
alphabase/constants/const_files/psm_reader.yaml Extended spectronaut_report column mappings, added EG.PrecursorId to mod_seq_columns, removed unused msfragger modification mappings, changed msfragger_psm_tsv to use 'maxquant' mapping type

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 165 to 166

pfind:
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletion of the msfragger-specific modification mappings section (lines 165-201 in the original file) also removes mappings that are still referenced by msfragger_psm_tsv's mass_mapped_mods list (lines 211-212), specifically 'TMTpro@K' and 'TMTpro@Any_N-term'. While changing modification_mapping_type to 'maxquant' may provide these mappings, this should be verified. The maxquant modification_mappings section does not appear to contain TMTpro mappings based on the visible code (lines 100-157), which could cause translation failures for TMTpro-labeled peptides in MSFragger data.

Copilot uses AI. Check for mistakes.
@vbrennsteiner vbrennsteiner marked this pull request as draft January 15, 2026 09:18
@vbrennsteiner vbrennsteiner marked this pull request as ready for review January 15, 2026 09:30
'genes': 'PG.Genes'
'charge': ['charge', 'FG.Charge']
'mobility': 'FG.ApexIonMobility'
'proteins': ['PG.ProteinNames','PG.ProteinGroups', 'PG.ProteinAccessions']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please enumerate lists items with - to be consistent with the rest of this

].str.split(".", expand=True, n=2)
# In case charge state column is missing, we splice it out of the precursor id column
if PsmDfCols.CHARGE not in df.columns:
df[[self.mod_seq_column, PsmDfCols.CHARGE]] = df[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now df[self.mod_seq_column] is not set when PsmDfCols.CHARGE is present .. is this intended?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants