Skip to content

Add import_phoenix_ast() for BD Phoenix instrument AST data#68

Merged
katholt merged 6 commits intoAMRverse:mainfrom
efosternyarko:feature/import-phoenix-ast
Mar 6, 2026
Merged

Add import_phoenix_ast() for BD Phoenix instrument AST data#68
katholt merged 6 commits intoAMRverse:mainfrom
efosternyarko:feature/import-phoenix-ast

Conversation

@efosternyarko
Copy link
Collaborator

Summary

This PR adds import_phoenix_ast(), a new import function for antimicrobial susceptibility testing (AST) data exported from BD Phoenix instruments, and registers it in the import_ast() dispatcher as format = "phoenix".

Three BD Phoenix export formats are supported, with automatic detection from file extension and content:

  • long_german — XLS export (no header, 7 fixed columns, German decimal locale). Handles site-specific drug name suffixes ((f), (u)), testing additives (mit G6P), synergy screens (-Syn), high-concentration tests (Hohe X Konzentration), X SIR values (no breakpoint → NA), and DD.MM.YYYY date parsing.
  • long_clsi — Per-isolate CLSI instrument report (TXT/TSV, named headers: Antimicrobial, MIC or Concentration, Interp, Expert (SIR), Final (SIR)). Trailing Resistance Markers and Expert Triggered Rules sections are automatically stripped. Sample ID is derived from the filename.
  • wide — Wide-format XLSX with alternating [Drug] call / [Drug] MIC column pairs (e.g. as in Mills et al. 2022, Genome Medicine). Embedded whitespace (tabs) in column names is normalised; Unicode signs are converted to <=.

A shared internal .clean_mic() helper normalises MIC strings across all three formats (Unicode <=, German decimal comma → period, combination drug denominator stripping e.g. >32/2>32).

New function parameters: format, sample_col, instrument_guideline

DESCRIPTION: adds readxl (>= 1.4.0) to Imports.

Test plan

  • import_phoenix_ast("Phoenix-Antibiogramm-Daten.xls") — auto-detects long_german, parses German MIC locale, strips drug name suffixes, returns correct pheno_provided from expertized column
  • import_phoenix_ast("TF-BDP_CLSI2018.txt", species = "Escherichia coli", instrument_guideline = "CLSI 2018") — auto-detects long_clsi, strips trailing sections, uses Final (SIR) as authoritative call
  • import_phoenix_ast("AST_MIC_Mills.xlsx", sample_col = "Sample", species = "Escherichia coli") — auto-detects wide, pivots call/MIC pairs, normalises , strips embedded tabs from column names
  • import_ast(input, format = "phoenix") dispatches correctly to import_phoenix_ast()
  • interpret_eucast = TRUE and interpret_ecoff = TRUE work correctly for all three sub-formats

@katholt
Copy link
Contributor

katholt commented Mar 4, 2026

Need to generalise this further please
Also I don't think the supp table from Mills 2022 is actually in any format exported from BD Phoenix, it is presumably processed to generate a wide format so is not representative of instrument export.

Adds a new import function supporting three BD Phoenix export formats,
auto-detected from file extension and content:

- long_german: XLS with no header row, 7 fixed columns, German decimal
  locale (comma separator). Handles site-specific suffixes (e.g. "(f)"),
  testing additives ("mit G6P"), synergy screens ("-Syn"), high-
  concentration tests ("Hohe X Konzentration"), and DD.MM.YYYY dates.
  "X" SIR values (no breakpoint defined) are treated as NA.

- long_clsi: Per-isolate CLSI report TXT/TSV with named column headers
  (Antimicrobial, MIC or Concentration, Interp, Expert (SIR), Final (SIR)).
  Trailing Resistance Markers and Expert Triggered Rules sections are
  automatically stripped. Sample ID is derived from the filename.

- wide: Wide-format XLSX with alternating [Drug] call / [Drug] MIC
  column pairs (e.g. Mills et al. 2022). Embedded whitespace in column
  names (tabs) is normalised. Unicode ≤ signs are converted to <=.

A shared .clean_mic() helper normalises MIC strings across all formats:
Unicode ≤ → <=, German decimal comma → period, combination denominator
stripping (>32/2 → >32).

New parameters: format ("auto"/"long_german"/"long_clsi"/"wide"),
sample_col, instrument_guideline.

Also adds format = "phoenix" dispatch to import_ast(), and adds
readxl (>= 1.4.0) to DESCRIPTION Imports.
Replace specific private data filenames in @examples and internal
comments with generic placeholder names to avoid exposing private data.
Remove format-specific modes (long_german, long_clsi, wide) and the
wide format entirely (not a Phoenix export format). Replace with generic
column detection: auto-detects drug/MIC/SIR/sample/species columns by
common Phoenix header name patterns, with positional fallback for
headerless XLS exports (col 1=sample, 2=species, 3=drug, 4=MIC,
5=instrument SIR, 6=expert SIR). Any column can be overridden by name
or index via drug_col, mic_col, sir_col, sample_col, species_col.
Drug name normalisation is delegated to as.ab() throughout.
WHONET column values can contain raw measurements (MIC strings or zone
sizes) as well as SIR letters. Parse sir_value as as.mic() for broth
dilution/Etest columns and as as.disk() for disk diffusion columns,
and include mic/disk in the relocate output order. All other import
functions already applied the full set of AMR classes (as.ab, as.mic,
as.disk, as.sir, as.mo).
@efosternyarko efosternyarko force-pushed the feature/import-phoenix-ast branch from 20b5796 to d5e9bfb Compare March 4, 2026 09:53
@katholt katholt merged commit c7249ae into AMRverse:main Mar 6, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants