Add import_phoenix_ast() for BD Phoenix instrument AST data#68
Merged
katholt merged 6 commits intoAMRverse:mainfrom Mar 6, 2026
Merged
Add import_phoenix_ast() for BD Phoenix instrument AST data#68katholt merged 6 commits intoAMRverse:mainfrom
katholt merged 6 commits intoAMRverse:mainfrom
Conversation
Contributor
|
Need to generalise this further please |
Adds a new import function supporting three BD Phoenix export formats,
auto-detected from file extension and content:
- long_german: XLS with no header row, 7 fixed columns, German decimal
locale (comma separator). Handles site-specific suffixes (e.g. "(f)"),
testing additives ("mit G6P"), synergy screens ("-Syn"), high-
concentration tests ("Hohe X Konzentration"), and DD.MM.YYYY dates.
"X" SIR values (no breakpoint defined) are treated as NA.
- long_clsi: Per-isolate CLSI report TXT/TSV with named column headers
(Antimicrobial, MIC or Concentration, Interp, Expert (SIR), Final (SIR)).
Trailing Resistance Markers and Expert Triggered Rules sections are
automatically stripped. Sample ID is derived from the filename.
- wide: Wide-format XLSX with alternating [Drug] call / [Drug] MIC
column pairs (e.g. Mills et al. 2022). Embedded whitespace in column
names (tabs) is normalised. Unicode ≤ signs are converted to <=.
A shared .clean_mic() helper normalises MIC strings across all formats:
Unicode ≤ → <=, German decimal comma → period, combination denominator
stripping (>32/2 → >32).
New parameters: format ("auto"/"long_german"/"long_clsi"/"wide"),
sample_col, instrument_guideline.
Also adds format = "phoenix" dispatch to import_ast(), and adds
readxl (>= 1.4.0) to DESCRIPTION Imports.
Replace specific private data filenames in @examples and internal comments with generic placeholder names to avoid exposing private data.
Remove format-specific modes (long_german, long_clsi, wide) and the wide format entirely (not a Phoenix export format). Replace with generic column detection: auto-detects drug/MIC/SIR/sample/species columns by common Phoenix header name patterns, with positional fallback for headerless XLS exports (col 1=sample, 2=species, 3=drug, 4=MIC, 5=instrument SIR, 6=expert SIR). Any column can be overridden by name or index via drug_col, mic_col, sir_col, sample_col, species_col. Drug name normalisation is delegated to as.ab() throughout.
WHONET column values can contain raw measurements (MIC strings or zone sizes) as well as SIR letters. Parse sir_value as as.mic() for broth dilution/Etest columns and as as.disk() for disk diffusion columns, and include mic/disk in the relocate output order. All other import functions already applied the full set of AMR classes (as.ab, as.mic, as.disk, as.sir, as.mo).
20b5796 to
d5e9bfb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds
import_phoenix_ast(), a new import function for antimicrobial susceptibility testing (AST) data exported from BD Phoenix instruments, and registers it in theimport_ast()dispatcher asformat = "phoenix".Three BD Phoenix export formats are supported, with automatic detection from file extension and content:
long_german— XLS export (no header, 7 fixed columns, German decimal locale). Handles site-specific drug name suffixes ((f),(u)), testing additives (mit G6P), synergy screens (-Syn), high-concentration tests (Hohe X Konzentration),XSIR values (no breakpoint →NA), and DD.MM.YYYY date parsing.long_clsi— Per-isolate CLSI instrument report (TXT/TSV, named headers:Antimicrobial,MIC or Concentration,Interp,Expert (SIR),Final (SIR)). TrailingResistance MarkersandExpert Triggered Rulessections are automatically stripped. Sample ID is derived from the filename.wide— Wide-format XLSX with alternating[Drug] call/[Drug] MICcolumn pairs (e.g. as in Mills et al. 2022, Genome Medicine). Embedded whitespace (tabs) in column names is normalised; Unicode≤signs are converted to<=.A shared internal
.clean_mic()helper normalises MIC strings across all three formats (Unicode≤→<=, German decimal comma → period, combination drug denominator stripping e.g.>32/2→>32).New function parameters:
format,sample_col,instrument_guidelineDESCRIPTION: addsreadxl (>= 1.4.0)toImports.Test plan
import_phoenix_ast("Phoenix-Antibiogramm-Daten.xls")— auto-detectslong_german, parses German MIC locale, strips drug name suffixes, returns correctpheno_providedfrom expertized columnimport_phoenix_ast("TF-BDP_CLSI2018.txt", species = "Escherichia coli", instrument_guideline = "CLSI 2018")— auto-detectslong_clsi, strips trailing sections, usesFinal (SIR)as authoritative callimport_phoenix_ast("AST_MIC_Mills.xlsx", sample_col = "Sample", species = "Escherichia coli")— auto-detectswide, pivots call/MIC pairs, normalises≤, strips embedded tabs from column namesimport_ast(input, format = "phoenix")dispatches correctly toimport_phoenix_ast()interpret_eucast = TRUEandinterpret_ecoff = TRUEwork correctly for all three sub-formats