Fix ICD codes mapping to multiple comorbidities#38
Merged
Conversation
Co-authored-by: vvcb <8311806+vvcb@users.noreply.github.com>
Co-authored-by: vvcb <8311806+vvcb@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix ICD10 codes mapping to multiple comorbidities
Fix ICD codes mapping to multiple comorbidities
Jan 24, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug where ICD codes that should map to multiple comorbidities only mapped to one due to dictionary-based reverse mapping. The fix replaces the dictionary with a list of tuples and uses DataFrame joins to handle one-to-many relationships correctly.
Changes:
- Replaced dictionary reverse mapping with list-based approach allowing codes to map to multiple comorbidities
- Added DataFrame join operation to handle one-to-many code→comorbidity mappings
- Implemented deduplication at the (patient ID, comorbidity) level to prevent double-counting
- Added comprehensive test suite covering all affected codes across ICD-9 and ICD-10
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/comorbidipy/calculators/comorbidity.py |
Core fix replacing dictionary mapping with list-based approach and DataFrame join to enable one-to-many code→comorbidity relationships |
tests/test_multiple_comorbidity_mapping.py |
Comprehensive test suite covering all affected codes (I426, F315, I2782, 40403, 4255) with edge cases for duplicate codes and multiple patients |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
vvcb
added a commit
that referenced
this pull request
Jan 24, 2026
- **Fix ICD codes mapping to multiple comorbidities** - Correctly handles ICD codes that map to more than one comorbidity category (#38) - **Fix negative score handling for non-SHMI weightings** - Scores are now correctly calculated when using Charlson or Quan weightings (#37) - **Increased test coverage from 91% to 99%** - Added extended test suites for CLI and comorbidity calculations - **Use WeightingVariant enum constants** - Replaced string literals with proper enum constants for type safety - **Removed deprecated main.py** - Cleaned up unused module - **Removed pandas-specific ruff lint rule** - Updated linting configuration - **Updated API documentation** - Improved examples and parameter descriptions - **Updated getting-started guide** - Enhanced onboarding documentation - **Updated calculator documentation** - Refreshed Charlson, Elixhauser, HFRS, and Disability guides - **Updated CLI documentation** - Improved command-line usage examples - **Updated badges in index.md** - Refreshed project status badges See full details in [HISTORY.md](HISTORY.md)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dictionary-based reverse mapping prevented ICD codes from mapping to multiple comorbidities. Code
I426(alcoholic cardiomyopathy) should map to both CHF and alcohol abuse but only mapped to the last one encountered.Changes
src/comorbidipy/calculators/comorbidity.py(code, comorbidity)pairs(id, comorbidity)level to prevent double-countingBefore:
After:
Affected Codes
All codes now correctly map to multiple comorbidities:
I426,F315,I2782(ICD-10 Elixhauser)40403,40413,40493,4255(ICD-9 Charlson/Elixhauser)Example
Original prompt
This section details on the original issue you should resolve
<issue_title>ICD10 codes map to only one comorbidity when they should map to two</issue_title>
<issue_description>### General info
Description
When a patient has a list of icd codes, each icd code is supposed to be mapped to its corresponding comorbidities. This works as expected for most codes. However, an issue arises when a code corresponds to multiple different comorbidities. For instance, ICD10 code I42.6, alcoholic cardiomyopathy, is supposed to map to both alcohol abuse as well as congestive heart failure (per quan ICD10 mapping). When the comorbidity function does its mapping/calculation, the icd code is only mapped once (to alcohol) instead of twice, to both alcohol & chf.
Ultimately, a code that should essentially count for 10 points (swiss: -3 alcohol, 13 chf) counts as -3 points if a patient does not have other codes recorded for chf.
It is also worth noting that this method deviates from the way that the R comorbidity package, which this repo is modeled after, calculates and maps comorbidities. When using that package, a patient with code I42.6 is mapped to both the alcohol & the chf comorbidities.
What I Did
Example using 3 different icd10 codes where this problem can be seen:
df_out output:
</issue_description>
<agent_instructions>This issue was raised against an older version of this package (v0.5.0) but is certainly still relevant. Implement the correct approach to address what @kyliewillis and @rpomponio have reported. </agent_instructions>
Comments on the Issue (you are @copilot in this section)
@vvcb @kyliewillis - Thank you for reporting this! 🙏 I didn't think anyone else was using this library. So it was a pleasant surprise to find this issue raised, albeit an embarrassing one as I missed it for an entire month.@rpomponio - thanks for the fantastic work on the tests comparing the parent R package and this one 🚀. Are you please able to share anonymised data for the cases where the two packages differ?
I will find some time to dig into this and fix it. (And will document it better as well - especially if people are using it!)</comment_new>
<comment_new>@vvcb
I suspect the reason for this bug is this code section here - https://github.com/vvcb/comorbidipy/blob/main/comorbidipy/calculator.py#L111-L116
It will be easy enough to find all the codes that map to more than one category. I will have to think about how this section can be modified. Should be straightforward (:coldsweat:)!</comment_new>
<comment_new>@vvcb
Having reviewed all the codes across all the comorbidity risk scores, there are a very small number of codes that cause this issue.
A workaround specific to these codes may be the most pragmatic and simple solution.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.