Skip to content

Fix ICD codes mapping to multiple comorbidities#38

Merged
vvcb merged 4 commits into
mainfrom
copilot/fix-icd10-comorbidity-mapping
Jan 24, 2026
Merged

Fix ICD codes mapping to multiple comorbidities#38
vvcb merged 4 commits into
mainfrom
copilot/fix-icd10-comorbidity-mapping

Conversation

Copilot AI commented Jan 24, 2026

Copy link
Copy Markdown
Contributor

Dictionary-based reverse mapping prevented ICD codes from mapping to multiple comorbidities. Code I426 (alcoholic cardiomyopathy) should map to both CHF and alcohol abuse but only mapped to the last one encountered.

Changes

src/comorbidipy/calculators/comorbidity.py

  • Replace dictionary reverse mapping with list of (code, comorbidity) pairs
  • Use DataFrame join to handle one-to-many code→comorbidity relationships
  • Deduplicate at (id, comorbidity) level to prevent double-counting

Before:

reverse_mapping = {
    i: k
    for i in codes
    for k, v in mapping[score_icd_variant].items()
    if i.startswith(tuple(v))
}
working_df = working_df.with_columns(
    pl.col(code).replace_strict(reverse_mapping, default=None).alias("mapped_code")
)

After:

code_to_comorbidities = []
for icd_code in codes:
    for comorbidity_name, icd_patterns in mapping[score_icd_variant].items():
        if icd_code.startswith(tuple(icd_patterns)):
            code_to_comorbidities.append((icd_code, comorbidity_name))

mapping_df = pl.DataFrame(code_to_comorbidities, schema=[code, "mapped_code"], orient="row")
working_df = working_df.join(mapping_df, on=code, how="inner")
working_df = working_df.unique(subset=[id, "mapped_code"])

Affected Codes

All codes now correctly map to multiple comorbidities:

  • I426, F315, I2782 (ICD-10 Elixhauser)
  • 40403, 40413, 40493, 4255 (ICD-9 Charlson/Elixhauser)

Example

df = pl.DataFrame({'id': [1, 1, 1], 'code': ['I2782', 'I426', 'F315']})
result = comorbidity(df, score='elixhauser', icd='icd10', variant='quan')

# Before: alcohol=1, chf=0, cpd=1, depre=1, pcd=0, psycho=0
# After:  alcohol=1, chf=1, cpd=1, depre=1, pcd=1, psycho=1
Original prompt

This section details on the original issue you should resolve

<issue_title>ICD10 codes map to only one comorbidity when they should map to two</issue_title>
<issue_description>### General info

  • comorbidipy version: 0.4.4 (latest)
  • Python version: 3.7.12
  • Operating System: linux

Description

When a patient has a list of icd codes, each icd code is supposed to be mapped to its corresponding comorbidities. This works as expected for most codes. However, an issue arises when a code corresponds to multiple different comorbidities. For instance, ICD10 code I42.6, alcoholic cardiomyopathy, is supposed to map to both alcohol abuse as well as congestive heart failure (per quan ICD10 mapping). When the comorbidity function does its mapping/calculation, the icd code is only mapped once (to alcohol) instead of twice, to both alcohol & chf.

Ultimately, a code that should essentially count for 10 points (swiss: -3 alcohol, 13 chf) counts as -3 points if a patient does not have other codes recorded for chf.

It is also worth noting that this method deviates from the way that the R comorbidity package, which this repo is modeled after, calculates and maps comorbidities. When using that package, a patient with code I42.6 is mapped to both the alcohol & the chf comorbidities.

What I Did

Example using 3 different icd10 codes where this problem can be seen:


id = [1,1,1]
age = [50,50,50]
code = ['I2782','I426','F315']
df_example = pd.DataFrame({'id': id, 'age': age, 'code': code})

## These 3 codes should return 1s for pcd, cpd, psycho, depre, alcohol, chf
## Instead, each code is only mapped to one comorbidity

df_out = comorbidipy.comorbidity(df_example,  
                                 age="age",
                                 score="elixhauser",
                                 icd="icd10",
                                 variant="quan",
                                 weighting="swiss")

df_out[['alcohol','chf','cpd', 'depre', 'pcd', 'psycho']]

df_out output:

id alcohol chf cpd depre pcd psycho
0 1 1 0 1 1 0 0

</issue_description>

<agent_instructions>This issue was raised against an older version of this package (v0.5.0) but is certainly still relevant. Implement the correct approach to address what @kyliewillis and @rpomponio have reported. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

@vvcb @kyliewillis - Thank you for reporting this! 🙏 I didn't think anyone else was using this library. So it was a pleasant surprise to find this issue raised, albeit an embarrassing one as I missed it for an entire month.

@rpomponio - thanks for the fantastic work on the tests comparing the parent R package and this one 🚀. Are you please able to share anonymised data for the cases where the two packages differ?

I will find some time to dig into this and fix it. (And will document it better as well - especially if people are using it!)</comment_new>
<comment_new>@vvcb
I suspect the reason for this bug is this code section here - https://github.com/vvcb/comorbidipy/blob/main/comorbidipy/calculator.py#L111-L116

It will be easy enough to find all the codes that map to more than one category. I will have to think about how this section can be modified. Should be straightforward (:coldsweat:)!</comment_new>
<comment_new>@vvcb
Having reviewed all the codes across all the comorbidity risk scores, there are a very small number of codes that cause this issue.

A workaround specific to these codes may be the most pragmatic and simple solution.

code comorbidity 1 comorbidity 2
charlson_icd9_quan
40403 chf rend
40413 chf rend
40493 chf rend
charlson_icd10_se
K703 mld mld
charlson_icd10_am
C80 canc metacanc
elixhauser_icd9_quan
40403 chf rf
40413 chf rf
40493 chf rf
4255 chf alcohol
elixhauser_icd10_quan
I426 chf alcohol
F315 psycho ...

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 24, 2026 18:48
Co-authored-by: vvcb <8311806+vvcb@users.noreply.github.com>
Co-authored-by: vvcb <8311806+vvcb@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix ICD10 codes mapping to multiple comorbidities Fix ICD codes mapping to multiple comorbidities Jan 24, 2026
Copilot AI requested a review from vvcb January 24, 2026 18:52
@vvcb vvcb marked this pull request as ready for review January 24, 2026 19:14
@vvcb vvcb requested a review from Copilot January 24, 2026 19:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where ICD codes that should map to multiple comorbidities only mapped to one due to dictionary-based reverse mapping. The fix replaces the dictionary with a list of tuples and uses DataFrame joins to handle one-to-many relationships correctly.

Changes:

  • Replaced dictionary reverse mapping with list-based approach allowing codes to map to multiple comorbidities
  • Added DataFrame join operation to handle one-to-many code→comorbidity mappings
  • Implemented deduplication at the (patient ID, comorbidity) level to prevent double-counting
  • Added comprehensive test suite covering all affected codes across ICD-9 and ICD-10

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/comorbidipy/calculators/comorbidity.py Core fix replacing dictionary mapping with list-based approach and DataFrame join to enable one-to-many code→comorbidity relationships
tests/test_multiple_comorbidity_mapping.py Comprehensive test suite covering all affected codes (I426, F315, I2782, 40403, 4255) with edge cases for duplicate codes and multiple patients

Comment thread tests/test_multiple_comorbidity_mapping.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@vvcb vvcb merged commit 460a96f into main Jan 24, 2026
1 check passed
@vvcb vvcb deleted the copilot/fix-icd10-comorbidity-mapping branch January 24, 2026 20:05
vvcb added a commit that referenced this pull request Jan 24, 2026
- **Fix ICD codes mapping to multiple comorbidities** - Correctly handles ICD codes that map to more than one comorbidity category (#38)
- **Fix negative score handling for non-SHMI weightings** - Scores are now correctly calculated when using Charlson or Quan weightings (#37)

- **Increased test coverage from 91% to 99%** - Added extended test suites for CLI and comorbidity calculations
- **Use WeightingVariant enum constants** - Replaced string literals with proper enum constants for type safety
- **Removed deprecated main.py** - Cleaned up unused module
- **Removed pandas-specific ruff lint rule** - Updated linting configuration

- **Updated API documentation** - Improved examples and parameter descriptions
- **Updated getting-started guide** - Enhanced onboarding documentation
- **Updated calculator documentation** - Refreshed Charlson, Elixhauser, HFRS, and Disability guides
- **Updated CLI documentation** - Improved command-line usage examples
- **Updated badges in index.md** - Refreshed project status badges

See full details in [HISTORY.md](HISTORY.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ICD10 codes map to only one comorbidity when they should map to two

3 participants