Skip to content

Conversation

@spopelka-dsac
Copy link
Contributor

module-name: Add Jupyter Notebook to Perform Mapping from Halloween CSVs to API DB

Jira Ticket #NDH-417

Problem

The Halloween CSV to API DB mapping in the npd_Puffin repo contained some issues that prevented us from being able to fully connect the API to the data, including:

  • It was very slow to load the data from the insert statements that were created
  • There were missing tables and relationships that caused the API to not be fully populated
  • Some of the inner joins masked data quality issues by omitting records that had invalid values (e.g. an inner join on NPI would exclude an NPI that does not exist in the NPI table, but doing so makes it harder to notice that such a value is present)

Solution

This PR introduces a python notebook that, among other things:

  • Provides a faster way to load data from the Halloween CSVs to any database that has had the flyway migrations applied
  • Corrects for a number of issues noted in the issues spreadsheet
  • Documents data quality issues in the code

Result

I would not recommend deploying this in Dagster, as our load processes will change once we land the core data model. However, in the event that we need to load Halloween CSV data before we finalize the core data model, this Jupyter Notebook represents a more robust way to do so.

Test Plan

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Backend Django Test Results

59 tests   - 39   56 ✅  - 42   0s ⏱️ -1s
11 suites  -  2    0 💤 ± 0 
11 files    -  2    0 ❌ ± 0   3 🔥 +3 

For more details on these errors, see this check.

Results for commit 8012223. ± Comparison against base commit 784f8dc.

This pull request removes 43 and adds 4 tests. Note that renamed tests count towards both.
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_default
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_city
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_postalcode
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_state
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_use
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_name
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_default_order
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_descending_order
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_order_by_address
…
npdfhir.tests.test_organization.OrganizationViewSetTestCase ‑ test_parent_id
setUpClass (npdfhir.tests.test_location.LocationViewSetTestCase)
setUpClass (npdfhir.tests.test_practitioner.PractitionerViewSetTestCase)
setUpClass (npdfhir.tests.test_practitioner_role.PractitionerRoleViewSetTestCase)

♻️ This comment has been updated with latest results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to see how we could replicate this structure on top of the #237 branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants