Skip to content

Conversation

@abachman-dsac
Copy link
Member

@abachman-dsac abachman-dsac commented Dec 18, 2025

Warning

This PR is a work in progress and intended for demonstration purposes. Some functionality is broken by this change.

Jira Ticket NDH-640

Problem

The /fhir/Organization/ API demonstrates a SQL query pattern with extremely high operational costs in terms of database latency.

In our testing environment with a closer-to-full-size dataset, requesting the 1001st page with page_size of 100 takes almost 20 seconds.

In simple load testing (10 URLs requested three at a time) I was able to crash the nonproduction service ~60% of the time.

Solution

Introduce a postgres materialized view over the appropriate sorting field, attach a Django model to it, and use that model to replace the model used by the FHIROrganizationViewSet view list method to select organizations ordered by name.

Result

Original base query to support the list view in backend/npdfhir/views.py:363-388:

organizations = (
    Organization.objects.all()
    .prefetch_related(
        "authorized_official",
        "ein",
        "organizationtoname_set",
        "organizationtoaddress_set",
        "organizationtoaddress_set__address",
        "organizationtoaddress_set__address__address_us",
        "organizationtoaddress_set__address__address_us__state_code",
        "organizationtoaddress_set__address_use",
        "authorized_official__individualtophone_set",
        "authorized_official__individualtoname_set",
        "authorized_official__individualtoemail_set",
        "authorized_official__individualtoaddress_set",
        "authorized_official__individualtoaddress_set__address__address_us",
        "authorized_official__individualtoaddress_set__address__address_us__state_code",
        "clinicalorganization",
        "clinicalorganization__npi",
        "clinicalorganization__organizationtootherid_set",
        "clinicalorganization__organizationtootherid_set__other_id_type",
        "clinicalorganization__organizationtotaxonomy_set",
        "clinicalorganization__organizationtotaxonomy_set__nucc_code",
    )
    .order_by("organizationtoname__name")
)

generates the raw SQL:

SELECT
  "organization"."id",
  "organization"."authorized_official_id",
  "organization"."ein_id",
  "organization"."parent_id"
FROM
  "organization"
  LEFT OUTER JOIN "organization_to_name" ON (
    "organization"."id" = "organization_to_name"."organization_id"
  )
ORDER BY
  "organization_to_name"."name" ASC
LIMIT
  10;

whose explanation shows no indexes and two full tables scans:

Limit  (cost=140.12..140.15 rows=10 width=93)
  ->  Sort  (cost=140.10..144.51 rows=1767 width=93)
        Sort Key: organization_to_name.name
        ->  Hash Right Join  (cost=52.76..93.08 rows=1767 width=93)
              Hash Cond: (organization_to_name.organization_id = organization.id)
              ->  Seq Scan on organization_to_name  (cost=0.00..35.67 rows=1767 width=45)
              ->  Hash  (cost=30.67..30.67 rows=1767 width=64)
                    ->  Seq Scan on organization  (cost=0.00..30.67 rows=1767 width=64)

The updated materialized view query in this PR backend/npdfhir/views.py:360-386:

organizations = (
    OrganizationByName.objects.all()
    .prefetch_related(
        "organization",
        "organization__authorized_official",
        "organization__ein",
        "organization__organizationtoname_set",
        "organization__organizationtoaddress_set",
        "organization__organizationtoaddress_set__address",
        "organization__organizationtoaddress_set__address__address_us",
        "organization__organizationtoaddress_set__address__address_us__state_code",
        "organization__organizationtoaddress_set__address_use",
        "organization__authorized_official__individualtophone_set",
        "organization__authorized_official__individualtoname_set",
        "organization__authorized_official__individualtoemail_set",
        "organization__authorized_official__individualtoaddress_set",
        "organization__authorized_official__individualtoaddress_set__address__address_us",
        "organization__authorized_official__individualtoaddress_set__address__address_us__state_code",
        "organization__clinicalorganization",
        "organization__clinicalorganization__npi",
        "organization__clinicalorganization__organizationtootherid_set",
        "organization__clinicalorganization__organizationtootherid_set__other_id_type",
        "organization__clinicalorganization__organizationtotaxonomy_set",
        "organization__clinicalorganization__organizationtotaxonomy_set__nucc_code",
    )
    .order_by("name")
)

shows a much cleaner underlying query:

SELECT
  "organization_by_name"."id",
  "organization_by_name"."name"
FROM
  "organization_by_name"
ORDER BY
  "organization_by_name"."name" ASC
LIMIT
  10;

which exclusively relies on an index scan:

Limit  (cost=0.86..1.44 rows=10 width=45)
  ->  Index Scan using idx_organization_by_name_on_name on organization_by_name  (cost=0.28..102.75 rows=1765 width=45)

Risks

This approach will require maintaining the materialized view, for which there is a Postgres command:

REFRESH MATERIALIZED VIEW CONCURRENTLY "npd"."organization_by_name";

This command will have to be run after every update to organization or organization_to_name, after every migration from ETL to NPD, or on a schedule within NPD itself.

@abachman-dsac abachman-dsac force-pushed the abachman-dsac/organization-api-indexes branch from 52f85c3 to 26f040f Compare December 19, 2025 16:44
@abachman-dsac abachman-dsac marked this pull request as ready for review December 19, 2025 17:39
@abachman-dsac abachman-dsac changed the title [WIP] Example replacing a base table with a materialized view [DEMO] Replacing a base table with a materialized view Dec 19, 2025
@abachman-dsac abachman-dsac force-pushed the abachman-dsac/organization-api-indexes branch from dcbd009 to 8585685 Compare December 19, 2025 19:35
@abachman-dsac abachman-dsac force-pushed the abachman-dsac/organization-api-indexes branch from b77f3b6 to 7f5c377 Compare December 19, 2025 20:21
@abachman-dsac abachman-dsac changed the title [DEMO] Replacing a base table with a materialized view [NDH-640][SPIKE] Replacing a base table with a materialized view Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants