Skip to content

Source database should reconcile ~160k orphan records #438

@kbighorse

Description

@kbighorse

Summary

During the FK enforcement implementation (PR #416), transfer scripts discovered ~160,180 orphan records in the legacy SQL Server database that have no valid parent records. These records were correctly filtered out during transfer, but the root cause should be investigated in the source database.

Orphan Record Counts

Table Valid Records Orphan Records
HydraulicsData 0 288
ChemistrySampleInfo 1,622 9,820
MajorChemistry 16,532 62,046
Radionuclides 12 351
MinorTraceChemistry 31,721 76,640
FieldParameters 2,899 11,035
  Summary                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                        
  The branch implements FK enforcement by:                                                                                                                                                                                                                                              
  1. Adding Integer PKs to NMA legacy tables (nma_ prefix)                                                                                                                                                                                                                              
  2. Creating FK relationships from chemistry tables → ChemistrySampleInfo → Thing                                                                                                                                                                                                      
  3. Filtering orphan records during transfer (preventing FK violations)                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                        
  The transfer scripts correctly identify and filter ~71k orphan records that would violate FK constraints.                                                                                                                                                                             
                                                                                                                                                                                                                                                                                        
  ---                                                                                                                                                                                                                                                                                   
  Orphan Record Analysis                                                                                                                                                                                                                                                                
  ┌─────────────────────┬─────────┬────────┬─────────┐                                                                                                                                                                                                                                  
  │        Table        │  Valid  │ Orphan │  Total  │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ HydraulicsData      │     288 │      0 │     288 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ ChemistrySampleInfo │   7,629 │  3,813 │  11,442 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ MajorChemistry      │  78,578 │      0 │  78,578 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ Radionuclides       │     358 │      5 │     363 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ MinorTraceChemistry │  50,931 │ 57,430 │ 108,361 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ FieldParameters     │   4,192 │  9,742 │  13,934 │                                                                                                                                                                                                                                  
  ├─────────────────────┼─────────┼────────┼─────────┤                                                                                                                                                                                                                                  
  │ TOTAL               │ 141,976 │ 70,990 │ 212,966 │                                                                                                                                                                                                                                  
  └─────────────────────┴─────────┴────────┴─────────┘                                                                                                                                                                                                                                  
  ---                                                                                                                                                                                                                                                                                   
  Missing Parent Identifiers (SamplePointIDs)                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                        
  1,993 unique SamplePointIDs in Chemistry_SampleInfo have no matching record in WellData (Things).                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                        
  By Project/Basin Prefix:                                                                                                                                                                                                                                                              
  ┌────────┬───────┬──────────────────────────────────────────────────┐                                                                                                                                                                                                                 
  │ Prefix │ Count │                   Top Examples                   │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ SB     │   480 │ SB-0003, SB-0010, SB-0014, SB-0015, SB-0019      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ SM     │   199 │ SM-0007, SM-0011, SM-0012, SM-0016, SM-0018      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ NM     │   145 │ NM-00032, NM-00035, NM-00036, NM-00059, NM-00091 │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ AR     │   118 │ AR-0003, AR-0004, AR-0005, AR-0006, AR-0008      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ TB     │   107 │ TB-0008, TB-0020, TB-0021, TB-0024, TB-0070      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ QU     │    83 │ QU-001, QU-002, QU-007, QU-009, QU-010           │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ BW     │    71 │ BW-0616, BW-0617, BW-0618, BW-0619, BW-0620      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ SA     │    70 │ SA-0010, SA-0012, SA-0043, SA-0046, SA-0053      │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ RA     │    70 │ RA-001, RA-002, RA-003, RA-004, RA-006           │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ TV     │    67 │ TV-100, TV-102, TV-115, TV-118, TV-125           │                                                                                                                                                                                                                 
  ├────────┼───────┼──────────────────────────────────────────────────┤                                                                                                                                                                                                                 
  │ ...    │   624 │ (remaining 31 prefixes)                          │                                                                                                                                                                                                                 
  └────────┴───────┴──────────────────────────────────────────────────┘                                                                                                                                                                                                                 
  Top 10 Highest-Impact Orphan IDs:                                                                                                                                                                                                                                                     
  ┌───────────────┬────────────┬─────────────┬─────────────┬──────────────────┐                                                                                                                                                                                                         
  │ SamplePointID │ ChemSample │ Minor/Trace │ FieldParams │ Total Downstream │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0059       │         13 │         757 │          45 │              802 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0112       │         13 │         715 │          45 │              760 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0075       │         11 │         654 │          45 │              699 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0207       │         10 │         627 │          50 │              677 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0106       │         10 │         625 │          45 │              670 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0058       │         11 │         607 │          40 │              647 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0017       │         10 │         589 │          40 │              629 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0008       │         10 │         569 │          45 │              614 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0010       │          9 │         565 │          45 │              610 │                                                                                                                                                                                                         
  ├───────────────┼────────────┼─────────────┼─────────────┼──────────────────┤                                                                                                                                                                                                         
  │ AR-0006       │          9 │         565 │          45 │              610 │                                                                                                                                                                                                         
  └───────────────┴────────────┴─────────────┴─────────────┴──────────────────┘                                                                                                                                                                                                         
  ---                                                                                                                                                                                                                                                                                   
  Investigation Files Generated                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        
  1. transfers/data/orphan_sample_point_ids.csv - All 1,993 unique orphan SamplePointIDs                                                                                                                                                                                                
  2. transfers/data/orphan_investigation_report.csv - Full report with downstream record counts per orphan ID                                                                                                                                                                           
                                                                                                                                                                                                                                                                                        
  ---                                                                                                                                                                                                                                                                                   
  Next Steps for Issue #438                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                        
  The investigation tasks from the issue should focus on:                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                        
  1. AR- prefix wells (Artesian? region) - Most downstream data impact (~18k records)                                                                                                                                                                                                   
  2. SB- prefix wells - Most orphan IDs (480 unique)                                                                                                                                                                                                                                    
  3. SM- prefix wells - Second most orphan IDs (199 unique)                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                        
  Key questions to answer:                                                                                                                                                                                                                                                              
  - Were these wells deleted from WellData after chemistry data was entered?                                                                                                                                                                                                            
  - Is there a naming mismatch (e.g., different PointID format)?                                                                                                                                                                                                                        
  - Can orphan records be linked via alternate identifiers (WellID, OSEWellID, LocationId)?

Example Orphan Subtree

┌─────────────────────────────────────────────────────────────────────┐
│ LEVEL 0: Thing (MISSING)                                            │
├─────────────────────────────────────────────────────────────────────┤
│ Expected name: ar-1001                                              │
│ Status: NOT FOUND in WellData                                       │
└─────────────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ LEVEL 1: ChemistrySampleInfo (ORPHAN)                               │
├─────────────────────────────────────────────────────────────────────┤
│ SamplePointID: AR-1001B                                             │
│ SamplePtID: FA79D688-8BCB-B64F-8E2D-409324D0DD8C                    │
└─────────────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ LEVEL 2: MinorTraceChemistry (62 ORPHAN records)                    │
├─────────────────────────────────────────────────────────────────────┤
│ 62 chemistry results with no valid parent chain                     │
└─────────────────────────────────────────────────────────────────────┘

Investigation Tasks

  • Query SQL Server to identify all unique orphan SamplePointIDs
  • Determine why these wells are missing from WellData table:
    • Were they deleted?
    • Were they never created?
    • Is there a naming mismatch (e.g., different PointID format)?
  • Check if orphan records can be linked to existing wells via alternate identifiers (WellID, OSEWellID, LocationId)
  • Determine if missing wells should be added to WellData
  • Document which orphan records are truly unrecoverable vs. fixable

Potential Root Causes

  1. Wells never registered: Chemistry samples collected but well never added to WellData
  2. Wells deleted: Well records removed but child records left behind
  3. Naming mismatches: SamplePointID doesn't match PointID format (e.g., suffix handling)
  4. Data entry errors: Typos in SamplePointID values

Acceptance Criteria

  • Root cause identified for majority of orphan records
  • Decision made on whether to fix source data or document as known data quality issue
  • If fixable, source database updated and transfer re-run to verify
  • Final orphan count documented

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions