Skip to content

feat: Add Nested Document Flattening Support #110

Description

@ReginaldErzoah

Summary

Add nested document flattening support for NoSQL and JSON-like datasets.


Motivation

MongoDB documents often contain nested objects and arrays.

Dift’s comparison engine works best with tabular data, so nested structures need to be normalized into comparable columns.

This feature enables better support for:

  • MongoDB documents
  • nested JSON records
  • semi-structured datasets
  • document schema drift detection

Proposed Improvements

  • Flatten nested objects into dotted column names
  • Preserve nested field paths
  • Add configurable flattening behavior
  • Handle arrays safely
  • Support flattened schema comparison

Suggested Files

Potential implementation areas:

dift/io/mongodb_reader.py
dift/io/readers.py
dift/utils/
dift/core/schema_diff.py
tests/test_nested_flattening.py
docs/connectors/mongodb.md
docs/statistical-analysis.md

Suggested Tasks

  • Add nested document flattening utility
  • Support dotted field paths
  • Add safe handling for arrays
  • Add tests for nested dictionaries
  • Add tests for mixed document shapes
  • Update documentation

How to Test

Run:

pytest
ruff check .

Run targeted tests:

pytest tests/test_nested_flattening.py

Manual validation example:

dift nested_old.json nested_new.json --key id

Example nested document:

{
  "id": 1,
  "profile": {
    "name": "Ama",
    "location": {
      "country": "Ghana"
    }
  }
}

Expected flattened fields:

profile.name
profile.location.country

Verify:

  • nested fields are flattened correctly
  • schema comparison detects nested field changes
  • existing JSON workflows remain stable

Documentation Impact

Update or create:

docs/connectors/mongodb.md
docs/examples.md
docs/statistical-analysis.md

Documentation should include:

  • flattening behavior
  • dotted field naming
  • array handling limitations
  • examples with nested documents

Acceptance Criteria

  • Nested document flattening works
  • Flattened fields are comparable
  • Schema diff detects nested field changes
  • Tests pass
  • Documentation updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions