Summary
Add nested document flattening support for NoSQL and JSON-like datasets.
Motivation
MongoDB documents often contain nested objects and arrays.
Dift’s comparison engine works best with tabular data, so nested structures need to be normalized into comparable columns.
This feature enables better support for:
- MongoDB documents
- nested JSON records
- semi-structured datasets
- document schema drift detection
Proposed Improvements
- Flatten nested objects into dotted column names
- Preserve nested field paths
- Add configurable flattening behavior
- Handle arrays safely
- Support flattened schema comparison
Suggested Files
Potential implementation areas:
dift/io/mongodb_reader.py
dift/io/readers.py
dift/utils/
dift/core/schema_diff.py
tests/test_nested_flattening.py
docs/connectors/mongodb.md
docs/statistical-analysis.md
Suggested Tasks
- Add nested document flattening utility
- Support dotted field paths
- Add safe handling for arrays
- Add tests for nested dictionaries
- Add tests for mixed document shapes
- Update documentation
How to Test
Run:
Run targeted tests:
pytest tests/test_nested_flattening.py
Manual validation example:
dift nested_old.json nested_new.json --key id
Example nested document:
{
"id": 1,
"profile": {
"name": "Ama",
"location": {
"country": "Ghana"
}
}
}
Expected flattened fields:
profile.name
profile.location.country
Verify:
- nested fields are flattened correctly
- schema comparison detects nested field changes
- existing JSON workflows remain stable
Documentation Impact
Update or create:
docs/connectors/mongodb.md
docs/examples.md
docs/statistical-analysis.md
Documentation should include:
- flattening behavior
- dotted field naming
- array handling limitations
- examples with nested documents
Acceptance Criteria
- Nested document flattening works
- Flattened fields are comparable
- Schema diff detects nested field changes
- Tests pass
- Documentation updated
Summary
Add nested document flattening support for NoSQL and JSON-like datasets.
Motivation
MongoDB documents often contain nested objects and arrays.
Dift’s comparison engine works best with tabular data, so nested structures need to be normalized into comparable columns.
This feature enables better support for:
Proposed Improvements
Suggested Files
Potential implementation areas:
Suggested Tasks
How to Test
Run:
pytest ruff check .Run targeted tests:
Manual validation example:
Example nested document:
{ "id": 1, "profile": { "name": "Ama", "location": { "country": "Ghana" } } }Expected flattened fields:
Verify:
Documentation Impact
Update or create:
Documentation should include:
Acceptance Criteria