Describe the task
Replace the current simulated test dataset in conftest.py with a "golden dataset" derived from actual pipeline output to improve test reliability and coverage. The existing test data is both artificially small and only simulates real data distributions, which limits the effectiveness of unit tests that require realistic data patterns and sufficient sample sizes. This task involves generating a representative ~1,000 row dataset from full pipeline output, implementing it as the new testing standard, updating all existing tests to use this golden dataset, and documenting the process for future maintenance and updates.
Acceptance Criteria
Describe the task
Replace the current simulated test dataset in
conftest.pywith a "golden dataset" derived from actual pipeline output to improve test reliability and coverage. The existing test data is both artificially small and only simulates real data distributions, which limits the effectiveness of unit tests that require realistic data patterns and sufficient sample sizes. This task involves generating a representative ~1,000 row dataset from full pipeline output, implementing it as the new testing standard, updating all existing tests to use this golden dataset, and documenting the process for future maintenance and updates.Acceptance Criteria
conftest.pywith the new golden dataset