Ocr accuracy regression harnes#473
Closed
Aisha000-0 wants to merge 2 commits into
Closed
Conversation
…ction and data extraction
Cedarich
requested changes
May 26, 2026
Contributor
Cedarich
left a comment
There was a problem hiding this comment.
Thanks for opening this PR! I can see you've started work on issue #464 (OCR Accuracy Regression Harness). However, this PR is not yet ready to merge and doesn't fully implement the requirements. Here's what needs to be addressed:
🚨 Critical Issues
1. Virtual Environment Directories Should Not Be Committed
- Remove
.venv/and.venv-1/directories from this PR - Add them to
.gitignoreif not already present - These should only exist locally on your machine
2. Incomplete Implementation
The PR currently has only a Jupyter notebook (tmp_ocr_inspect.ipynb) for inspecting OCR output. To fully satisfy issue #464, you need:
- ✅ Golden Dataset Definition — Create test fixtures with expected OCR outputs (fields, bounding boxes, confidence scores)
- Suggested path:
app/ai-service/tests/fixtures/ocr_golden_inputs.py
- Suggested path:
- ✅ Automated Test Harness — Implement actual regression tests
- Suggested path:
app/ai-service/tests/test_ocr_regression.py - Should load test images, run OCR, compare against golden data, and report pass/fail
- Suggested path:
- ✅ Summary Report Logic — Generate reports with pass/fail status and error categorization
- ✅ CI Integration — Add or update a GitHub Actions workflow to run the harness on relevant changes
- Suggested path:
.github/workflows/ocr-regression.yml
- Suggested path:
📋 Recommended Next Steps
-
Delete/reset this branch and create a fresh one:
git rm -r .venv .venv-1 git commit -m "remove venv directories" -
Implement the test harness:
- Define 2–3 golden test images with known OCR outputs
- Create Python test file with comparison logic
- Add report generation (JSON or markdown format)
-
Add CI job to
.github/workflows/:- Run on changes to
app/ai-service/orapp/ai-service/services/ocr.py - Fail the build if regressions are detected
- Run on changes to
-
Update the PR description with:
- How to run the harness locally
- Example test report output
- Links to any related documentation
Would you like guidance on any of these steps? Happy to help! 🚀
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #464
Soter Documentation
Overview
Soter is an open-source, privacy-first platform for humanitarian aid distribution built on Stellar. It combines AI-powered verification, direct recipient claiming, and on-chain transparency to connect donors, NGOs, and recipients without middlemen.
Repository Structure
app/ai-service/- Python-based AI/OCR service and verification logic.backend/- NestJS API, business logic, Prisma schema, and backend tests.frontend/- Next.js web application UI and consumer-facing experience.mobile/- Expo React Native app for field operations and mobile access.onchain/- Soroban smart contract source, deployment scripts, and Rust tooling.Key Components
app/ai-service/main.py- entry point for the AI service.services/- contains OCR, fraud detection, preprocessing, and humanitarian verification.schemas/- Pydantic schemas for request/response validation.tests/- unit tests for AI service functionality.app/backend/src/- NestJS controllers, services, and application modules.prisma/- database schema and migration management.test/- backend test suites.app/frontend/src/- React components, pages, API hooks, and UI modules.public/- static assets.app/mobile/src/- mobile app screens and components.App.tsx- Expo entry point.app/onchain/contracts/- Rust contract source and tests.scripts/- deployment and interaction scripts.Getting Started
Prerequisites
pnpmapp/ai-serviceSetup
Install dependencies:
Activate Python environment for AI service (if used):
python -m venv .venv source .venv/bin/activate pip install -r app/ai-service/requirements.txtConfigure environment variables in the relevant package folders.
Development
Run backend tests:
pnpm --filter backend testRun AI service tests:
Run frontend locally:
Run mobile app locally:
Contributing
Notes
README.mdfile includes additional project-specific overview and setup guidance.