Ocr accuracy regression harnes by Aisha000-0 · Pull Request #473 · Pulsefy/Soter

Aisha000-0 · 2026-05-26T13:36:04Z

Closes #464

Soter Documentation

Overview

Soter is an open-source, privacy-first platform for humanitarian aid distribution built on Stellar. It combines AI-powered verification, direct recipient claiming, and on-chain transparency to connect donors, NGOs, and recipients without middlemen.

Repository Structure

app/
- ai-service/ - Python-based AI/OCR service and verification logic.
- backend/ - NestJS API, business logic, Prisma schema, and backend tests.
- frontend/ - Next.js web application UI and consumer-facing experience.
- mobile/ - Expo React Native app for field operations and mobile access.
- onchain/ - Soroban smart contract source, deployment scripts, and Rust tooling.

Key Components

app/ai-service/
- main.py - entry point for the AI service.
- services/ - contains OCR, fraud detection, preprocessing, and humanitarian verification.
- schemas/ - Pydantic schemas for request/response validation.
- tests/ - unit tests for AI service functionality.
app/backend/
- src/ - NestJS controllers, services, and application modules.
- prisma/ - database schema and migration management.
- test/ - backend test suites.
app/frontend/
- src/ - React components, pages, API hooks, and UI modules.
- public/ - static assets.
app/mobile/
- src/ - mobile app screens and components.
- App.tsx - Expo entry point.
app/onchain/
- contracts/ - Rust contract source and tests.
- scripts/ - deployment and interaction scripts.

Getting Started

Prerequisites

Node.js 18+ and pnpm
Python 3.11+ for app/ai-service
Rust toolchain for Soroban contracts
PostgreSQL or another database supported by Prisma
Stellar testnet account and wallet

Setup

Install dependencies:
```
pnpm install
```

Activate Python environment for AI service (if used):

python -m venv .venv
source .venv/bin/activate
pip install -r app/ai-service/requirements.txt

Configure environment variables in the relevant package folders.

Development

Run backend tests:
```
pnpm --filter backend test
```
Run AI service tests:
```
python -m pytest app/ai-service/tests
```
Run frontend locally:
```
pnpm --filter frontend dev
```
Run mobile app locally:
```
pnpm --filter mobile start
```

Contributing

Follow existing repository conventions for linting and formatting.
Add tests for new features and bug fixes.
Open a pull request with a clear summary of changes and testing details.

Notes

This repository contains multiple application layers; ensure the appropriate package is targeted when running commands.
The README.md file includes additional project-specific overview and setup guidance.

…ction and data extraction

…y binaries

Cedarich

@Aisha000-0

Thanks for opening this PR! I can see you've started work on issue #464 (OCR Accuracy Regression Harness). However, this PR is not yet ready to merge and doesn't fully implement the requirements. Here's what needs to be addressed:

🚨 Critical Issues

1. Virtual Environment Directories Should Not Be Committed

Remove .venv/ and .venv-1/ directories from this PR
Add them to .gitignore if not already present
These should only exist locally on your machine

2. Incomplete Implementation
The PR currently has only a Jupyter notebook (tmp_ocr_inspect.ipynb) for inspecting OCR output. To fully satisfy issue #464, you need:

✅ Golden Dataset Definition — Create test fixtures with expected OCR outputs (fields, bounding boxes, confidence scores)
- Suggested path: app/ai-service/tests/fixtures/ocr_golden_inputs.py
✅ Automated Test Harness — Implement actual regression tests
- Suggested path: app/ai-service/tests/test_ocr_regression.py
- Should load test images, run OCR, compare against golden data, and report pass/fail
✅ Summary Report Logic — Generate reports with pass/fail status and error categorization
✅ CI Integration — Add or update a GitHub Actions workflow to run the harness on relevant changes
- Suggested path: .github/workflows/ocr-regression.yml

📋 Recommended Next Steps

Delete/reset this branch and create a fresh one:

git rm -r .venv .venv-1
git commit -m "remove venv directories"

Implement the test harness:
- Define 2–3 golden test images with known OCR outputs
- Create Python test file with comparison logic
- Add report generation (JSON or markdown format)
Add CI job to .github/workflows/:
- Run on changes to app/ai-service/ or app/ai-service/services/ocr.py
- Fail the build if regressions are detected
Update the PR description with:
- How to run the harness locally
- Example test report output
- Links to any related documentation

Would you like guidance on any of these steps? Happy to help! 🚀

Aisha000-0 added 2 commits May 26, 2026 13:24

feat(ocr-inspection): add Jupyter notebook for OCR golden input inspe…

efbc5a1

…ction and data extraction

feat(venv): add Python virtual environment configuration and necessar…

81a6aa9

…y binaries

Cedarich requested changes May 26, 2026

View reviewed changes

Cedarich closed this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ocr accuracy regression harnes#473

Ocr accuracy regression harnes#473
Aisha000-0 wants to merge 2 commits into
Pulsefy:mainfrom
Aisha000-0:OCR-Accuracy-Regression-Harnes

Aisha000-0 commented May 26, 2026

Uh oh!

Cedarich left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aisha000-0 commented May 26, 2026

Soter Documentation

Overview

Repository Structure

Key Components

Getting Started

Prerequisites

Setup

Development

Contributing

Notes

Uh oh!

Cedarich left a comment

Choose a reason for hiding this comment

🚨 Critical Issues

📋 Recommended Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants