Skip to content

Ocr accuracy regression harnes#473

Closed
Aisha000-0 wants to merge 2 commits into
Pulsefy:mainfrom
Aisha000-0:OCR-Accuracy-Regression-Harnes
Closed

Ocr accuracy regression harnes#473
Aisha000-0 wants to merge 2 commits into
Pulsefy:mainfrom
Aisha000-0:OCR-Accuracy-Regression-Harnes

Conversation

@Aisha000-0
Copy link
Copy Markdown

Closes #464

Soter Documentation

Overview

Soter is an open-source, privacy-first platform for humanitarian aid distribution built on Stellar. It combines AI-powered verification, direct recipient claiming, and on-chain transparency to connect donors, NGOs, and recipients without middlemen.

Repository Structure

  • app/
    • ai-service/ - Python-based AI/OCR service and verification logic.
    • backend/ - NestJS API, business logic, Prisma schema, and backend tests.
    • frontend/ - Next.js web application UI and consumer-facing experience.
    • mobile/ - Expo React Native app for field operations and mobile access.
    • onchain/ - Soroban smart contract source, deployment scripts, and Rust tooling.

Key Components

  • app/ai-service/

    • main.py - entry point for the AI service.
    • services/ - contains OCR, fraud detection, preprocessing, and humanitarian verification.
    • schemas/ - Pydantic schemas for request/response validation.
    • tests/ - unit tests for AI service functionality.
  • app/backend/

    • src/ - NestJS controllers, services, and application modules.
    • prisma/ - database schema and migration management.
    • test/ - backend test suites.
  • app/frontend/

    • src/ - React components, pages, API hooks, and UI modules.
    • public/ - static assets.
  • app/mobile/

    • src/ - mobile app screens and components.
    • App.tsx - Expo entry point.
  • app/onchain/

    • contracts/ - Rust contract source and tests.
    • scripts/ - deployment and interaction scripts.

Getting Started

Prerequisites

  • Node.js 18+ and pnpm
  • Python 3.11+ for app/ai-service
  • Rust toolchain for Soroban contracts
  • PostgreSQL or another database supported by Prisma
  • Stellar testnet account and wallet

Setup

  1. Install dependencies:

    pnpm install
  2. Activate Python environment for AI service (if used):

    python -m venv .venv
    source .venv/bin/activate
    pip install -r app/ai-service/requirements.txt
  3. Configure environment variables in the relevant package folders.

Development

  • Run backend tests:

    pnpm --filter backend test
  • Run AI service tests:

    python -m pytest app/ai-service/tests
  • Run frontend locally:

    pnpm --filter frontend dev
  • Run mobile app locally:

    pnpm --filter mobile start

Contributing

  • Follow existing repository conventions for linting and formatting.
  • Add tests for new features and bug fixes.
  • Open a pull request with a clear summary of changes and testing details.

Notes

  • This repository contains multiple application layers; ensure the appropriate package is targeted when running commands.
  • The README.md file includes additional project-specific overview and setup guidance.

Copy link
Copy Markdown
Contributor

@Cedarich Cedarich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aisha000-0

Thanks for opening this PR! I can see you've started work on issue #464 (OCR Accuracy Regression Harness). However, this PR is not yet ready to merge and doesn't fully implement the requirements. Here's what needs to be addressed:

🚨 Critical Issues

1. Virtual Environment Directories Should Not Be Committed

  • Remove .venv/ and .venv-1/ directories from this PR
  • Add them to .gitignore if not already present
  • These should only exist locally on your machine

2. Incomplete Implementation
The PR currently has only a Jupyter notebook (tmp_ocr_inspect.ipynb) for inspecting OCR output. To fully satisfy issue #464, you need:

  • Golden Dataset Definition — Create test fixtures with expected OCR outputs (fields, bounding boxes, confidence scores)
    • Suggested path: app/ai-service/tests/fixtures/ocr_golden_inputs.py
  • Automated Test Harness — Implement actual regression tests
    • Suggested path: app/ai-service/tests/test_ocr_regression.py
    • Should load test images, run OCR, compare against golden data, and report pass/fail
  • Summary Report Logic — Generate reports with pass/fail status and error categorization
  • CI Integration — Add or update a GitHub Actions workflow to run the harness on relevant changes
    • Suggested path: .github/workflows/ocr-regression.yml

📋 Recommended Next Steps

  1. Delete/reset this branch and create a fresh one:

    git rm -r .venv .venv-1
    git commit -m "remove venv directories"
  2. Implement the test harness:

    • Define 2–3 golden test images with known OCR outputs
    • Create Python test file with comparison logic
    • Add report generation (JSON or markdown format)
  3. Add CI job to .github/workflows/:

    • Run on changes to app/ai-service/ or app/ai-service/services/ocr.py
    • Fail the build if regressions are detected
  4. Update the PR description with:

    • How to run the harness locally
    • Example test report output
    • Links to any related documentation

Would you like guidance on any of these steps? Happy to help! 🚀

@Cedarich Cedarich closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCR Accuracy Regression Harness (Golden Inputs)

2 participants