Caution
Under active development. This project is being piloted at one institution and is not yet ready for general production use at other sites. APIs, schemas, and UI may change without notice.
An AI-assisted web application that helps researchers draft three IRB protocol forms from uploaded study documents:
- HRP-503 — full Human Research Protocol application
- HRP-503c — Human Research Engagement Determination
- HRP-398 — AI Considerations Worksheet (guidance only, not submitted to the IRB)
A single Study automatically creates three submissions — one per form. Documents you upload to the Study are shared across all submissions and drive the AI analysis on each form.
Live deployment: https://ignet.org/irb-assistant/ — pilot tenancy at the University of North Dakota with Sanford Health, funded by NIH/NIGMS through the TRANSCEND RDCDC (P20GM155890).
The screenshots below walk through the full workflow using a fictional Pediatric Asthma Outcomes Study as the sample. No real research data is shown.
Edit study details after creation — correct the nickname, title, PI, summary, or sponsor at any time
Edits propagate to all three child submissions inside a single DB transaction so the per-form headers stay consistent with the Study identity.
The Analyze button is automatically disabled while a run is queued or running, preventing accidental double-submission.
Each top-level question renders as one of 24 typed renderers (textarea, radio, checkbox group, date, file, etc.). AI-suggested answers from a completed run appear inline with their source-document quote. The sidebar's active indicator follows the user's scroll position via IntersectionObserver, and the last-viewed section is persisted to localStorage (keyed by study UUID, PII-free).
Each guidance entry includes a plain-language description, "Why IRB cares", a worked example, common pitfalls, and source attribution. Content is reviewed against an anti-fabrication deny-list at seed time (no real PI names, grant numbers, or IRB protocols).
![]() |
![]() |
![]() |
+-------------------------------------------+
| Study (one umbrella per research project)|
+----------+--------+---------------+-------+
| | |
v v v
HRP-503 HRP-503c HRP-398
Submission Submission Submission
(full app) (engagement) (AI worksheet)
^ ^ ^
| | |
+--------+---------------+
|
| Shared documents (PDF / DOCX / TXT)
| Encrypted at rest, malware-scanned
v
+----------------------------------+
| 1. Upload study documents |
| 2. Run AI Analyze on a form |
| - Evidence extraction |
| - Optional Assistant drafts |
| 3. Review + edit suggestions |
| 4. Export filled DOCX |
+----------------------------------+
The AI analysis runs in two modes per submission:
- Strict mode (default for audit-grade workflows). The LLM only proposes answers it can ground in a verbatim quote from your uploaded documents. Every suggestion includes a chunk-level evidence pointer. Fields with no supporting evidence stay blank for you to fill manually.
- Assistant mode. On top of evidence-grounded answers, the LLM generates plain-language draft answers for fields with insufficient evidence, with explicit
[SPECIFY: ...]placeholders for anything it would otherwise have to invent. Drafts are clearly distinguished from evidence-grounded answers in the UI (amber border, explicit "Accept draft" button).
The Analyze button kicks off a queued background job. A real-time progress modal opens automatically and polls the job state every 2 seconds — pressable Esc dismisses the modal but keeps polling, Cancel stops the job at the next checkpoint, and partial results that already saved are kept.
- Three-form multi-submission model — HRP-503, HRP-503c, HRP-398 share Study-level documents and run independently.
- Editable Study details — correct nickname, application title, PI, summary, or sponsor at any time after creation; edits propagate to all three child submissions inside one DB transaction.
- Evidence-backed suggestions — every LLM proposal links to a verbatim chunk from your source PDFs; quote-in-chunk match is enforced server-side.
- Real-time analysis modal with Esc-dismiss, cancel, and step-by-step progress (Prepare → Extract evidence → AI drafts → Save). The Analyze button is automatically disabled while a run is queued or running.
- Collapsible section accordion + sticky sidebar nav across all three form codes — per-section completion badges (N/M answered),
aria-currentactive indicator driven by IntersectionObserver, last-viewed section persisted to localStorage (PII-free, keyed by study UUID). - "About this field" disclosure panel for guidance-authored questions — plain-language description, "Why IRB cares", worked example, common pitfalls, source attribution. Anti-fabrication contract enforced at seed time.
- Section-level navigation for HRP-503's 248 questions across 43 sections, with cross-section trigger gating (sections lock or unlock based on earlier answers).
- Encryption at rest — uploaded documents and LLM payloads are encrypted with XChaCha20-Poly1305; keyring supports rotation.
- Malware scanning via ClamAV with graceful fallback for hosts without ClamAV installed.
- Self-registration → admin approval workflow — new users are blocked from login until an admin reviews and approves their account.
- Multi-provider LLM — OpenAI, OpenAI-compatible (e.g. LM Studio, Ollama, GLM 4.7), with per-tenant SSRF allow-list and audit-redacted base URLs.
- Audit log — every significant action (auth, upload, analysis, export, admin, study edit) is recorded with request context; configurable retention prune keeps the table bounded.
- Template-driven DOCX export that fills Word content controls (SDTs) in the official HRP-503 / HRP-503c templates.
- WCAG 2.1 AA-conformant UI — skip-to-content links, ARIA semantics,
role="alert"on validation errors, dark-mode parity (including JS-driven state), keyboard-accessible modals with focus management. - Retention management — automated daily cleanup of expired uploads, exports, and audit events.
| Layer | Technology |
|---|---|
| Backend | Laravel 12 · PHP 8.3 (dev) / PHP 8.2 (prod via Remi) |
| Database | MariaDB 10.x · MySQL 8 |
| Queue | Laravel Queue (Redis driver) with systemd worker |
| Frontend | Blade · Tailwind CSS · Alpine.js |
| Build | Vite (self-hosted Inter via @fontsource/inter) |
| Tests | PHPUnit (527 tests / 1,994 assertions) · Playwright E2E |
| LLM (pilot) | LM Studio on DGX Spark via Tailscale · gemma-4-e4b at 16K context |
- PHP 8.3+
- Node.js 18+
- MariaDB 10.x or MySQL 8
- Redis (for the queue)
- An LLM endpoint — OpenAI key, or any OpenAI-compatible local server (LM Studio, Ollama)
git clone https://github.com/windysky/irb-assistant.git
cd irb-assistant
# Start a user-space MariaDB instance (no sudo required)
./ops/db/start.sh
# Configure environment
cp .env.example .env
php artisan key:generate
# Install dependencies
composer install
npm ci
# Initialize the database (migrations + seed admin + bundled HRP templates)
php artisan migrate --seed
# Build frontend assets
npm run build
# Start the dev server
php artisan serve --host=127.0.0.1 --port=8000
# In a second terminal, start the queue worker
php artisan queue:work --tries=1 --timeout=1800Open http://localhost:8000.
| Field | Value |
|---|---|
admin@local |
|
| Password | change-me |
Change this immediately after first login. To re-seed:
php artisan db:seed --class=Database\\Seeders\\AdminUserSeederSign in as admin, go to Admin → LLM Providers, click Add provider, and enter:
- Provider type — OpenAI, OpenAI-compatible, LM Studio, Ollama, or GLM 4.7
- Base URL — e.g.
http://127.0.0.1:1234/v1for a local LM Studio - Model — e.g.
google/gemma-4-e4b - API key — if required
For LM Studio specifically, load the model with a 16K+ context window:
lms unload google/gemma-4-e4b
lms load google/gemma-4-e4b --context-length 16384 --gpu max -yThe default prompt config sends up to 40 evidence chunks with 1,200 chars each, plus 20 questions per batch. A 4K context is too small for non-trivial HRP forms.
Environment variables (see .env.example for the full list):
| Variable | Default | Description |
|---|---|---|
IRB_PDF_PARSER_MEMORY_MB |
256 |
Memory limit for the PDF fallback parser |
IRB_FILE_ENCRYPTION_KEYS |
— | Pipe-separated keyring for file at-rest encryption |
IRB_FILE_ENCRYPTION_ACTIVE_KEY_ID |
— | Active key ID for new encryptions |
IRB_DRAFTING_MAX_PER_RUN |
20 |
Max AI-drafted answers per Assistant-mode run |
IRB_ALLOW_LLM_LOOPBACK |
false |
Local-dev only — permits 127.0.0.1 base URLs for LLM providers |
IRB_LLM_HTTP_TIMEOUT |
600 |
LLM HTTP request timeout (seconds) |
SESSION_LIFETIME |
60 |
Idle session timeout (minutes) |
SESSION_EXPIRE_ON_CLOSE |
true |
Sessions invalidate when browser closes |
The app supports deployment under a URL prefix (e.g. /irb-assistant on a multi-tenant host). Set:
APP_URL=https://example.org/irb-assistant…and build assets with the matching prefix so the CSS url() font references resolve correctly:
VITE_APP_BASE=/irb-assistant/build/ npm run build- Registration is public by default, but new accounts are created with
is_approved=falseand cannot log in until an admin clicks Approve on the Admin → Users tab. - Sessions idle-expire after 60 minutes and invalidate when the browser closes — no "Remember me" option is offered. This is intentional for handling study PII.
- Password reset is rate-limited (5 attempts per minute), as are login, registration, document upload, and analysis dispatch.
- Admins have a non-deletable "Demote" action that prevents accidental lockout.
# Full PHPUnit suite (456 tests, 1,405 assertions)
php artisan test
# E2E (Playwright)
npx playwright test
# Smoke test against a live LM Studio instance over Tailscale (opt-in)
IRB_RUN_LIVE_LLM=1 npx playwright test lm-studio-smoke.spec.ts --workers=1CI runs the full suite + vendor/bin/pint --test + npm run build on every push.
- Documents stored outside the web root with optional XChaCha20-Poly1305 encryption.
- Malware scanning via ClamAV; gracefully falls back to a quarantine-only mode when ClamAV is not installed.
- LLM provider base URL is DNS-resolved server-side and rejected if it points at private IP space, IPv6 loopback / link-local / ULA, 6to4 / Teredo translation prefixes, non-decimal IPv4 literals, or IPv4-mapped IPv6. Tailscale's 100.64.0.0/10 range is allowed by design. See
SECURITY_CHECKLIST.mdfor the full SSRF posture. - Rate limiting on all auth + sensitive routes (5 requests per minute).
- LLM request and response payloads are stored with sensitive parts redacted in the JSON column; the full payload is kept encrypted for audit.
- Audit log covers auth events, document uploads, analysis runs, exports, and all admin actions.
- Project deletion redacts audit payloads while preserving event records (regulatory-friendly).
- @MX code-level annotations document invariants, danger zones, and incomplete work for downstream AI agents.
For the full security posture see SECURITY_CHECKLIST.md and the deploy playbook at ops/DEPLOYMENT_CHECKLIST.md.
The reference production deployment is on RHEL 9 + Apache 2.4 + PHP 8.2 (Remi side-by-side) + MariaDB. The public templates in ops/apache/ and ops/db/ cover the vhost, FPM pool, and database setup. Site-specific deploy automation (rsync wrapper, vhost installer, secrets handling) is maintained in a separate internal repository.
The queue worker is supervised by systemd:
# /etc/systemd/system/irb-queue.service
[Service]
ExecStart=/opt/remi/php82/root/usr/bin/php artisan queue:work --tries=1 --timeout=1800 --sleep=3 --max-time=3600
WorkingDirectory=/data/var/www/html/irb-assistant.
+-- app/
| +-- Console/Commands/ # Retention prune, template control dump
| +-- Enums/ # FormCode backed enum (hrp503 | hrp503c | hrp398)
| +-- Http/Controllers/ # Auth, Study, Submission, Admin, Export
| +-- Http/Middleware/ # EnsureUserIsAdmin, EnsureUserIsActive
| +-- Jobs/ # AnalyzeSubmissionJob (queued LLM pipeline)
| +-- Models/ # Study, Submission, SubmissionAnswer, FormDefinition, ...
| +-- Services/ # SubmissionAnalysisService, LlmChatService,
| # AnswerValidator, ConditionalVisibilityEvaluator,
| # SubmissionDraftingService, SubmissionDocxExportService,
| # FileEncryptionService, MalwareScanService, AuditService
+-- database/
| +-- migrations/ # Phase 3 canonical schema (studies + submissions)
| +-- seeders/ # Admin, FormDefinition, Hrp398FieldDefinitions, Templates
+-- resources/
| +-- templates/ # HRP-503.docx + HRP-503c.docx + HRP-398.docx
| +-- views/ # Blade (studies, submissions, admin, auth, layouts)
| | +-- submissions/types/ # 24 question-type partials (radio, checkbox, textarea, ...)
+-- routes/ # web.php (thin) + auth.php
+-- tests/
| +-- Feature/FormsV2/ # Phase 3/4/5/6/8 integration tests
| +-- Unit/ # Service + helper tests
+-- ops/
| +-- db/ # User-space MariaDB start/stop scripts
| +-- apache/ # Apache vhost samples
+-- ops/ # Public deploy templates (Apache vhost, FPM pool, MariaDB user-space scripts)
Pilot phase is closed-loop with the UND clinical research group + Sanford collaborators. After the pilot we plan:
- Per-field version history (every edit preserved, not just current value).
- Staging environment alongside production.
- Sentry error tracking + UptimeRobot HTTP monitoring.
- Curated
field_guidancetable — plain-English explainer + redacted example + common pitfalls per HRP field. - Optional SSO via institutional Shibboleth/SAML.
Open work is tracked internally.
Developed by Dr. Junguk Hur (@hurlab) at the University of North Dakota School of Medicine and Health Sciences in collaboration with Sanford Health.
This work is supported by NIH/NIGMS through the TRANSCEND RDCDC (P20GM155890).
This project is currently distributed under a research-pilot licence. Contact the maintainer for evaluation use. See LICENSE once added.



















