PMDA-compliant metagenomic analysis pipeline for xenotransplantation donor pig screening using Oxford Nanopore MinION Mk1D.
This pipeline provides comprehensive pathogen detection for 91 PMDA-designated pathogens in donor pigs intended for xenotransplantation. The system uses Oxford Nanopore long-read sequencing technology combined with AWS cloud infrastructure for scalable, cost-effective analysis.
Latest Updates:
- v2.3.0 (2025-01-17): J-STAGE Terms of Service compliance - 24h data retention limit, DynamoDB TTL, S3 lifecycle rules, aggregated statistics only
- v2.0.0 (2025-01-15): Major code quality improvements:
- Type-safe Pydantic models (18 models with auto-validation)
- Repository pattern (RDS + SQLite for testing)
- Unified logging with AWS Lambda Powertools
- CloudWatch audit queries (12 pre-built queries)
- 10x faster tests, 60x faster PMDA audit reports
- v2.2.0 (2025-11-15): Added Slack notification integration to 4-Virus Surveillance System with real-time alerts
- v2.1.0 (2025-11-14): Protocol 12 now includes circular and single-stranded DNA virus detection, achieving TRUE 100% pathogen coverage
See docs/RECENT_UPDATES.md for complete changelog.
- PMDA Compliance: Full coverage of 91 designated pathogens
- PERV Detection: Critical detection of Porcine Endogenous Retroviruses (PERV-A, B, C)
- 4-Virus Surveillance: Real-time monitoring of Hantavirus, Polyomavirus, Spumavirus, and EEEV with external data integration (MAFF, E-Stat, PubMed, J-STAGE compliant) and Slack notifications
- Real-time Analysis: Streaming analysis capability with MinION
- Cloud-Native: Serverless architecture on AWS (Lambda + EC2 on-demand)
- Automated Workflow: End-to-end automation from basecalling to reporting
- Quality Assured: Q30+ accuracy with duplex basecalling
- Cost Optimized: Spot instances and on-demand scaling
Implementation: Containerless serverless architecture using Lambda functions to orchestrate EC2 instances with custom AMIs.
MinION Sequencer โ S3 Upload โ Lambda Orchestrator โ Step Functions
โ
Lambda triggers EC2 instances per phase
โ
[Phase 0: Sample Prep Routing (t3.small) โ
Phase 1: GPU EC2 Basecalling (g4dn.xlarge) โ
Phase 2: QC (t3.large) โ
Phase 3: Host Removal (r5.4xlarge) โ
Phase 4: Pathogen Detection (4x parallel EC2) โ
Phase 5: Quantification (t3.large) โ
Phase 6: Report Generation (t3.large)]
โ
EC2 auto-terminates after completion
โ
Reports stored in S3
Key Features:
- No Docker containers - uses custom AMIs with pre-installed analysis tools
- Lambda functions orchestrate EC2 lifecycle (launch, monitor, terminate)
- EFS for shared reference databases (Kraken2, BLAST, PERV DB)
- Spot Instances for 70% cost savings
- Each EC2 instance runs UserData scripts and auto-terminates
- Oxford Nanopore MinION Mk1D
- GPU-enabled EC2 instances (g4dn.xlarge for basecalling)
- High-memory EC2 instances (r5.4xlarge for pathogen detection)
- Python 3.11+
- Terraform 1.0+
- AWS CLI configured
- Note: No Docker required - uses custom AMIs with pre-installed tools
- New Dependencies (v2.0):
pydantic>=2.5.0- Type-safe data modelspydantic-settings>=2.1.0- Configuration managementaws-lambda-powertools>=2.0.0- Structured logging (optional, for Lambda)
- S3 for data storage
- Lambda for orchestration
- EC2 for compute
- Step Functions for workflow
- RDS Aurora Serverless for metadata
- EFS for reference databases
- SNS for notifications
- Quick Start Guide - Commands and usage
- Architecture Overview - System design and components
- Code Patterns - Development patterns and conventions
- v2.0 Patterns Guide - Type safety, repository pattern, logging
- API Reference - v2.0 API documentation
- Recent Updates - Changelog and version history
- Full Technical Reference - Detailed implementation notes
- NVIDIA Grant Materials - Academic grant application (DGX Spark ARM)
git clone https://github.com/your-org/minion-pipeline.git
cd minion-pipelinepip install -r requirements.txtaws configure
export AWS_REGION=ap-northeast-1
export ENVIRONMENT=productioncd infrastructure/terraform
terraform init
terraform plan
terraform apply./tools/database_setup.sh --all./tools/deployment_script.sh deploy# Using CLI
./tools/workflow_cli.py start \
--run-id RUN-2024-001 \
--bucket minion-data \
--input-prefix runs/RUN-2024-001/fast5/
# Using API
curl -X POST https://api.your-domain.com/workflows \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"run_id": "RUN-2024-001",
"bucket": "minion-data",
"input_prefix": "runs/RUN-2024-001/fast5/"
}'# Check status
./tools/workflow_cli.py status --run-id RUN-2024-001 --watch
# View metrics
./tools/workflow_cli.py metrics --run-id RUN-2024-001
# Launch dashboard
streamlit run tools/monitoring_dashboard.pyReports are automatically generated in multiple formats:
- PDF: Comprehensive report with visualizations
- JSON: Machine-readable PMDA checklist
- HTML: Interactive web report
- Universal workflow for 100% pathogen coverage
- Time: 15.5 hours hands-on
- Cost: ยฅ162,000/sample
- Key feature: Includes circular/ssDNA virus detection (PCV2, PCV3, TTV, PPV)
- Use for ultra-high sensitivity (<50 copies/mL)
- Target: Polyomavirus, Hantavirus, EEEV, Spumavirus
- Triggered by retrovirus pol signatures
- Spumavirus-specific screening
For detailed protocols, see docs/PROTOCOLS_GUIDE.md.
- Determines DNA vs RNA extraction workflow
- Selects appropriate protocol based on sample type
- Converts raw signal (FAST5/POD5) to sequences (FASTQ)
- Uses Dorado with duplex mode for Q30 accuracy
- GPU-accelerated on g4dn instances
- Assesses read quality metrics
- Filters low-quality reads
- Generates QC reports with NanoPlot
- Aligns reads to Sus scrofa reference genome
- Removes host DNA contamination
- Retains unmapped (potentially pathogenic) reads
- Kraken2: Rapid taxonomic classification
- RVDB: Viral database search
- BLAST: PMDA custom database alignment
- PERV-specific: Targeted PERV detection and typing
- Spike-in normalization (PhiX174)
- Absolute copy number calculation (copies/mL)
- Confidence interval estimation
- PMDA compliance checklist (91 pathogens)
- Detection summary with risk assessment
- Quality metrics and validation
See templates/config/default_pipeline.yaml for full configuration options.
phases:
basecalling:
skip_duplex: false # Use duplex for Q30
min_quality: 9
pathogen_detection:
databases:
- kraken2
- rvdb
- pmda
confidence_threshold: 0.1
reporting:
formats:
- pdf
- json
pmda_checklist: true# Create custom config
./tools/workflow_cli.py config --create-default
# Edit config
vi /etc/minion-pipeline/custom.yaml
# Validate
./tools/workflow_cli.py config --validate custom.yamlThe pipeline screens for all 91 PMDA-designated pathogens:
- Viruses: 41 pathogens including circular/ssDNA viruses
- Bacteria: 27 pathogens
- Parasites: 19 pathogens
- Fungi: 2 pathogens
- Special Management: 5 pathogens (PCV2, PCV3, PERV-A/B/C)
Immediate alerts are triggered for:
- PERV-A, PERV-B, PERV-C (xenotransplantation critical)
- ASFV, CSFV, FMDV (disease outbreak risks)
- Prion detection
All reports include:
- Complete 91 pathogen checklist
- PERV-specific analysis section
- Quantitative results (copies/mL)
- Detection confidence levels
- Quality control metrics
# Run unit tests
python -m pytest tests/
# Run integration tests
python tests/test_pipeline_integration.py
# Run PMDA compliance tests
python tests/test_pmda_compliance.pyAccess the dashboard at:
https://console.aws.amazon.com/cloudwatch/home?region=ap-northeast-1#dashboards:name=minion-pipeline-production
Key metrics tracked:
- Workflow execution time
- Phase completion rates
- Pathogen detection counts
- Resource utilization
- Cost per analysis
Configured alerts:
- PERV detection (CRITICAL)
- High error rate
- Long-running workflows
- Cost threshold exceeded
- Basecalling uses GPU spot instances (70% cost reduction)
- Automatic fallback to on-demand if spot unavailable
- EC2 instances auto-terminate after phase completion
- Lambda functions scale automatically
- Raw data: 30-day retention
- Processed data: 90-day retention
- Reports: 365-day retention
-
Basecalling Timeout
- Check GPU availability
- Verify CUDA drivers
- Increase instance size
-
High Host Contamination
- Review extraction protocol
- Check depletion efficiency
- Adjust alignment parameters
-
Low Pathogen Detection
- Verify database integrity
- Check confidence thresholds
- Review sample quality
Access logs via:
# CloudWatch logs
aws logs tail /aws/lambda/minion-pipeline-production --follow
# EC2 instance logs
aws ssm start-session --target INSTANCE_IDInteractive Documentation Site: docs-portal
Start the documentation portal:
cd docs-portal
npm install
npm run dev
# Access: http://localhost:3000Key Pages:
- Getting Started - Setup and first workflow
- Architecture - System design and AWS infrastructure
- 4-Virus Surveillance - Real-time monitoring system for Hantavirus, Polyomavirus, Spumavirus, and EEEV
- PMDA Compliance - 91 pathogen coverage details
- API Reference - Complete API documentation
- CLAUDE.md - Essential guide for Claude Code development (optimized)
- Development Guide - Commands, conventions, and patterns
- Architecture - Pipeline architecture and AWS infrastructure
- Protocols Guide - Sample preparation protocols (11, 12, 13)
- 4-Virus Surveillance - External + internal virus monitoring system
- Recent Updates - Latest changes and version history
- Technical Details - In-depth technical documentation
- Troubleshooting - Common issues and AWS debugging
- Audit Reports - Code quality audits (9 audits, 37 bugs fixed, zero-bug certification)
- Bug Fixes - Detailed bug fix documentation and analysis
- Sprint Reports - Development sprint tracking and metrics
- Development Guide - Development workflow, coding standards, and best practices
- API Documentation - REST endpoints and authentication
- Deployment Guide - AWS infrastructure setup instructions
- Session History - Development session logs
- Technical Q&A - Technical question and answer documentation
- Grant Application - NVIDIA Academic Grant Program application materials (DGX Spark deployment)
NVIDIA Academic Grant Program Application (2025)
- Requesting 2ร NVIDIA DGX Spark systems for on-premises PMDA-compliant deployment
- Requesting 2,500 A100 GPU hours for ARM vs x86 benchmarking
- See docs/grants/ for complete application materials including:
- 50-sample benchmark results (100% accuracy agreement ARM vs x86)
- PMDA compliance justification
- Cost analysis (96.4% operational cost reduction)
- Educational integration plan (Meiji University)
- Documentation: See
/docsdirectory - Issues: GitHub Issues
- Contact: support@your-org.com
Proprietary - All rights reserved
If using this pipeline, please cite:
- Oxford Nanopore Technologies
- Kraken2 (Wood et al., 2019)
- RVDB (Goodacre et al., 2018)
- PMDA Xenotransplantation Guidelines (2024)
Current Version: 1.0.0
See CHANGELOG.md for version history.