Table of Contents
PubMatcher is an automated genomic research tool that integrates biological databases and APIs to facilitate genetic interpretation. It enables batch analysis of gene lists, combining automated PubMed searches with curated databases to help geneticists identify relevant disease genes, especially those not yet fully documented in OMIM.
- 🔬 Batch Gene Analysis - Analyze multiple genes simultaneously
- 📚 Automated PubMed Search - Real-time literature queries with phenotype matching
- 🗃️ Multi-Database Aggregation - ClinVar, gnomAD, PanelApp, IMPC, UniProt in one view
- 📊 Constraint Metrics - gnomAD v2.1 and v4 with automatic comparison
- 🐭 Mouse Phenotypes - IMPC knockout data visualization
- 📄 PDF Export - Generate comprehensive reports
- 👤 User Accounts - Save search history
| Component | Technology |
|---|---|
| Backend | Node.js 18.x, Express.js 4.17 |
| Frontend | Vue.js 3.5, Tailwind CSS |
| Database | PostgreSQL 14+ |
| Containerization | Docker, Docker Compose |
- Node.js 18.x or higher
- PostgreSQL 14 or higher
- npm 9.x or higher
1. Clone the repository
git clone https://github.com/victormar1/PubMatcher.gitcd pubmatcher2. Install dependencies
npm install3. Configure environment
cp .env.example .envEdit .env with your configuration
4. Initialize database
psql -U postgres -f database/schema.sql5. Build frontend
npm run build6. Start the server
node app.jsThe application will be available at http://localhost:3000
1. Clone the repository
git clone https://github.com/victormar1/PubMatcher.gitcd pubmatcher2. Configure environment
cp .env.example .envEdit
.envwith your configuration (at minimum, changeDB_PASSWORDandSECRET_KEY)
3. Start with Docker Compose (includes PostgreSQL)
docker-compose -f docker/docker-compose.yml up -d4. Check logs
docker-compose -f docker/docker-compose.yml logs -f appThe application will be available at http://localhost:3000
Copy .env.example to .env and configure the following variables:
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3000 |
NODE_ENV |
Environment (development/production) | development |
DB_HOST |
PostgreSQL host | localhost |
DB_PORT |
PostgreSQL port | 5432 |
DB_NAME |
Database name | pubmatcher |
DB_USER |
Database user | pubmatcher |
DB_PASSWORD |
Database password | - |
SECRET_KEY |
Secret for JWT tokens | - |
SMTP_* |
Email configuration for password reset | - |
PubMatcher integrates data from multiple sources:
| Source | Access Method | Update | Description |
|---|---|---|---|
| PubMed | Web scraping | Real-time | Scientific literature |
| HGNC | REST API (XML) | Real-time | Gene nomenclature validation |
| IMPC | SOLR API | Real-time | Mouse knockout phenotypes |
| PanelApp UK | REST API (JSON) | Real-time | Clinical panels (Genomics England) |
| PanelApp AUS | REST API (JSON) | Real-time | Clinical panels (Australian Genomics) |
| UniProt | REST API | Real-time | Protein function |
| OMIM | Via Ensembl API | Real-time | Disease associations |
| ClinVar | Local JSON | Periodic | Variant counts per gene |
| gnomAD v2.1 | Local CSV | Static | Constraint metrics |
| gnomAD v4 | Local CSV | Static | Constraint metrics |
| ClinGen | Local CSV | Periodic | Gene validity classifications |
Gene recognition uses dictionary-based lookup with the HGNC (HUGO Gene Nomenclature Committee) nomenclature as the reference.
- User input is compared against a local dictionary file (
genes.json) containing approximately 43,000 approved gene symbols and their official aliases - Matching uses exact string comparison on both symbols and aliases
- Each matched gene is validated via the HGNC REST API (
rest.genenames.org/fetch/symbol/{gene}) - The API returns cross-references to external databases (UniProt, OMIM, Ensembl, MGI)
Aliases are resolved using HGNC official data. For example, FANCS is recognized as an alias for BRCA1 and the system returns data for the official symbol.
- No fuzzy matching or typo tolerance
- No NLP-based recognition
- Exact match only
POST /api/search
Content-Type: application/json
{
"genes": ["BRCA1", "TP53", "EGFR"],
"phenotypes": ["cancer", "tumor"]
}{
"cached": false,
"results": [
{
"gene": "BRCA1",
"url": "https://pubmed.ncbi.nlm.nih.gov/?term=...",
"count": 12345,
"firstArticleTitle": "...",
"constraints_v2": { "pLI": 0.99, ... },
"constraints_v4": { "pLI": 0.98, ... },
"panelAppEnglandCount": 42,
"mousePhenotypes": { ... }
}
]
}pubmatcher/
├── app.js # Main application entry point
├── config/ # Server configuration
├── controllers/ # Route controllers
├── database/ # Database schema
├── docker/ # Docker configuration
│ ├── Dockerfile # Image build instructions
│ ├── docker-compose.yml # Standalone deployment
│ └── docker-compose.production.yml # Production (Traefik)
├── models/ # Data models
├── routes/ # API routes
├── services/ # Business logic
├── utils/ # Utility functions (data fetchers)
├── src/ # Vue.js frontend source
│ ├── components/ # Vue components
│ ├── stores/ # Pinia stores
│ └── router.js # Frontend routing
├── public/ # Static files
└── BDD/ # Local data files (CSV, JSON)
Start development server with hot reload:
npm run devBuild for production:
npm run buildFormat code:
npx prettier --write .This is the easiest way to deploy PubMatcher. It includes everything needed.
Configure environment:
cp .env.example .envEdit .env with your settings
Start the application:
docker-compose -f docker/docker-compose.yml up -dView logs:
docker-compose -f docker/docker-compose.yml logs -fStop the application:
docker-compose -f docker/docker-compose.yml downFor production with an existing Traefik reverse proxy:
docker-compose -f docker/docker-compose.production.yml up -dNote:
docker-compose.production.ymlassumes you have:
- An external Docker network named
main_network- Traefik configured with
websecureentrypoint- External PostgreSQL database
Modify this file to match your infrastructure.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use PubMatcher in your research, please cite:
Marin V, et al. (2025). PubMatcher: a web app to support genomic data interpretation through simplified bibliographic research. European Journal of Human Genetics. [DOI pending]
- Victor Marin - Geneticist, Project Lead
- Hugo Lannes - Developer
- Victor Dumont - Developer
- Louis Lebreton - Contributor
- HGNC for gene nomenclature data
- IMPC for mouse phenotype data
- Genomics England and Australian Genomics for PanelApp data
- gnomAD team for constraint metrics
- ClinVar and ClinGen for variant and gene validity data