PubMatcher

A web application to support genomic data interpretation through simplified bibliographic research

Table of Contents

About
Technical Stack
Installation
Configuration
Data Sources
Gene Recognition
API
Project Structure
Development
Deployment
License
Citation
Authors
Acknowledgments

📖 About

PubMatcher is an automated genomic research tool that integrates biological databases and APIs to facilitate genetic interpretation. It enables batch analysis of gene lists, combining automated PubMed searches with curated databases to help geneticists identify relevant disease genes, especially those not yet fully documented in OMIM.

Key Features

🔬 Batch Gene Analysis - Analyze multiple genes simultaneously
📚 Automated PubMed Search - Real-time literature queries with phenotype matching
🗃️ Multi-Database Aggregation - ClinVar, gnomAD, PanelApp, IMPC, UniProt in one view
📊 Constraint Metrics - gnomAD v2.1 and v4 with automatic comparison
🐭 Mouse Phenotypes - IMPC knockout data visualization
📄 PDF Export - Generate comprehensive reports
👤 User Accounts - Save search history

🛠️ Technical Stack

Component	Technology
Backend	Node.js 18.x, Express.js 4.17
Frontend	Vue.js 3.5, Tailwind CSS
Database	PostgreSQL 14+
Containerization	Docker, Docker Compose

📦 Installation

Prerequisites

Node.js 18.x or higher
PostgreSQL 14 or higher
npm 9.x or higher

Option 1: Local Development

1. Clone the repository

git clone https://github.com/victormar1/PubMatcher.git

cd pubmatcher

2. Install dependencies

npm install

3. Configure environment

cp .env.example .env

Edit .env with your configuration

4. Initialize database

psql -U postgres -f database/schema.sql

5. Build frontend

npm run build

6. Start the server

node app.js

The application will be available at http://localhost:3000

Option 2: Docker Deployment (Recommended)

1. Clone the repository

git clone https://github.com/victormar1/PubMatcher.git

cd pubmatcher

2. Configure environment

cp .env.example .env

Edit .env with your configuration (at minimum, change DB_PASSWORD and SECRET_KEY)

3. Start with Docker Compose (includes PostgreSQL)

docker-compose -f docker/docker-compose.yml up -d

4. Check logs

docker-compose -f docker/docker-compose.yml logs -f app

The application will be available at http://localhost:3000

⚙️ Configuration

Copy .env.example to .env and configure the following variables:

Variable	Description	Default
`PORT`	Server port	`3000`
`NODE_ENV`	Environment (development/production)	`development`
`DB_HOST`	PostgreSQL host	`localhost`
`DB_PORT`	PostgreSQL port	`5432`
`DB_NAME`	Database name	`pubmatcher`
`DB_USER`	Database user	`pubmatcher`
`DB_PASSWORD`	Database password	-
`SECRET_KEY`	Secret for JWT tokens	-
`SMTP_*`	Email configuration for password reset	-

🗄️ Data Sources

PubMatcher integrates data from multiple sources:

Source	Access Method	Update	Description
PubMed	Web scraping	Real-time	Scientific literature
HGNC	REST API (XML)	Real-time	Gene nomenclature validation
IMPC	SOLR API	Real-time	Mouse knockout phenotypes
PanelApp UK	REST API (JSON)	Real-time	Clinical panels (Genomics England)
PanelApp AUS	REST API (JSON)	Real-time	Clinical panels (Australian Genomics)
UniProt	REST API	Real-time	Protein function
OMIM	Via Ensembl API	Real-time	Disease associations
ClinVar	Local JSON	Periodic	Variant counts per gene
gnomAD v2.1	Local CSV	Static	Constraint metrics
gnomAD v4	Local CSV	Static	Constraint metrics
ClinGen	Local CSV	Periodic	Gene validity classifications

🧬 Gene Recognition

Gene recognition uses dictionary-based lookup with the HGNC (HUGO Gene Nomenclature Committee) nomenclature as the reference.

Method

User input is compared against a local dictionary file (genes.json) containing approximately 43,000 approved gene symbols and their official aliases
Matching uses exact string comparison on both symbols and aliases
Each matched gene is validated via the HGNC REST API (rest.genenames.org/fetch/symbol/{gene})
The API returns cross-references to external databases (UniProt, OMIM, Ensembl, MGI)

Alias Resolution

Aliases are resolved using HGNC official data. For example, FANCS is recognized as an alias for BRCA1 and the system returns data for the official symbol.

Limitations

No fuzzy matching or typo tolerance
No NLP-based recognition
Exact match only

🔌 API

Search Genes

POST /api/search
Content-Type: application/json

{
  "genes": ["BRCA1", "TP53", "EGFR"],
  "phenotypes": ["cancer", "tumor"]
}

Response

{
  "cached": false,
  "results": [
    {
      "gene": "BRCA1",
      "url": "https://pubmed.ncbi.nlm.nih.gov/?term=...",
      "count": 12345,
      "firstArticleTitle": "...",
      "constraints_v2": { "pLI": 0.99, ... },
      "constraints_v4": { "pLI": 0.98, ... },
      "panelAppEnglandCount": 42,
      "mousePhenotypes": { ... }
    }
  ]
}

📁 Project Structure

pubmatcher/
├── app.js                      # Main application entry point
├── config/                     # Server configuration
├── controllers/                # Route controllers
├── database/                   # Database schema
├── docker/                     # Docker configuration
│   ├── Dockerfile              # Image build instructions
│   ├── docker-compose.yml      # Standalone deployment
│   └── docker-compose.production.yml  # Production (Traefik)
├── models/                     # Data models
├── routes/                     # API routes
├── services/                   # Business logic
├── utils/                      # Utility functions (data fetchers)
├── src/                        # Vue.js frontend source
│   ├── components/             # Vue components
│   ├── stores/                 # Pinia stores
│   └── router.js               # Frontend routing
├── public/                     # Static files
└── BDD/                        # Local data files (CSV, JSON)

🧪 Development

Start development server with hot reload:

npm run dev

Build for production:

npm run build

Format code:

npx prettier --write .

🚀 Deployment

Quick Start (Standalone with PostgreSQL)

This is the easiest way to deploy PubMatcher. It includes everything needed.

Configure environment:

cp .env.example .env

Edit .env with your settings

Start the application:

docker-compose -f docker/docker-compose.yml up -d

View logs:

docker-compose -f docker/docker-compose.yml logs -f

Stop the application:

docker-compose -f docker/docker-compose.yml down

Production Deployment (with Traefik)

For production with an existing Traefik reverse proxy:

docker-compose -f docker/docker-compose.production.yml up -d

Note: docker-compose.production.yml assumes you have:

An external Docker network named main_network

Traefik configured with websecure entrypoint

External PostgreSQL database

Modify this file to match your infrastructure.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use PubMatcher in your research, please cite:

Marin V, et al. (2025). PubMatcher: a web app to support genomic data interpretation through simplified bibliographic research. European Journal of Human Genetics. [DOI pending]

👥 Authors

Victor Marin - Geneticist, Project Lead
Hugo Lannes - Developer
Victor Dumont - Developer
Louis Lebreton - Contributor

🙏 Acknowledgments

HGNC for gene nomenclature data
IMPC for mouse phenotype data
Genomics England and Australian Genomics for PanelApp data
gnomAD team for constraint metrics
ClinVar and ClinGen for variant and gene validity data

⬆ Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.vscode		.vscode
BDD		BDD
config		config
controllers		controllers
database		database
docker		docker
models		models
public		public
routes		routes
services		services
src		src
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.js		app.js
global.css		global.css
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
prettier.config.js		prettier.config.js
pubmatcher.config.js		pubmatcher.config.js
tailwind.config.js		tailwind.config.js
vue.config.js		vue.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PubMatcher

📖 About

Key Features

🛠️ Technical Stack

📦 Installation

Prerequisites

Option 1: Local Development

Option 2: Docker Deployment (Recommended)

⚙️ Configuration

🗄️ Data Sources

🧬 Gene Recognition

Method

Alias Resolution

Limitations

🔌 API

Search Genes

Response

📁 Project Structure

🧪 Development

🚀 Deployment

Quick Start (Standalone with PostgreSQL)

Production Deployment (with Traefik)

📄 License

📚 Citation

👥 Authors

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PubMatcher

📖 About

Key Features

🛠️ Technical Stack

📦 Installation

Prerequisites

Option 1: Local Development

Option 2: Docker Deployment (Recommended)

⚙️ Configuration

🗄️ Data Sources

🧬 Gene Recognition

Method

Alias Resolution

Limitations

🔌 API

Search Genes

Response

📁 Project Structure

🧪 Development

🚀 Deployment

Quick Start (Standalone with PostgreSQL)

Production Deployment (with Traefik)

📄 License

📚 Citation

👥 Authors

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages