A lightweight GPT-based language model framework for training custom question-answering models on any domain. This package provides a transformer-based GPT architecture that you can train on your own Q&A datasets - whether it's casual conversations, technical support, education, or any other domain.
If you use this model in your research, please cite:
@software{gptmed_2026,
author = {Sanjog Sigdel},
title = {GptMed: A custom causal question answering general purpose GPT Transformer Architecture Model},
year = {2026},
url = {https://github.com/sigdelsanjog/gptmed}
}- Installation
- Quick Start
- Model Architecture
- Configuration
- Observability
- Project Structure
- Requirements
- Documentation
- Performance
- Examples
- Contributing
- Citation
- License
- Support
pip install gptmedgit clone https://github.com/sigdelsanjog/gptmed.git
cd gptmed
pip install -e .# For development
pip install gptmed[dev]
# For training with logging integrations
pip install gptmed[training]
# For visualization (loss curves, metrics plots)
pip install gptmed[visualization]
# For Explainable AI features
pip install gptmed[xai]
# All dependencies
pip install gptmed[dev,training,visualization,xai]The easiest way to use GptMed is through the high-level API:
import gptmed
# 1. Create a training configuration
gptmed.create_config('my_config.yaml')
# 2. Edit my_config.yaml with your settings (data paths, model size, etc.)
# 3. Train the model
gptmed.train_from_config('my_config.yaml')
# 4. Generate answers
answer = gptmed.generate(
checkpoint='model/checkpoints/best_model.pt',
tokenizer='tokenizer/my_tokenizer.model',
prompt='What is machine learning?',
max_length=150,
temperature=0.7
)
print(answer)For a complete API testing workflow, see the gptmed-api folder with ready-to-run examples.
from gptmed.inference.generator import TextGenerator
from gptmed.model.architecture import GPTTransformer
from gptmed.model.configs.model_config import get_small_config
# Load model
config = get_small_config()
model = GPTTransformer(config)
# Load your trained checkpoint
# model.load_state_dict(torch.load('path/to/checkpoint.pt'))
# Create generator
generator = TextGenerator(
model=model,
tokenizer_path='path/to/tokenizer.model'
)
# Generate answer
question = "What's your favorite programming language?"
answer = generator.generate(
prompt=question,
max_length=100,
temperature=0.7
)
print(f"Q: {question}")
print(f"A: {answer}")# Generate answers
gptmed-generate --prompt "How do I train a custom model?" --max-length 100
# Train model
gptmed-train --model-size small --num-epochs 10 --batch-size 16from gptmed.training.train import main
from gptmed.configs.train_config import get_default_config
from gptmed.model.configs.model_config import get_small_config
# Configure training
train_config = get_default_config()
train_config.batch_size = 16
train_config.num_epochs = 10
train_config.learning_rate = 3e-4
# Start training
main()The model uses a custom GPT-based transformer architecture:
- Embedding: Token + positional embeddings
- Transformer Blocks: Multi-head self-attention + feed-forward networks
- Parameters: ~10M (small), ~50M (medium)
- Context Length: 512 tokens
- Vocabulary: Custom SentencePiece tokenizer trained on your data
from gptmed.model.configs.model_config import (
get_tiny_config, # ~2M parameters - for testing
get_small_config, # ~10M parameters - recommended
get_medium_config # ~50M parameters - higher quality
)from gptmed.configs.train_config import TrainingConfig
config = TrainingConfig(
batch_size=16,
learning_rate=3e-4,
num_epochs=10,
warmup_steps=100,
grad_clip=1.0
)New in v0.4.0: Built-in training monitoring with Observer Pattern architecture.
- π Loss Curves: Track training/validation loss over time
- π Metrics Tracking: Perplexity, gradient norms, learning rates
- π Callbacks: Console output, JSON logging, early stopping
- π Export: CSV export, matplotlib visualizations
- π Extensible: Add custom observers for integrations (W&B, TensorBoard)
from gptmed.observability import MetricsTracker, ConsoleCallback, EarlyStoppingCallback
# Create observers
tracker = MetricsTracker(output_dir='./metrics')
console = ConsoleCallback(print_every=50)
early_stop = EarlyStoppingCallback(patience=3)
# Use with TrainingService (automatic)
from gptmed.services import TrainingService
service = TrainingService(config_path='config.yaml')
service.train() # Automatically creates MetricsTracker
# Or use with Trainer directly
trainer = Trainer(model, train_loader, config, observers=[tracker, console])
trainer.train()| Observer | Description |
|---|---|
MetricsTracker |
Comprehensive metrics collection with export capabilities |
ConsoleCallback |
Real-time console output with progress bars |
JSONLoggerCallback |
Structured JSON logging for analysis |
EarlyStoppingCallback |
Stop training when validation loss plateaus |
LRSchedulerCallback |
Learning rate scheduling integration |
See XAI.md for future Explainable AI features roadmap.
gptmed/
βββ model/
β βββ architecture/ # GPT transformer implementation
β βββ configs/ # Model configurations
βββ inference/
β βββ generator.py # Text generation
β βββ sampling.py # Sampling strategies
βββ training/
β βββ train.py # Training script
β βββ trainer.py # Training loop
β βββ dataset.py # Data loading
βββ observability/ # Training monitoring & XAI (v0.4.0+)
β βββ base.py # Observer pattern interfaces
β βββ metrics_tracker.py # Loss curves & metrics
β βββ callbacks.py # Console, JSON, early stopping
βββ tokenizer/
β βββ train_tokenizer.py # SentencePiece tokenizer
βββ configs/
β βββ train_config.py # Training configurations
βββ services/
β βββ training_service.py # High-level training orchestration
βββ utils/
βββ checkpoints.py # Model checkpointing
βββ logging.py # Training logging
- Python >= 3.8
- PyTorch >= 2.0.0
- sentencepiece >= 0.1.99
- numpy >= 1.24.0
- tqdm >= 4.65.0
π Complete User Manual - Step-by-step guide for training your own model
- User Manual - Start here! Complete training pipeline guide
- Architecture Guide - Understanding the model architecture
- XAI Roadmap - Explainable AI features & implementation guide
- Deployment Guide - Publishing to PyPI
- Changelog - Version history
| Model Size | Parameters | Training Time | Inference Speed |
|---|---|---|---|
| Tiny | ~2M | 2 hours | ~100 tokens/sec |
| Small | ~10M | 8 hours | ~80 tokens/sec |
| Medium | ~50M | 24 hours | ~50 tokens/sec |
Tested on GTX 1080 8GB
GptMed works with any domain - just train on your own Q&A data:
# Technical Support Bot
question = "How do I reset my WiFi router?"
answer = generator.generate(question, temperature=0.7)
# Educational Assistant
question = "Explain the water cycle in simple terms"
answer = generator.generate(question, temperature=0.6)
# Customer Service
question = "What is your return policy?"
answer = generator.generate(question, temperature=0.5)
# Medical Q&A (example domain)
question = "What are the symptoms of flu?"
answer = generator.generate(question, temperature=0.7)Monitor your training with built-in observability:
from gptmed.observability import MetricsTracker, ConsoleCallback
# Create observers
tracker = MetricsTracker(output_dir='./metrics')
console = ConsoleCallback(print_every=10)
# Train with observability
gptmed.train_from_config(
'my_config.yaml',
observers=[tracker, console]
)
# After training - get the report
report = tracker.get_report()
print(f"Final Loss: {report['final_loss']:.4f}")
print(f"Total Steps: {report['total_steps']}")
# Export metrics
tracker.export_to_csv('training_metrics.csv')
tracker.plot_loss_curves('loss_curves.png') # Requires matplotlibContributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MedQuAD dataset creators
- PyTorch team
- π« User Manual** - Complete step-by-step training guide
- π« Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π§ Email: sigdelsanjog@gmail.com | sanjog.sigdel@ku.edu.np
See CHANGELOG.md for version history.