Machine learning service for training and inference using scikit-learn models.
# From the project root directory
docker compose -f 'compose/compose.dev.yaml' up -d --build
# Access Swagger UI
open http://localhost:8001/docs # Development modeNo setup needed! The service starts with pre-trained models ready for inference.
- 5 ML Algorithms: Random Forest, SVM, Decision Tree, KNN, Logistic Regression
- Model Training: Train models with configurable feature selection
- Model Persistence: Automatic saving and loading of trained models
- RESTful API: Full Swagger/OpenAPI documentation
Access the Swagger UI at: http://localhost:8001/docs (development mode)
GET /health- Service health checkGET /models- List all trained models with metadataPOST /models/train- Train a new modelPOST /models/{id}/predict- Get prediction from specific modelDELETE /models/{id}- Delete a trained model
| Algorithm | ID | Configurable Parameters |
|---|---|---|
| Random Forest | rf |
n_estimators |
| Support Vector Machine | svm |
- |
| Decision Tree | dt |
- |
| K-Nearest Neighbors | knn |
n_neighbors |
| Logistic Regression | lr |
- |
Features from the Titanic dataset (select which ones to use during training):
pclass- Passenger class (1, 2, 3)sex- Gender (male/female)age- Age in yearsfare- Ticket fareembarked- Port of embarkationtitle- Extracted from name (Mr, Mrs, etc.)is_alone- Traveling alone flagage_class- Age Γ Class interaction
- Go to http://localhost:8001/docs
- Try
/modelsto see pre-loaded models - Test prediction with
/models/{model_id}/predict:{ "pclass": 1, "sex": "female", "age": 30, "fare": 100, "travelled_alone": false, "embarked": "cherbourg", "title": "mrs" } - Train a custom model with
/models/train
cd model
# Install dependencies (if not already done)
uv sync --extra dev
# Run tests
uv run pytest
# Linting and formatting check
uv run ruff check
uv run ruff format --check
# Auto-fix formatting
uv run ruff formatmodel/
βββ main.py # FastAPI application
βββ models_router.py # Model management endpoints
βββ schemas.py # Pydantic data models
βββ train.py # Training script
βββ utils/
β βββ data.py # Data preprocessing
β βββ models.py # Model loading/saving
β βββ model_factory.py # Algorithm factory
βββ data/ # Included dataset
β βββ train.csv
β βββ test.csv
β βββ gender_submission.csv
βββ tests/ # Test suite
- Go to http://localhost:8001/docs
- Expand
POST /models/train - Click "Try it out"
- Use this example request:
{ "algo": { "name": "rf", "n_estimators": 150 }, "features": ["pclass", "sex", "age", "fare"], "random_state": 42 } - Click "Execute"
{
"id": "trained-abc123",
"params": { ... },
"info": {
"accuracy": 0.85
}
}- Get model ID from
/modelsendpoint - Use
POST /models/{model_id}/predict - Provide passenger data
- Receive survival prediction with probability
{
"pclass": 3,
"sex": "male",
"age": 25,
"fare": 15.5,
"travelled_alone": true,
"embarked": "southampton",
"title": "mr"
}{
"survived": false,
"probability": 0.78
}- Models automatically saved to
/data/models/ - Persisted across container restarts
- Each model includes:
model.pkl- Serialized scikit-learn modelparams.json- Training parametersinfo.json- Model metadata and accuracy
On startup, the service loads:
rf- Random Forestsvm- Support Vector Machineknn- K-Nearest Neighborslr- Logistic Regression
The service is production-ready when deployed via:
docker compose -f compose/compose.prod-local.yaml up- Check model ID with
GET /models - Verify model files in container:
docker compose exec model ls /data/models
- Models are loaded at container startup
- Consider model complexity and input data size
- Check feature names match schema
- Verify algorithm parameters are valid
- Review logs:
docker compose logs model
- The
random_stateparameter only affects model training for reproducibility - It does not affect predictions