High-performance gRPC API service for HPC job runtime predictions using ensemble ML models.
This service provides real-time runtime predictions for HPC jobs via a gRPC interface. It supports:
- Single predictions with <10ms latency
- Batch predictions for multiple jobs
- Streaming predictions for continuous workloads
- Confidence intervals for all predictions
- Prometheus metrics for monitoring
- Health checks and model info endpoints
- High Performance: Optimized for M2 Max and multi-core systems
- Ensemble Models: Combines XGBoost, LightGBM, and CatBoost
- Feature Engineering: 44+ engineered features for accurate predictions
- Production Ready: Health checks, metrics, logging, error handling
- Scalable: Supports batch and streaming predictions
- Observable: Prometheus metrics and structured logging
├── src/
│ ├── proto/ # Protocol buffer definitions
│ ├── service/ # gRPC server and predictor logic
│ ├── features/ # Feature engineering (shared with training)
│ ├── model/ # Model loading utilities
│ └── utils/ # Configuration and logging
├── scripts/ # Helper scripts
├── configs/ # Configuration files
├── models/ # Trained model artifacts
└── tests/ # Test suite
- Python 3.11+
- Trained model artifacts in
models/production/ - Docker (optional)
- Install dependencies:
pip install -r requirements.txt- Generate proto files:
./scripts/generate_proto.sh-
Place trained model in
models/production/ -
Start the server:
python src/service/server.py- Test the service:
python scripts/test_client.py- Build and run with Docker Compose:
docker-compose up -d- With monitoring stack:
docker-compose --profile monitoring up -d- M2 Max optimized deployment:
# Uses docker-compose.m2max.yml with resource limits:
# - 4 CPU cores
# - 8GB RAM
make start-m2maxrequest = PredictRequest(
processors_req=32,
nodes_req=4,
mem_req=128000,
time_req=3600,
partition="normal",
qos="normal"
)
response = stub.Predict(request)
# Returns: predicted_runtime, confidence_lower, confidence_upperbatch_request = BatchPredictRequest(requests=[req1, req2, req3])
batch_response = stub.BatchPredict(batch_request)
# Returns: list of predictionsdef request_generator():
for job in job_queue:
yield create_predict_request(job)
for response in stub.StreamPredict(request_generator()):
process_prediction(response)Edit configs/config.toml:
[server]
port = 50051
max_workers = 10
metrics_port = 8181
[model]
path = "models/production"
[features.optimization]
chunk_size = 100000
enable_caching = true
n_jobs = -1Available at http://localhost:8181/metrics:
rt_predictor_requests_total: Total requests by methodrt_predictor_request_duration_seconds: Request latencyrt_predictor_prediction_duration_seconds: Model inference timert_predictor_active_connections: Current active connections
Access at http://localhost:3000 (admin/admin) when using monitoring profile.
- Single prediction latency: <10ms (P99)
- Batch prediction throughput: 10,000+ predictions/second
- Streaming rate: 1,000+ predictions/second
- Model accuracy: MAE ~5,878 seconds (~1.6 hours)
- Check model files exist in
models/production/ - Verify proto files are generated
- Check port 50051 is available
- Ensure ensemble_config.json has
modelskey (notmodel_names)
- Ensure feature engineering matches training
- Check model compatibility
- Review logs for missing features
- Enable model caching
- Increase worker threads
- Use batch predictions for multiple jobs
-
KeyError: 'model_names'
- The ensemble config expects
modelskey, notmodel_names - Ensure your trained models use the correct config format
- The ensemble config expects
-
Proto compilation errors
- Message names have been standardized:
- Use
PredictBatchRequest(notBatchPredictRequest) - Use
request.jobs(notrequest.requests) - Use
PredictStream(notStreamPredict) - Use
GetModelInfoRequest(notModelInfoRequest)
- Use
- Message names have been standardized:
pytest tests/# Edit src/proto/rt_predictor.proto
./scripts/generate_proto.sh- Update feature engineering in training service
- Retrain model with new features
- Deploy new model to API service
- No code changes needed in API!
See LICENSE file in root directory.