RT Predictor API Service

High-performance gRPC API service for HPC job runtime predictions using ensemble ML models.

Overview

This service provides real-time runtime predictions for HPC jobs via a gRPC interface. It supports:

Single predictions with <10ms latency
Batch predictions for multiple jobs
Streaming predictions for continuous workloads
Confidence intervals for all predictions
Prometheus metrics for monitoring
Health checks and model info endpoints

Features

High Performance: Optimized for M2 Max and multi-core systems
Ensemble Models: Combines XGBoost, LightGBM, and CatBoost
Feature Engineering: 44+ engineered features for accurate predictions
Production Ready: Health checks, metrics, logging, error handling
Scalable: Supports batch and streaming predictions
Observable: Prometheus metrics and structured logging

Architecture

├── src/
│   ├── proto/          # Protocol buffer definitions
│   ├── service/        # gRPC server and predictor logic
│   ├── features/       # Feature engineering (shared with training)
│   ├── model/          # Model loading utilities
│   └── utils/          # Configuration and logging
├── scripts/            # Helper scripts
├── configs/            # Configuration files
├── models/             # Trained model artifacts
└── tests/              # Test suite

Quick Start

Prerequisites

Python 3.11+
Trained model artifacts in models/production/
Docker (optional)

Local Development

Install dependencies:

pip install -r requirements.txt

Generate proto files:

./scripts/generate_proto.sh

Place trained model in models/production/
Start the server:

python src/service/server.py

Test the service:

python scripts/test_client.py

Docker Deployment

Build and run with Docker Compose:

docker-compose up -d

With monitoring stack:

docker-compose --profile monitoring up -d

M2 Max optimized deployment:

# Uses docker-compose.m2max.yml with resource limits:
# - 4 CPU cores
# - 8GB RAM
make start-m2max

API Reference

Single Prediction

request = PredictRequest(
    processors_req=32,
    nodes_req=4,
    mem_req=128000,
    time_req=3600,
    partition="normal",
    qos="normal"
)
response = stub.Predict(request)
# Returns: predicted_runtime, confidence_lower, confidence_upper

Batch Prediction

batch_request = BatchPredictRequest(requests=[req1, req2, req3])
batch_response = stub.BatchPredict(batch_request)
# Returns: list of predictions

Streaming Prediction

def request_generator():
    for job in job_queue:
        yield create_predict_request(job)

for response in stub.StreamPredict(request_generator()):
    process_prediction(response)

Configuration

Edit configs/config.toml:

[server]
port = 50051
max_workers = 10
metrics_port = 8181

[model]
path = "models/production"

[features.optimization]
chunk_size = 100000
enable_caching = true
n_jobs = -1

Monitoring

Prometheus Metrics

Available at http://localhost:8181/metrics:

rt_predictor_requests_total: Total requests by method
rt_predictor_request_duration_seconds: Request latency
rt_predictor_prediction_duration_seconds: Model inference time
rt_predictor_active_connections: Current active connections

Grafana Dashboard

Access at http://localhost:3000 (admin/admin) when using monitoring profile.

Performance

Single prediction latency: <10ms (P99)
Batch prediction throughput: 10,000+ predictions/second
Streaming rate: 1,000+ predictions/second
Model accuracy: MAE ~5,878 seconds (~1.6 hours)

Troubleshooting

Service won't start

Check model files exist in models/production/
Verify proto files are generated
Check port 50051 is available
Ensure ensemble_config.json has models key (not model_names)

Predictions failing

Ensure feature engineering matches training
Check model compatibility
Review logs for missing features

High latency

Enable model caching
Increase worker threads
Use batch predictions for multiple jobs

Common Errors and Fixes

KeyError: 'model_names'
- The ensemble config expects models key, not model_names
- Ensure your trained models use the correct config format
Proto compilation errors
- Message names have been standardized:
  - Use PredictBatchRequest (not BatchPredictRequest)
  - Use request.jobs (not request.requests)
  - Use PredictStream (not StreamPredict)
  - Use GetModelInfoRequest (not ModelInfoRequest)

Development

Running Tests

pytest tests/

Updating Proto Files

# Edit src/proto/rt_predictor.proto
./scripts/generate_proto.sh

Adding New Features

Update feature engineering in training service
Retrain model with new features
Deploy new model to API service
No code changes needed in API!

License

See LICENSE file in root directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RT Predictor API Service

Overview

Features

Architecture

Quick Start

Prerequisites

Local Development

Docker Deployment

API Reference

Single Prediction

Batch Prediction

Streaming Prediction

Configuration

Monitoring

Prometheus Metrics

Grafana Dashboard

Performance

Troubleshooting

Service won't start

Predictions failing

High latency

Common Errors and Fixes

Development

Running Tests

Updating Proto Files

Adding New Features

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

RT Predictor API Service

Overview

Features

Architecture

Quick Start

Prerequisites

Local Development

Docker Deployment

API Reference

Single Prediction

Batch Prediction

Streaming Prediction

Configuration

Monitoring

Prometheus Metrics

Grafana Dashboard

Performance

Troubleshooting

Service won't start

Predictions failing

High latency

Common Errors and Fixes

Development

Running Tests

Updating Proto Files

Adding New Features

License