A high-performance LLM Gateway built in Rust with a React dashboard that provides a unified OpenAI-compatible API interface for multiple LLM providers.
- ✅ Anthropic (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, 4.1, 4.0, 3.5, 3)
- ✅ Google Gemini (Gemini 3 Pro, 2.5 Pro, 2.5 Flash, 2.0 Flash)
- ✅ OpenAI (GPT-5, GPT-4.1, GPT-4 Turbo)
- ✅ Azure OpenAI (All Azure-deployed OpenAI models)
- 🚧 AWS Bedrock
- 🚧 Google VertexAI
- 🚧 Cohere
- 🚧 HuggingFace
- 🚧 Replicate
- 🚧 Groq
- 🚀 OpenAI-Compatible API: Drop-in replacement for OpenAI API
- ⚡ High Performance: Built with Rust and Tokio for maximum throughput
- 🔄 Streaming Support: Server-Sent Events for real-time responses
- 🎯 Smart Routing: Automatic model-to-provider mapping
- 📊 Analytics Dashboard: Real-time metrics and monitoring
- 🔒 Security: API key management, rate limiting, IP whitelisting
- 💾 Caching: Response caching with Redis (optional)
- 🔁 Retry Logic: Automatic retries with exponential backoff
- 📝 Comprehensive Logging: Structured logging with tracing
The fastest way to get started using pre-built Docker images:
# Clone the repository
git clone https://github.com/jasmedia/InferXgate.git
cd InferXgate
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys (at minimum, set one LLM provider key)
# Run the quickstart script and select option 4
./quickstart.sh
# Choose: 4) Run with Docker (production - uses pre-built images)Or manually:
docker-compose -f docker-compose.prod.yml up -dServices will be available at:
- Frontend: http://localhost
- Backend API: http://localhost:3000
For development, you'll need:
- Rust 1.75+ (install from rustup.rs)
- Bun 1.0+ (install from bun.sh)
- Docker (for PostgreSQL and Redis)
- API keys for providers you want to use
- Clone the repository:
git clone https://github.com/jasmedia/InferXgate.git
cd InferXgate- Start PostgreSQL and Redis with Docker:
docker run -d --name inferxgate-postgres \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=inferxgate \
-p 5432:5432 \
postgres:18-alpine
docker run -d --name inferxgate-redis \
-p 6379:6379 \
redis:7-alpine- Set up environment variables:
cd backend
cp .env.example .env
# Edit .env with your API keys
# Ensure DATABASE_URL=postgresql://postgres:postgres@localhost/inferxgate
# Ensure REDIS_URL=redis://localhost:6379- Build and run the Rust backend:
cargo build --release
cargo run --releaseThe backend will start on http://localhost:3000
- Install dependencies:
cd frontend
bun install- Start the development server:
bun run devThe frontend will start on http://localhost:5173
Create a .env file in the backend directory:
# Server Configuration
HOST=0.0.0.0
PORT=3000
LOG_LEVEL=info
# Provider API Keys
ANTHROPIC_API_KEY=your-anthropic-api-key
GEMINI_API_KEY=your-gemini-api-key
OPENAI_API_KEY=your-openai-api-key
# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your-azure-openai-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_RESOURCE_NAME=your-resource-name
COHERE_API_KEY=your-cohere-api-key
# AWS Configuration (for Bedrock)
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
# Optional: Database for metadata
DATABASE_URL=postgresql://user:password@localhost/inferxgate
# Optional: Redis for caching
REDIS_URL=redis://localhost:6379The gateway provides an OpenAI-compatible API. You can use it with any OpenAI client library.
from openai import OpenAI
# Point to your gateway instead of OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-gateway-api-key" # If auth is enabled
)
# Use any supported model
response = client.chat.completions.create(
model="claude-opus-4-5-20251101", # or "gemini-3-pro-preview"
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000
}'stream = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Create chat completion |
/v1/models |
GET | List available models |
/health |
GET | Health check |
{
"model": "claude-opus-4-5-20251101",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
],
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9,
"frequency_penalty": 0,
"presence_penalty": 0,
"stop": ["\\n\\n"],
"stream": false,
"n": 1,
"user": "user-123"
}{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699000000,
"model": "claude-opus-4-5-20251101",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}inferxgate/
├── backend/
│ ├── src/
│ │ ├── main.rs # Main server entry point
│ │ ├── config.rs # Configuration management
│ │ ├── error.rs # Error handling
│ │ └── providers/ # Provider implementations
│ │ ├── mod.rs # Provider trait
│ │ ├── anthropic.rs # Anthropic provider
│ │ ├── gemini.rs # Gemini provider
│ │ ├── openai.rs # OpenAI provider
│ │ └── azure.rs # Azure OpenAI provider
│ └── Cargo.toml
├── frontend/
│ ├── src/
│ │ ├── App.tsx # Main React app
│ │ ├── components/ # React components
│ │ └── pages/ # Page components
│ ├── package.json
│ └── vite.config.ts
└── docker-compose.yml
Use pre-built images from Docker Hub:
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys and secrets
# Start with production compose file
docker-compose -f docker-compose.prod.yml up -dThis uses:
inferxgate/backend:latestinferxgate/frontend:latest
Build images locally:
# Copy the example env file
cp .env.example .env
# Edit .env with your API keys
# At minimum, set one LLM provider API key (e.g., ANTHROPIC_API_KEY)
# Start all services
docker-compose up -dBackend:
cd backend
docker build -t inferxgate-backend .
docker run -p 3000:3000 --env-file .env inferxgate-backendFrontend:
cd frontend
docker build -t inferxgate-frontend .
docker run -p 80:80 inferxgate-frontendThe project includes a comprehensive Makefile for common tasks (all frontend commands use Bun):
make help # Show all available commands
make setup # Initial setup (copy .env.example, install deps)
make install # Install all dependencies (Rust + Bun)
make dev # Start development servers (backend + frontend)
make build # Build both backend and frontend for production
make test # Run all tests (backend + frontend)
make fmt # Format code (cargo fmt + bun run format)
make lint # Lint code (cargo clippy + bun run lint)
make docker-build # Build Docker images
make docker-up # Start all services with Docker Compose
make docker-down # Stop Docker services- Connection Pooling: The gateway uses connection pooling by default
- Response Caching: Enable Redis for caching frequent requests
- Rate Limiting: Configure appropriate rate limits per provider
- Timeouts: Adjust timeouts based on model response times
- Latency overhead: < 5ms
- Throughput: 10,000+ requests/second
- Memory usage: ~50MB base + cache
The gateway provides several monitoring endpoints:
/metrics- Prometheus metrics/health- Health check- Dashboard at
http://localhost:5173- Real-time analytics
- Rate limit errors: Check provider rate limits and adjust retry settings
- Timeout errors: Increase timeout values for larger models
- Authentication errors: Verify API keys in
.envfile - CORS errors: Ensure frontend proxy is configured correctly
Enable debug logging:
LOG_LEVEL=debug cargo run- Use HTTPS in production: Deploy behind a reverse proxy with SSL
- Rotate API keys regularly: Set up key rotation policies
- Enable authentication: Use the built-in auth middleware
- IP whitelisting: Restrict access to known IPs
- Rate limiting: Prevent abuse with appropriate limits
Contributions are welcome! Please read our Contributing Guide for details.
This project follows the Contributor Covenant Code of Conduct.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Core gateway functionality
- Anthropic provider
- Google Gemini provider
- OpenAI provider
- Azure OpenAI provider
- React dashboard
- Streaming support
- AWS Bedrock provider
- Redis caching
- PostgreSQL for metadata
- Prometheus metrics
- Advanced routing strategies
- Load balancing
- A/B testing support
- Cost tracking
- Semantic caching
- Request queuing
- WebSocket support
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.
- Built with Axum web framework
- UI components from Tailwind CSS
For issues and questions:
- Email: support@inferxgate.com
- Open an issue on GitHub
Note: This is an active development project. APIs may change between versions.
