InferXgate - Blazing fast LLM Gateway for 100+ LLMs

A high-performance LLM Gateway built in Rust with a React dashboard that provides a unified OpenAI-compatible API interface for multiple LLM providers.

Features

Currently Supported Providers

✅ Anthropic (Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, 4.1, 4.0, 3.5, 3)
✅ Google Gemini (Gemini 3 Pro, 2.5 Pro, 2.5 Flash, 2.0 Flash)
✅ OpenAI (GPT-5, GPT-4.1, GPT-4 Turbo)
✅ Azure OpenAI (All Azure-deployed OpenAI models)

Coming Soon

🚧 AWS Bedrock
🚧 Google VertexAI
🚧 Cohere
🚧 HuggingFace
🚧 Replicate
🚧 Groq

Core Features

🚀 OpenAI-Compatible API: Drop-in replacement for OpenAI API
⚡ High Performance: Built with Rust and Tokio for maximum throughput
🔄 Streaming Support: Server-Sent Events for real-time responses
🎯 Smart Routing: Automatic model-to-provider mapping
📊 Analytics Dashboard: Real-time metrics and monitoring
🔒 Security: API key management, rate limiting, IP whitelisting
💾 Caching: Response caching with Redis (optional)
🔁 Retry Logic: Automatic retries with exponential backoff
📝 Comprehensive Logging: Structured logging with tracing

Quick Start

Quick Production Setup (Recommended)

The fastest way to get started using pre-built Docker images:

# Clone the repository
git clone https://github.com/jasmedia/InferXgate.git
cd InferXgate

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys (at minimum, set one LLM provider key)

# Run the quickstart script and select option 4
./quickstart.sh
# Choose: 4) Run with Docker (production - uses pre-built images)

Or manually:

docker-compose -f docker-compose.prod.yml up -d

Services will be available at:

Frontend: http://localhost
Backend API: http://localhost:3000

Development Setup

For development, you'll need:

Rust 1.75+ (install from rustup.rs)
Bun 1.0+ (install from bun.sh)
Docker (for PostgreSQL and Redis)
API keys for providers you want to use

Backend Setup (Development)

Clone the repository:

git clone https://github.com/jasmedia/InferXgate.git
cd InferXgate

Start PostgreSQL and Redis with Docker:

docker run -d --name inferxgate-postgres \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=inferxgate \
  -p 5432:5432 \
  postgres:18-alpine

docker run -d --name inferxgate-redis \
  -p 6379:6379 \
  redis:7-alpine

Set up environment variables:

cd backend
cp .env.example .env
# Edit .env with your API keys
# Ensure DATABASE_URL=postgresql://postgres:postgres@localhost/inferxgate
# Ensure REDIS_URL=redis://localhost:6379

Build and run the Rust backend:

cargo build --release
cargo run --release

The backend will start on http://localhost:3000

Frontend Setup

Install dependencies:

cd frontend
bun install

Start the development server:

bun run dev

The frontend will start on http://localhost:5173

Configuration

Environment Variables

Create a .env file in the backend directory:

# Server Configuration
HOST=0.0.0.0
PORT=3000
LOG_LEVEL=info

# Provider API Keys
ANTHROPIC_API_KEY=your-anthropic-api-key
GEMINI_API_KEY=your-gemini-api-key
OPENAI_API_KEY=your-openai-api-key

# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your-azure-openai-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_RESOURCE_NAME=your-resource-name

COHERE_API_KEY=your-cohere-api-key

# AWS Configuration (for Bedrock)
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1

# Optional: Database for metadata
DATABASE_URL=postgresql://user:password@localhost/inferxgate

# Optional: Redis for caching
REDIS_URL=redis://localhost:6379

API Usage

The gateway provides an OpenAI-compatible API. You can use it with any OpenAI client library.

Using with OpenAI Python SDK

from openai import OpenAI

# Point to your gateway instead of OpenAI
client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-gateway-api-key"  # If auth is enabled
)

# Use any supported model
response = client.chat.completions.create(
    model="claude-opus-4-5-20251101",  # or "gemini-3-pro-preview"
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

Using with curl

curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Streaming Responses

stream = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

API Endpoints

Core Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Create chat completion
`/v1/models`	GET	List available models
`/health`	GET	Health check

Request Format

{
  "model": "claude-opus-4-5-20251101",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 0.9,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "stop": ["\\n\\n"],
  "stream": false,
  "n": 1,
  "user": "user-123"
}

Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "claude-opus-4-5-20251101",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

Development

Project Structure

inferxgate/
├── backend/
│   ├── src/
│   │   ├── main.rs           # Main server entry point
│   │   ├── config.rs         # Configuration management
│   │   ├── error.rs          # Error handling
│   │   └── providers/        # Provider implementations
│   │       ├── mod.rs        # Provider trait
│   │       ├── anthropic.rs  # Anthropic provider
│   │       ├── gemini.rs     # Gemini provider
│   │       ├── openai.rs     # OpenAI provider
│   │       └── azure.rs      # Azure OpenAI provider
│   └── Cargo.toml
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Main React app
│   │   ├── components/       # React components
│   │   └── pages/            # Page components
│   ├── package.json
│   └── vite.config.ts
└── docker-compose.yml

Docker Deployment

Production Deployment (Recommended)

Use pre-built images from Docker Hub:

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys and secrets

# Start with production compose file
docker-compose -f docker-compose.prod.yml up -d

This uses:

inferxgate/backend:latest
inferxgate/frontend:latest

Development with Docker Compose

Build images locally:

# Copy the example env file
cp .env.example .env

# Edit .env with your API keys
# At minimum, set one LLM provider API key (e.g., ANTHROPIC_API_KEY)

# Start all services
docker-compose up -d

Building Images Separately

Backend:

cd backend
docker build -t inferxgate-backend .
docker run -p 3000:3000 --env-file .env inferxgate-backend

Frontend:

cd frontend
docker build -t inferxgate-frontend .
docker run -p 80:80 inferxgate-frontend

Available Make Commands

The project includes a comprehensive Makefile for common tasks (all frontend commands use Bun):

make help           # Show all available commands
make setup          # Initial setup (copy .env.example, install deps)
make install        # Install all dependencies (Rust + Bun)
make dev            # Start development servers (backend + frontend)
make build          # Build both backend and frontend for production
make test           # Run all tests (backend + frontend)
make fmt            # Format code (cargo fmt + bun run format)
make lint           # Lint code (cargo clippy + bun run lint)
make docker-build   # Build Docker images
make docker-up      # Start all services with Docker Compose
make docker-down    # Stop Docker services

Performance Optimization

Recommended Settings

Connection Pooling: The gateway uses connection pooling by default
Response Caching: Enable Redis for caching frequent requests
Rate Limiting: Configure appropriate rate limits per provider
Timeouts: Adjust timeouts based on model response times

Benchmarks

Latency overhead: < 5ms
Throughput: 10,000+ requests/second
Memory usage: ~50MB base + cache

Monitoring

The gateway provides several monitoring endpoints:

/metrics - Prometheus metrics
/health - Health check
Dashboard at http://localhost:5173 - Real-time analytics

Troubleshooting

Common Issues

Rate limit errors: Check provider rate limits and adjust retry settings
Timeout errors: Increase timeout values for larger models
Authentication errors: Verify API keys in .env file
CORS errors: Ensure frontend proxy is configured correctly

Debug Mode

Enable debug logging:

LOG_LEVEL=debug cargo run

Security Best Practices

Use HTTPS in production: Deploy behind a reverse proxy with SSL
Rotate API keys regularly: Set up key rotation policies
Enable authentication: Use the built-in auth middleware
IP whitelisting: Restrict access to known IPs
Rate limiting: Prevent abuse with appropriate limits

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

This project follows the Contributor Covenant Code of Conduct.

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Roadmap

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

Acknowledgments

Built with Axum web framework
UI components from Tailwind CSS

Support

For issues and questions:

Email: support@inferxgate.com
Open an issue on GitHub

Note: This is an active development project. APIs may change between versions.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
backend		backend
benchmark		benchmark
docs		docs
frontend		frontend
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
check_connections.sh		check_connections.sh
compare_performance.sh		compare_performance.sh
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
quickstart.sh		quickstart.sh
test_enhanced_features.sh		test_enhanced_features.sh

Folders and files

Latest commit

History

Repository files navigation

InferXgate - Blazing fast LLM Gateway for 100+ LLMs

Features

Currently Supported Providers

Coming Soon

Core Features

Quick Start

Quick Production Setup (Recommended)

Development Setup

Backend Setup (Development)

Frontend Setup

Configuration

Environment Variables

API Usage

Using with OpenAI Python SDK

Using with curl

Streaming Responses

API Endpoints

Core Endpoints

Request Format

Response Format

Development

Project Structure

Docker Deployment

Production Deployment (Recommended)

Development with Docker Compose

Building Images Separately

Available Make Commands

Performance Optimization

Recommended Settings

Benchmarks

Monitoring

Troubleshooting

Common Issues

Debug Mode

Security Best Practices

Contributing

Development Workflow

Roadmap

License

Acknowledgments

Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages