I'm a Senior AI Engineer & Backend Architect specializing in building high-scale distributed systems and production-grade GenAI platforms. My expertise lies in bridging the gap between cutting-edge AI research and robust, scalable backend engineering.
- 🏗️ Designing cloud-native, event-driven backends for large-scale LLM and generative AI workloads.
- ⚡ Building low-latency, fault-tolerant distributed systems with smart batching, async I/O, and backpressure-aware routing.
- 📦 Orchestrating GPU/CPU workloads on Kubernetes with autoscaling, bin-packing, and workload-aware scheduling.
- 🤖 Developing agentic AI backends for multi-agent orchestration, tool use, and long-running workflows with reliable state.
- 🧠 Implementing LLM serving infrastructure (streaming APIs, KV cache reuse, vLLM/TensorRT-LLM, quantization) for high throughput.
- 🎯 Applying system design patterns for AI to production inference stacks.
- 🔍 Building end-to-end observability for latency, error budgets, drift, and GPU utilization (metrics, tracing, structured logs, SLOs).
- 💰 Engineering cost-efficient GPU infrastructure with autoscaling, right-sizing, spot capacity, and usage-based metering.
- 🔐 Hardening AI systems for security and abuse (authn/z, rate limits, prompt injection defenses, secure data paths).
- 🚀 Automating CI/CD and infrastructure-as-code for AI services using containers, GitOps, and Terraform-style workflows.
I believe in building systems that scale, sharing knowledge openly, and treating infrastructure as code. Every system I design is built for resilience, observability, and performance.
📫 Want to collaborate?
Open to: System Design discussions • Backend architecture • Open source collaboration • Speaking opportunities • Code reviews
|
Frameworks: Transformers • LlamaIndex • LangGraph Inference: vLLM • TGI • ONNX • Triton • TensorRT Training: PEFT • DeepSpeed • FSDP • bitsandbytes Models: GPT-4 • Claude 3 • Llama 3 • Mistral |
Core: Microservices • Event-Driven • CQRS • DDD Messaging: NATS • Redis Streams • RabbitMQ Protocol: gRPC • GraphQL • REST • WebSockets Observability: OpenTelemetry • Prometheus • Jaeger |
|
Serving: KServe • Ray • vLLM Operator • TorchServe GPU Ops: NVIDIA Operator • DCGM • MIG • MPS GitOps: ArgoCD • Flux • Helm • GitHub Actions Clouds: AWS • GCP • Azure • Lambda Labs |
Vector DBs: Pinecone • Milvus • Chroma • pgvector Databases: MongoDB • Elasticsearch • DynamoDB Processing: Spark • Airflow • dbt • Kafka Streams Storage: S3 • MinIO • Delta Lake • Iceberg |
|
Production-grade RAG platform with advanced chunking, hybrid search, and multi-LLM support. 🎯 Key Features:
🛠️ Tech: Python • FastAPI • LangChain • Weaviate • vLLM |
Multi-agent system with agentic AI patterns, tool use, planning & orchestration. 🎯 Key Features:
🛠️ Tech: Python • LangGraph • GPT-4 • Claude • Weaviate |
|
End-to-end platform for fine-tuning LLMs with experiment tracking & deployment. 🎯 Key Features:
🛠️ Tech: PyTorch • Transformers • PEFT • MLflow • vLLM |
High-performance Go backend for LLM routing, caching & observability. 🎯 Key Features:
🛠️ Tech: Go • gRPC • Redis • PostgreSQL • OpenTelemetry |
|
Production K8s infrastructure optimized for GPU workloads & model serving. 🎯 Key Features:
🛠️ Tech: Terraform • Kubernetes • Helm • ArgoCD • KServe |
🔗 Explore all my repositories: ⭐ Additional Projects:
|
- 🔬 Experimenting with cutting-edge techniques (Graph RAG, Corrective RAG, Constitutional AI)
- 📝 Writing about lessons learned building GenAI systems at scale
- 🛠️ Contributing to open-source AI projects (vLLM, LangChain, Transformers)
- 🎓 Implementing recent AI research papers (RAPTOR, HyDE, Reflexion)
- 💬 Sharing insights on prompt engineering, RAG optimization, and LLMOps
⭐ Star repositories you find useful • 📢 Share projects with your network • 🤝 Contribute to open-source • 💬 Connect for collaborations