Skip to content
View sanketny8's full-sized avatar

Block or report sanketny8

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sanketny8/README.md
Typing SVG

🚀 Designing scalable distributed systems & production-grade AI platforms

Twitter LinkedIn Portfolio



Profile Views GitHub Followers GitHub Stars


🚀 About Me

I'm a Senior AI Engineer & Backend Architect specializing in building high-scale distributed systems and production-grade GenAI platforms. My expertise lies in bridging the gap between cutting-edge AI research and robust, scalable backend engineering.

🔭 What I'm Currently Focused On

  • 🏗️ Designing cloud-native, event-driven backends for large-scale LLM and generative AI workloads.
  • ⚡ Building low-latency, fault-tolerant distributed systems with smart batching, async I/O, and backpressure-aware routing.
  • 📦 Orchestrating GPU/CPU workloads on Kubernetes with autoscaling, bin-packing, and workload-aware scheduling.
  • 🤖 Developing agentic AI backends for multi-agent orchestration, tool use, and long-running workflows with reliable state.
  • 🧠 Implementing LLM serving infrastructure (streaming APIs, KV cache reuse, vLLM/TensorRT-LLM, quantization) for high throughput.
  • 🎯 Applying system design patterns for AI to production inference stacks.
  • 🔍 Building end-to-end observability for latency, error budgets, drift, and GPU utilization (metrics, tracing, structured logs, SLOs).
  • 💰 Engineering cost-efficient GPU infrastructure with autoscaling, right-sizing, spot capacity, and usage-based metering.
  • 🔐 Hardening AI systems for security and abuse (authn/z, rate limits, prompt injection defenses, secure data paths).
  • 🚀 Automating CI/CD and infrastructure-as-code for AI services using containers, GitOps, and Terraform-style workflows.

💡 My Philosophy

I believe in building systems that scale, sharing knowledge openly, and treating infrastructure as code. Every system I design is built for resilience, observability, and performance.

📫 Want to collaborate?

Open to: System Design discussions • Backend architecture • Open source collaboration • Speaking opportunities • Code reviews


💡 Core Competencies

🏗️ Distributed Systems & Backend Architecture

System Design Scalability High Perf Resilience

🤖 LLMs & GenAI Expertise

RAG Fine-tuning Agents VectorDB


🛠️ Technical Expertise

🧠 AI & GenAI Stack


Frameworks: Transformers • LlamaIndex • LangGraph
Inference: vLLM • TGI • ONNX • Triton • TensorRT
Training: PEFT • DeepSpeed • FSDP • bitsandbytes
Models: GPT-4 • Claude 3 • Llama 3 • Mistral

⚙️ Backend & Distributed Systems


Core: Microservices • Event-Driven • CQRS • DDD
Messaging: NATS • Redis Streams • RabbitMQ
Protocol: gRPC • GraphQL • REST • WebSockets
Observability: OpenTelemetry • Prometheus • Jaeger

☁️ Cloud & MLOps


Serving: KServe • Ray • vLLM Operator • TorchServe
GPU Ops: NVIDIA Operator • DCGM • MIG • MPS
GitOps: ArgoCD • Flux • Helm • GitHub Actions
Clouds: AWS • GCP • Azure • Lambda Labs

🗄️ Data Engineering


Vector DBs: Pinecone • Milvus • Chroma • pgvector
Databases: MongoDB • Elasticsearch • DynamoDB
Processing: Spark • Airflow • dbt • Kafka Streams
Storage: S3 • MinIO • Delta Lake • Iceberg

📊 GitHub Statistics & Activity

Activity Graph

🎯 Featured Projects

Stars Forks

Production-grade RAG platform with advanced chunking, hybrid search, and multi-LLM support.

🎯 Key Features:

  • ⚡ Hybrid retrieval with cross-encoder reranking
  • 🔀 Multi-LLM router (OpenAI, Anthropic, local)
  • 📊 Comprehensive evaluation (RAGAS)
  • 💰 Cost optimization & semantic caching

🛠️ Tech: Python • FastAPI • LangChain • Weaviate • vLLM

Stars Forks

Multi-agent system with agentic AI patterns, tool use, planning & orchestration.

🎯 Key Features:

  • 🧠 ReAct pattern with self-reflection
  • 🔧 Dynamic tool registry
  • 💾 Multi-tier memory system
  • 🛡️ Sandboxed execution

🛠️ Tech: Python • LangGraph • GPT-4 • Claude • Weaviate

Stars

End-to-end platform for fine-tuning LLMs with experiment tracking & deployment.

🎯 Key Features:

  • 🔧 LoRA/QLoRA fine-tuning
  • 📊 Comprehensive evaluation
  • 📈 MLflow + W&B tracking
  • 🚀 Auto-deployment to vLLM

🛠️ Tech: PyTorch • Transformers • PEFT • MLflow • vLLM

Stars

High-performance Go backend for LLM routing, caching & observability.

🎯 Key Features:

  • 🔀 Multi-provider routing
  • ⚡ Semantic caching
  • 🎫 Token-based rate limiting
  • 📊 Sub-10ms p99 latency

🛠️ Tech: Go • gRPC • Redis • PostgreSQL • OpenTelemetry

Stars

Production K8s infrastructure optimized for GPU workloads & model serving.

🎯 Key Features:

  • 🎮 GPU node pools (T4, A10G, A100)
  • 🚀 KServe + vLLM operator
  • 📊 DCGM monitoring
  • 🔄 GitOps with ArgoCD

🛠️ Tech: Terraform • Kubernetes • Helm • ArgoCD • KServe

📚 More Projects

🔗 Explore all my repositories:

GitHub Repos

⭐ Additional Projects:

  • Vector DB Benchmarks
  • LLM Evaluation Framework
  • Prompt Engineering Library
  • AI Cost Optimizer

📝 Latest Blog Posts


⚡ Recent GitHub Activity


📈 Learning Journey & Current Focus

🎯 Current Focus Areas

Building Optimizing Exploring Researching


🔬 Research Implementation

Graph RAG HyDE Reflexion Constitutional AI

📚 Continuous Learning

  • 🔬 Experimenting with cutting-edge techniques (Graph RAG, Corrective RAG, Constitutional AI)
  • 📝 Writing about lessons learned building GenAI systems at scale
  • 🛠️ Contributing to open-source AI projects (vLLM, LangChain, Transformers)
  • 🎓 Implementing recent AI research papers (RAPTOR, HyDE, Reflexion)
  • 💬 Sharing insights on prompt engineering, RAG optimization, and LLMOps

📫 Let's Connect!

💬 I'm always open to interesting conversations and collaborations!

Collaborations Discussions Community


🌐 Find Me On

Twitter LinkedIn Portfolio



💝 Support My Work

Star repositories you find useful • 📢 Share projects with your network • 🤝 Contribute to open-source • 💬 Connect for collaborations


"Building the future of AI, one production system at a time"

Made with Love Built for Production

⭐️ From sanketny8 with 💜

Pinned Loading

  1. llm-engineering-fundamentals llm-engineering-fundamentals Public

    Production-ready transformer implementation from scratch: 10 complete projects covering tokenization, positional embeddings, attention, transformers, and more. 148 passing tests, modern techniques …

    Python