Vivek Gangasani vivekgangasani

Vivek Gangasani

Principal GenAI Architect | Technical Leader | Builder

Santa Clara, CA | LinkedIn | vivekrg13@gmail.com

About

Technical leader with 10+ years building and deploying AI/ML solutions at scale. Currently leading a team of AI/ML architects at AWS, serving as embedded technical advisor to strategic customers building production GenAI systems

152,000+ total readers across 27 technical publications on the official AWS Machine Learning Blog.

Technical Publications

🚀 LLM Serving, Inference Optimization & New Product Launches

Title	Date
Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod	2025
Introducing Bidirectional Streaming for Real-Time Inference on Amazon SageMaker AI	2025
Enhance Deployment Guardrails with Inference Component Rolling Updates for Amazon SageMaker AI Inference	Mar 25, 2025
Amazon SageMaker HyperPod Launches Model Deployments to Accelerate the Generative AI Model Development Lifecycle	2025
Supercharge Your LLM Performance with Amazon SageMaker Large Model Inference Container v15	Apr 22, 2025
Introducing Fast Model Loader in SageMaker Inference: Accelerate Autoscaling for Your LLMs (Part 2)	Dec 2, 2024
Amazon SageMaker Inference Now Supports G6e Instances	Nov 22, 2024
Boost Inference Performance for LLMs with New Amazon SageMaker Containers	Nov 27, 2023

🤖 Agentic AI & GenAI Architectures

Title	Date
Build Agentic Workflows with OpenAI GPT & OSS on Amazon SageMaker AI and Amazon Bedrock AgentCore	2025
Use Amazon Bedrock Tooling with Amazon SageMaker JumpStart Models	Dec 4, 2024

📚 RAG & Retrieval Architectures

Title	Date
Optimize RAG in Production Environments Using Amazon SageMaker JumpStart and Amazon OpenSearch Service	2025
Deploy RAG Applications on Amazon SageMaker JumpStart Using FAISS	Dec 5, 2024
RAG Architecture with Voyage AI Embedding Models on Amazon SageMaker JumpStart and Anthropic Claude 3 Models	May 14, 2024

🧠 Model Deployment, Fine-Tuning & Benchmarking

Title	Date
Impel Enhances Automotive Dealership Customer Experience with Fine-Tuned LLMs on Amazon SageMaker	Jun 4, 2025
Deploy DeepSeek R1 Distilled Models on Amazon SageMaker Using a Large Model Inference Container	Mar 11, 2025
DeepSeek R1 Model Now Available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart	Jan 30, 2025
Efficient and Cost-Effective Multi-Tenant LoRA Serving with Amazon SageMaker	May 21, 2024
Deploy Foundation Models with Amazon SageMaker, Iterate and Monitor with TruEra	Dec 22, 2023
Fine-Tune and Deploy Mistral 7B with Amazon SageMaker JumpStart	Nov 14, 2023
Zero-Shot Prompting for the Flan-T5 Foundation Model in Amazon SageMaker JumpStart	Apr 3, 2023
Identify Key Insights from Text Documents Through Fine-Tuning and HPO with Amazon SageMaker JumpStart	Nov 21, 2022

⚡ Hardware Optimization & Cost-Efficient Inference

Title	Date
Maximize Stable Diffusion Performance and Lower Inference Costs with AWS Inferentia2	Jul 26, 2023
Host ML Models on Amazon SageMaker Using Triton: CV Model with PyTorch Backend	May 31, 2023
Achieve High Performance with Lowest Cost for Generative AI Inference Using AWS Inferentia2 and AWS Trainium on Amazon SageMaker	May 4, 2023

🔧 MLOps & Platform Engineering

Title	Date
Governing the ML Lifecycle at Scale: Centralized Observability with Amazon SageMaker and Amazon CloudWatch	Oct 29, 2024
Implementing MLOps Practices with Amazon SageMaker JumpStart Pre-Trained Models	Feb 15, 2023
Isima.io Optimizes Price-Performance for OLAP Workloads Using Amazon EBS	Feb 8, 2023

Conference Speaking (19+ Presentations)

Event	Topic Area	Year
AWS re:Invent	SageMaker Inference & GenAI	2023, 2024
NVIDIA GTC	LLM Serving Optimization	2024
Intel MLCon	ML Inference Performance	2024
Arize Observe	Model Monitoring & Deployment	2024
Retrivex	RAG Architectures	2024

Areas of Expertise

LLM Serving & Optimization — vLLM, KV caching, speculative decoding, disaggregated inference, intelligent routing
Agentic AI — LangChain, multi-agent orchestration, tool use, Bedrock AgentCore
RAG Architectures — OpenSearch, FAISS, Voyage AI embeddings, production retrieval pipelines
Infrastructure — Kubernetes, GPU clusters (H100/H200/A100), Inferentia/Trainium, distributed systems
MLOps — CI/CD for ML, model monitoring, deployment guardrails, auto-scaling
Product & Strategy — Zero-to-one product launches, customer-driven roadmap development, technical GTM

Contact

📧 vivekrg13@gmail.com

🔗 LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly