Skip to content
View vivekgangasani's full-sized avatar

Block or report vivekgangasani

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vivekgangasani/README.md

Vivek Gangasani

Principal GenAI Architect | Technical Leader | Builder

Santa Clara, CA | LinkedIn | vivekrg13@gmail.com


About

Technical leader with 10+ years building and deploying AI/ML solutions at scale. Currently leading a team of AI/ML architects at AWS, serving as embedded technical advisor to strategic customers building production GenAI systems

152,000+ total readers across 27 technical publications on the official AWS Machine Learning Blog.


Technical Publications

πŸš€ LLM Serving, Inference Optimization & New Product Launches

Title Date
Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod 2025
Introducing Bidirectional Streaming for Real-Time Inference on Amazon SageMaker AI 2025
Enhance Deployment Guardrails with Inference Component Rolling Updates for Amazon SageMaker AI Inference Mar 25, 2025
Amazon SageMaker HyperPod Launches Model Deployments to Accelerate the Generative AI Model Development Lifecycle 2025
Supercharge Your LLM Performance with Amazon SageMaker Large Model Inference Container v15 Apr 22, 2025
Introducing Fast Model Loader in SageMaker Inference: Accelerate Autoscaling for Your LLMs (Part 2) Dec 2, 2024
Amazon SageMaker Inference Now Supports G6e Instances Nov 22, 2024
Boost Inference Performance for LLMs with New Amazon SageMaker Containers Nov 27, 2023

πŸ€– Agentic AI & GenAI Architectures

Title Date
Build Agentic Workflows with OpenAI GPT & OSS on Amazon SageMaker AI and Amazon Bedrock AgentCore 2025
Use Amazon Bedrock Tooling with Amazon SageMaker JumpStart Models Dec 4, 2024

πŸ“š RAG & Retrieval Architectures

Title Date
Optimize RAG in Production Environments Using Amazon SageMaker JumpStart and Amazon OpenSearch Service 2025
Deploy RAG Applications on Amazon SageMaker JumpStart Using FAISS Dec 5, 2024
RAG Architecture with Voyage AI Embedding Models on Amazon SageMaker JumpStart and Anthropic Claude 3 Models May 14, 2024

🧠 Model Deployment, Fine-Tuning & Benchmarking

Title Date
Impel Enhances Automotive Dealership Customer Experience with Fine-Tuned LLMs on Amazon SageMaker Jun 4, 2025
Deploy DeepSeek R1 Distilled Models on Amazon SageMaker Using a Large Model Inference Container Mar 11, 2025
DeepSeek R1 Model Now Available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart Jan 30, 2025
Efficient and Cost-Effective Multi-Tenant LoRA Serving with Amazon SageMaker May 21, 2024
Deploy Foundation Models with Amazon SageMaker, Iterate and Monitor with TruEra Dec 22, 2023
Fine-Tune and Deploy Mistral 7B with Amazon SageMaker JumpStart Nov 14, 2023
Zero-Shot Prompting for the Flan-T5 Foundation Model in Amazon SageMaker JumpStart Apr 3, 2023
Identify Key Insights from Text Documents Through Fine-Tuning and HPO with Amazon SageMaker JumpStart Nov 21, 2022

⚑ Hardware Optimization & Cost-Efficient Inference

Title Date
Maximize Stable Diffusion Performance and Lower Inference Costs with AWS Inferentia2 Jul 26, 2023
Host ML Models on Amazon SageMaker Using Triton: CV Model with PyTorch Backend May 31, 2023
Achieve High Performance with Lowest Cost for Generative AI Inference Using AWS Inferentia2 and AWS Trainium on Amazon SageMaker May 4, 2023

πŸ”§ MLOps & Platform Engineering

Title Date
Governing the ML Lifecycle at Scale: Centralized Observability with Amazon SageMaker and Amazon CloudWatch Oct 29, 2024
Implementing MLOps Practices with Amazon SageMaker JumpStart Pre-Trained Models Feb 15, 2023
Isima.io Optimizes Price-Performance for OLAP Workloads Using Amazon EBS Feb 8, 2023

Conference Speaking (19+ Presentations)

Event Topic Area Year
AWS re:Invent SageMaker Inference & GenAI 2023, 2024
NVIDIA GTC LLM Serving Optimization 2024
Intel MLCon ML Inference Performance 2024
Arize Observe Model Monitoring & Deployment 2024
Retrivex RAG Architectures 2024

Areas of Expertise

  • LLM Serving & Optimization β€” vLLM, KV caching, speculative decoding, disaggregated inference, intelligent routing
  • Agentic AI β€” LangChain, multi-agent orchestration, tool use, Bedrock AgentCore
  • RAG Architectures β€” OpenSearch, FAISS, Voyage AI embeddings, production retrieval pipelines
  • Infrastructure β€” Kubernetes, GPU clusters (H100/H200/A100), Inferentia/Trainium, distributed systems
  • MLOps β€” CI/CD for ML, model monitoring, deployment guardrails, auto-scaling
  • Product & Strategy β€” Zero-to-one product launches, customer-driven roadmap development, technical GTM

Contact

πŸ“§ vivekrg13@gmail.com

πŸ”— LinkedIn

Popular repositories Loading

  1. amazon-sagemaker-examples amazon-sagemaker-examples Public

    Forked from aws/amazon-sagemaker-examples

    Example πŸ““ Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Jupyter Notebook

  2. ml-specialized-hardware ml-specialized-hardware Public

    Forked from aws-samples/ml-specialized-hardware

    Jupyter Notebook

  3. trulens trulens Public

    Forked from truera/trulens

    Evaluation and Tracking for LLM Experiments

    Jupyter Notebook

  4. amazon-sagemaker-generativeai amazon-sagemaker-generativeai Public

    Forked from aws-samples/amazon-sagemaker-generativeai

    Repository for training and deploying Generative AI models, including text-text, text-to-image generation and prompt engineering playground using SageMaker Studio.

    Jupyter Notebook

  5. lora-finetune lora-finetune Public

    Jupyter Notebook

  6. Reinvent-2024-Demos Reinvent-2024-Demos Public

    Demos for Inference on HyperPod for Reinvent session AIM367