shyeh25

Follow

shyeh25

Follow

2 followers · 0 following

Popular repositories Loading

TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda
sglang sglang Public

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python
srt-slurm srt-slurm Public

Forked from NVIDIA/srt-slurm

NVIDIA Inference Benchmarks provide recipes in ready-to-use templates for evaluating platform speed. Validate your platform across specific AI use cases across hardware and software combinations.

Python