ML Engineer Assignment

Premise

LLMs, while useful for solving various NLP related use-cases, are costly and cumbersome to productionize. Organisations use expensive A100/H100 GPUs to deploy LLMs while extracting underwhelming performance in terms of latency and throughput.

Problem statement

Design and develop a novel, generalised model optimisation and inference script for MistralForCausalLM based LLMs that:

Accepts a Huggingface model path as an input (example: mistralai/Mistral-7B-v0.1)
Optimises the model for faster inference (including warmups etc.)
Waits for user to input the prompt
Runs the model on the prompt
Outputs the model response and performance metrics

Baseline benchmarks

Develop the script to beat the following benchmarks with the set constraints:

Total throughput (in + out tokens per second) = 200 tokens/sec

Here are the other details:

Input tokens = 128
Output tokens = 128
Concurrency = 32
GPU = 1 X Nvidia Tesla T4 (16GB VRAM)
Model dtype = any dtype of choice supported by said GPU

Bonus

Make the script compatible with LoRA models

Tools that offer free T4 GPUs

Google Colab
Kaggle
Amazon Sagemaker Studio Labs

For any clarifications, contact devansh@simplismart.ai or daksh.goel@simplismart.tech

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Engineer Assignment

Premise

Problem statement

Baseline benchmarks

Bonus

Tools that offer free T4 GPUs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ML Engineer Assignment

Premise

Problem statement

Baseline benchmarks

Bonus

Tools that offer free T4 GPUs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages