Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 1.96 KB

File metadata and controls

46 lines (29 loc) · 1.96 KB

Model templates

Model format for defining AI models and their suggested engine options, especially for vLLM and SGLang.

These templates are also used by Verda command line client (TODO: add link when available) that allows easily running the template.

Supported Engines

  • sglang - SGLang inference engine
  • vllm - vLLM inference engine
  • custom - Any custom inference engine

Configuration Format

Structured JSON with common fields at the top level and engine-specific options nested. More information about the configuration format is available in a separate spec file.

The JSON-to-CLI parameter mappings are defined in the mappings/ directory.

⚠️ Disclaimer

The templates in the templates/ directory have not been fully verified. While they are based on recommended configurations from various sources, they may require adjustments for your specific hardware, software versions, or use cases. Please test and debug serverless deployments thoroughly.

Take a look at DEV_WORKFLOW.md for a way to test templates on a GPU instance before deploying to a Serverless Container.

Acknowledgements

This project was inspired by and partially uses model data from:

Resources

LLM

Verda