Add vLLM for fun and profit by zten · Pull Request #1 · g4rg/tgi-kai-bridge

zten · 2023-10-04T02:50:24Z

I could use some feedback on what I did.

I should probably clean up the configuration because that's a bit wacky.

I did adapt the text-generation-inference stuff to use aiohttp instead so I could keep the async interfaces as similar as possible. Yes, I did at least test it, because I needed something to experiment with while getting the vLLM piece working.

* preserved text-generation-inference support by adding an interface * switched api methods to async

g4rg · 2023-10-04T08:54:27Z

Thank you for your contribution (and for hosting!)

The refactor to allow more backends is great.
My concern is that this implementation forces users who just want a lightweight TGI bridge to fully install vLLM and all of its dependencies, aswell as load it (import).

That said, I think in general it's preferable to do it your way (using the engine directly) instead of making bridges for bridges.
This project is really just a result of me being too lazy to fork TGI and learning how to integrate a proper KAI endpoint in their Rust server.

I would be happy to integrate this if the vLLM requirement becomes optional.
That would likely mean:

a separate requirements-vllm.txt
refactoring to only trigger the vllm imports when it is selected as the backend

This is what AI-Horde-Worker is designed to do aswell. It has a separate requirements-scribe.txt that doesn't include all the stable diffusion stuff.

g4rg · 2023-10-04T13:39:58Z

Tagging @Pyroserenus @official-elinas @HarmonyTechLabs @Yardanico as I know they've been using the bridge to make a batching KAI server, to make aware that TGI is no longer the only option.
It's also worth checking out PygmalionAI/aphrodite-engine, which now has built-in KAI API support and more sampler options than TGI and vLLM.

zten · 2023-10-06T03:50:19Z

Hey, thanks for the feedback. I did this mostly to see if it could be done (of course it could be done, I guess...) so this achieved the itch I wanted to scratch for myself. However, I'm realizing I think I made a mistake -- implementing an OpenAI proxy in the same vein as your text-generation-inference proxy would have meant it would be possible to integrate Aphrodite or vLLM's API server.

zten added 4 commits October 1, 2023 19:19

Add vLLM as an inference backend option

9475e36

* preserved text-generation-inference support by adding an interface * switched api methods to async

Documentation

fc2f813

Fix some gaffes with model naming and selection

4af8a77

docs and requirements fix

bd92792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM for fun and profit#1

Add vLLM for fun and profit#1
zten wants to merge 4 commits into
g4rg:mainfrom
zten:vllm-experiment

zten commented Oct 4, 2023

Uh oh!

g4rg commented Oct 4, 2023 •

edited

Loading

Uh oh!

g4rg commented Oct 4, 2023 •

edited

Loading

Uh oh!

zten commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zten commented Oct 4, 2023

Uh oh!

g4rg commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

g4rg commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zten commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

g4rg commented Oct 4, 2023 •

edited

Loading

g4rg commented Oct 4, 2023 •

edited

Loading