Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

Add vLLM for fun and profit#1

Draft
zten wants to merge 4 commits into
g4rg:mainfrom
zten:vllm-experiment
Draft

Add vLLM for fun and profit#1
zten wants to merge 4 commits into
g4rg:mainfrom
zten:vllm-experiment

Conversation

@zten
Copy link
Copy Markdown

@zten zten commented Oct 4, 2023

I could use some feedback on what I did.

I should probably clean up the configuration because that's a bit wacky.

I did adapt the text-generation-inference stuff to use aiohttp instead so I could keep the async interfaces as similar as possible. Yes, I did at least test it, because I needed something to experiment with while getting the vLLM piece working.

zten added 4 commits October 1, 2023 19:19
* preserved text-generation-inference support by adding
  an interface
* switched api methods to async
@g4rg
Copy link
Copy Markdown
Owner

g4rg commented Oct 4, 2023

Thank you for your contribution (and for hosting!)

The refactor to allow more backends is great.
My concern is that this implementation forces users who just want a lightweight TGI bridge to fully install vLLM and all of its dependencies, aswell as load it (import).

That said, I think in general it's preferable to do it your way (using the engine directly) instead of making bridges for bridges.
This project is really just a result of me being too lazy to fork TGI and learning how to integrate a proper KAI endpoint in their Rust server.

I would be happy to integrate this if the vLLM requirement becomes optional.
That would likely mean:

  • a separate requirements-vllm.txt
  • refactoring to only trigger the vllm imports when it is selected as the backend

This is what AI-Horde-Worker is designed to do aswell. It has a separate requirements-scribe.txt that doesn't include all the stable diffusion stuff.

@g4rg
Copy link
Copy Markdown
Owner

g4rg commented Oct 4, 2023

Tagging @Pyroserenus @official-elinas @HarmonyTechLabs @Yardanico as I know they've been using the bridge to make a batching KAI server, to make aware that TGI is no longer the only option.
It's also worth checking out PygmalionAI/aphrodite-engine, which now has built-in KAI API support and more sampler options than TGI and vLLM.

@zten
Copy link
Copy Markdown
Author

zten commented Oct 6, 2023

Hey, thanks for the feedback. I did this mostly to see if it could be done (of course it could be done, I guess...) so this achieved the itch I wanted to scratch for myself. However, I'm realizing I think I made a mistake -- implementing an OpenAI proxy in the same vein as your text-generation-inference proxy would have meant it would be possible to integrate Aphrodite or vLLM's API server.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants