Add vLLM for fun and profit#1
Conversation
* preserved text-generation-inference support by adding an interface * switched api methods to async
|
Thank you for your contribution (and for hosting!) The refactor to allow more backends is great. That said, I think in general it's preferable to do it your way (using the engine directly) instead of making bridges for bridges. I would be happy to integrate this if the vLLM requirement becomes optional.
This is what |
|
Tagging @Pyroserenus @official-elinas @HarmonyTechLabs @Yardanico as I know they've been using the bridge to make a batching KAI server, to make aware that TGI is no longer the only option. |
|
Hey, thanks for the feedback. I did this mostly to see if it could be done (of course it could be done, I guess...) so this achieved the itch I wanted to scratch for myself. However, I'm realizing I think I made a mistake -- implementing an OpenAI proxy in the same vein as your text-generation-inference proxy would have meant it would be possible to integrate Aphrodite or vLLM's API server. |
I could use some feedback on what I did.
I should probably clean up the configuration because that's a bit wacky.
I did adapt the text-generation-inference stuff to use aiohttp instead so I could keep the async interfaces as similar as possible. Yes, I did at least test it, because I needed something to experiment with while getting the vLLM piece working.