This repository is a fork of llama.cpp customized to facilitate Llama2 inference within Codesphere.
Llama.cpp is a powerful tool for running Llama2 inference, and this fork is tailored specifically for seamless integration with Codesphere environments.
- Pre-Configured CI Pipeline: The CI pipeline is set up to automatically fetch a pre-converted and quantized llama code instruct model from TheBloke on Hugging Face.
- HTTP Server Example: The repository includes an HTTP server example, allowing for easy deployment and testing. Configuration options can be found in the /examples/server directory.
- clone this repository in a new workspace (at least Pro/GPU)
- start the
Preparestage in theCI-Pipeline - after the
Preparestage is done you can start therunstage - click on
Open deploymentin the top right corner
For detailed configuration options and usage instructions, refer to the README file located in the /examples/server directory.
Please note that while this repository provides a convenient setup for running Llama2 inference in Codesphere, further customization may be required to suit specific use cases or preferences.