The dgx-spark-inference-stack provides an easy way to serve AI models on your personal computer. This stack is specifically designed for the Nvidia DGX Spark, also known as the Grace Blackwell AI supercomputer on your desk. It mainly uses vLLM technology to help you get started with AI inference quickly.
- Simple Setup: Get up and running quickly with user-friendly installation instructions.
- Local Model Serving: Run your AI models directly on your machine.
- Docker Support: Utilize Docker to simplify application management.
- ML Ops Ready: Ideal for machine learning operations and workflows.
- Focused on Generative AI: Utilize cutting-edge AI models like LLaMA for generative tasks.
Before you begin, make sure your system meets the following requirements:
- Operating System: Windows 10 or later, macOS 10.13 or later, or a Linux distribution.
- Memory: At least 8 GB of RAM recommended.
- GPU: Nvidia GPU with CUDA support is required for optimal performance.
- Docker: Latest version of Docker must be installed.
- Ensure your system meets the requirements above.
- If your Docker is not installed, please install it from Docker's official page.
- Review this guide and prepare for the download.
To get the latest release:
- Visit this page to download: Releases Page.
- Locate the latest version and download the appropriate file for your operating system.
- Follow the instructions in the download section of the release for specific installation steps.
Once you have downloaded and installed the application:
- Open a terminal or command prompt.
- Navigate to the directory where the application is installed.
- Run the following command to start the inference server:
docker-compose up - Once the server is running, follow the instructions in the terminal to access the application through your web browser.
If you encounter any issues:
- Check System Requirements: Ensure all requirements are met.
- Review Docker Logs: If the application does not start, check the Docker logs for any error messages.
- Google the Error Message: Often, solutions are available online for common issues.
- Seek Help in the Community: Visit related forums or GitHub discussions for support.
For detailed documentation on how to use the application, you can refer to the Wiki section in the repository. This includes information on advanced features, tuning parameters, and FAQs.
We welcome contributions from the community. If you want to contribute, please follow our guidelines in the repository. Check the issues section for any enhancement requests or bugs that need fixing.
If you have further questions, you can open an issue in the GitHub repository. The community is active and ready to assist you.
- CUDA: A parallel computing platform and programming model.
- Generative AI: Using models to generate new content.
- MLOps: Integration of machine learning into operations for deployment.
This README.md file provides all the information you need to successfully download and run the dgx-spark-inference-stack application on your computer. Enjoy your journey into AI model serving!