Skip to content

Commit e251dcc

Browse files
Merge pull request #61 from SunbirdAI/phionah-branch
Phionah branch
2 parents dfa3fcb + 078d76a commit e251dcc

2 files changed

Lines changed: 45 additions & 0 deletions

File tree

docs/sunflower/overview.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# 🌻 Sunflower Quantized Inference
2+
3+
## Overview
4+
The **Sunflower** models are available in **14B** and **32B** sizes and support **8-bit** and **4-bit quantized inference** for efficient performance on GPUs with limited memory.
5+
6+
Quantization reduces memory requirements while keeping inference quality high, enabling large models to run on consumer-grade GPUs or hardware with limited VRAM.
7+
8+
| Feature | 8-bit | 4-bit |
9+
|-----------------|--------------------------|----------------------------|
10+
| Memory Usage | Higher (~16GB for 14B) | Lower (~10GB for 14B) |
11+
| Speed | ⚡ Fast | ⚡⚡ Faster |
12+
| Accuracy | Very Good | Slightly Lower |
13+
| VRAM Efficiency | Moderate | High |
14+
15+
> ⚠️ **Important:** Do not set both 8-bit and 4-bit modes at the same time.
16+
17+
---
18+
19+
## 🌻 Sunflower 14B Models
20+
21+
- **14B 8-bit:** Balanced memory and accuracy, suitable for most GPUs.
22+
- **14B 4-bit:** Optimized for memory-limited GPUs and faster inference, with minimal accuracy trade-off.
23+
24+
---
25+
26+
## 🌻 Sunflower 32B Models
27+
28+
- **32B 8-bit:** High accuracy, requires more GPU memory.
29+
- **32B 4-bit:** Reduced memory usage, faster inference, slightly lower accuracy.
30+
31+
> The usage process is identical for 14B and 32B; only model size and quantization type differ.
32+
33+
34+
## Tips & Best Practices
35+
36+
- Use 4-bit models when **GPU memory is limited** or faster inference is needed.
37+
- 8-bit models offer a **good balance of memory usage and accuracy**.
38+
- Always choose **either 8-bit or 4-bit** for a model.
39+
- For large inputs or batch processing, monitor GPU memory to avoid out-of-memory errors.
40+
- Adjust inference parameters (like sequence length or tokens) for optimal performance based on your hardware.
41+
42+
---
43+
44+

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,7 @@ nav:
180180
- API:
181181
- "Getting started": sunflower/sunflower.md
182182
- Quantization:
183+
- "Overview": sunflower/overview.md
183184
- "LoRA Fine-tuned Models to GGUF": sunflower/quantization.md
184185

185186
- Tutorials:

0 commit comments

Comments
 (0)