Troubleshooting Guide

Common issues and solutions for Qwen3-Coder-480B-A35B-Instruct installation and usage.

🔧 Installation Issues

GPU Memory Errors

Error: CUDA out of memory

RuntimeError: CUDA out of memory. Tried to allocate X GB (GPU 0; Y GB total capacity)

Solutions:

Check GPU memory usage: nvidia-smi
Kill existing processes: sudo pkill -f python
Restart GPU: sudo nvidia-smi --gpu-reset
Reduce batch size or use model sharding

Enable CPU offloading:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    offload_folder="./offload",
    offload_state_dict=True
)

NumPy Version Conflicts

Error: numpy.dtype size changed, may indicate binary incompatibility

Solutions:

Clean installation:

rm -rf ~/qwen480b_env
pip cache purge
./install.sh

Manual NumPy fix:

source ~/qwen480b_env/bin/activate
pip uninstall numpy -y
pip install numpy==1.24.4

CUDA Installation Issues

Error: NVCC not found or CUDA version mismatch

Solutions:

Install CUDA 12.1:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-1

Update PATH:

echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Verify installation:
```
nvcc --version
nvidia-smi
```

Model Download Failures

Error: Connection timeout or HTTP 403/404 errors

Solutions:

Check internet connection: ping huggingface.co
Use Hugging Face token (if model requires authentication):
```
export HF_TOKEN="your_token_here"
```

Manual download with resume:

git clone https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

Use mirror servers:

export HF_ENDPOINT="https://hf-mirror.com"

🐛 Runtime Issues

Model Loading Failures

Error: Can't load model or Config file not found

Solutions:

Verify model directory:

ls -la ~/qwen480b_env/models/qwen3-coder-480b/

Check file permissions:
```
chmod -R 755 ~/qwen480b_env/models/
```

Re-download corrupted files:

cd ~/qwen480b_env/models/qwen3-coder-480b/
git lfs pull

Slow Inference

Issue: Model takes too long to generate responses

Solutions:

Enable optimizations:

# Use torch.compile (PyTorch 2.0+)
model = torch.compile(model, mode="reduce-overhead")

# Use better data types
model = model.half()  # FP16

Adjust generation parameters:

outputs = model.generate(
    **inputs,
    max_new_tokens=100,  # Reduce if too slow
    do_sample=False,     # Use greedy decoding
    num_beams=1,         # Disable beam search
)

Use VLLM (if installed):

from vllm import LLM, SamplingParams

llm = LLM(model="path/to/model")
outputs = llm.generate(prompts, SamplingParams(temperature=0.7))

Memory Leaks

Issue: Memory usage keeps increasing

Solutions:

Clear GPU cache regularly:
```
import torch
torch.cuda.empty_cache()
```

Use context managers:

with torch.no_grad():
    outputs = model.generate(**inputs)

Delete unused variables:

del outputs, inputs
import gc
gc.collect()

🔍 Debugging

Enable Debug Logging

import logging
logging.basicConfig(level=logging.DEBUG)

# For transformers
import transformers
transformers.logging.set_verbosity_debug()

Check System Resources

# GPU usage
nvidia-smi -l 1

# Memory usage
htop

# Disk usage
df -h

# Process monitoring
ps aux | grep python

Model Information

# Check model config
from transformers import AutoConfig
config = AutoConfig.from_pretrained(model_path)
print(config)

# Check model size
import torch
model_size = sum(p.numel() for p in model.parameters())
print(f"Model parameters: {model_size:,}")

📊 Performance Tuning

Optimize for Speed

Use FP16 precision:
```
model = model.half()
```
Enable torch.compile:
```
model = torch.compile(model)
```

Optimize tokenizer:

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

Optimize for Memory

Use model sharding:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)

Enable gradient checkpointing:
```
model.gradient_checkpointing_enable()
```

Use CPU offloading:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    offload_folder="./offload"
)

🆘 Getting Help

Collect Debug Information

Before reporting issues, collect this information:

# System info
./scripts/system_check.sh > debug_info.txt

# Python environment
pip list >> debug_info.txt

# GPU info
nvidia-smi >> debug_info.txt

# Error logs
tail -100 ~/qwen480b_install.log >> debug_info.txt

Report Issues

Check existing issues: GitHub Issues
Create new issue with:
- Clear description of the problem
- Steps to reproduce
- Error messages and logs
- System information from debug_info.txt

Community Support

Discussions: GitHub Discussions
Documentation: Project Wiki

🔧 Advanced Debugging

Memory Profiling

import torch.profiler

with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CUDA],
    record_shapes=True
) as prof:
    outputs = model.generate(**inputs)

print(prof.key_averages().table(sort_by="cuda_memory_usage"))

Performance Profiling

import time
import torch

# Warm up
for _ in range(3):
    _ = model.generate(**inputs, max_new_tokens=10)

# Benchmark
times = []
for _ in range(10):
    start = time.time()
    outputs = model.generate(**inputs, max_new_tokens=50)
    times.append(time.time() - start)

print(f"Average time: {sum(times)/len(times):.2f}s")

Network Debugging

# Test Hugging Face connectivity
curl -v https://huggingface.co

# Test download speed
wget --progress=bar --timeout=30 https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/resolve/main/README.md

# Check DNS resolution
nslookup huggingface.co

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Guide

🔧 Installation Issues

GPU Memory Errors

NumPy Version Conflicts

CUDA Installation Issues

Model Download Failures

🐛 Runtime Issues

Model Loading Failures

Slow Inference

Memory Leaks

🔍 Debugging

Enable Debug Logging

Check System Resources

Model Information

📊 Performance Tuning

Optimize for Speed

Optimize for Memory

🆘 Getting Help

Collect Debug Information

Report Issues

Community Support

🔧 Advanced Debugging

Memory Profiling

Performance Profiling

Network Debugging

FilesExpand file tree

TROUBLESHOOTING.md

Latest commit

History

TROUBLESHOOTING.md

File metadata and controls

Troubleshooting Guide

🔧 Installation Issues

GPU Memory Errors

NumPy Version Conflicts

CUDA Installation Issues

Model Download Failures

🐛 Runtime Issues

Model Loading Failures

Slow Inference

Memory Leaks

🔍 Debugging

Enable Debug Logging

Check System Resources

Model Information

📊 Performance Tuning

Optimize for Speed

Optimize for Memory

🆘 Getting Help

Collect Debug Information

Report Issues

Community Support

🔧 Advanced Debugging

Memory Profiling

Performance Profiling

Network Debugging