Skip to content

ibm-granite/granite.debug-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Granite.Debug Tools

Granite.Debug is a suite of self-service debugging tools for Large Language Models (LLMs) that streamline issue detection, analysis, and resolution across diverse LLM workflows.

These tools help identify, evaluate, and resolve issues across fine-tuning workflows, benchmark analysis, and agent-based LLM interactions.

Available Tools

Selecting the Right Tool

If I need to... Then I should use...
Design scaffolded tasks to diagnose which skill-level capability is missing STaD
Benchmark LLM serving endpoints and local inference with an MCP-based tool perfbench
Validate model behavior across inference engines (vLLM, llama.cpp, Ollama) runtimes-validator

STaD - Scaffolded Task Design

STaD is a framework for generating scaffolded variations of multi-step reasoning tasks to enable systematic LLM debugging, evaluation, and training.

Use STaD when you need to design scaffolded tasks to diagnose which skill-level capability is missing in your model.

perfbench - MCP server for Granite benchmarking

perfbench is an MCP server that manages LLM benchmark runs as asynchronous subprocesses, wrapping five benchmark runners (vLLM, AIPerf, GuideLLM, llama-bench, Ollama) behind a unified tool interface.

Use perfbench when you need to benchmark LLM serving endpoints or local inference and want an agent-driven workflow via the Model Context Protocol.

runtimes-validator

runtimes-validator is a unified validation framework for running model checks across inference engines (vLLM, llama.cpp, Ollama). It provides a CLI (runtimes-validator) to run automated validation tests against Granite models deployed on different backends, supporting both managed (framework starts/stops the engine) and external (connect to a running engine) execution modes.

Use runtimes-validator when you need to validate that a Granite model behaves correctly across different inference engines.

Coming Soon

Additional debugging tools are being prepared for open-source release. Stay tuned!

Contributing

We welcome contributions! If you'd like to contribute to any of the tools in this repository, please open an issue or submit a pull request.

🚧 Work in Progress

This repository is actively evolving. We are continuously adding new debugging tools, expanding coverage, and refining existing functionality based on community feedback and ongoing research. Check back regularly for updates, and feel free to open an issue or discussion if you have suggestions or requests.

Notice

IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

About

Granite Debug Tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages