Granite.Debug Tools

Granite.Debug is a suite of self-service debugging tools for Large Language Models (LLMs) that streamline issue detection, analysis, and resolution across diverse LLM workflows.

These tools help identify, evaluate, and resolve issues across fine-tuning workflows, benchmark analysis, and agent-based LLM interactions.

Available Tools

Selecting the Right Tool

If I need to...	Then I should use...
Design scaffolded tasks to diagnose which skill-level capability is missing	STaD
Benchmark LLM serving endpoints and local inference with an MCP-based tool	perfbench
Validate model behavior across inference engines (vLLM, llama.cpp, Ollama)	runtimes-validator

STaD - Scaffolded Task Design

STaD is a framework for generating scaffolded variations of multi-step reasoning tasks to enable systematic LLM debugging, evaluation, and training.

Use STaD when you need to design scaffolded tasks to diagnose which skill-level capability is missing in your model.

perfbench - MCP server for Granite benchmarking

perfbench is an MCP server that manages LLM benchmark runs as asynchronous subprocesses, wrapping five benchmark runners (vLLM, AIPerf, GuideLLM, llama-bench, Ollama) behind a unified tool interface.

Use perfbench when you need to benchmark LLM serving endpoints or local inference and want an agent-driven workflow via the Model Context Protocol.

runtimes-validator

runtimes-validator is a unified validation framework for running model checks across inference engines (vLLM, llama.cpp, Ollama). It provides a CLI (runtimes-validator) to run automated validation tests against Granite models deployed on different backends, supporting both managed (framework starts/stops the engine) and external (connect to a running engine) execution modes.

Use runtimes-validator when you need to validate that a Granite model behaves correctly across different inference engines.

Coming Soon

Additional debugging tools are being prepared for open-source release. Stay tuned!

Contributing

We welcome contributions! If you'd like to contribute to any of the tools in this repository, please open an issue or submit a pull request.

🚧 Work in Progress

This repository is actively evolving. We are continuously adding new debugging tools, expanding coverage, and refining existing functionality based on community feedback and ongoing research. Check back regularly for updates, and feel free to open an issue or discussion if you have suggestions or requests.

Notice

IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github		.github
STaD		STaD
perfbench		perfbench
runtimes-validator		runtimes-validator
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Granite.Debug Tools

Available Tools

Selecting the Right Tool

STaD - Scaffolded Task Design

perfbench - MCP server for Granite benchmarking

runtimes-validator

Coming Soon

Contributing

🚧 Work in Progress

Notice

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Granite.Debug Tools

Available Tools

Selecting the Right Tool

STaD - Scaffolded Task Design

perfbench - MCP server for Granite benchmarking

runtimes-validator

Coming Soon

Contributing

🚧 Work in Progress

Notice

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages