Skip to content

Gimpy666/tribot-docs-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tribot JavaDoc Scraper

A Python-based web scraper that extracts JavaDoc content from the Tribot API documentation and converts it to well-formatted markdown for LLM consumption.

🎯 Overview

This project scrapes all 224 Tribot API classes and interfaces from the official JavaDoc documentation and provides them in two formats:

  • Individual files: One .md file per class/interface
  • Consolidated file: ALL_TRIBOT_DOCS.md with everything in one organized document

✨ Features

  • 🕷️ Complete Coverage: Scrapes all 224 Tribot API classes and interfaces
  • 📝 Clean Markdown: Converts HTML to well-formatted, LLM-friendly markdown
  • 🎯 Dual Output: Creates both individual files and a consolidated document
  • Rate Limiting: Respectful 1-second delay between requests
  • 📊 Progress Tracking: Visual progress bars and detailed error reporting
  • 🔍 Metadata Extraction: Class names, packages, descriptions, and URLs
  • 📋 Table of Contents: Clickable navigation in the consolidated file

📁 Repository Contents

Core Files

  • scraper.py - Main scraper class with all functionality
  • main.py - Command-line interface script
  • urls.py - Complete list of 224 Tribot API URLs
  • requirements.txt - Python dependencies

Documentation Output

  • scraped_docs/ - Directory containing all scraped documentation
    • ALL_TRIBOT_DOCS.md - Consolidated file with all 224 pages
    • Individual .md files for each class/interface
    • scraping_summary.json - Statistics and metadata

🚀 Quick Start

Installation

git clone https://github.com/Gimpy666/tribot-docs-scraper.git
cd tribot-docs-scraper
pip install -r requirements.txt

Usage

Scrape all Tribot API pages (default):

python main.py

Test with single page:

python main.py --test

Custom options:

python main.py --output my_docs --delay 2.0

📊 Scraping Results

The scraper successfully processed 224/224 URLs with 0 failures:

  • 224 successful scrapes
  • 0 failed requests
  • 📄 224 individual markdown files
  • 📚 1 consolidated file (ALL_TRIBOT_DOCS.md)

📖 Output Format

Individual Files

Each class gets its own markdown file with:

  • Class/interface name as main heading
  • Source URL and package information
  • Description (if available)
  • Complete JavaDoc content in markdown format

Consolidated File

The ALL_TRIBOT_DOCS.md file includes:

  • Header with generation timestamp and page count
  • Table of Contents with clickable links to each class
  • Individual sections for each of the 224 classes
  • Clear separators and navigation between pages

🛠️ Technical Details

Dependencies

  • requests - HTTP requests
  • beautifulsoup4 - HTML parsing
  • lxml - XML/HTML parser
  • markdownify - HTML to markdown conversion
  • tqdm - Progress bars

Rate Limiting

  • 1-second delay between requests (configurable)
  • Respectful to the server
  • Prevents rate limiting issues

Error Handling

  • Comprehensive error reporting
  • Continues processing even if some pages fail
  • Detailed summary of successes and failures

📋 API Coverage

The scraper covers all major Tribot API categories:

  • Core Interfaces: Actionable, Clickable, Interactable, etc.
  • Query Classes: ActionableQuery, BankQuery, InventoryQuery, etc.
  • Game Objects: GameObject, Npc, Player, GroundItem, etc.
  • Input/Output: Mouse, Keyboard, Camera, Screenshot
  • Banking: Bank, BankQuery, BankSettings
  • Combat: Combat, Prayer, Magic
  • Walking: GlobalWalking, LocalWalking, DaxWalkerAdapter
  • UI Elements: Widget, WidgetQuery, GameTab
  • Utilities: Log, Waiting, Notifications, ScriptSettings

🤖 LLM Usage

The generated markdown files are optimized for LLM consumption:

  • Clean formatting with proper headings and code blocks
  • Structured content with consistent metadata
  • Complete coverage of all Tribot API functionality
  • Easy navigation with table of contents and clear sections

📈 Statistics

  • Total URLs: 224
  • Success Rate: 100%
  • Total Content: ~75,000+ lines of markdown
  • Consolidated File Size: ~2.5MB
  • Scraping Time: ~25 seconds (with rate limiting)

🔧 Configuration

Command Line Options

  • --test: Run test with single URL only
  • --all: Scrape all Tribot API URLs (default)
  • --output DIR: Output directory (default: scraped_docs)
  • --delay SECONDS: Delay between requests (default: 1.0)

Customization

You can modify urls.py to add or remove URLs, or adjust the scraper behavior in scraper.py.

📝 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Feel free to:

  • Report issues
  • Suggest improvements
  • Submit pull requests
  • Add new features

📞 Support

If you encounter any issues or have questions, please open an issue on GitHub.


Generated on: 2025-09-11
Tribot API Version: 1.0.70
Total Classes Scraped: 224

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages