Skip to content

AI-powered scraper that extracts race data from trail running websites using Claude and Playwright.

Notifications You must be signed in to change notification settings

mmarinovic/bonkmap-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BonkMap Scraper

AI-powered scraper that extracts race data from trail running websites using Claude and Playwright.

What it does

  • Navigates race websites autonomously
  • Extracts race info: name, distance, elevation, location
  • Downloads GPX files and parses them into segments
  • Captures aid station data with cutoff times
  • Takes screenshots of course profiles
  • Outputs structured JSON ready for import

Setup

cd bonkmap-scraper
bun install
npx playwright install chromium

Usage

export ANTHROPIC_API_KEY=your_key
bun run start <race-url>

Examples

# UTMB Mont-Blanc races
bun run start https://montblanc.utmb.world/races/ccc
bun run start https://montblanc.utmb.world/races/utmb
bun run start https://montblanc.utmb.world/races/tds

# Other UTMB World Series
bun run start https://oman.utmb.world/races/100km
bun run start https://thailand.utmb.world/races/100

Output

Creates output/YYYY-MM-DD/ folder:

output/2024-08-15/
├── ccc-utmb-2024.json    # Race data
├── course.gpx            # GPX file
├── main_page.png         # Screenshots
├── profile.png
└── map.png

JSON Structure

{
  "success": true,
  "course": {
    "slug": "ccc-utmb-2024",
    "name": "CCC",
    "distanceKm": 101,
    "elevationGainM": 6100,
    "location": "Courmayeur to Chamonix",
    "country": "France/Italy/Switzerland",
    "raceStartTime": "09:00",
    "cutoffTime": "26:30",
    "hasGpx": true,
    "gpxFilename": "course.gpx",
    "aidStations": [
      {
        "km": 13.7,
        "name": "Refuge Bertone",
        "elevation": 2100,
        "type": "full-aid",
        "cutoffTime": "13:45"
      }
    ],
    "segments": [...],
    "tags": ["utmb", "100k", "alpine"]
  }
}

How it works

  1. Launches Chrome via Playwright (visible for debugging)
  2. Claude agent uses tools to navigate and extract data:
    • navigate, click, scroll - Browser control
    • get_page_content, find_elements - Read page
    • download_gpx - Download and parse GPX files
    • screenshot - Capture page screenshots
    • extract_race_info, add_aid_station - Store data
    • dismiss_overlays - Remove popups blocking clicks
    • finish - Save results to JSON
  3. Blocks tracking scripts (HubSpot, OneTrust) that interfere
  4. Outputs JSON + GPX + screenshots

Requirements

  • Node.js 18+ or Bun
  • Anthropic API key
  • Chrome/Chromium

License

MIT

About

AI-powered scraper that extracts race data from trail running websites using Claude and Playwright.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •