AI-powered scraper that extracts race data from trail running websites using Claude and Playwright.
- Navigates race websites autonomously
- Extracts race info: name, distance, elevation, location
- Downloads GPX files and parses them into segments
- Captures aid station data with cutoff times
- Takes screenshots of course profiles
- Outputs structured JSON ready for import
cd bonkmap-scraper
bun install
npx playwright install chromiumexport ANTHROPIC_API_KEY=your_key
bun run start <race-url># UTMB Mont-Blanc races
bun run start https://montblanc.utmb.world/races/ccc
bun run start https://montblanc.utmb.world/races/utmb
bun run start https://montblanc.utmb.world/races/tds
# Other UTMB World Series
bun run start https://oman.utmb.world/races/100km
bun run start https://thailand.utmb.world/races/100Creates output/YYYY-MM-DD/ folder:
output/2024-08-15/
├── ccc-utmb-2024.json # Race data
├── course.gpx # GPX file
├── main_page.png # Screenshots
├── profile.png
└── map.png
{
"success": true,
"course": {
"slug": "ccc-utmb-2024",
"name": "CCC",
"distanceKm": 101,
"elevationGainM": 6100,
"location": "Courmayeur to Chamonix",
"country": "France/Italy/Switzerland",
"raceStartTime": "09:00",
"cutoffTime": "26:30",
"hasGpx": true,
"gpxFilename": "course.gpx",
"aidStations": [
{
"km": 13.7,
"name": "Refuge Bertone",
"elevation": 2100,
"type": "full-aid",
"cutoffTime": "13:45"
}
],
"segments": [...],
"tags": ["utmb", "100k", "alpine"]
}
}- Launches Chrome via Playwright (visible for debugging)
- Claude agent uses tools to navigate and extract data:
navigate,click,scroll- Browser controlget_page_content,find_elements- Read pagedownload_gpx- Download and parse GPX filesscreenshot- Capture page screenshotsextract_race_info,add_aid_station- Store datadismiss_overlays- Remove popups blocking clicksfinish- Save results to JSON
- Blocks tracking scripts (HubSpot, OneTrust) that interfere
- Outputs JSON + GPX + screenshots
- Node.js 18+ or Bun
- Anthropic API key
- Chrome/Chromium
MIT