These Python scripts take a YAML file of show links from CAGEMATCH.net, scrape each one, parse the info to a standard set of objects (see models/) and outputs each one as a JSON file in the specified directory.
uv run main.py shows.yaml out$ uv run main.py --help
Usage: main.py [OPTIONS] FILENAME DESTINATION
Parse path FILENAME, outputting YAML representations of the shows in folder
DESTINATION, which will be created if it does not exist. Existing files will
be overwritten if they clash.
Options:
--loglevel [CRITICAL|FATAL|ERROR|WARN|WARNING|INFO|DEBUG|NOTSET]
Set the Python logging level. [default:
WARNING]
--help Show this message and exit.The list of attended shows is sourced from a YAML file. My personal copy is at shows.yaml.
At it's most basic, the file is just a top level YAML list of show URLs:
- https://www.cagematch.net/?id=1&nr=244194
- https://www.cagematch.net/?id=1&nr=241265
- https://www.cagematch.net/?id=1&nr=244175To add shows to the list that will be considered, add a new line containing the text - followed by the URL of the show page from CAGEMATCH. The - syntax creates a list in YAML.
You can also use comments to help you identify links in any way you choose. Comments are created by typing #. All text that follows on that line will be ignored when the file is loaded in. This makes it easy to keep track of which URL is which. For instance you could indicate the show with a comment at the end of the line:
- https://www.cagematch.net/?id=1&nr=425337 # NOAH Sunny Voyage 2025
- https://www.cagematch.net/?id=1&nr=417742 # AEW Dynasty 2025
#- https://www.cagematch.net/?id=1&nr=419740 This link will be ignored, for exampleor you could use comment lines:
# NOAH Sunny Voyage 2025
- https://www.cagematch.net/?id=1&nr=242190
# AEW Dynasty 2025
- https://www.cagematch.net/?id=1&nr=417742
# This link will be ignored, for example:
#- https://www.cagematch.net/?id=1&nr=419740or however you choose. In my personal file, I've used whole line comments to separate months and then added show names at the end of each line.
Not every entry is just a URL however. Special types of shows can be created using certain keywords. The following keywords are supported:
Note: indentation is important in YAML. You should follow the indentation patterns in the example below (i.e. each new level should be indented by 4 spaces).
When it comes to TV shows, CAGEMATCH tends to create a show page for each show even if they were taped in one go. If you want to stick these back together, the taping keyword will do this. To use it, use taping: as your top level list item, and then include an indented list of the show pages you want to be combined:
- taping:
- https://www.cagematch.net/?id=1&nr=94859 # WWE Friday Night SmackDown #714
- https://www.cagematch.net/?id=1&nr=94858 # WWE Main Event #30You can combine as many shows as you want together:
- taping:
- https://www.cagematch.net/?id=1&nr=129265
- https://www.cagematch.net/?id=1&nr=129881
- https://www.cagematch.net/?id=1&nr=130380
- https://www.cagematch.net/?id=1&nr=127390By default, the combined show will take the name of the first show listed, and add "Taping" to the end, for example "WWE Friday Night SmackDown #714 Taping" in the first example. The details of the first show listed will also be used (such as the promotion and date). The ID number of the show will be a list of all the shows that were combined.
If you want to specify a custom name, you can do this using the following syntax:
- taping:
name: "AEW All In London 2024"
urls:
- https://www.cagematch.net/?id=1&nr=401410 # AEW All In London 2024 - Zero Hour
- https://www.cagematch.net/?id=1&nr=374482 # AEW All In London 2024The urls key should contain the same show URL list as you would have entered (now indented another level), and the name key will be used for the combined show instead of the default.
Sometimes, you didn't see an entire show. In this case, you can stop wrestlers you didn't see getting an appearance credit by indicating the matches that you missed using the partial keyword. For instance, to exclude the last two matches of this show, do the following:
- partial:
url: https://www.cagematch.net/?id=1&nr=239089
exclude: [4,5]The url key indicates the show URL as you would have entered, and the exclude key indicates the matches you missed. This is a comma separated list, with the first match being match 1, regardless of if it's a dark match. The numbers don't need to be in order or continuous.
If you missed almost all of the show, you may not want to count it in your overall totals. In this case, use the exclude_from_count key:
- partial:
url: https://www.cagematch.net/?id=1&nr=236880
exclude: [2,3,5]
exclude_from_count: TrueCAGEMATCH will sometimes list things you might consider as a single match as multiple separate matches. An example I found was a gauntlet match. For example, including this show normally would add 50 matches to Manami Toyota's count. If you would rather have this treated as one match, you can do the following:
- squashmatch:
url: https://www.cagematch.net/?id=1&nr=186905 # Manami Toyota ~ Retirement To The Universe
squash: [2-51]The url key indicates the show URL as you would have entered, and the squash key indicates the match range that should be combined. This is a comma separated list, formatted start-end. Multiple ranges can be provided in a list. No checking is done that these ranges are valid or make sense.
Keywords can be used anywhere a show URL would otherwise go, meaning you can combine the various keywords together. For example, you could combine part of one show with a complete other show:
- taping:
- partial:
url: https://www.cagematch.net/?id=1&nr=242611
exclude: [1]
- https://www.cagematch.net/?id=1&nr=242612- taping:
name: "WWE Friday Night SmackDown #556 Taping"
urls:
- https://www.cagematch.net/?id=1&nr=50217 # WWE NXT #1.08
- partial:
url: https://www.cagematch.net/?id=1&nr=50185 # WWE Superstars #53
exclude: [2,3] # only one match
- https://www.cagematch.net/?id=1&nr=50226 # WWE Friday Night SmackDown #556The system is flexible enough that you could deeply nest keywords if you wanted:
- taping:
name: "WrestleMania Weekend"
urls:
- squashmatch:
url:
taping:
name: "WWE World at WrestleMania 41"
urls:
- https://www.cagematch.net/?id=1&nr=424399
- https://www.cagematch.net/?id=1&nr=424400
- https://www.cagematch.net/?id=1&nr=424385
- partial:
url: https://www.cagematch.net/?id=1&nr=424401
exclude: [2]
squash: [1-15]
- https://www.cagematch.net/?id=1&nr=394375
- taping:
- https://www.cagematch.net/?id=1&nr=418372
- https://www.cagematch.net/?id=1&nr=423692
- https://www.cagematch.net/?id=1&nr=394376My personal shows.yaml is in the repo for more examples of using these keywords: shows.yaml
To avoid making too many requests, CAGEMATCH pages are cached locally when retrieved. The tool will print out as it loads whether a page was downloaded fresh or retrieved from the cache. If the information on a page has changed since it was first retrieved, deleting the cache database (cagematch_cache.sqlite) will cause all pages to be reloaded on the next run.