Skip to content

dfrancour/mongodb-explain-plan-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

MongoDB Explain Plan Generator

Generates MongoDB explain plan JSON fixtures using disposable Podman containers. Two generators serve different purposes:

Generator Data Source Queries Fixtures Use Case
Example Plans 101 synthetic Airbnb docs 16 curated 48 Example fixtures for mongodb-paste-the-plan
Validation Plans Official MongoDB sample datasets (~400k docs) 72 across 6 databases ~216 per version Parser validation with realistic execution stats

Example Plans

Curated set of 16 queries (48 fixtures) across basic, performance, and complex categories. Uses a small synthetic Airbnb dataset — self-contained, no external downloads.

./src/example-plans/run.sh

Output: src/example-plans/output/ with basic/, performance/, complex/ subdirectories.

Validation Plans

Comprehensive test fixture generator targeting maximum parser coverage. Downloads the official MongoDB sample datasets (~350 MB, cached after first download) and runs queries against real data across multiple databases.

Data Sources

Database Key Collections Queries
sample_mflix movies (~23k), comments (~50k), theaters, users Primary: find, $lookup, text search, geo, $setWindowFields
sample_training companies, routes (~67k), zips (~30k), grades $graphLookup, $elemMatch, geo, $or with SUBPLAN
sample_analytics accounts, customers, transactions $unwind (transactions array), $lookup
sample_supplies sales (~5k) $bucket, $facet
sample_geospatial shipwrecks GeoJSON queries, sharded geo
sample_weatherdata data $fill (linear interpolation)

A supplemental logs collection (clustered index) is created in sample_training for CLUSTERED_IXSCAN coverage.

Usage

# Full run: all versions + sharded cluster
./src/validation-plans/run.sh

# Quick iteration: single version, no sharding
./src/validation-plans/run.sh --versions 8.0 --no-sharded

# See all options
./src/validation-plans/run.sh --help

Output: src/validation-plans/output/ with {version}/single-node/ and {version}/sharded/ subdirectories, plus a manifest.json coverage report.

Query Categories

  • find — IXSCAN, COLLSCAN, IDHACK, projections (covered/simple/default), sorts, $or/SUBPLAN, multi-plan, arrays, regex, $exists
  • aggregate — $group, $lookup, $graphLookup, $unwind, $facet, $bucket, $setWindowFields, $densify, $fill, $merge, $documents, $unionWith
  • geo — $near (2dsphere), $geoWithin, $geoIntersects, $geoNear pipeline
  • text — basic search, phrase, negation, with filter, with score sort
  • write — update (single/multi), delete (single/multi, BATCHED_DELETE)
  • edge-cases — count, distinct (DISTINCT_SCAN), hints, sparse/partial indexes, clustered collections

How Data Loading Works

  1. The official sample archive is downloaded once from atlas-education.s3.amazonaws.com and cached in src/.cache/
  2. mongorestore loads the archive into the container (tries host binary first, falls back to running inside the container)
  3. Indexes are created across all databases for the queries
  4. The logs clustered collection is created synthetically (no official dataset uses clustered indexes)

Requirements

  • Podman
  • Python 3.7+
  • curl (validation-plans only — for downloading sample data)
  • mongorestore (validation-plans only — from MongoDB Database Tools; falls back to container if not on host)

Port 27017 must be free — the script checks and fails fast with a clear message if something else is already listening.

Project Structure

src/
  common/               # Shared utilities
    podman.py           #   Container lifecycle (start, wait, connect, stop)
    data_loader.py      #   Synthetic data + official sample data download/restore
    index_manager.py    #   Index creation (example + validation)
    explain_runner.py   #   Explain command builder for all query types
    output_writer.py    #   JSON serialization (handles BSON types) + validation
  example-plans/        # Curated example fixture generator
    queries.py          #   16 queries, synthetic Airbnb data
    generate.py         #   Generator entry point
    run.sh              #   Shell wrapper
  validation-plans/     # Comprehensive validation fixture generator
    queries/            #   72 queries across 7 files, official sample data
    infrastructure/     #   SingleNodeContainer + ShardedCluster managers
    coverage.py         #   Stage coverage tracking + manifest
    generate.py         #   Generator entry point
    run.sh              #   Shell wrapper

About

Utility for bulk generating sample MongoDB query explain plans

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors