Automatically evaluates movie scripts against the Bechdel Test using DocETL + LLM reasoning.
The Bechdel Test is a rule for assessing the agency of women in movies.
A movie passes the test if:
- Two named women speak to each other
- Their conversation is about something other than a man
The DocETL pipeline this repository uses is as follows:
- Extract potential dialogue and character names
- Infer character genders using LLM (🤖 Femputer)
- Filter conversations involving women
- Evaluate whether the script passes the Bechdel Test using LLM reasoning (🤖 Femputer)
graph TD
A[Movie Script] --> B[Dialogue Extraction]
B --> C[Gender Inference]
C --> D[Conversation Filtering]
D --> E[Bechdel Evaluation]
Pixi is used to manage dependencies.
Pixi installation can be found at: https://pixi.prefix.dev/latest/installation/.
This project requires an LLM API key.
A sample .env file is provided.
DocETL uses LiteLLM under the hood, so many providers are supported.
The default model, gemini-2.5-flash, has a free tier with a limited number of queries per minute and per day.
To change models:
- Add the appropriate API key to
.env - Update
default_modelinrun_bechdel.py
Several sample movie scripts have been provided in the data/raw subfolder.
Many more scripts can be found on https://imsdb.com/.
├── LICENSE
├── README.md
├── data
│ └── raw
│ ├── 10_things_..._you.txt
│ ├── 500_days_of_summer.txt
│ ├── barbie.txt
│ ├── futurama.txt
│ └── joker.txt
├── pixi.lock
├── pixi.toml
└── src
├── create_json.py
├── prompts.py
├── run_bechdel.py
└── utils.pyThe pixi environment is activated with
pixi shellThe run_bechdel.py script takes one argument, the location of the script .txt file.
The following command runs the Bechdel Test on the barbie movie.
python src/run_bechdel.py data/raw/barbie.txt- The pipeline may miss dialogue in scripts with non-standard formatting
- Character gender inference is highly imperfect
- The Bechdel test alone is an incomplete measure of representation. Adding a parallel map operation with varied degrees of Bechdel Test strictness could allow for a more nuanced scoring system.