A lightweight command-line utility for OpenAI-hosted fine-tuning workflows (tested with openai==2.17.0).
OpenAI’s web UI for fine-tuning is intentionally simple. In practice, it can feel limiting when you need to:
- quickly list your fine-tuned models
- enumerate fine-tuning jobs and correlate jobs → resulting models
- inspect job events (progress, warnings, status transitions)
- pull checkpoints and their metrics
- download and parse result files (often where train/eval loss live)
- estimate cost from billable training tokens
There isn’t an obvious, maintained “basic admin CLI” that covers these everyday tasks, so this script provides a pragmatic baseline.
This single-file CLI (openai_cli.py) provides:
-
Models
- list your fine-tuned models (filters
ft:IDs) - delete a fine-tuned model by ID
- list your fine-tuned models (filters
-
Fine-tuning jobs
- list jobs (status, base model, resulting fine-tuned model)
- list job events
- list job checkpoints (including checkpoint metrics)
-
Training statistics export
- dump a job’s all available training artifacts to a single LLM-friendly JSON blob:
- job metadata
- checkpoints (+ metrics)
- optionally events
- result files (downloaded and CSV-parsed when applicable)
- dump a job’s all available training artifacts to a single LLM-friendly JSON blob:
-
Cost estimate
- read
trained_tokensfrom the job and estimate training USD using a configurable$ / 1M tokensrate table - defaults to
gpt-4.1-mini
- read
- Python 3.9+ recommended
openai==2.17.0(tested)- other 2.x versions may work, but this script targets the 2.17.0 resource layout
Install dependency:
pip install --upgrade openai==2.17.0The script reads the API key from an environment variable.
Required:
OPENAI_API_KEY— your OpenAI API key
Optional (if you use org/project scoping):
OPENAI_ORG_ID— sent asOpenAI-OrganizationheaderOPENAI_PROJECT— sent asOpenAI-Projectheader
Examples:
export OPENAI_API_KEY="sk-..."
# optional:
export OPENAI_ORG_ID="org_..."
export OPENAI_PROJECT="proj_..."Make the script executable (optional):
chmod +x openai_cli.pyRun:
python openai_cli.py --helpLists fine-tuned models visible to your key (IDs starting with ft:).
python openai_cli.py models list
python openai_cli.py models list --verboseOptions:
--verbose
Outputs all fields returned by the Models API as JSONL (one JSON object per line), suitable for piping tojq.
Deletes a fine-tuned model by ID (must look like ft:...).
python openai_cli.py models delete "ft:..."Notes:
- Deleting models may require appropriate org permissions/role.
Lists fine-tuning jobs.
python openai_cli.py jobs list
python openai_cli.py jobs list --limit 50
python openai_cli.py jobs list --after ftjob_... # pagination cursor
python openai_cli.py jobs list --verboseOptions:
--limit N(default 20)--after ID(pagination cursor)--verboseoutputs JSONL with all fields
Lists events for a fine-tuning job (progress/status messages).
python openai_cli.py events ftjob-...
python openai_cli.py events ftjob-... --limit 200
python openai_cli.py events ftjob-... --verbose | jq .Options:
--limit N(default 50)--after ID(pagination cursor)--verboseoutputs JSONL
Lists checkpoints for a fine-tuning job, including checkpoint metrics (when available).
python openai_cli.py checkpoints ftjob-...
python openai_cli.py checkpoints ftjob-... --verbose | jq .Options:
--limit N(default 20)--after ID(pagination cursor)--verboseoutputs JSONL
Dumps everything available for a job as one JSON object (ideal for analysis, regression tracking, or feeding into another LLM).
It pulls:
- job metadata
- checkpoints (+ metrics)
- optional events
- result files listed on the job (downloads them and parses CSV when applicable)
python openai_cli.py stats ftjob-... --pretty > ftjob_stats.jsonOptions:
--pretty
Pretty JSON output.--no-events
Skip events in output (reduces size).--events-limit N(default 200)
How many events to include if enabled.--max-rows N(default 5000;-1= unlimited)
Max CSV rows parsed per result file.--include-raw-csv
Include full raw result file text (can be large).
Typical usage patterns:
# full dump (pretty)
python openai_cli.py stats ftjob-... --pretty > stats.json
# smaller dump (skip events)
python openai_cli.py stats ftjob-... --no-events > stats_no_events.json
# include raw CSV (largest)
python openai_cli.py stats ftjob-... --include-raw-csv > stats_with_raw.jsonOutput notes:
- Many fine-tune runs provide a results file (often CSV) containing step-level metrics.
- Column names vary; the script also emits:
hints.result_file_loss_like_columnshints.checkpoint_metric_keysto help locate loss/accuracy fields quickly.
Estimates training cost from trained_tokens:
python openai_cli.py cost-estimate ftjob-... --prettyOptions:
--model MODEL(defaultgpt-4.1-mini)
Select which model’s training rate is used. The script includes a small editable tableTRAINING_USD_PER_1Mnear the top.--prettypretty JSON output
Examples:
# default gpt-4.1-mini
python openai_cli.py cost-estimate ftjob-... --pretty
# estimate using a different model pricing
python openai_cli.py cost-estimate ftjob-... --model gpt-4.1 --prettyNotes:
trained_tokensis oftennullwhile the job is still running. Re-run after completion.- This is an estimate based on the script’s local rate table. Update
TRAINING_USD_PER_1Mif pricing changes.
python openai_cli.py jobs list --limit 20
python openai_cli.py checkpoints ftjob-...
python openai_cli.py stats ftjob-... --no-events --pretty > stats.jsonpython openai_cli.py models list
python openai_cli.py models list --verbose | jq -r '.id'python openai_cli.py models delete "ft:..."-
No models/jobs returned
- Confirm you set
OPENAI_API_KEY - If you use Projects, set
OPENAI_PROJECTso you’re querying the expected scope.
- Confirm you set
-
trained_tokensis null- The job may still be running (or not finished producing accounting fields). Re-run after status is
succeeded/failed.
- The job may still be running (or not finished producing accounting fields). Re-run after status is
-
Result files missing loss/eval loss
- Metrics exposure depends on model/run configuration and what the platform emits for your job.
- Check both result files (CSV) and checkpoint metrics.
MIT