PubChem in your terminal.
pubchem is a JSON-first, agent-friendly CLI for compounds, substances, assays, references, PUG View browsing, raw PubChem/NCBI endpoints, resolution workflows, exports, and batch automation.
The npm package ships prebuilt binaries for Linux, macOS, and Windows on amd64 and arm64, so npx pubchem-cli works directly on common platforms.
NPM package: pubchem-cli on npm
Run without installing:
npx pubchem-cli --helpGlobal install:
npm install -g pubchem-cli
pubchem --helpIf you are hacking on this repository locally:
go build -o ./pubchem ./cmd/pubchem
./pubchem --helpUse the CLI in this order:
- Resolve an identifier to a record when you start from a name, CAS RN, InChIKey, or source ID.
- Use exact identity commands when you already have the CID, SID, or AID.
- Use search commands when you need candidate matches.
- Use
viewwhen you need headings and sections from PUG View. - Use
rawonly when there is no higher-level verb yet. - Use
exportwhen you need files. - Use
batchwhen you need many records.
This CLI does not guess silently. That is intentional.
pubchem <command> [flags]Root flags:
--json: Output JSON to stdout.--plain: Output stable plain text to stdout.--results-only: In JSON mode, emit only the primary result.--select: In JSON mode, select comma-separated fields with exact dot-path support.--force: Skip confirmations for future destructive commands and file overwrites.--no-input: Never prompt; fail instead.--enable-commands: Comma-separated list of enabled top-level commands.--version: Print version and exit.--help: Show context-sensitive help.
Top-level command families:
compoundsubstanceassayentitybatchrefsviewrawresolveidentityexportschemaagentcompletion
| Task | Command |
|---|---|
| Resolve a drug name, CAS RN, InChI, or InChIKey | pubchem resolve compound ... |
| Exact compound lookup by CID | pubchem identity compound ... or pubchem compound get ... |
| Similarity search around a known structure | pubchem compound similar <cid> --query-type cid |
| Substructure or superstructure search | pubchem compound substructure <cid> --query-type cid or pubchem compound superstructure <cid> --query-type cid |
| Resolve a substance by SID or depositor source ID | pubchem resolve substance ... or pubchem substance sourceid ... |
| Search assays by gene, protein, name, source, or keyword | pubchem assay search ... |
| PubMed, patents, and xrefs | pubchem refs ... |
| Browse nested PUG View sections | pubchem view ... |
| Fetch a raw endpoint that has no higher-level verb | pubchem raw fetch ... |
| Export files or tabular data | pubchem export ... |
| Repeat a workflow across many IDs | pubchem batch ... |
| Discover exact flags and JSON shapes | pubchem schema ... |
| Generate shell completion | pubchem completion ... |
| Inspect stable exit codes | pubchem agent exit-codes |
If you are an agent, use this sequence:
- Start with
pubchem schema <command>when you need the exact flag surface. - Use
pubchem agent exit-codeswhen you need a stable automation contract. - Use
--jsonby default. - Add
--results-onlywhen the downstream consumer wants the primary payload only. - Add
--selectonly with exact field paths that already exist in the JSON output. - Use
--enable-commandswhen you need a restricted command allowlist. - Prefer CID-based structure workflows over name-based structure workflows.
Use resolve when the input is a name, registry number, InChIKey, or other identifier that may need normalization.
Use identity when you already have the exact record ID and want the explicit no-guess form.
Examples:
pubchem identity compound 2244
pubchem identity substance 92297672
pubchem identity assay 541Resolve a drug name to a CID:
pubchem resolve compound imatinib --json --results-only --max 1Sample output:
[
{
"cid": 5291,
"properties": {
"InChIKey": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
"MolecularFormula": "C29H31N7O",
"MolecularWeight": "493.6"
}
}
]When you already know the CID, use exact identity:
pubchem identity compound 2244 --jsonGet compound details and common pharma fields:
pubchem compound get 2244 --synonyms --description --classification --drug-likeness --jsonPull a compact property panel:
pubchem compound properties 2244 --json --results-only --select=cid,properties.MolecularFormulaSample output:
[
{
"cid": 2244,
"properties.MolecularFormula": "C9H8O4"
}
]Search compounds by name:
pubchem compound search aspirin --max 1Search compounds by formula:
pubchem compound search C9H8O4 --mode formula --allow-other-elements --max 10Run SAR-style similarity search from a CID:
pubchem compound similar 5291 --query-type cid --threshold 90 --max 20Run structure searches from a CID:
pubchem compound substructure 5291 --query-type cid --max 20
pubchem compound superstructure 5291 --query-type cid --max 20Important rule:
- Do not feed a plain drug name directly into
compound similar,compound substructure, orcompound superstructureand expect the CLI to guess. - If you start from a name, run
pubchem resolve compound <name>first, then feed the resulting CID into the structure search. - This is the deterministic, agent-safe path.
Name-to-SAR recipe:
pubchem resolve compound imatinib --json --results-only --select=cid --max 1
pubchem compound similar <cid> --query-type cid --max 20Structure files and metadata:
pubchem compound structure 2244 --record-type 3d --format json
pubchem compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
pubchem compound structure 2244 --record-type 2d --format mol --out aspirin-2d.molCompound images:
pubchem compound image 2244 --inline --json
pubchem compound image 2244 --out aspirin.png --forceCompound xrefs and safety:
pubchem compound xref 2244
pubchem compound safety 2244Resolve a substance by SID:
pubchem resolve substance 92297672 --jsonSearch substances by name:
pubchem substance search aspirin --max 20Search substances by depositor source and source ID:
pubchem substance sourceid ChemIDplus 0002153982 --max 20Search substances by registry xref:
pubchem substance search D41527A7-A9EB-472D-A7FC-312821130549 --mode xref --xref-type RegistryIDMap substance records to compounds:
pubchem substance cids 92297672Fetch substance xrefs for registry normalization:
pubchem substance xref 92297672 --types RegistryID,DBURL,SBURLBatch substance workflows:
pubchem batch substance cids 92297672 135052148 --progress
pubchem batch substance xref 92297672 135052148 --types RegistryID,DBURL,SBURL --progressResolve an assay by AID or by target/name:
pubchem resolve assay 541 --json
pubchem resolve assay EGFR --mode target --target-type genesymbol --jsonSearch assays:
pubchem assay search EGFR --mode target --target-type genesymbol --max 20
pubchem assay search viability --mode name --max 20
pubchem assay search ncgc --mode source --max 20
pubchem assay search kinase --mode keyword --max 20Fetch assay details and concise result tables:
pubchem assay get 541
pubchem assay results 541 --outcome active --max 20Batch assay workflows:
pubchem batch assay get 541 542 --progress
pubchem batch assay results 541 542 --outcome active --progressSearch PubMed:
pubchem refs search aspirin --max 20Fetch PubMed metadata by PMID:
pubchem refs get 22385 --json --results-only --select=urlSample output:
[
{
"url": "https://pubmed.ncbi.nlm.nih.gov/22385/"
}
]Pull compound-linked literature:
pubchem refs literature compound 2244 --max 20Pull compound-linked patents:
pubchem refs patents compound 2244 --max 20Pull compound and substance xrefs through the reference surface:
pubchem refs external compound 2244
pubchem refs external substance 92297672When you need a bibliography or patent landscape, prefer refs over raw PubMed scraping. It already normalizes the citation records.
Use view when you want headings and sections, not just raw IDs.
Get or browse a compound record:
pubchem view get compound 2244
pubchem view browse compound 2244Search inside a compound record:
pubchem view search compound 2244 Safety --json --max 1Sample output:
{
"entityType": "compound",
"identifier": "2244",
"query": "Safety",
"totalFound": 20,
"pageSize": 1,
"truncated": true,
"results": [
{
"path": "Chemical and Physical Properties",
"tocHeading": "Chemical and Physical Properties",
"description": "Various chemical and physical properties that are experimentally determined for this compound."
}
]
}Browse specific sections:
pubchem view section compound 2244 --heading "Safety and Hazards"
pubchem view section compound 2244 --heading "Names and Identifiers"The same patterns work for substances and assays.
Use raw fetch only when the CLI does not yet have a dedicated verb.
pubchem raw fetch pug /compound/cid/2244/JSON
pubchem raw fetch view /data/compound/2244/JSON
pubchem raw fetch pubmed /esummary.fcgi?db=pubmed&id=22385&retmode=jsonYou can also pass an allowed full URL, but only for PubChem and NCBI hosts.
Export compound structures:
pubchem export compound structure 2244 --format smiles
pubchem export compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
pubchem export compound structure 2244 --record-type 2d --format mol --out aspirin-2d.molExport compound properties:
pubchem export compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --format tsv --out compounds.tsvExport assay results:
pubchem export assay results 541 --outcome active --format csv --out assay-results.csvUse batch when you have multiple IDs and want one command to do the same work repeatedly.
pubchem batch compound get 2244 5291 --progress
pubchem batch compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --progress
pubchem batch compound bioactivity 2244 5291 --outcome active --progress
pubchem batch compound xref 2244 5291 --progressProgress goes to stderr so stdout stays machine-readable.
--select is exact-path only. There is no fuzzy matching.
Good examples:
pubchem compound search aspirin --json --results-only --select=cid,properties.MolecularFormula --max 1
pubchem refs get 22385 --json --results-only --select=url
pubchem substance search aspirin --json --results-only --select=sid --max 1If you need a nested field, use the exact path that appears in the JSON output.
Use pubchem agent exit-codes for automation.
The current stable codes are:
0success1generic error2usage error3not found4timeout5command disabled by--enable-commands
pubchem completion bash
pubchem completion zsh
pubchem completion fish
pubchem completion powershell- Prefer CID-based structure workflows whenever you can.
- If you start from a name, resolve first and reuse the CID.
- Broad SMILES similarity and substructure searches can still fail or time out upstream at PubChem.
- The CLI now reports those failures cleanly instead of hiding them behind parser noise.
raw fetchis intentionally allowlisted; it is an escape hatch, not a free-for-all.- Use
schemainstead of guessing flags.
If you want a small, practical agent allowlist:
pubchem --enable-commands compound,substance,assay,refs,view,raw,resolve,identity,batch,export,schema,agent,completion compound search aspirin --max 1That style keeps the command surface predictable while still covering most PubChem work.
Find a structure, then inspect it, then export a file:
pubchem resolve compound imatinib --json --results-only --max 1
pubchem compound get 5291 --synonyms --classification --json
pubchem export compound structure 5291 --record-type 3d --format sdf --out imatinib-3d.sdfTrace a compound into the literature:
pubchem refs literature compound 5291 --max 20
pubchem refs patents compound 5291 --max 20Move from an assay target to a hit list:
pubchem assay search EGFR --mode target --target-type genesymbol --max 20
pubchem assay results 3364 --outcome active --max 20Inspect a PubChem record tree and then pull only one section:
pubchem view search compound 2244 Hazard --max 5
pubchem view section compound 2244 --heading "Safety and Hazards"If you only remember three commands, remember these:
pubchem resolve compound aspirin
pubchem compound get 2244
pubchem view search compound 2244 SafetyEverything else in the CLI builds from those patterns.