Skip to content

BrainGnosis/pubchem-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubChem CLI

PubChem in your terminal.

pubchem is a JSON-first, agent-friendly CLI for compounds, substances, assays, references, PUG View browsing, raw PubChem/NCBI endpoints, resolution workflows, exports, and batch automation.

The npm package ships prebuilt binaries for Linux, macOS, and Windows on amd64 and arm64, so npx pubchem-cli works directly on common platforms.

NPM package: pubchem-cli on npm

Install

Run without installing:

npx pubchem-cli --help

Global install:

npm install -g pubchem-cli
pubchem --help

If you are hacking on this repository locally:

go build -o ./pubchem ./cmd/pubchem
./pubchem --help

Mental Model

Use the CLI in this order:

  1. Resolve an identifier to a record when you start from a name, CAS RN, InChIKey, or source ID.
  2. Use exact identity commands when you already have the CID, SID, or AID.
  3. Use search commands when you need candidate matches.
  4. Use view when you need headings and sections from PUG View.
  5. Use raw only when there is no higher-level verb yet.
  6. Use export when you need files.
  7. Use batch when you need many records.

This CLI does not guess silently. That is intentional.

Usage

pubchem <command> [flags]

Root flags:

  • --json: Output JSON to stdout.
  • --plain: Output stable plain text to stdout.
  • --results-only: In JSON mode, emit only the primary result.
  • --select: In JSON mode, select comma-separated fields with exact dot-path support.
  • --force: Skip confirmations for future destructive commands and file overwrites.
  • --no-input: Never prompt; fail instead.
  • --enable-commands: Comma-separated list of enabled top-level commands.
  • --version: Print version and exit.
  • --help: Show context-sensitive help.

Top-level command families:

  • compound
  • substance
  • assay
  • entity
  • batch
  • refs
  • view
  • raw
  • resolve
  • identity
  • export
  • schema
  • agent
  • completion

What To Use

Task Command
Resolve a drug name, CAS RN, InChI, or InChIKey pubchem resolve compound ...
Exact compound lookup by CID pubchem identity compound ... or pubchem compound get ...
Similarity search around a known structure pubchem compound similar <cid> --query-type cid
Substructure or superstructure search pubchem compound substructure <cid> --query-type cid or pubchem compound superstructure <cid> --query-type cid
Resolve a substance by SID or depositor source ID pubchem resolve substance ... or pubchem substance sourceid ...
Search assays by gene, protein, name, source, or keyword pubchem assay search ...
PubMed, patents, and xrefs pubchem refs ...
Browse nested PUG View sections pubchem view ...
Fetch a raw endpoint that has no higher-level verb pubchem raw fetch ...
Export files or tabular data pubchem export ...
Repeat a workflow across many IDs pubchem batch ...
Discover exact flags and JSON shapes pubchem schema ...
Generate shell completion pubchem completion ...
Inspect stable exit codes pubchem agent exit-codes

Agent Playbook

If you are an agent, use this sequence:

  1. Start with pubchem schema <command> when you need the exact flag surface.
  2. Use pubchem agent exit-codes when you need a stable automation contract.
  3. Use --json by default.
  4. Add --results-only when the downstream consumer wants the primary payload only.
  5. Add --select only with exact field paths that already exist in the JSON output.
  6. Use --enable-commands when you need a restricted command allowlist.
  7. Prefer CID-based structure workflows over name-based structure workflows.

Resolve vs Identity

Use resolve when the input is a name, registry number, InChIKey, or other identifier that may need normalization.

Use identity when you already have the exact record ID and want the explicit no-guess form.

Examples:

pubchem identity compound 2244
pubchem identity substance 92297672
pubchem identity assay 541

Compound Workflows

Resolve a drug name to a CID:

pubchem resolve compound imatinib --json --results-only --max 1

Sample output:

[
  {
    "cid": 5291,
    "properties": {
      "InChIKey": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
      "MolecularFormula": "C29H31N7O",
      "MolecularWeight": "493.6"
    }
  }
]

When you already know the CID, use exact identity:

pubchem identity compound 2244 --json

Get compound details and common pharma fields:

pubchem compound get 2244 --synonyms --description --classification --drug-likeness --json

Pull a compact property panel:

pubchem compound properties 2244 --json --results-only --select=cid,properties.MolecularFormula

Sample output:

[
  {
    "cid": 2244,
    "properties.MolecularFormula": "C9H8O4"
  }
]

Search compounds by name:

pubchem compound search aspirin --max 1

Search compounds by formula:

pubchem compound search C9H8O4 --mode formula --allow-other-elements --max 10

Run SAR-style similarity search from a CID:

pubchem compound similar 5291 --query-type cid --threshold 90 --max 20

Run structure searches from a CID:

pubchem compound substructure 5291 --query-type cid --max 20
pubchem compound superstructure 5291 --query-type cid --max 20

Important rule:

  • Do not feed a plain drug name directly into compound similar, compound substructure, or compound superstructure and expect the CLI to guess.
  • If you start from a name, run pubchem resolve compound <name> first, then feed the resulting CID into the structure search.
  • This is the deterministic, agent-safe path.

Name-to-SAR recipe:

pubchem resolve compound imatinib --json --results-only --select=cid --max 1
pubchem compound similar <cid> --query-type cid --max 20

Structure files and metadata:

pubchem compound structure 2244 --record-type 3d --format json
pubchem compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
pubchem compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol

Compound images:

pubchem compound image 2244 --inline --json
pubchem compound image 2244 --out aspirin.png --force

Compound xrefs and safety:

pubchem compound xref 2244
pubchem compound safety 2244

Substance Workflows

Resolve a substance by SID:

pubchem resolve substance 92297672 --json

Search substances by name:

pubchem substance search aspirin --max 20

Search substances by depositor source and source ID:

pubchem substance sourceid ChemIDplus 0002153982 --max 20

Search substances by registry xref:

pubchem substance search D41527A7-A9EB-472D-A7FC-312821130549 --mode xref --xref-type RegistryID

Map substance records to compounds:

pubchem substance cids 92297672

Fetch substance xrefs for registry normalization:

pubchem substance xref 92297672 --types RegistryID,DBURL,SBURL

Batch substance workflows:

pubchem batch substance cids 92297672 135052148 --progress
pubchem batch substance xref 92297672 135052148 --types RegistryID,DBURL,SBURL --progress

Assay Workflows

Resolve an assay by AID or by target/name:

pubchem resolve assay 541 --json
pubchem resolve assay EGFR --mode target --target-type genesymbol --json

Search assays:

pubchem assay search EGFR --mode target --target-type genesymbol --max 20
pubchem assay search viability --mode name --max 20
pubchem assay search ncgc --mode source --max 20
pubchem assay search kinase --mode keyword --max 20

Fetch assay details and concise result tables:

pubchem assay get 541
pubchem assay results 541 --outcome active --max 20

Batch assay workflows:

pubchem batch assay get 541 542 --progress
pubchem batch assay results 541 542 --outcome active --progress

References, Literature, and Patents

Search PubMed:

pubchem refs search aspirin --max 20

Fetch PubMed metadata by PMID:

pubchem refs get 22385 --json --results-only --select=url

Sample output:

[
  {
    "url": "https://pubmed.ncbi.nlm.nih.gov/22385/"
  }
]

Pull compound-linked literature:

pubchem refs literature compound 2244 --max 20

Pull compound-linked patents:

pubchem refs patents compound 2244 --max 20

Pull compound and substance xrefs through the reference surface:

pubchem refs external compound 2244
pubchem refs external substance 92297672

When you need a bibliography or patent landscape, prefer refs over raw PubMed scraping. It already normalizes the citation records.

PUG View Browsing

Use view when you want headings and sections, not just raw IDs.

Get or browse a compound record:

pubchem view get compound 2244
pubchem view browse compound 2244

Search inside a compound record:

pubchem view search compound 2244 Safety --json --max 1

Sample output:

{
  "entityType": "compound",
  "identifier": "2244",
  "query": "Safety",
  "totalFound": 20,
  "pageSize": 1,
  "truncated": true,
  "results": [
    {
      "path": "Chemical and Physical Properties",
      "tocHeading": "Chemical and Physical Properties",
      "description": "Various chemical and physical properties that are experimentally determined for this compound."
    }
  ]
}

Browse specific sections:

pubchem view section compound 2244 --heading "Safety and Hazards"
pubchem view section compound 2244 --heading "Names and Identifiers"

The same patterns work for substances and assays.

Raw Escape Hatch

Use raw fetch only when the CLI does not yet have a dedicated verb.

pubchem raw fetch pug /compound/cid/2244/JSON
pubchem raw fetch view /data/compound/2244/JSON
pubchem raw fetch pubmed /esummary.fcgi?db=pubmed&id=22385&retmode=json

You can also pass an allowed full URL, but only for PubChem and NCBI hosts.

Exports

Export compound structures:

pubchem export compound structure 2244 --format smiles
pubchem export compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
pubchem export compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol

Export compound properties:

pubchem export compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --format tsv --out compounds.tsv

Export assay results:

pubchem export assay results 541 --outcome active --format csv --out assay-results.csv

Batch Automation

Use batch when you have multiple IDs and want one command to do the same work repeatedly.

pubchem batch compound get 2244 5291 --progress
pubchem batch compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --progress
pubchem batch compound bioactivity 2244 5291 --outcome active --progress
pubchem batch compound xref 2244 5291 --progress

Progress goes to stderr so stdout stays machine-readable.

Selection and JSON Shape

--select is exact-path only. There is no fuzzy matching.

Good examples:

pubchem compound search aspirin --json --results-only --select=cid,properties.MolecularFormula --max 1
pubchem refs get 22385 --json --results-only --select=url
pubchem substance search aspirin --json --results-only --select=sid --max 1

If you need a nested field, use the exact path that appears in the JSON output.

Exit Codes

Use pubchem agent exit-codes for automation.

The current stable codes are:

  • 0 success
  • 1 generic error
  • 2 usage error
  • 3 not found
  • 4 timeout
  • 5 command disabled by --enable-commands

Shell Completion

pubchem completion bash
pubchem completion zsh
pubchem completion fish
pubchem completion powershell

Common Limits and Good Practice

  • Prefer CID-based structure workflows whenever you can.
  • If you start from a name, resolve first and reuse the CID.
  • Broad SMILES similarity and substructure searches can still fail or time out upstream at PubChem.
  • The CLI now reports those failures cleanly instead of hiding them behind parser noise.
  • raw fetch is intentionally allowlisted; it is an escape hatch, not a free-for-all.
  • Use schema instead of guessing flags.

Sample Command Set For Agents

If you want a small, practical agent allowlist:

pubchem --enable-commands compound,substance,assay,refs,view,raw,resolve,identity,batch,export,schema,agent,completion compound search aspirin --max 1

That style keeps the command surface predictable while still covering most PubChem work.

More Examples

Find a structure, then inspect it, then export a file:

pubchem resolve compound imatinib --json --results-only --max 1
pubchem compound get 5291 --synonyms --classification --json
pubchem export compound structure 5291 --record-type 3d --format sdf --out imatinib-3d.sdf

Trace a compound into the literature:

pubchem refs literature compound 5291 --max 20
pubchem refs patents compound 5291 --max 20

Move from an assay target to a hit list:

pubchem assay search EGFR --mode target --target-type genesymbol --max 20
pubchem assay results 3364 --outcome active --max 20

Inspect a PubChem record tree and then pull only one section:

pubchem view search compound 2244 Hazard --max 5
pubchem view section compound 2244 --heading "Safety and Hazards"

For New Users

If you only remember three commands, remember these:

pubchem resolve compound aspirin
pubchem compound get 2244
pubchem view search compound 2244 Safety

Everything else in the CLI builds from those patterns.