This directory contains small shell and Python utilities that support a LaTeX manuscript workflow: merging PDFs, extracting figures/tables as EPS, listing or freezing LaTeX references, counting words, converting TeX to DOCX, uploading drafts, and augmenting BibTeX entries with DOIs.
All commands below assume you are running from the directory that contains these scripts.
Make scripts executable:
chmod +x *.shCommon dependencies (install names vary by OS):
- TeX Live utilities:
pdfjam,pdfjoin(often provided by TeX Livetexlive-extra-utilsor similar) - Ghostscript:
gs - Pandoc:
pandoc - TeXcount:
texcount - Python 3 for DOI scripts, plus packages (see below)
Example install (Ubuntu-like):
sudo apt-get update
sudo apt-get install -y texlive-extra-utils ghostscript pandoc texcount python3-pipPython packages for DOI scripts:
python3 -m pip install --user bibtexparser requests unidecode tqdm habaneroQuick dependency sanity checks:
command -v pdfjam
command -v pdfjoin
command -v gs
command -v pandoc
command -v texcount
python3 -c "import bibtexparser, requests, unidecode, tqdm; import habanero; print('python deps ok')"Option 1: getauthorpdf.sh (pdfjam)
./getauthorpdf.sh main.pdf SI.pdf
# or, if your SI file is literally named SI.pdf
./getauthorpdf.sh main.pdfExpected output:
authorpdf.pdf
Option 2: merge.sh (pdfjoin)
./merge.sh main.pdf SI.pdfExpected output:
authorpdf.pdf
Notes:
- Both scripts produce
authorpdf.pdf. The practical difference ispdfjamvspdfjoinand local TeX Live behavior.
./geteps.sh figure.pdfExpected output:
figure.eps
Example: your extended data PDF has multiple pages and you want pages 1, 3, and 5:
./getextfigaseps.sh extended_data.pdf 1 3 5Expected outputs:
ED_Table_1.eps(from page 1)ED_Table_2.eps(from page 3)ED_Table_3.eps(from page 5)
./getfigtaborder.sh main.aux
# or default:
./getfigtaborder.shOutput:
- Prints lines mapping label keys to resolved numbers (and possibly a page field depending on your AUX format).
./getrefseq.sh main.tex
# or default:
./getrefseq.shOutput:
- One reference key per line, unique, in first-seen order.
Prereq: compile your TeX file to generate an up-to-date .aux file (often compile twice):
pdflatex main.tex
pdflatex main.texThen create the static file:
./getstatictex.sh main.texExpected output:
main_static.tex
./wordcount.shExpected output:
nature med word count: intro+result+discussion: <N>
Prereq: ensure these files exist in the working directory:
manuscript.texvancouver-superscript.cslcustom.docx
Then:
./todocx.shExpected output:
manuscript.docx
Default behavior uploads main.pdf as IPF.pdf:
./upload.shUpload a specific file:
./upload.sh paper.pdfExpected:
- Local file
IPF.pdfis created/overwritten - Remote upload occurs to the configured destination
Prereq:
- Working
sftpkey-based auth is strongly recommended because the script runs in batch mode.
Option 1: API-based (Crossref REST via habanero)
python3 getdoi.py refs.bibExpected output file naming quirk:
- The script uses
.replace('.bib','new.bib') - Example:
refs.bibbecomesrefsnew.bib
Option 2: More aggressive, HTML scraping of Crossref guest query
python3 pydoi.py refs.bibExpected output:
refs.bib_doi.bib(the script appends"_doi.bib"to the full input filename)
Goal: Combine main PDF and SI PDF into authorpdf.pdf using pdfjam (letter paper formatting).
Run:
./getauthorpdf.sh main.pdf
# uses SI.pdf as the SI defaultOr:
./getauthorpdf.sh main.pdf Supplement.pdfOutput:
authorpdf.pdf
Failure modes:
pdfjam: command not found
Install TeX Live utilities (example above) and ensure TeX binaries are on PATH.- Missing SI file
Provide the second argument or name the SI fileSI.pdf.
Goal: Merge two PDFs into authorpdf.pdf using pdfjoin.
Run:
./merge.sh main.pdf SI.pdfOutput:
authorpdf.pdf
Failure modes:
pdfjoin: command not found
Install TeX Live extra utils.
Goal: Convert one PDF into an EPS using Ghostscript (eps2write).
Run:
./geteps.sh figure.pdfOutput:
figure.eps
Notes and caveats:
- Output name is computed by Bash substitution
${1/pdf/eps}. Prefer filenames that end with.pdf. - If the file is named
my.figure.PDF(uppercase), it will not match the substitution. Rename to lowercase.pdf.
Troubleshooting:
gs: command not found
Install Ghostscript.
Goal: Extract specific pages from a multi-page PDF and convert each extracted page to EPS.
Run (example extracting pages 2 and 4):
./getextfigaseps.sh extended_data.pdf 2 4What happens:
- Creates
page2.pdf, converts topage2.eps, then renames toED_Table_1.eps - Creates
page4.pdf, converts topage4.eps, then renames toED_Table_2.eps
Outputs:
ED_Table_1.eps,ED_Table_2.eps, ... (numbered by argument order, not by page number)
Common gotchas:
- Page numbers are interpreted the way
pdfjamexpects them. In most TeX Live setups, page numbers start at 1. - If you want output names that preserve the page number, you can adjust the script, but as written it uses sequential
k.
Goal: Inspect label numbering recorded in a LaTeX AUX file.
Default run:
./getfigtaborder.sh
# reads main.auxSpecify AUX explicitly:
./getfigtaborder.sh paper.auxOutput:
- Lines derived from
\newlabel{...}{...}entries - Intended to help you confirm the order and numbers of figure/table labels
Notes:
- AUX formats differ depending on packages and engines, so the extracted fields can vary.
Goal: Print unique \ref{...} keys in the order they appear in a TeX file.
Default run:
./getrefseq.sh
# reads main.texSpecify TeX:
./getrefseq.sh paper.texOutput:
- One label key per line, duplicates removed, order preserved.
Uses:
- Sanity check: confirm expected reference labels exist
- Quick inventory of referenced labels (figures, tables, sections)
Goal: Produce a static TeX file by replacing \ref{label} with its resolved number from the AUX file.
Run:
./getstatictex.sh main.texPrereq:
main.auxmust exist and be up to date. Compile the LaTeX file first:
pdflatex main.tex
pdflatex main.texOutput:
main_static.tex
Important caveats:
- Replacement is done with plain
sedand can over-replace if your label name appears as plain text elsewhere. - The AUX filename is computed by
${FILE/tex/aux}which can misbehave iftexappears earlier in the path. If you hit issues, hardcode AUX derivation or adjust the substitution. - Always diff the result:
diff -u main.tex main_static.tex | headGoal: Print a word count intended to approximate Nature Medicine counting for intro + results + discussion.
Run:
./wordcount.shAssumptions:
- The script runs
texcount main_new.tex - It then slices a fixed region of the texcount output and sums a field
If you rename your TeX file, update the script or create a symlink:
ln -sf yourfile.tex main_new.tex
./wordcount.shGoal: Convert a LaTeX manuscript to a Word document using Pandoc and a specific CSL citation style.
Inputs expected:
manuscript.texvancouver-superscript.cslcustom.docx
Run:
./todocx.shOutput:
manuscript.docx
Troubleshooting:
- If citations do not render, confirm
--citeprocsupport in your Pandoc version. - If math renders poorly, check the target journal requirements and consider
--mathmlalternatives.
Goal: Upload a PDF to a fixed server path using sftp batch mode.
Default run:
./upload.sh
# uploads main.pdf as IPF.pdfSpecify input:
./upload.sh authorpdf.pdfBehavior:
- Copies the selected PDF to
IPF.pdflocally - Uploads
IPF.pdfviasftpto a hardcoded destination
Prereq:
- Key-based auth typically required for non-interactive
sftpworkflows
Both scripts read BibTeX, fill missing DOI fields, then write a new BibTeX file.
Run:
python3 getdoi.py refs.bibBehavior:
- For each entry missing
doi, queries Crossref usingtitleand takes the top hit (limit 1) - Writes a new bib file
Output file naming:
refs.bib->refsnew.bib(quirk of.replace('.bib','new.bib'))
Tip:
- If you want
refs_new.bib, edit that output filename line.
Potential failure modes:
- If Crossref returns no items, the script can error (depending on current code). If it crashes on some entries, use
pydoi.pyor harden the empty-result handling.
Run:
python3 pydoi.py refs.bibBehavior:
- Normalizes the title (ASCII, strips LaTeX-ish constructs)
- Tries multiple author last names to increase hit probability
- Extracts DOI by regex matching of
doi.org/<...>in the HTML response
Output:
refs.bib_doi.bib
Tradeoffs:
- More aggressive on messy entries
- More brittle because it depends on Crossref HTML layout
Shell scripts:
getauthorpdf.shmerge.shgeteps.shgetextfigaseps.shgetfigtaborder.shgetrefseq.shgetstatictex.shwordcount.shtodocx.shupload.sh
Python scripts:
getdoi.pypydoi.py
If you want to make these tools more robust, the highest-value changes are:
geteps.sh: output naming via${1%.pdf}.epsinstead of${1/pdf/eps}getstatictex.sh: replace only\ref{label}tokens (not raw label substrings) and use safer parsing of AUX entriesgetdoi.py: handle Crossref empty results without indexing[0]- Harmonize DOI script output naming to
*_doi.biband*_new.bib
This utility merges all BibTeX files in a directory into a single consolidated .bib file while removing duplicates using bibtool.
- Combine multiple
.bibsources (Zotero exports, PubMed, manual entries, etc.) - Normalize entries
- Remove duplicates based on:
- BibTeX key (first pass)
- Content similarity (title + author + year) (second pass)
Install bibtool:
sudo apt-get install bibtoolCheck installation:
bibtool --versionFile: merge_bibs.sh
Make executable:
chmod +x merge_bibs.sh./merge_bibs.shOutput:
merged.bib
./merge_bibs.sh ./refs final.bib- Finds all
.bibfiles in the specified directory - Concatenates them into a temporary file
- Runs
bibtoolin two passes:
--duplicates=keyRemoves entries with identical BibTeX keys.
--duplicates=field \
--duplicate.field=title \
--duplicate.field=author \
--duplicate.field=yearRemoves entries that are the same paper but have different keys.
# Step 1: Merge raw bibliographies
./merge_bibs.sh ./bib_sources merged_raw.bib
# Step 2: Add DOIs
python3 getdoi.py merged_raw.bib
# OR
python3 pydoi.py merged_raw.bib
# Step 3: Re-deduplicate using DOI
./merge_bibs.sh . merged_clean.bib--duplicate.field=doi--normalize.fields=title--prefer.field=doibibtooldeduplication is heuristic for content-based matches- Title normalization is not perfect across LaTeX vs Unicode sources
- Always inspect final output before submission:
less merged.bib- Before
todocx.sh - After DOI enrichment (
getdoi.py,pydoi.py) - Before journal submission formatting