syllabus

Local CLI study index over the AI-security PDF corpus.

Features • Installation • Corpus • Use • NICE 541 • Recon mode • Scope

syllabus indexes 981 PDFs into a single SQLite database: NICE 541 career-pathway PDFs (the DoD cyber workforce framework), 5 University of Illinois AI-secure courses (cs307, cs442, cs562, cs598, cs598-fall2020), and 260 USENIX and NDSS papers. The CLI runs BM25 search, lists papers by topic across a 21-topic hand-curated taxonomy, and maps 36 NICE 541 KSAs onto the AI-security paper corpus that operationalizes each KSA in an ML context.

The same corpus drives recon. signals.py, exfil.py, expand.py, scope.py, and probe.py chain together so the literature directly produces a paper-derived target list. The first end-to-end exposed-LLM finding from that chain (case-study-syllabus-vllm-sweep.md) shipped four verified UNAUTH OpenAI-compatible serving endpoints, one already under attacker mass-scanning.

Features

BM25 (k1=1.5, b=0.75) full-text search over the indexed corpus
21 hand-curated topics. A paper inherits a topic when its keyword count crosses 2
36 NICE 541 KSA-to-paper maps, including four AI-extension KSAs (K_AI_ADV, K_AI_POI, K_AI_PRIV, K_AI_FED)
First 15 pages indexed per PDF (abstract, intro, problem setup, enough body for topic tagging without index bloat)
Idempotent ingest: cache hits skip re-extraction; --reindex forces a clean re-pass
Recon-mode scripts read the corpus to extract IP literals, GitHub and HuggingFace pivots, ports, endpoints, and defaults
Single SQLite DB at ~/syllabus/syllabus.db. No service to run. No network calls inside search, topics, or ksa

Installation

Requires pdftotext (poppler-utils) and Python 3.10+.

git clone https://github.com/nuclide-research/syllabus.git ~/syllabus
ln -sf ~/syllabus/syllabus.py ~/.local/bin/syllabus

Corpus layout

Hard-coded in CORPORA at the top of syllabus.py. Defaults:

~/Documents/dod-cyber-pathways         (NICE 541 work-role PDFs)
~/Documents/cs307-aisecure
~/Documents/cs442-aisecure
~/Documents/cs562-aisecure
~/Documents/cs598-aisecure
~/Documents/cs598-fall2020-aisecure

Point CORPORA at whatever folders you have. Each value is a filesystem path; the key is the corpus label shown in search output.

Use

syllabus ingest                                          # extract + index every PDF (idempotent)
syllabus ingest --reindex                                # re-extract from scratch

syllabus search "certified robustness randomized smoothing" -n 10
syllabus topics                                          # all topics + counts
syllabus topics backdoor -n 8                            # papers tagged backdoor
syllabus ksa K0342                                       # NICE 541 KSA -> corpus papers
syllabus ksa                                             # every KSA, all at once
syllabus brief certified-robustness -n 5
syllabus stats

Storage

Path	Contents
`~/syllabus/syllabus.db`	SQLite index
`~/syllabus/extracted/*.txt`	per-PDF text cache, sha1-named

Reindex is safe. Deletes happen by doc_id, so the prior cache is reused.

NICE 541 mapping

The KSA bridge is the load-bearing part. Each NICE 541 KSA is mapped to a keyword set drawn from the official career-pathway PDF. syllabus ksa <id> returns BM25-ranked corpus matches for that KSA, scoring the NICE pathway docs and the AI-security paper corpus together. The result is a side-by-side reading list: the work-role doc that defines the KSA next to the AI-security papers that operationalize it.

$ syllabus ksa K0177
=== K0177 - cyber attack stages (recon/scanning/etc.) ===
   27.83  [nice-541  ] 531 Cyber Defense Incident Responder
   27.13  [nice-541  ] 541 Vulnerability Assessment Analyst Career Pathway
   24.52  [nice-541  ] 511 Cyber Defense Analyst Career Pathway
   20.79  [nice-541  ] 212 Cyber Defense Forensics Analyst Career Pathway
   10.01  [cs562     ] Poison Frogs! Targeted Clean-Label Poisoning

The work-role doc names the kill-chain framing the workforce uses. The Poison Frogs paper is what that kill chain looks like inside an ML pipeline. Same KSA, two operating substrates.

Recon mode (corpus as the brain)

syllabus is more than a study index. The corpus describes the threat model the field is actively researching, and that intelligence drives recon in the wild. The scripts at the repo root chain together:

signals.py   -> rank AI/ML platforms by paper-mention count;
                extract corpus-described ports / endpoints / defaults

exfil.py     -> pull every cited IP literal + non-standard host:port
exfil2.py    -> pull every cited GitHub / HuggingFace / Replicate / etc.
                second-hop pivot surface

expand.py    -> pull a recent AI-security paper corpus from the arxiv API
                to keep the brain current

scope.py     -> turn extracted citations into an operator-authorizable
                scope sheet (checkbox per target)

probe.py     -> read scope.md, run passive recon (rDNS, whois, crt.sh,
                HTTP HEAD) on every [x] row; --active adds TCP banners

shodan/sweep.py is the same pattern at scale. It reads a corpus-derived Shodan dork (vllm, sglang, "Triton Inference Server"), pulls all hits, runs a single fingerprint GET per host (/v1/models for OpenAI-compatible engines, /v2 for Triton). Each verified unauth is a real exposed inference endpoint.

Requires SHODAN_API_KEY to run.

Case studies

case-study-syllabus-vllm-sweep.md. First end-to-end exposed-AI finding produced by literature-derived asset discovery. Four UNAUTH OpenAI-compatible LLM serving endpoints verified, one already under active mass scanning with attacker-injected model entries.

Scope

syllabus does single fingerprint GETs per target. No inference requests. No model uploads. No federation joins. The operator-policy gate is welcome to block deeper probes. Enumerate metadata, do not exfiltrate. The names are the finding. Only run the recon scripts against hosts you own or have explicit written authorization to assess.

Our other projects

wardrobe — NICE Cybersecurity Workforce Framework as a wardrobe of atoms
tome — Technical OSINT Mining Engine, canonical platform corpus
aimap — AI/ML infrastructure fingerprint scanner
scanner — active-banner stage between passive discovery and deep enumeration
BARE — semantic exploit-module ranking over scanner findings

License

MIT. Part of the NuClide toolchain. Contact: nuclide-research.com

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
disclosures		disclosures
shodan		shodan
.gitignore		.gitignore
CORRECTIONS-2026-06-08.md		CORRECTIONS-2026-06-08.md
LICENSE		LICENSE
README.md		README.md
academic-cluster.md		academic-cluster.md
case-study-syllabus-vllm-sweep.md		case-study-syllabus-vllm-sweep.md
deep-dive.md		deep-dive.md
exfil.py		exfil.py
exfil2.py		exfil2.py
expand.py		expand.py
expand_venues.py		expand_venues.py
findings-breakdown.txt		findings-breakdown.txt
hardening-guide.md		hardening-guide.md
probe.py		probe.py
scope.py		scope.py
signals.py		signals.py
syllabus.py		syllabus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

syllabus

Local CLI study index over the AI-security PDF corpus.

Features

Installation

Corpus layout

Use

Storage

NICE 541 mapping

Recon mode (corpus as the brain)

Case studies

Scope

Our other projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

syllabus

Local CLI study index over the AI-security PDF corpus.

Features

Installation

Corpus layout

Use

Storage

NICE 541 mapping

Recon mode (corpus as the brain)

Case studies

Scope

Our other projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages