diff --git a/README.md b/README.md index 35c3491..f7c26cf 100644 --- a/README.md +++ b/README.md @@ -1,59 +1,46 @@ # Revenant -> Find the secrets they thought they deleted. +Recover deleted and force-pushed GitHub commits and scan them for live secrets. [![CI](https://github.com/smakarim/revenant/actions/workflows/ci.yml/badge.svg)](https://github.com/smakarim/revenant/actions/workflows/ci.yml) [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](LICENSE) [![Go Report Card](https://goreportcard.com/badge/github.com/smakarim/revenant)](https://goreportcard.com/report/github.com/smakarim/revenant) -Revenant recovers **deleted, force-pushed, and dangling commits** from a targeted -GitHub org, user, or repo list — commits the developer believes are gone — scans -them with [TruffleHog](https://github.com/trufflesecurity/trufflehog), confirms which -secrets are still **live**, deduplicates them, correlates each back to the leaking -developer, and ranks findings by a transparent blast-radius score. +Revenant finds secrets in commits that are no longer reachable from any branch: deleted +commits, force-pushed history, and dangling objects. In the same run it also scans a repo's +reachable history, public gists, and, optionally, the whole of GitHub through code search. +Every candidate is verified with [TruffleHog](https://github.com/trufflesecurity/trufflehog), +deduplicated, attributed to the author who committed it, and ranked by a blast-radius score. -Revoking-and-deleting is **not** remediation: GitHub keeps the commit objects, and the -credential stays reachable until it is actually rotated. Revenant finds exactly those. +Revoking a leaked credential without rotating it is not enough. GitHub keeps the commit +objects, so the secret stays reachable by its SHA. Revenant is built to find those. ## Why -The mainstream secret scanners (TruffleHog, Gitleaks, GitHound) are excellent at scanning -the *current* state of repos you point them at. The deleted/force-pushed surface is proven -high-value (force-push leaks have paid out tens of thousands in bounties) yet remains -fragmented across slow, fiddly, credential-heavy tooling. Revenant makes that surface a -single zero-config command. +TruffleHog, Gitleaks, and GitHound scan the current state of repositories you already know +about. The deleted and force-pushed surface, where credentials get "removed" and forgotten, +is not covered by those tools. Revenant targets that surface and folds the ordinary history +scan into the same run, so you do not have to stitch several tools together. ## Features -- **Scans everything in one pass** — each repo's **reachable history** *and* its - **deleted/force-pushed commits**, plus **gists**, merged into one deduped report with a - `SOURCE` tag per finding (`history` / `deleted` / `gist` / `dork`). `--no-history` / - `--no-deleted` / `--no-gists` narrow it; `--verified-only` shows only live secrets. -- **Footprint discovery** — a user's gists are scanned by default; `--members` also enumerates - an org's members and scans their personal repos and gists. -- **GitHub-wide dorking** — `--dork` searches *all* of GitHub (Code Search) for the target's - leaks using a parameterized dork corpus, scanning each matched file (`SOURCE=dork`). Add - `--domain acme.com`, supply `--dork-file`, or cap with `--dork-max`. Requires `--tokens`. -- **Live-key intelligence** — `--analyze` enumerates what a *verified* GitHub token can actually - do (identity + scopes, e.g. `user:bob; scopes: repo, workflow, admin:org`), printed in a - "Key intelligence" section. Turns "it's live" into blast radius. -- **Keyless by default** — `revenant --org acme` runs with no signup, no API keys, no - subscription (waybackurls/gau ergonomics). A GitHub PAT is optional and only adds speed. -- **Tiered discovery** — the repo **activity log** (`force_push`/`branch_deletion` - entries carry the overwritten `before` SHA — precise and immediate) → **Events API** - (keyless fallback) → optional CFOR short-SHA brute-force (`--bruteforce`). -- **Live verification** — wraps TruffleHog's verified detectors; each unique secret is - verified exactly once (content-addressed cache). -- **Developer correlation** — every finding is tied back to `org → repo → commit → author`, - with first/last-seen and distinct-author spread. -- **Blast-radius ranking** — transparent, tunable scoring (verified status × secret-type - severity × spread) so you triage the org-admin PAT before the read-only token. -- **Rate-limit resilient** — the shared HTTP client treats `403`/secondary-limit responses - as *throttle, not failure*, with adaptive backoff and round-robin across multiple PATs. +- Scans reachable history, deleted and force-pushed commits, and gists in one run. Every + finding carries a SOURCE tag (history, deleted, gist, or dork). +- Discovers a wider footprint: a user's gists by default, and with `--members` an org's + members along with their personal repos and gists. +- GitHub-wide dorking (`--dork`) searches all of GitHub with a code-search dork corpus and + scans each matching file. A custom dork file and a domain term are supported. +- Live-key intelligence (`--analyze`) reports a verified GitHub token's identity and scopes, + so you know what the key can actually do. +- Verifies each secret with TruffleHog and caches the result, so a repeated secret is checked + once. +- Attributes every finding to org, repo, commit, and author, with first and last seen. +- Ranks findings by a transparent score based on verified status, secret type, and spread. +- Handles GitHub rate limits with adaptive backoff and rotation across multiple tokens. ## Install -Requires **Go 1.22+** and [TruffleHog](https://github.com/trufflesecurity/trufflehog) on your `PATH`. +Requires Go 1.22 or newer and TruffleHog on your PATH. ```bash go install github.com/smakarim/revenant/cmd/revenant@latest @@ -70,86 +57,94 @@ go build -o revenant ./cmd/revenant ## Usage ```bash -revenant --org acme # keyless, zero config -revenant --org acme --tokens t1,t2 # faster: parallelize across PATs -revenant --repos acme/web,acme/api # explicit repos (brute-force discovery) -revenant --user bob --min-score 50 # only high-blast-radius findings -revenant --org acme -o findings.json # machine-readable JSON output +revenant --org acme --tokens TOKEN +revenant --user bob --tokens TOKEN --min-score 50 +revenant --repos acme/web,acme/api --tokens TOKEN +revenant --org acme --tokens TOKEN -o findings.json +revenant --org acme --tokens TOKEN --dork --analyze --verified-only ``` +A token is recommended. The activity log used to find deleted commits needs an authenticated +request, and code search requires one. Any free personal access token works. Without a token, +discovery falls back to the slower public Events API. + ### Example output ``` -SCORE TYPE STATUS AUTHORS FIRST_COMMIT -172 AWS VERIFIED 2 deadbeef -60 GitHub VERIFIED 1 0a1b2c3d -22 Stripe unverified 1 feedface +SCORE TYPE STATUS SOURCE AUTHORS FIRST_COMMIT +172 AWS VERIFIED deleted 2 03e0f0f5d25c +60 Github VERIFIED history 1 32eaf5a4af9e +22 Stripe unverified gist 1 a1b2c3d4e5f6 +``` + +With `--analyze`, verified GitHub tokens get a Key intelligence section: + +``` +Key intelligence: + [Github] user:bob; scopes: repo, workflow, admin:org ``` ### Flags | Flag | Description | |------|-------------| -| `--org` | Target organization (all its repos) | -| `--user` | Target user (all their repos) | +| `--org` | Target organization (all of its repos) | +| `--user` | Target user (all of their repos) | | `--repos` | Explicit `owner/name` repos, comma-separated | | `--repos-file` | File of newline-delimited `owner/name` repos | -| `--tokens` | Optional GitHub PAT(s), comma-separated — speed only | +| `--tokens` | GitHub personal access tokens, comma-separated | | `--min-score` | Hide findings below this blast-radius score | -| `--bruteforce` | Also brute-force short SHAs (slow keyless fallback; off by default) | | `--no-history` | Skip scanning each repo's reachable history | -| `--no-deleted` | Skip mining deleted/force-pushed commits | -| `--verified-only` | Report only secrets confirmed live | -| `--members` | Also enumerate org members and scan their personal repos + gists | +| `--no-deleted` | Skip mining deleted and force-pushed commits | | `--no-gists` | Skip scanning gists (on by default) | +| `--members` | Enumerate org members and scan their personal repos and gists | +| `--bruteforce` | Brute-force short SHAs (slow keyless fallback; off by default) | | `--dork` | Search all of GitHub for the target's leaks (requires `--tokens`) | -| `--domain` | Extra dork search term (e.g. a company domain) | -| `--dork-file` | Custom dork templates (`{term}`/`{domain}`); overrides built-in corpus | +| `--domain` | Extra dork search term, such as a company domain | +| `--dork-file` | Custom dork templates using `{term}` and `{domain}` | | `--dork-max` | Cap on total dork hits scanned (default 200) | -| `--analyze` | Enumerate capabilities of verified keys (GitHub tokens) | +| `--analyze` | Report capabilities of verified keys (GitHub tokens) | +| `--verified-only` | Report only secrets confirmed live | | `-o, --output` | Write JSON findings to a file | -> **Tokens:** the activity-log tier (the precise, immediate one) needs a token with repo -> read access — pass any free PAT via `--tokens`. Without a token, discovery falls back to -> the slower keyless Events API and `--bruteforce`. - ## How it works ``` -target → discover → fetch → detect → validate → correlate → report +target -> discover -> fetch -> detect -> validate -> correlate -> report ``` -1. **Discover** deleted commits: read the repo activity log for `force_push` / - `branch_deletion` entries and take their `before` SHA (the now-unreachable commit); - fall back to the Events API, or `--bruteforce` short-SHA probing. -2. **Fetch** each dangling commit via the GitHub commits API (resolves commits by SHA - even when they're unreachable from any branch). -3. **Detect** secrets with TruffleHog over the recovered diff. -4. **Validate** each unique secret once — is it still live? -5. **Correlate** + score, then **report** as a ranked table or JSON. - -## Limitations (v1) - -- **Recent-history window.** The activity log and Events API cover roughly the last ~90 days - / a few hundred events per repo. Older force-pushes need a GH Archive / BigQuery backfill, - which is out of scope for the keyless default (the `archive-files` parser exists behind an - interface for a future downloader). -- **Serial execution.** Repos and commits are processed one at a time. Large targets will be - slow; concurrency is planned. -- **Brute-force is a slow opt-in fallback.** `--bruteforce` short-SHA probing is rate-limited - by GitHub and capped (4096 probes/repo). Prefer a token + the activity tier for real coverage. +1. Discover deleted commits from the repo activity log. `force_push` and `branch_deletion` + entries carry the overwritten `before` SHA. Discovery falls back to the Events API, or to + short-SHA probing with `--bruteforce`. +2. Fetch each commit through the GitHub commits API, which resolves commits by SHA even when + they are unreachable from any branch. +3. Detect secrets with TruffleHog over the recovered diff, as well as over reachable history, + gists, and dork hits. +4. Validate each unique secret once to check whether it is still live. +5. Correlate, score, and report the results as a ranked table or as JSON. + +## Limitations + +- Recent window. The activity log and Events API cover roughly the last 90 days and a few + hundred events per repo. Older force-pushes would need a GH Archive backfill, which is not + built yet (the parser exists behind an interface for it). +- Serial execution. Repos, commits, gists, and dork queries are processed one at a time, so + large targets are slow. Concurrency is planned. +- Brute-force is a slow, opt-in fallback. It is rate-limited by GitHub and capped per repo. + Prefer a token and the activity tier. +- Live-key intelligence covers GitHub tokens. Other key types are not analyzed yet. ## Contributing -Issues and PRs welcome. Please keep changes test-covered (`go test ./...`) and gofmt-clean; -CI enforces both. Use only synthetic or revoked credentials in fixtures — never commit a live -secret. +Issues and pull requests are welcome. Keep changes covered by tests (`go test ./...`) and +gofmt-clean; CI enforces both. Use only synthetic or revoked credentials in test fixtures. ## Legal -For authorized security testing and your own assets only. You are responsible for complying -with the terms of service of any platform you target and with applicable law. +For authorized security testing and assets you own or are permitted to test. You are +responsible for complying with the terms of service of any platform you target and with +applicable law. ## License -[GNU GPL v3](LICENSE) © Revenant contributors. +GNU GPL v3. See [LICENSE](LICENSE). diff --git a/internal/detect/detect.go b/internal/detect/detect.go index 26c059c..d19b709 100644 --- a/internal/detect/detect.go +++ b/internal/detect/detect.go @@ -14,7 +14,7 @@ import ( ) // Scanner runs TruffleHog over a directory and returns its raw JSON output lines. -// verify toggles credential verification (detect calls with verify=false — that's +// verify toggles credential verification (detect calls with verify=false, that's // validate's job, deduped). Injected for tests. type Scanner interface { Scan(ctx context.Context, dir string, verify bool) ([][]byte, error) diff --git a/internal/discover/activity.go b/internal/discover/activity.go index f267a90..be3f3da 100644 --- a/internal/discover/activity.go +++ b/internal/discover/activity.go @@ -11,7 +11,7 @@ import ( // ActivityAPI discovers dangling commits from a repo's activity log // (GET /repos/{owner}/{repo}/activity). Unlike the events firehose this is // immediate and precise: "force_push" and "branch_deletion" entries carry the -// overwritten `before` SHA — exactly the now-unreachable commit where deleted +// overwritten `before` SHA, exactly the now-unreachable commit where deleted // secrets live. Requires a token (the activity endpoint needs repo read access); // EventsAPI is the keyless fallback. type ActivityAPI struct { diff --git a/internal/discover/events.go b/internal/discover/events.go index a6aaa38..317c1f7 100644 --- a/internal/discover/events.go +++ b/internal/discover/events.go @@ -14,7 +14,7 @@ type Getter interface { } // EventsAPI discovers candidate dangling commits from a repo's recent public -// events. Each PushEvent payload carries a `before` SHA — the prior ref tip. For a +// events. Each PushEvent payload carries a `before` SHA, the prior ref tip. For a // force-push or branch delete that commit is no longer reachable from any branch // but is still fetchable by SHA, which is exactly where deleted secrets hide. // diff --git a/internal/githubclient/ratelimit.go b/internal/githubclient/ratelimit.go index 57754e3..4fbdab7 100644 --- a/internal/githubclient/ratelimit.go +++ b/internal/githubclient/ratelimit.go @@ -9,8 +9,8 @@ import ( // IsThrottle reports whether resp is a rate-limit/throttle response that should // be retried rather than treated as a hard failure. This is the #3658 fix: a 403 -// with remaining==0, or any Retry-After, or a 429, is throttling — NOT "forbidden" -// and NOT "secret unverified". +// with remaining==0, or any Retry-After, or a 429, is throttling, not "forbidden" +// and not "secret unverified". func IsThrottle(resp *http.Response) bool { if resp == nil { return false diff --git a/internal/scan/history.go b/internal/scan/history.go index 2d422e9..cb3b61c 100644 --- a/internal/scan/history.go +++ b/internal/scan/history.go @@ -44,7 +44,7 @@ type thGitResult struct { // cloneURL builds the HTTPS clone URL, embedding the token when present. GitHub // PAT auth over HTTPS needs a username, so the token goes in the password position -// behind the conventional "x-access-token" user — a bare "@host" makes git +// behind the conventional "x-access-token" user, since a bare "@host" makes git // treat the token as a username with no password and the clone fails. func cloneURL(repo model.RepoRef, token string) string { if token != "" { diff --git a/internal/validate/adapter.go b/internal/validate/adapter.go index 8998108..f99be77 100644 --- a/internal/validate/adapter.go +++ b/internal/validate/adapter.go @@ -12,7 +12,7 @@ import ( // NewTruffleHogVerifier verifies one secret by writing it to a temp file and // running the verifying scanner. It returns true only when TruffleHog reports the -// credential as actually live (Verified=true) — NOT merely detected. (A secret +// credential as actually live (Verified=true), not merely detected. (A secret // that is detected but dead, e.g. a revoked or fake key, yields false.) func NewTruffleHogVerifier(s detect.Scanner) Verifier { return func(ctx context.Context, c model.Candidate) (bool, error) {