Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 83 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,46 @@
# Revenant

> Find the secrets they thought they deleted.
Recover deleted and force-pushed GitHub commits and scan them for live secrets.

[![CI](https://github.com/smakarim/revenant/actions/workflows/ci.yml/badge.svg)](https://github.com/smakarim/revenant/actions/workflows/ci.yml)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](LICENSE)
[![Go Report Card](https://goreportcard.com/badge/github.com/smakarim/revenant)](https://goreportcard.com/report/github.com/smakarim/revenant)

Revenant recovers **deleted, force-pushed, and dangling commits** from a targeted
GitHub org, user, or repo list — commits the developer believes are gone — scans
them with [TruffleHog](https://github.com/trufflesecurity/trufflehog), confirms which
secrets are still **live**, deduplicates them, correlates each back to the leaking
developer, and ranks findings by a transparent blast-radius score.
Revenant finds secrets in commits that are no longer reachable from any branch: deleted
commits, force-pushed history, and dangling objects. In the same run it also scans a repo's
reachable history, public gists, and, optionally, the whole of GitHub through code search.
Every candidate is verified with [TruffleHog](https://github.com/trufflesecurity/trufflehog),
deduplicated, attributed to the author who committed it, and ranked by a blast-radius score.

Revoking-and-deleting is **not** remediation: GitHub keeps the commit objects, and the
credential stays reachable until it is actually rotated. Revenant finds exactly those.
Revoking a leaked credential without rotating it is not enough. GitHub keeps the commit
objects, so the secret stays reachable by its SHA. Revenant is built to find those.

## Why

The mainstream secret scanners (TruffleHog, Gitleaks, GitHound) are excellent at scanning
the *current* state of repos you point them at. The deleted/force-pushed surface is proven
high-value (force-push leaks have paid out tens of thousands in bounties) yet remains
fragmented across slow, fiddly, credential-heavy tooling. Revenant makes that surface a
single zero-config command.
TruffleHog, Gitleaks, and GitHound scan the current state of repositories you already know
about. The deleted and force-pushed surface, where credentials get "removed" and forgotten,
is not covered by those tools. Revenant targets that surface and folds the ordinary history
scan into the same run, so you do not have to stitch several tools together.

## Features

- **Scans everything in one pass** — each repo's **reachable history** *and* its
**deleted/force-pushed commits**, plus **gists**, merged into one deduped report with a
`SOURCE` tag per finding (`history` / `deleted` / `gist` / `dork`). `--no-history` /
`--no-deleted` / `--no-gists` narrow it; `--verified-only` shows only live secrets.
- **Footprint discovery** — a user's gists are scanned by default; `--members` also enumerates
an org's members and scans their personal repos and gists.
- **GitHub-wide dorking** — `--dork` searches *all* of GitHub (Code Search) for the target's
leaks using a parameterized dork corpus, scanning each matched file (`SOURCE=dork`). Add
`--domain acme.com`, supply `--dork-file`, or cap with `--dork-max`. Requires `--tokens`.
- **Live-key intelligence** — `--analyze` enumerates what a *verified* GitHub token can actually
do (identity + scopes, e.g. `user:bob; scopes: repo, workflow, admin:org`), printed in a
"Key intelligence" section. Turns "it's live" into blast radius.
- **Keyless by default** — `revenant --org acme` runs with no signup, no API keys, no
subscription (waybackurls/gau ergonomics). A GitHub PAT is optional and only adds speed.
- **Tiered discovery** — the repo **activity log** (`force_push`/`branch_deletion`
entries carry the overwritten `before` SHA — precise and immediate) → **Events API**
(keyless fallback) → optional CFOR short-SHA brute-force (`--bruteforce`).
- **Live verification** — wraps TruffleHog's verified detectors; each unique secret is
verified exactly once (content-addressed cache).
- **Developer correlation** — every finding is tied back to `org → repo → commit → author`,
with first/last-seen and distinct-author spread.
- **Blast-radius ranking** — transparent, tunable scoring (verified status × secret-type
severity × spread) so you triage the org-admin PAT before the read-only token.
- **Rate-limit resilient** — the shared HTTP client treats `403`/secondary-limit responses
as *throttle, not failure*, with adaptive backoff and round-robin across multiple PATs.
- Scans reachable history, deleted and force-pushed commits, and gists in one run. Every
finding carries a SOURCE tag (history, deleted, gist, or dork).
- Discovers a wider footprint: a user's gists by default, and with `--members` an org's
members along with their personal repos and gists.
- GitHub-wide dorking (`--dork`) searches all of GitHub with a code-search dork corpus and
scans each matching file. A custom dork file and a domain term are supported.
- Live-key intelligence (`--analyze`) reports a verified GitHub token's identity and scopes,
so you know what the key can actually do.
- Verifies each secret with TruffleHog and caches the result, so a repeated secret is checked
once.
- Attributes every finding to org, repo, commit, and author, with first and last seen.
- Ranks findings by a transparent score based on verified status, secret type, and spread.
- Handles GitHub rate limits with adaptive backoff and rotation across multiple tokens.

## Install

Requires **Go 1.22+** and [TruffleHog](https://github.com/trufflesecurity/trufflehog) on your `PATH`.
Requires Go 1.22 or newer and TruffleHog on your PATH.

```bash
go install github.com/smakarim/revenant/cmd/revenant@latest
Expand All @@ -70,86 +57,94 @@ go build -o revenant ./cmd/revenant
## Usage

```bash
revenant --org acme # keyless, zero config
revenant --org acme --tokens t1,t2 # faster: parallelize across PATs
revenant --repos acme/web,acme/api # explicit repos (brute-force discovery)
revenant --user bob --min-score 50 # only high-blast-radius findings
revenant --org acme -o findings.json # machine-readable JSON output
revenant --org acme --tokens TOKEN
revenant --user bob --tokens TOKEN --min-score 50
revenant --repos acme/web,acme/api --tokens TOKEN
revenant --org acme --tokens TOKEN -o findings.json
revenant --org acme --tokens TOKEN --dork --analyze --verified-only
```

A token is recommended. The activity log used to find deleted commits needs an authenticated
request, and code search requires one. Any free personal access token works. Without a token,
discovery falls back to the slower public Events API.

### Example output

```
SCORE TYPE STATUS AUTHORS FIRST_COMMIT
172 AWS VERIFIED 2 deadbeef
60 GitHub VERIFIED 1 0a1b2c3d
22 Stripe unverified 1 feedface
SCORE TYPE STATUS SOURCE AUTHORS FIRST_COMMIT
172 AWS VERIFIED deleted 2 03e0f0f5d25c
60 Github VERIFIED history 1 32eaf5a4af9e
22 Stripe unverified gist 1 a1b2c3d4e5f6
```

With `--analyze`, verified GitHub tokens get a Key intelligence section:

```
Key intelligence:
[Github] user:bob; scopes: repo, workflow, admin:org
```

### Flags

| Flag | Description |
|------|-------------|
| `--org` | Target organization (all its repos) |
| `--user` | Target user (all their repos) |
| `--org` | Target organization (all of its repos) |
| `--user` | Target user (all of their repos) |
| `--repos` | Explicit `owner/name` repos, comma-separated |
| `--repos-file` | File of newline-delimited `owner/name` repos |
| `--tokens` | Optional GitHub PAT(s), comma-separated — speed only |
| `--tokens` | GitHub personal access tokens, comma-separated |
| `--min-score` | Hide findings below this blast-radius score |
| `--bruteforce` | Also brute-force short SHAs (slow keyless fallback; off by default) |
| `--no-history` | Skip scanning each repo's reachable history |
| `--no-deleted` | Skip mining deleted/force-pushed commits |
| `--verified-only` | Report only secrets confirmed live |
| `--members` | Also enumerate org members and scan their personal repos + gists |
| `--no-deleted` | Skip mining deleted and force-pushed commits |
| `--no-gists` | Skip scanning gists (on by default) |
| `--members` | Enumerate org members and scan their personal repos and gists |
| `--bruteforce` | Brute-force short SHAs (slow keyless fallback; off by default) |
| `--dork` | Search all of GitHub for the target's leaks (requires `--tokens`) |
| `--domain` | Extra dork search term (e.g. a company domain) |
| `--dork-file` | Custom dork templates (`{term}`/`{domain}`); overrides built-in corpus |
| `--domain` | Extra dork search term, such as a company domain |
| `--dork-file` | Custom dork templates using `{term}` and `{domain}` |
| `--dork-max` | Cap on total dork hits scanned (default 200) |
| `--analyze` | Enumerate capabilities of verified keys (GitHub tokens) |
| `--analyze` | Report capabilities of verified keys (GitHub tokens) |
| `--verified-only` | Report only secrets confirmed live |
| `-o, --output` | Write JSON findings to a file |

> **Tokens:** the activity-log tier (the precise, immediate one) needs a token with repo
> read access — pass any free PAT via `--tokens`. Without a token, discovery falls back to
> the slower keyless Events API and `--bruteforce`.

## How it works

```
target discover fetch detect validate correlate report
target -> discover -> fetch -> detect -> validate -> correlate -> report
```

1. **Discover** deleted commits: read the repo activity log for `force_push` /
`branch_deletion` entries and take their `before` SHA (the now-unreachable commit);
fall back to the Events API, or `--bruteforce` short-SHA probing.
2. **Fetch** each dangling commit via the GitHub commits API (resolves commits by SHA
even when they're unreachable from any branch).
3. **Detect** secrets with TruffleHog over the recovered diff.
4. **Validate** each unique secret once — is it still live?
5. **Correlate** + score, then **report** as a ranked table or JSON.

## Limitations (v1)

- **Recent-history window.** The activity log and Events API cover roughly the last ~90 days
/ a few hundred events per repo. Older force-pushes need a GH Archive / BigQuery backfill,
which is out of scope for the keyless default (the `archive-files` parser exists behind an
interface for a future downloader).
- **Serial execution.** Repos and commits are processed one at a time. Large targets will be
slow; concurrency is planned.
- **Brute-force is a slow opt-in fallback.** `--bruteforce` short-SHA probing is rate-limited
by GitHub and capped (4096 probes/repo). Prefer a token + the activity tier for real coverage.
1. Discover deleted commits from the repo activity log. `force_push` and `branch_deletion`
entries carry the overwritten `before` SHA. Discovery falls back to the Events API, or to
short-SHA probing with `--bruteforce`.
2. Fetch each commit through the GitHub commits API, which resolves commits by SHA even when
they are unreachable from any branch.
3. Detect secrets with TruffleHog over the recovered diff, as well as over reachable history,
gists, and dork hits.
4. Validate each unique secret once to check whether it is still live.
5. Correlate, score, and report the results as a ranked table or as JSON.

## Limitations

- Recent window. The activity log and Events API cover roughly the last 90 days and a few
hundred events per repo. Older force-pushes would need a GH Archive backfill, which is not
built yet (the parser exists behind an interface for it).
- Serial execution. Repos, commits, gists, and dork queries are processed one at a time, so
large targets are slow. Concurrency is planned.
- Brute-force is a slow, opt-in fallback. It is rate-limited by GitHub and capped per repo.
Prefer a token and the activity tier.
- Live-key intelligence covers GitHub tokens. Other key types are not analyzed yet.

## Contributing

Issues and PRs welcome. Please keep changes test-covered (`go test ./...`) and gofmt-clean;
CI enforces both. Use only synthetic or revoked credentials in fixtures — never commit a live
secret.
Issues and pull requests are welcome. Keep changes covered by tests (`go test ./...`) and
gofmt-clean; CI enforces both. Use only synthetic or revoked credentials in test fixtures.

## Legal

For authorized security testing and your own assets only. You are responsible for complying
with the terms of service of any platform you target and with applicable law.
For authorized security testing and assets you own or are permitted to test. You are
responsible for complying with the terms of service of any platform you target and with
applicable law.

## License

[GNU GPL v3](LICENSE) © Revenant contributors.
GNU GPL v3. See [LICENSE](LICENSE).
2 changes: 1 addition & 1 deletion internal/detect/detect.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import (
)

// Scanner runs TruffleHog over a directory and returns its raw JSON output lines.
// verify toggles credential verification (detect calls with verify=false that's
// verify toggles credential verification (detect calls with verify=false, that's
// validate's job, deduped). Injected for tests.
type Scanner interface {
Scan(ctx context.Context, dir string, verify bool) ([][]byte, error)
Expand Down
2 changes: 1 addition & 1 deletion internal/discover/activity.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import (
// ActivityAPI discovers dangling commits from a repo's activity log
// (GET /repos/{owner}/{repo}/activity). Unlike the events firehose this is
// immediate and precise: "force_push" and "branch_deletion" entries carry the
// overwritten `before` SHA exactly the now-unreachable commit where deleted
// overwritten `before` SHA, exactly the now-unreachable commit where deleted
// secrets live. Requires a token (the activity endpoint needs repo read access);
// EventsAPI is the keyless fallback.
type ActivityAPI struct {
Expand Down
2 changes: 1 addition & 1 deletion internal/discover/events.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ type Getter interface {
}

// EventsAPI discovers candidate dangling commits from a repo's recent public
// events. Each PushEvent payload carries a `before` SHA the prior ref tip. For a
// events. Each PushEvent payload carries a `before` SHA, the prior ref tip. For a
// force-push or branch delete that commit is no longer reachable from any branch
// but is still fetchable by SHA, which is exactly where deleted secrets hide.
//
Expand Down
4 changes: 2 additions & 2 deletions internal/githubclient/ratelimit.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ import (

// IsThrottle reports whether resp is a rate-limit/throttle response that should
// be retried rather than treated as a hard failure. This is the #3658 fix: a 403
// with remaining==0, or any Retry-After, or a 429, is throttling — NOT "forbidden"
// and NOT "secret unverified".
// with remaining==0, or any Retry-After, or a 429, is throttling, not "forbidden"
// and not "secret unverified".
func IsThrottle(resp *http.Response) bool {
if resp == nil {
return false
Expand Down
2 changes: 1 addition & 1 deletion internal/scan/history.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ type thGitResult struct {

// cloneURL builds the HTTPS clone URL, embedding the token when present. GitHub
// PAT auth over HTTPS needs a username, so the token goes in the password position
// behind the conventional "x-access-token" user a bare "<token>@host" makes git
// behind the conventional "x-access-token" user, since a bare "<token>@host" makes git
// treat the token as a username with no password and the clone fails.
func cloneURL(repo model.RepoRef, token string) string {
if token != "" {
Expand Down
2 changes: 1 addition & 1 deletion internal/validate/adapter.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (

// NewTruffleHogVerifier verifies one secret by writing it to a temp file and
// running the verifying scanner. It returns true only when TruffleHog reports the
// credential as actually live (Verified=true) — NOT merely detected. (A secret
// credential as actually live (Verified=true), not merely detected. (A secret
// that is detected but dead, e.g. a revoked or fake key, yields false.)
func NewTruffleHogVerifier(s detect.Scanner) Verifier {
return func(ctx context.Context, c model.Candidate) (bool, error) {
Expand Down
Loading