Skip to content

Latest commit

 

History

History
464 lines (369 loc) · 21.4 KB

File metadata and controls

464 lines (369 loc) · 21.4 KB

Developer documentation

Technical reference for BingeWatcher: architecture, env-var matrix, where every setting actually lives, the privacy-stack phases, and the optional OS / Tor extensions. For day-to-day usage, see the user-facing README.md.


Repository layout

One_Piece_Watcher/
├── README.md                        ← user-facing guide
├── .env.example                     ← copy to .env for power-user knobs
├── docs/
│   ├── developer-docs.md            ← you are here
│   ├── privacy.md                   ← privacy stack reference
│   ├── windows-hardening.md         ← Windows OS hardening guide
│   └── tor-bridges.md               ← obfs4 setup walkthrough
├── start_watching.bat               ← Windows launcher
├── scripts/os-hardening/
│   ├── README.md                    ← script reference
│   ├── Get-OsHardeningStatus.ps1    ← read-only audit (anybody)
│   ├── Apply-OsHardening.ps1        ← interactive wizard (admin)
│   └── modules/                     ← one .ps1 per section
├── runtime/
│   ├── README.md                    ← embedded-Python audit + rebuild
│   └── python/                      ← Python 3.12.10 + selenium etc.
├── vendor/wheels/                   ← bundled wheels for offline rebuild
├── runtime/
│   ├── python/                      ← embedded Python 3.12 + site-packages
│   ├── ffmpeg/                      ← auto-downloaded FFmpeg for screen-stream (gitignored, on-demand)
│   └── firefox/                     ← auto-downloaded portable Firefox (gitignored, first-launch)
└── SerienJunkie/                    ← bot code
    ├── bingewatcher.py                   ← main entry
    ├── bw/                          ← privacy + integrity + streaming modules
    │   ├── firefox_detect.py        ← 6-tier Firefox path detection
    │   ├── firefox_download.py      ← portable-Firefox auto-installer (bsdtar)
    │   ├── ffmpeg_download.py       ← portable-FFmpeg auto-installer (zipfile, gyan.dev)
    │   ├── screen_stream.py         ← FFmpeg lifecycle + health monitor
    │   ├── screen_broadcaster.py    ← MPEG-TS fan-out (1 producer → N HTTP subs)
    │   ├── cast_proxy.py            ← HTTP proxy + /screen/live.ts endpoint
    │   ├── audio_loopback.py        ← WASAPI loopback capture (via TCP)
    │   ├── process_guardian.py      ← Windows Job Object (kernel-level child kill)
    │   └── integrity.py             ← TOFU SHA-256 manifest for bundled bins
    ├── providers/                   ← per-site selectors (s.to, aniworld)
    ├── tor/                         ← Tor daemon + GeoIP + Pluggable Transports
    │   ├── tor.exe
    │   ├── PluggableTransports/     ← lyrebird (obfs4), conjure-client
    │   └── data/                    ← geoip, geoip6, runtime state
    ├── extensions/                  ← bundled .xpi files (NoScript)
    ├── geckodriver.exe              ← Selenium ↔ Firefox bridge
    └── torrc-bingewatcher.template  ← Tor config template

State files (progress.json, settings.json, bw.log, etc.) and the per-user Firefox profile (user.BingeWatcher/) are created at runtime; nothing in .gitignore ships with prior-run data.


What this is, technically

A Selenium-driven Firefox session that:

  • Streams via Tor — SOCKS5 proxy + DNS-over-Tor + SafeSocks rule-set. Tor is mandatory; the bot refuses to start without it. Per-host bypass for s.to only (Cloudflare Turnstile rejects Tor exits on s.to; aniworld + hosters keep full Tor).
  • Hides automation tellsnavigator.webdriver cloak; Canvas / WebGL / Audio jitter; Worker-context cloak; BiDi pre-load script installed via driver.script.add_preload_script.
  • Looks human — Bezier-curve mouse moves with Gaussian tremor, varied click latencies, inter-episode pauses, probabilistic NEWNYM rotation (~70% of episode boundaries, jittered).
  • Doesn't phone home — ~70 Firefox prefs disabling telemetry, SafeBrowsing endpoints, Mozilla Sync, Quicksuggest, Crash Reporter, GMP-Manager, Captive-Portal probe, Region / Geo network calls.
  • Leaves a small footprint — single .bak per state file (the ones that matter for resuming a session), zero backups for trivially regenerable files, no daily snapshots, optional RAM-only mode that wipes everything on quit.

Where each setting actually lives

Where What's in it Touchable from dashboard? Touchable from .env?
SerienJunkie/settings.json Wiedergabe + Player-Defaults + Komfort + acknowledgedStoTorWarning yes no
SerienJunkie/privacy.json Privatsphäre & Forensik yes yes (env wins)
SerienJunkie/progress.json Per-episode timestamps, provider, season_episodes, season_episode_titles, display_title no (written by the player + probes) no
SerienJunkie/watch_stats.json Watch-time aggregates no (read-only via Stats button) no
SerienJunkie/notes.json Per-series free-text notes yes (per-series row) no
SerienJunkie/intro_marks.json Per-series intro start/end + end-skip + learning samples yes (per-series cog) no
SerienJunkie/watchlist.json Series shown in the watchlist view yes (auto-grows as you watch) no
SerienJunkie/bookmarks.json Per-episode bookmarks yes (toolbar) no
SerienJunkie/descriptions.json Cached series synopses no (regenerated by description probe) no
.env (project root) Power-user environment no yes
SerienJunkie/user.BingeWatcher/ Persistent Firefox profile (cookies, DDoS-Guard token) manually via about:config if you must no
SerienJunkie/.integrity.json TOFU SHA-256 hashes for bundled binaries no no

Where your state lives + how to wipe

File / Folder Holds How to wipe
SerienJunkie/progress.json Watched-episode timeline per series Delete file.
SerienJunkie/settings.json Dashboard preferences Delete file.
SerienJunkie/privacy.json Privacy mode toggles Delete file.
SerienJunkie/notes.json Per-series notes Delete file.
SerienJunkie/watchlist.json + bookmarks.json + intro_marks.json UX state Delete files.
SerienJunkie/watch_stats.json Time-tracking aggregates Delete file.
SerienJunkie/descriptions.json Cached series synopses Delete file (regenerated).
SerienJunkie/bw.log* + bw.events.jsonl Application logs Delete files.
SerienJunkie/user.BingeWatcher/ Persistent Firefox profile Delete directory (re-solve CAPTCHA next launch).
SerienJunkie/.integrity.json TOFU hashes for bundled binaries Delete file (regenerated on next launch).

Wipe everything user-specific at once:

cd SerienJunkie
Remove-Item -Recurse -Force user.BingeWatcher, _fresh_profiles, _dbg_bridge, _debug_out -ErrorAction SilentlyContinue
Remove-Item *.json, *.json.bak, *.json.corrupt.*, bw.log*, .integrity.json -ErrorAction SilentlyContinue

For non-persistent runs without writing to disk at all, set BW_FORENSIK_RAM=1 (RAM mode) in .env.


Power-user knobs (the .env file)

For settings that don't fit in the dashboard, copy .env.example to .env in the repo root and uncomment the lines you want. The launcher reads .env before starting Python, so each line becomes an environment variable for the bot.

Syntax:

KEY=VALUE                    one per line, no surrounding quotes
# this is a comment          any line starting with # is skipped
                             blank lines are ignored

.env is .gitignored. You can also export the same variables manually from your shell.

Debug mode

Default launcher console is quiet: boot banner, Tor status, OS-audit result, then only user-relevant events. Errors + warnings always come through.

For full verbose output (every event=... line, debug-bridge IPC chatter, HTTP-probe diagnostics, all internal bw.* module info):

BW_DEBUG=1

The full record always lands in SerienJunkie/bw.log (rotating, 1 MB × 3) and SerienJunkie/bw.events.jsonl regardless of the flag.

Full variable reference

Grouped by purpose. Every variable is also documented inline in .env.example.

Runtime selection

Variable Default What it does
BW_USE_SYSTEM_PYTHON unset Skip the embedded runtime/python/python.exe, use python from PATH instead.
BW_PYTHON_EXE unset Explicit path to any python.exe (wins over BW_USE_SYSTEM_PYTHON).

Logging / diagnostics

Variable Default What it does
BW_DEBUG unset Verbose console + DEBUG-level records. Default mode hides internal bw.* chatter.
BW_DEBUG_BRIDGE unset Start the file-IPC bridge for debug_client.py (DOM dumps, eval, screenshots from outside).
BW_SCREEN_DEBUG unset FFmpeg with -loglevel info + verbose bw.audio_loopback + broadcaster pump traces.
BW_GECKODRIVER_VERBOSE unset Geckodriver writes its full trace log to SerienJunkie/geckodriver.log.
BW_DEV_RELOAD_TOOLBAR_JS unset Re-read assets/toolbar.js from disk on every inject (no cache) -- handy when iterating on toolbar code.
BW_SKIP_OS_AUDIT unset Never run the OS-hardening status check, never show the banner. Overrides both first-run and every-start. Does NOT touch the .os_audit_seen marker, so removing this var later still triggers the first-run flow.
BW_OS_AUDIT_EVERY_START unset Run the OS-hardening status check on every launch (default is first-run-only via SerienJunkie/.os_audit_seen marker). Useful for devs iterating on hardening modules, or paranoid users wanting a daily privacy heartbeat. Skips the first-run intro since the user opted in.
BW_INTEGRITY_STRICT unset TOFU SHA-256 mismatch on python.exe / geckodriver.exe / Tor binaries → abort. Default is warn + continue.

Firefox detection

Variable Default What it does
BW_FIREFOX_BIN unset Explicit Firefox path, highest detection priority. Useful for non-standard installs (custom drive, portable).
BW_SIMULATE_NO_FIREFOX unset Force firefox_detect to return None regardless of what's on disk. Lets you exercise the auto-download UX on a machine that already has Firefox.

FFmpeg detection (same shape as Firefox — auto-downloaded on demand)

Variable Default What it does
BW_FFMPEG_BIN unset Explicit ffmpeg.exe path, highest detection priority. Useful when you have a custom build (with extra codecs / hardware accel) you'd rather use than the gyan.dev essentials build.
BW_FFMPEG_AUDIO_DEVICE unset Force a specific WASAPI device name for loopback capture. Default auto-picks the loudest device — set this only when auto-pick lands on the wrong card.

Privacy / forensik (same toggles as the dashboard, but env wins)

Variable Default What it does
BW_FRESH_PROFILE unset New Firefox profile every launch, wiped on exit.
BW_FORENSIK_RAM unset All state files into a tempdir under %TEMP%, wiped on exit.
BW_FORENSIK_SYNC_BACK unset In RAM mode, copy state back to disk on clean exit so progress survives.
BW_FORENSIK_STASH unset Path to a custom directory for state files. Validated against path traversal.
BW_ENABLE_DECOY unset Background Tor cover-traffic worker. Costs bandwidth.

Operational

Variable Default What it does
BW_HEADLESS false Run Firefox without a visible window (CI mode; dashboard doesn't make sense without a window).
BW_START_URL https://aniworld.to/ Override the initial page.
BW_INTRO_SKIP 80 Default intro skip in seconds.
BW_TOR_PORT 9050 Tor SOCKS port.
BW_TIER0_PROBE 0 Re-enable the Tier-0 HEAD-probe hoster-cycle (disabled by default after it caused a Selenium thread race).
BW_ROTATE_BETWEEN_EPISODES 0 Per-episode NEWNYM rotation (disabled by default — disrupts in-flight HLS streams).
BW_POPOUT_IFRAME false Fullscreen fallback that navigates the whole window to the hoster's iframe src. Off by default — loses aniworld context.

s.to-specific (Cloudflare Turnstile workaround)

Variable Default What it does
BW_STO_USE_TOR unset Force s.to through Tor + DoH. Cloudflare Turnstile then typically refuses to verify (white screen / error 600010). The bypass is on by default.
BW_STO_DISABLE_TOR unset Legacy alias for the explicit opt-IN. No-op now (the bypass is default).
BW_STO_DISABLE_RFP unset Exempt s.to from privacy.resistFingerprinting. Diagnostic only — A/B testing showed RFP wasn't the Turnstile blocker, Tor was.
BW_STO_DISABLE_CLOAK unset Skip the navigator/canvas/webgl/audio preload cloak on s.to. Diagnostic only.
BW_STO_DISABLE_POPUNDER_NUKE unset Skip the MutationObserver popunder nuker on s.to. Diagnostic only.
BW_STO_NO_HARDENING unset Umbrella: enables all three BW_STO_DISABLE_* (RFP, cloak, popunder).

Global hardening overrides (affect both providers)

Variable Default What it does
BW_TEST_RELAX_COOKIES unset Cookie isolation (FPI + dFPI) OFF globally. Only for diagnostic test runs.
BW_TEST_ENABLE_WEBRTC unset WebRTC + navigator.mediaDevices re-enabled globally. Can leak local IP via STUN.
BW_TEST_ENABLE_SW unset Service workers re-enabled globally. Allows background tracking workers.

"Auf Handy schauen" — screen-streaming pipeline

When the user clicks the Auf Handy schauen toolbar button the bot spawns a self-contained MPEG-TS streaming stack so they can watch the PC's Firefox playback on a phone over the LAN. The pipeline is deliberately pure-Python + one FFmpeg subprocess; no external media server (MediaMTX, RTSP, WebRTC, WHEP are all gone).

ddagrab (or gdigrab fallback) + WASAPI loopback (TCP)
    │
    ▼
FFmpeg (h264_nvenc + aac → mpegts on stdout)
    │ stdout
    ▼
bw.screen_broadcaster.MPEGTSBroadcaster
    │ fan-out (bounded queue per subscriber, drop-oldest on slow client)
    ▼
bw.cast_proxy /screen/live.ts (HTTP chunked, close-delimited)
    │
    ▼
N × mpegts.js (bundled at assets/mpegts.js) → <video> via MSE

End-to-end LAN latency on NVENC + 1280x720@24:

Stage Latency
ddagrab capture 15–50 ms
NVENC + MPEG-TS mux 15–30 ms
Python pump + chunked HTTP 5–15 ms
mpegts.js MSE buffer 150–300 ms
Browser decode + paint 30–50 ms
Total 250–450 ms

Stability features:

  • Health monitor in bw.screen_stream restarts FFmpeg on death, on byte-stall while subscribers are connected, or on DXGI desktop- duplication loss (with a 3 s settling delay to let the display re-stabilise).
  • Per-restart backoff: _HEALTH_MAX_RESTARTS (5) within 60 s ⇒ give up.
  • Idle-shutdown: 60 s with no /screen/heartbeat ping ⇒ stream stops.
  • Kernel-level child kill via bw.process_guardian's Windows Job Object (KILL_ON_JOB_CLOSE) so a hard bot exit (X-button, BSoD, power cut) can't leave FFmpeg capturing the desktop in the background.

Player UX:

  • No scrubbing (CSS-hides the WebKit scrubber + JS seek-trap for Firefox).
  • Pause works; resuming snaps currentTime to the live edge.
  • "LIVE" pulsing pill + live-latency pill (green < 1.5 s, amber < 3 s, red > 3 s); latency pill shows while paused so the counter doesn't drift visibly.
  • Single "Ton an" tap target (autoplay-policy compliant: starts muted, user taps to unmute).

Selftests:

  • python -m bw.selftest_cast_proxy — endpoints + HLS rewrite
  • python -m bw.selftest_screen_pipeline — end-to-end MPEG-TS smoke (FFmpeg → broadcaster → live.ts → bytes verified)
  • python -m bw.selftest_process_guardian — Job Object kill semantics
  • python bw/screen_broadcaster.py — broadcaster fan-out + drop-oldest

The s.to story (and the Cloudflare Turnstile diagnostic)

Long version is in commit history; short version for new developers:

  1. s.to is fronted by Cloudflare. Their Turnstile widget refuses to verify connections from known Tor exit IPs (error code 600010).
  2. We assumed initially this was caused by our heavy hardening stack (RFP, cloak scripts, popunder MutationObserver, etc.) being detected as bot-shaped.
  3. A/B testing with one-at-a-time toggle re-enables proved the opposite: RFP, cloak, popunder, cookie isolation, WebRTC blocking, service worker blocking were all innocent. The only variable that broke s.to was Tor itself.
  4. Solution: default-on bypass of Tor + DoH for s.to only via Firefox's network.proxy.no_proxies_on and network.trr.excluded-domains prefs. AniWorld + every other site still goes through Tor + DoH. The single gate is hardened_profile.sto_bypasses_tor(), called by hardened_profile, bingewatcher, privacy_report, and the sidebar pill.
  5. A one-time popup explains this to the user the first time they switch to s.to. The popup is JS-only (bw/ui_sidebar.pySIDEBAR_POLISH_JSwindow.__bwStoNavGate); it injects before any byte hits s.to. Acknowledgement is persisted to settings.json["acknowledgedStoTorWarning"].
  6. Opt-out: BW_STO_USE_TOR=1 forces Tor for s.to too. Documented for completeness; expect Turnstile to refuse verification and s.to to stay unreachable.

If you're picking this codebase up: don't reintroduce BW_STO_DISABLE_RFP=1 etc. as defaults. They weren't the cause.


Privacy stack

Full design in docs/privacy.md. Summary:

  • Phase 0 — Tor always-on, gated at process start.
  • Phases 1 + 1b — network hardening (proxy + DNS + DoH guards), stream-persistence (HLS-friendly Tor circuit timing).
  • Phase 1.5 — 38 audit-driven Firefox prefs (permissions deny-by-default, web-API kills, anti-tracking).
  • Phase 1.7 — obfs4 bridges for restrictive networks (see docs/tor-bridges.md).
  • Phase 2 — fingerprint cloaks (RFP + Canvas / WebGL / Audio jitter, with toString reflection-defence).
  • Phase 3 — bot-detection cloak (navigator.webdriver, cdc_*, Worker-context bootstrap, navigator.connection and navigator.userAgentData neutered).
  • Phase 4 — human-likeness (cubic Bezier + Gauss tremor for mouse paths, varied delays, inter-episode pauses).
  • Phase 5 — fresh-profile mode, cookie hygiene, MAC randomizer CLI.
  • Phase 6 — circuit watchdog (NEWNYM every ~45 min ± 15% jitter, deferred while a stream is active) and traffic decoy.
  • Phase 7 — forensik hygiene (RAM / Stash modes route state files, profile-telemetry purge at boot, integrity TOFU on bundled binaries).

Plus the host-side counterpart for items the bot can't change itself (Windows telemetry, Wi-Fi probe surface, Bluetooth discovery, hibernation / pagefile encryption): see docs/windows-hardening.md + scripts/os-hardening/.


OS-level hardening (optional, recommended)

The bot's privacy stack runs inside Firefox. The host OS itself leaks data Firefox can't hide. The repo ships PowerShell tooling:

# Read-only audit — shows where you stand. No changes.
.\scripts\os-hardening\Get-OsHardeningStatus.ps1

# Interactive wizard. Creates a System Restore Point first,
# aborts if it can't. Each step prompts before acting.
.\scripts\os-hardening\Apply-OsHardening.ps1

See scripts/os-hardening/README.md for what each section does and how to roll back.


Tor in restrictive networks (obfs4 bridges)

If your ISP blocks Tor directly:

  1. Get a bridge line from https://bridges.torproject.org or by emailing bridges@torproject.org from a Gmail / Riseup account.
  2. Save it into SerienJunkie/bridges.txt (one bridge per line).
  3. Launch normally. The bot detects the file, renders the torrc with UseBridges 1, and routes through obfs4.

Full walkthrough: docs/tor-bridges.md.


MAC randomizer (optional, manual)

runtime\python\python.exe -m bw.mac_randomizer list
runtime\python\python.exe -m bw.mac_randomizer randomize "Wi-Fi"
runtime\python\python.exe -m bw.mac_randomizer reset "Wi-Fi"

Sets a locally-administered MAC (LAA bit) — needs Administrator. Combine with the per-network "Random hardware addresses" toggle that Windows ships. See docs/windows-hardening.md.


Updating

git pull

Most updates are bot-code changes and take effect on next launch.

For larger updates:

  • Embedded Python bumpruntime/python/ gets regenerated and committed with new SHA256SUMS. Your local .integrity.json will warn on first launch; delete it to bless the new hashes.
  • Backup policy changes — the launcher purges legacy .bak2..bak9 and .daily-*.bak files automatically; no action needed.

Selftest + diagnostic bundle

cd SerienJunkie
runtime\..\runtime\python\python.exe bingewatcher.py --self-test

Runs the pure-helper tests (URL parsing, settings clamping, atomic JSON I/O, etc.) without spawning Firefox. Useful in CI / pre-commit.

For full state capture when reporting a bug, use Settings → Werkzeuge → "Diagnose-Bundle exportieren" — produces a sanitised zip under SerienJunkie/diagnostics/.