Skip to content

chore(upstream): bump yt-dlp pin + rescan strings#3

Merged
antonio-orionus merged 5 commits into
mainfrom
chore/upstream-scan
Jun 1, 2026
Merged

chore(upstream): bump yt-dlp pin + rescan strings#3
antonio-orionus merged 5 commits into
mainfrom
chore/upstream-scan

Conversation

@github-actions

@github-actions github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Automated weekly scan picked up changes from the current data/yt-dlp-version.json pin.

Review:

  1. Walk data/known-yt-dlp-strings.json diff — new entries land with kind: null. Assign a kind (or leave null intentionally + document why in notes).
  2. If a regex needs to be extended for the new strings, edit src/patterns.ts and add a fixture under tests/fixtures/yt-dlp-stderr/<kind>/.
  3. Tests must pass before merge — upstream-coverage.test.ts enforces non-null kinds round-trip correctly.

Changes

  • Workflow (.github/workflows/upstream-scan.yml)

    • More robust "bump pin to latest stable" tag resolution: reads git/refs/tags/, inspects .object.type, dereferences annotated tag objects via git/tags/ when necessary, validates the resolved object type is commit, emits explicit failures for missing/unresolvable SHAs or unsupported object types, and writes tag/commit into data/yt-dlp-version.json.
    • After rescanning against the new pin, the workflow now runs npm run typecheck, npm run test, and npm run build before creating the PR.
    • Continues to run the rescan and open a PR when changes are detected.
  • scripts/scan-yt-dlp-source.mjs

    • Treats data/known-yt-dlp-strings.json as a curated snapshot:
      • Loads the prior snapshot and reuses prior.strings when present; only generates/dedupes strings on initial (empty) snapshot.
      • Snapshot JSON now includes schemaVersion, ytDlpVersion, ytDlpCommit, and a human triage note (DEFAULT_NOTE).
      • Removed the previous behavior that re-derived and preserved kind assignments by matching raw scan IDs; new upstream strings therefore appear with kind: null and existing snapshot notes are retained where present.
    • Emits pending-count warnings when entries have kind: null.

Public API / package behavior

  • No changes to exported APIs or package interfaces.

Internal classifier / regex / snapshot / metadata changes

  • Snapshot format extended (schemaVersion, note, ytDlpVersion, ytDlpCommit).
  • Snapshot ownership model changed: curated snapshot now controls string IDs, kinds, and notes; scans do not overwrite those values.
  • New upstream strings will appear with kind: null and require human triage.
  • This PR does not modify classifier/reg-ex code (src/patterns.ts) itself.

Risk areas

  • Upstream snapshot churn & detection:
    • Newly introduced upstream messages will remain pending (kind: null) until manual triage, which can delay automated classification and may increase test failures if many new entries appear.
  • CI/workflow failures:
    • The improved tag resolution is stricter and may fail if GitHub API responses differ; the added pre-PR typecheck/test/build increases CI coverage but can block automatic PR creation.
  • Schema/consumers:
    • Snapshot shape was extended; consumers that assumed the older shape should be reviewed. schemaVersion is included to help future migrations.
  • Permissions/automation:
    • No permission scope changes were made in this PR, but automation now performs stronger validation steps before PR creation.

Tests / checks to run before merge

  • Run full test suite (npm run test / vitest), including tests/upstream-coverage.test.ts (enforces non-null-kind round-trip expectations).
  • Run typecheck (npm run typecheck) and build (npm run build) locally or via CI.
  • Manual review:
    • Inspect data/known-yt-dlp-strings.json diff; assign kinds or document intentional null kinds in notes.
    • If new strings require regex/classifier updates, edit src/patterns.ts and add fixtures under tests/fixtures/yt-dlp-stderr//, then re-run tests.

@github-actions github-actions Bot force-pushed the chore/upstream-scan branch from 880bbb6 to 4ae7d38 Compare June 1, 2026 13:50
@antonio-orionus

Copy link
Copy Markdown
Owner

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6bb4816e-811e-4141-95ef-27a318579601

📥 Commits

Reviewing files that changed from the base of the PR and between 77fc575 and 8dcf542.

📒 Files selected for processing (1)
  • .github/workflows/upstream-scan.yml
📜 Recent review details
🧰 Additional context used
📓 Path-based instructions (1)
.github/workflows/**

⚙️ CodeRabbit configuration file

Review GitHub Actions for least-privilege permissions, unsafe secret exposure, untrusted PR execution, shell quoting, release/tag correctness, npm trusted publishing assumptions, artifact handling, and commands that can fail silently.

Files:

  • .github/workflows/upstream-scan.yml
🔇 Additional comments (1)
.github/workflows/upstream-scan.yml (1)

59-63: LGTM!


📝 Walkthrough

Walkthrough

Workflow now resolves the latest yt-dlp tag to its underlying commit SHA; the scan script treats the known-strings file as a curated snapshot and writes expanded metadata (schemaVersion, ytDlpVersion, ytDlpCommit, note) while reusing prior strings when present.

Changes

yt-dlp versioning and string snapshot management

Layer / File(s) Summary
Workflow tag resolution
.github/workflows/upstream-scan.yml
Reworks the bump-to-latest step to fetch git/refs/tags/<latest>, inspect the referenced object type, and dereference annotated tag objects via git/tags/<sha> when needed to obtain the underlying commit SHA; adds error checks and writes the resolved SHA to data/yt-dlp-version.json.
Workflow validation step
.github/workflows/upstream-scan.yml
After rescanning against the new pin, runs npm run typecheck, npm run test, and npm run build before attempting to open a PR.
Script curated snapshot output
scripts/scan-yt-dlp-source.mjs
Adds DEFAULT_NOTE constant and refactors output: load existing known-yt-dlp-strings.json, reuse prior.strings when non-empty, only generate deduped entries on first run, and emit expanded JSON including schemaVersion, ytDlpVersion, ytDlpCommit, note, and strings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

enhancement, github_actions

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: bumping the yt-dlp pin version and rescanning strings, which matches the primary objectives and file modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/upstream-scan

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added enhancement New feature or request github_actions labels Jun 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/upstream-scan.yml:
- Around line 34-38: The script currently assigns commit from tag_ref using jq
-r which yields the literal "null" with exit code 0 if .object.sha/.object.type
is missing; update the dereference logic around the tag_ref and commit
assignments (the tag_ref and commit variables and the two curl/jq calls) to fail
fast by: use jq -e (or test the jq output) so missing fields cause a non-zero
status, validate that the extracted commit is non-empty and not the string
"null" after each jq call, and if validation fails log an error and exit
non‑zero instead of continuing to write a bad commit into
data/yt-dlp-version.json; apply the same check after the tag object dereference
branch that fetches the annotated tag object.

In `@scripts/scan-yt-dlp-source.mjs`:
- Around line 151-163: prior.strings is currently returned wholesale which
prevents merging new rawHits and removing deleted fragments; change the logic
that builds strings so it merges rawHits with prior metadata by computing the
same id = sha1(`${hit.fragment}::${hit.call}`) for each raw hit, build a map of
raw hits, then for each raw id produce an entry that takes curated fields (e.g.,
notes, kind) from prior.strings if present while updating source/call/fragment
from the raw hit (new hits should get kind: null), and drop any prior entries
whose id is not present in rawHits; keep the final strings array sorted by id
(preserving the existing [...byHash.values()].sort(...)) and replace the current
early-return that just uses prior.strings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 60d3237f-e671-4f4c-92ff-adf5a6f48acb

📥 Commits

Reviewing files that changed from the base of the PR and between 3ab7323 and 54c0288.

📒 Files selected for processing (2)
  • .github/workflows/upstream-scan.yml
  • scripts/scan-yt-dlp-source.mjs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
.github/workflows/**

⚙️ CodeRabbit configuration file

Review GitHub Actions for least-privilege permissions, unsafe secret exposure, untrusted PR execution, shell quoting, release/tag correctness, npm trusted publishing assumptions, artifact handling, and commands that can fail silently.

Files:

  • .github/workflows/upstream-scan.yml
scripts/**

⚙️ CodeRabbit configuration file

Review Node scripts for unsafe command construction, stale temporary directories, path traversal, network/release API assumptions, missing error handling, platform-specific behavior, and reproducibility in CI.

Files:

  • scripts/scan-yt-dlp-source.mjs
🔇 Additional comments (1)
scripts/scan-yt-dlp-source.mjs (1)

44-45: LGTM!

Also applies to: 168-174

Comment thread .github/workflows/upstream-scan.yml
Comment thread scripts/scan-yt-dlp-source.mjs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/upstream-scan.yml:
- Around line 40-46: The tag dereference logic currently only checks for
empty/null SHAs (variables tag_type and commit) but must also verify the Git
object type is a commit; update the logic that processes the tag ref target (the
response used to set tag_type/commit) to extract and assert the returned
object.type is "commit" (error out if not), and likewise after fetching
tag_object for annotated tags ensure jq checks '.object.type' == "commit" before
accepting '.object.sha'; reference the variables/objects tag_type, tag_object,
and commit and use jq to validate object.type explicitly so non-commit targets
are rejected with a clear error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: dd7d843e-903d-4b80-8b2e-107361762b54

📥 Commits

Reviewing files that changed from the base of the PR and between 54c0288 and 19e2368.

📒 Files selected for processing (1)
  • .github/workflows/upstream-scan.yml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
.github/workflows/**

⚙️ CodeRabbit configuration file

Review GitHub Actions for least-privilege permissions, unsafe secret exposure, untrusted PR execution, shell quoting, release/tag correctness, npm trusted publishing assumptions, artifact handling, and commands that can fail silently.

Files:

  • .github/workflows/upstream-scan.yml
🔇 Additional comments (1)
.github/workflows/upstream-scan.yml (1)

35-39: LGTM!

Comment thread .github/workflows/upstream-scan.yml
@antonio-orionus

Copy link
Copy Markdown
Owner

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/upstream-scan.yml (1)

59-62: ⚠️ Potential issue | 🟠 Major

Ensure pull_request CI runs automatically for the PR created by upstream-scan.yml

upstream-scan.yml opens the PR with peter-evans/create-pull-request@v8 using token: ${{ secrets.GITHUB_TOKEN }} (lines 59-62). PRs created/updated with the default GITHUB_TOKEN put pull_request workflow runs into an approval-required/manual-approval state rather than running automatically—so test.yml’s pull_request checks may not execute until someone approves, undermining “Tests must pass before merge” unless you rely on additional branch protection/manual gating.

Use a GitHub App token or PAT (instead of GITHUB_TOKEN) for create-pull-request, or run the tests inside upstream-scan.yml before opening the PR.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/upstream-scan.yml around lines 59 - 62, The PR created by
peter-evans/create-pull-request@v8 in upstream-scan.yml currently uses token:
${{ secrets.GITHUB_TOKEN }}, which causes pull_request workflows (test.yml) to
require manual approval; replace the token with a non-GITHUB_TOKEN credential
(e.g., token: ${{ secrets.PERSONAL_ACCESS_TOKEN }} or a GitHub App token stored
in secrets) so the created PR triggers pull_request checks automatically, or
alternatively modify upstream-scan.yml to run the required tests inline before
calling the create-pull-request step; update the token input for the
create-pull-request action (and ensure the new secret exists) or add the test
steps prior to invoking create-pull-request.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/upstream-scan.yml:
- Around line 59-62: The PR created by peter-evans/create-pull-request@v8 in
upstream-scan.yml currently uses token: ${{ secrets.GITHUB_TOKEN }}, which
causes pull_request workflows (test.yml) to require manual approval; replace the
token with a non-GITHUB_TOKEN credential (e.g., token: ${{
secrets.PERSONAL_ACCESS_TOKEN }} or a GitHub App token stored in secrets) so the
created PR triggers pull_request checks automatically, or alternatively modify
upstream-scan.yml to run the required tests inline before calling the
create-pull-request step; update the token input for the create-pull-request
action (and ensure the new secret exists) or add the test steps prior to
invoking create-pull-request.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 1f6da5c8-7773-4283-b8a8-b842d7d8ab91

📥 Commits

Reviewing files that changed from the base of the PR and between 19e2368 and 77fc575.

📒 Files selected for processing (1)
  • .github/workflows/upstream-scan.yml
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
.github/workflows/**

⚙️ CodeRabbit configuration file

Review GitHub Actions for least-privilege permissions, unsafe secret exposure, untrusted PR execution, shell quoting, release/tag correctness, npm trusted publishing assumptions, artifact handling, and commands that can fail silently.

Files:

  • .github/workflows/upstream-scan.yml
🔇 Additional comments (2)
.github/workflows/upstream-scan.yml (2)

40-50: LGTM!


34-36: ⚡ Quick win

Clarify the REST API “array response” concern: it’s not /git/ref/*, it’s /git/matching-refs/*

GitHub’s REST docs specify that GET /git/ref/{ref} returns a single ref only when {ref} is an exact match (otherwise it returns 404); prefix/starts-with “array of matches” behavior is for GET /git/matching-refs/{ref}, not for git/ref. So changing to git/ref/tags/${latest} doesn’t directly address the claimed “prefix tag → array → jq fails” scenario—either ${latest} isn’t the exact ref name you think it is, or the failure is due to a different unexpected response/404 path.

Also, the operational note about create-pull-request + secrets.GITHUB_TOKEN not triggering downstream workflow runs should be validated against the repo’s actual trigger configuration (e.g., pull_request vs workflow_run, and any token/permission conditions).

@antonio-orionus

Copy link
Copy Markdown
Owner

CodeRabbit follow-up:

  • Applied the valid upstream tag hardening: annotated tags are dereferenced and the final object type must be commit before persisting the pin.
  • Applied the valid generated-PR gating concern by running typecheck, test, and build inside upstream-scan.yml before create-pull-request opens or updates the PR.
  • Reviewed the scanner snapshot merge suggestion and intentionally did not apply it. This repo treats data/known-yt-dlp-strings.json as a curated snapshot with human-stable IDs and metadata; replacing entries from raw scan hashes would reintroduce the kind:null explosion that the tests caught on this PR.

@antonio-orionus antonio-orionus merged commit 312c72c into main Jun 1, 2026
5 checks passed
@antonio-orionus antonio-orionus deleted the chore/upstream-scan branch June 1, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request github_actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant