build(deps): bump actions/checkout from 5 to 6 by dependabot[bot] · Pull Request #4 · Jsewill/xchplot2

dependabot · 2026-04-27T20:58:55Z

Release notes

Sourced from actions/checkout's releases.

v6.0.0

What's Changed

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

Persist creds to a separate file by @ericsciple in actions/checkout#2286

v6-beta by @ericsciple in actions/checkout#2298

update readme/changelog for v6 by @ericsciple in actions/checkout#2311

Full Changelog: actions/checkout@v5.0.0...v6.0.0

v6-beta

What's Changed

Updated persist-credentials to store the credentials under $RUNNER_TEMP instead of directly in the local git config.

This requires a minimum Actions Runner version of v2.329.0 to access the persisted credentials for Docker container action scenarios.

v5.0.1

What's Changed

Port v6 cleanup to v5 by @ericsciple in actions/checkout#2301

Full Changelog: actions/checkout@v5...v5.0.1

Changelog

Sourced from actions/checkout's changelog.

Changelog

v6.0.2

Fix tag handling: preserve annotations and explicit fetch-tags by @ericsciple in actions/checkout#2356

v6.0.1

Add worktree support for persist-credentials includeIf by @ericsciple in actions/checkout#2327

v6.0.0

Persist creds to a separate file by @ericsciple in actions/checkout#2286

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

v5.0.1

Port v6 cleanup to v5 by @ericsciple in actions/checkout#2301

v5.0.0

Update actions checkout to use node 24 by @salmanmkc in actions/checkout#2226

v4.3.1

Port v6 cleanup to v4 by @ericsciple in actions/checkout#2305

v4.3.0

docs: update README.md by @motss in actions/checkout#1971

Add internal repos for checking out multiple repositories by @mouismail in actions/checkout#1977

Documentation update - add recommended permissions to Readme by @benwells in actions/checkout#2043

Adjust positioning of user email note and permissions heading by @joshmgross in actions/checkout#2044

Update README.md by @nebuk89 in actions/checkout#2194

Update CODEOWNERS for actions by @TingluoHuang in actions/checkout#2224

Update package dependencies by @salmanmkc in actions/checkout#2236

v4.2.2

url-helper.ts now leverages well-known environment variables by @jww3 in actions/checkout#1941

Expand unit test coverage for isGhes by @jww3 in actions/checkout#1946

v4.2.1

Check out other refs/* by commit if provided, fall back to ref by @orhantoy in actions/checkout#1924

v4.2.0

Add Ref and Commit outputs by @lucacome in actions/checkout#1180

Dependency updates by @dependabot- actions/checkout#1777, actions/checkout#1872

v4.1.7

Bump the minor-npm-dependencies group across 1 directory with 4 updates by @dependabot in actions/checkout#1739

Bump actions/checkout from 3 to 4 by @dependabot in actions/checkout#1697

Check out other refs/* by commit by @orhantoy in actions/checkout#1774

Pin actions/checkout's own workflows to a known, good, stable version. by @jww3 in actions/checkout#1776

v4.1.6

Check platform to set archive extension appropriately by @cory-miller in actions/checkout#1732

... (truncated)

Commits

de0fac2 Fix tag handling: preserve annotations and explicit fetch-tags (#2356)
064fe7f Add orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is set (...
8e8c483 Clarify v6 README (#2328)
033fa0d Add worktree support for persist-credentials includeIf (#2327)
c2d88d3 Update all references from v5 and v4 to v6 (#2314)
1af3b93 update readme/changelog for v6 (#2311)
71cf226 v6-beta (#2298)
069c695 Persist creds to a separate file (#2286)
ff7abcd Update README to include Node.js 24 support details and requirements (#2248)
See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

…knobs Kernel-side scaffolding for the six-cut minimal-tier port from main's df21286. No host-side wiring yet — existing call sites continue to go through the thin launch_t*_match wrappers and see no behavior change. The next commit wires cuts #1, #2, #3 in GpuPipeline.cu streaming impl + BatchPlotter dispatch. src/gpu/T1Kernel.{cu,cuh}: split launch_t1_match into launch_t1_match_prepare (computes bucket + fine-bucket offset arrays once per plot, resets d_out_count) and launch_t1_match_range (runs match_all_buckets over a [b_begin, b_end) bucket sub-range, accumulating into d_out_meta + d_out_mi + d_out_count via atomicAdd). The original launch_t1_match becomes a thin prepare+range wrapper for the pool path and parity tests. match_all_buckets gains a uint32_t bucket_begin parameter; bucket_id is now bucket_begin + blockIdx.y so range launches resolve to the correct (section_l, match_key_r) tuples — mirror of the existing T2 / T3 prepare-range plumbing (d4f54ae and b86939f). Used by the upcoming cut #4 (T1 match sliced per section_l). src/gpu/T3Kernel.{cu,cuh}: T3 match_all_buckets gains two int64_t biases (meta_l_index_bias, meta_r_index_bias) that shift the kernel-internal global l/r indices into a sliced-meta buffer position. Full-cap callers pass biases = 0 so indexing is unchanged. The existing launch_t3_match_range wrapper passes 0/0; behavior preserved. Add launch_t3_match_section_pair_range — accepts a sliced d_sorted_meta buffer (section_l + section_r rows packed) plus the two biases. Used by the upcoming cut #3 (T3 match section-pair input slicing): d_t2_meta_sorted parked on pinned host across T3 match, the two row slices H2D'd per pass, d_t2_xbits_sorted + d_t2_keys_merged stay full-cap on device for binary-search / target reads. Drops T3 match peak from 5200 → ~3700 MB at k=28. Expose matching_section_host(section_l, num_section_bits) so the streaming caller can compute section_r on the host side from section_l (the kernel still does this internally; this helper avoids duplicating the rotation math at the wiring site). src/host/GpuPipeline.hpp: StreamingPinnedScratch gains two knobs: - gather_tile_count (default 1) — T1 / T2 sort gather tile count. When >= 2, the merged-key + permuted-meta gather output is D2H'd per tile to host pinned (h_meta / h_keys_merged) so the cap-sized sorted_meta never has to be alive on device in full. Drops T1-sort and T2-sort phase peaks from 5200 → ~3640 MB at k=28. - t3_input_slice_count (default 1) — T3 match input-slice count. When >= 2, d_t2_meta_sorted is parked on h_meta across T3 match and each pass H2Ds the section_l + section_r row slices onto cap/N device buffers. Must equal num_sections (= 4 at k=28 strength=2) when active. Defaults preserve old compact-tier behavior. The minimal tier will set both in the upcoming BatchPlotter wiring. All TUs nvcc-clean at sm_89. Existing parity tests + pool path unaffected — they call launch_t1_match / launch_t3_match (thin wrappers) which preserve the original API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the gap d4f54ae's caveat flagged: cuda-only minimal was aspirational (kMinimalFloorBytes = 3828 MiB advertised, but real peak still 5200 MB at T1 sort / T2 sort / T3 match). The three remaining SYCL-branch cuts now land on this branch and bring all three phases below the 4 GiB cliff. Cut #1 — T1 sort gather tiled. src/host/GpuPipeline.cu T1 sort phase: when scratch.gather_tile_ count >= 2, the per-tile sort-merge feeds a tiled gather instead of a single-shot one. Per tile: gather_u64 to a cap/N device tile, D2H to h_meta on host (whose unsorted-meta park lifetime ended at the JIT H2D into d_t1_meta a few lines earlier — h_meta is dead, free for reuse as the sorted-meta accumulator). After the loop, free d_t1_meta + merged_vals + tile, allocate d_t1_meta_sorted full-cap, H2D from h_meta. Live-set during gather drops from 8 + 8 + 4 = 20 cap (5200 MB) to 8 + 8/N + 4 = 12 + 8/N cap. At N=4: 14 cap = 3640 MB. Cut #2 — T2 sort meta + xbits gathers tiled, deferred re-hydrate. Mirror of cut #1 at the T2 sort gather sites, plus a deferred re-hydrate so d_t2_meta_sorted (8 cap) and d_t2_xbits_sorted (4 cap) don't co-reside with d_merged_vals (4 cap). Both accumulators land on host first (h_meta + h_t2_xbits), then d_merged_vals is freed, then both sorted streams are re-hydrated full-cap on device for T3 match. Gather peak: 5200 → ~3640 MB. Re-hydrate peak: ~3120 MB. Cut #3 — T3 match section-pair input slicing. src/host/GpuPipeline.cu T3 match phase: a new t3_input_slice_path branch precedes the existing t3_stage_path. When scratch.t3_input_slice_count >= 2, cut #2's deferred re-hydrate skips the d_t2_meta_sorted H2D entirely — T2 meta stays parked on h_meta. The T3 match phase then: 1. launch_t3_match_prepare to populate d_offsets in the temp storage region. 2. D2H d_offsets so the host loop can compute section_l / section_r row spans. Tiny (17 × 8 = 136 bytes at k=28 strength=2). 3. For each section_l ∈ [0, num_sections): compute section_r via matching_section_host, look up the row spans, H2D the section_l + section_r meta rows from h_meta into a cap/2 device slice buffer (tightly packed at indices [0, l_count) and [l_count, l_count + r_count)), set the kernel biases to map global l/r → slice indices, run launch_t3_match_section_ pair_range over the section_l × num_match_keys bucket sub- range, D2H d_t3_stage to a per-plot pinned h_t3_acc accumulator at offset t3_count, increment t3_count. 4. After all section_l: free d_t2_meta_slice + d_t3_stage + d_t3_match_temp + d_t2_xbits_sorted + d_t2_keys_merged, allocate d_t3 full-cap, H2D from h_t3_acc, free h_t3_acc. Per-plot pinned h_t3_acc (cap × T3PairingGpu = cap × u64) is necessary because h_meta is in active read-use across the section_l loop and can't double as the existing t3_stage_path's accumulator. T3 match peak: 5200 → ~3700 MB (cap/2 meta slice 1040 + cap xbits 1040 + cap keys_merged 1040 + cap/4 t3 stage 520 + offsets ~80 = ~3720 MB at k=28). src/host/BatchPlotter.cpp: minimal tier sets gather_tile_count = 4 (= num_sections at k=28 strength=2) and t3_input_slice_count = num_sections. Dispatch message updated to advertise the layered cuts. kMinimalFloorBytes stays 3828 MiB — already matches expected peak (~3700 MB) + 128 MB margin. README.md: minimal-tier description rewritten to describe the three layered cuts, the new bottleneck (T3 match at ~3700 MB), and the wider 4-GiB-card target. The b86939f-era "N=8 T2 staging only" wording was stale after d4f54ae shifted the bottleneck. Verification on hardware (RTX 4090 was main's verification host): - k=22 batch across plain / compact / minimal must produce byte-identical .plot2 output (cuts re-shape memory only). - k=28 minimal forced under POS2GPU_MAX_VRAM_MB=4096 should dispatch minimal and complete; POS2GPU_STREAMING_STATS=1 should confirm peak ≤ ~3700 MB. - k=28 minimal vs k=28 compact must be byte-identical. Cuts #4 (T1 match sliced) and #6 (Xs gen+sort+pack tiled) deferred — they're additive savings on phases that are no longer the bottleneck after the above three. Cut #4's kernel-side split landed in the previous commit so the wiring is straightforward when needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cuts #1+#2+#3 brought T1 sort, T2 sort, and T3 match below 4 GiB, but T1 match was unaffected and stayed at ~5280 MB at k=28 (d_xs 2080 + d_t1_meta 2080 + d_t1_mi 1040 + temp ~80) — the new overall pipeline peak. Cut #4 closes that gap. src/host/GpuPipeline.cu T1 match phase: when scratch.gather_tile_ count >= 2, gate a tiled_t1_match branch that uses the existing launch_t1_match_prepare + launch_t1_match_range plumbing landed in commit bca9bf1. Each section_l pass writes to cap/N device staging buffers (cap/N × u64 meta + cap/N × u32 mi), D2H'd per pass to scratch.h_meta + a per-plot pinned h_t1_mi accumulator at offset t1_count. After all passes, free stage + d_xs and re-hydrate d_t1_mi full-cap from h_t1_mi for the upcoming T1 sort. d_t1_meta is never allocated — h_meta already holds the unsorted meta when entering T1 sort, so the existing park step becomes a no-op (now gated on d_t1_meta != nullptr). Peak: d_xs (2080) + cap/N × 12 (stage) + temp ≈ 2940 MB at N=4 (= num_sections at k=28 strength=2). Plain / compact paths unchanged. src/host/BatchPlotter.cpp: dispatch message updated to advertise "N=4 T1-match" alongside the existing "T1/T2 sort gather" and "T3 input slicing" cuts. README.md: minimal-tier description rewritten as four layered cuts (was three) — adds cut #4 and re-orders the "compact's tied 5200 MB" summary to include T1 match. After this commit the cuda-only minimal-tier peak budget at k=28 strength=2 should be: Xs phase : ~3072 MB (unchanged, no cut #6 yet) T1 match : ~2940 MB (cut #4, was 5280) T1 sort : ~3640 MB (cut #1, was 5200) T2 match : ~3640 MB (existing N=8 staging) T2 sort : ~3640 MB (cut #2, was 5200) T3 match : ~3700 MB (cut #3, was 5200) T3 sort : ~3155 MB (no change needed) Overall peak: ~3700 MB at T3 match — fits kMinimalFloorBytes (3828 MiB = ~3700 MB + 128 MB margin) and the 4 GiB-card floor. Cut #6 (Xs gen+sort+pack tiled) deferred — Xs already under 4 GiB and not the bottleneck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the last cap × 4 (uint32) hot-spot. The non-tiled Xs phase peaks at four cap-sized uint32 buffers + CUB DoubleBuffer scratch (~4136 MB at k=28); cut #6 generates once into 2 cap × u32, then sorts in N tiles using cap/N alternate buffers, accumulates into host pinned, and packs into d_xs without ever holding 4 cap on device. src/host/GpuPipeline.cu Xs phase: when scratch.gather_tile_count >= 2 + scratch.h_meta != nullptr, take a tiled_xs branch: 1. Allocate d_xs_keys_full + d_xs_vals_full (2 cap × u32). 2. launch_xs_gen → fill them. 3. Allocate one shared cap/N alternate pair (keys + vals) + CUB scratch sized for tile_cap_xs. 4. For each tile in [0, N): CUB DoubleBuffer SortPairs over the slice, D2H sorted (key, val) pair to scratch.h_meta reinterpreted as a 2-cap u32 buffer (h_xs_keys at the first cap entries, h_xs_vals at the next cap — h_meta is cap × u64 = 2 cap × u32 of storage, with total_xs <= cap so both halves fit). h_meta gets overwritten by T1 match's cut #4 D2H later, so reusing it through Xs is safe. 5. Free per-tile alt + scratch + d_xs_keys_full + d_xs_vals_full (peak drops to 0 device-side). 6. Host paired stable merge (cut #5 shape) over h_xs_keys + h_xs_vals so the host buffers end up globally sorted by match_info with vals tiebreak. 7. Allocate d_xs (cap × XsCandidateGpu = 2 cap) and pack via two cudaMemcpy2DAsync H2D copies — match_info field gets h_xs_keys at struct stride 8, x field gets h_xs_vals at the same stride. No separate d_xs_keys_b / d_xs_vals_b on-device pack pair needed. Per-phase peak: 2 cap (full keys+vals) + 2 cap/N (sort alt) + scratch ≈ 2.5 cap = 2570 MB at N=4. Final d_xs alloc is the post-merge peak at ~2 cap = 2080 MB. Plain / compact paths unchanged (gated on the same tier flags as the other cuts). src/host/BatchPlotter.cpp: kMinimalFloorBytes 4356 → 3768 MiB (= 3640 measured peak + 128 MiB margin). Dispatch message "3.68 GiB floor". README.md: minimal-tier description rewritten as six layered cuts with measured per-phase peaks (Xs 2570, T1/T2 sort 3640, T3 match/sort 3640) and the new ~31 s/plot wall (vs ~12 s compact) reflecting the host-CPU merge overhead. Top-of-file streaming-floor summary 4.25 → 3.7 GiB. 4 GiB cards now targeted (with the standard "real 4 GiB hardware reports ~3.5 GiB free post-CUDA-context, please report actual fit" caveat). tools/xchplot2/cli.cpp: --tier help "minimal = ~3.7 GiB floor, fits 4 GiB". Verification on RTX 4090 (XCHPLOT2_STREAMING=1 + --tier minimal, POS2GPU_STREAMING_STATS=1): - k=22 plain / compact / minimal byte-identical (sha256 17dbf594…). - k=28 minimal byte-identical with k=28 compact (sha256 f42e62ad…). - k=28 minimal peak 4228 → 3640 MB; the bottleneck is now T1 sort / T2 sort / T3 match / T3 sort all tied at 3640 MB (T2 match was already at this level via the existing N=8 staging). - k=28 minimal wall: ~31 s/plot (vs ~12 s compact). The 2.6× slowdown matches the SYCL-branch's measured ~34 vs ~13 s for the same six-cut configuration on sm_89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Apr 27, 2026

Jsewill merged commit 752a39b into main Apr 27, 2026
11 checks passed

dependabot Bot deleted the dependabot/github_actions/actions/checkout-6 branch April 27, 2026 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump actions/checkout from 5 to 6#4

build(deps): bump actions/checkout from 5 to 6#4
Jsewill merged 1 commit into
mainfrom
dependabot/github_actions/actions/checkout-6

dependabot Bot commented on behalf of github Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Apr 27, 2026

v6.0.0

What's Changed

v6-beta

What's Changed

v5.0.1

What's Changed

Changelog

v6.0.2

v6.0.1

v6.0.0

v5.0.1

v5.0.0

v4.3.1

v4.3.0

v4.2.2

v4.2.1

v4.2.0

v4.1.7

v4.1.6

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant