Skip to content

feat(tools): add Copilot backend, per-commit cache, and async refactor to release notes generator#7264

Merged
romange merged 2 commits into
mainfrom
feat/release-notes-copilot-backend
May 5, 2026
Merged

feat(tools): add Copilot backend, per-commit cache, and async refactor to release notes generator#7264
romange merged 2 commits into
mainfrom
feat/release-notes-copilot-backend

Conversation

@romange
Copy link
Copy Markdown
Collaborator

@romange romange commented May 5, 2026

Summary

  • Adds a GitHub Copilot backend (--backend copilot) as an alternative to the Anthropic backend; routes requests through the Copilot CLI — no ANTHROPIC_API_KEY required
  • Adds per-commit analysis disk cache (default <repo>/.release_notes_cache/); cache key covers backend name, model, token budget, system prompt, and commit SHA — any prompt or model change auto-invalidates
  • Migrates the analysis pipeline from concurrent.futures.ThreadPoolExecutor to native asyncio; compose_release_notes streams output with a 15 s heartbeat and live token counters
  • Tightens commit categorisation: category descriptions and CRITICAL RULE block now strongly enforce that any fix (crash, data corruption, race, wrong result) lands in bugfix regardless of subsystem; domain categories are reserved for new features
  • Output filename gains a backend tag for Copilot runs (release_notes_copilot_<range>.md) to distinguish from Anthropic runs

Refactoring (no behaviour change to existing Anthropic path):

  • LLMBackend interface extended with analyze_commit, compose_progress_status, post_process_notes, notes_filename — all backend-specific decisions live in the backend, not in callers
  • AnthropicBackend._sync_stream replaces the 75-line nested _run closure; _ComposeStats dataclass replaces five loose instance variables
  • _model renamed to public model; _make_cache_key(backend, sha) replaces the old 5-argument function
  • main() decomposed: _build_arg_parser, _check_prerequisites, _build_backend, _resolve_cache_dir, _run_async (single asyncio.run call)
  • _is_fatal_error, FAIL_FAST_THRESHOLD, and all token/heartbeat/threshold constants promoted to module level

Release notes comparison (v1.37.0..v1.38.0)

Three outputs were generated from the same commit range to compare quality:

Gold (pre-PR, Anthropic) New Anthropic Copilot
File release_notes_v1.37.0_to_v1.38.0.gold.md release_notes_v1.37.0_to_v1.38.0.md release_notes_copilot_v1.37.0_to_v1.38.0.md
Lines 309 307 307
H1 title
Preamble artefact ⚠️ model emitted "Now I have full context…" before the markdown
Highlights ZSTD compression, TTL embed, TOPK+CMS TTL embed, Top-K+CMS, HNSW range search TTL embed, ZSTD compression, TOPK+CMS
Structured sections 8 sections 8 sections 9 sections (adds Docker)

Anthropic new vs gold — structurally identical; new output groups HNSW improvements more tightly in Highlights and surfaces the 30+ bug-fix count explicitly in the opening.

Copilot vs gold — equivalent quality and coverage; the Copilot model occasionally violates the "no preamble" system-prompt constraint (the spurious leading line above), which post_process_notes does not yet strip.

release_notes_v1.37.0_to_v1.38.0.gold.md (pre-PR baseline)
# Dragonfly v1.38.0 Release Notes

This release delivers major memory efficiency improvements, expanded probabilistic data structure support, and significant search engine enhancements. The standout change is a ~26% reduction in memory for workloads with expiring keys by embedding TTL directly into keys and eliminating the separate expire table — a meaningful win for any deployment using key expiry at scale. New probabilistic data structures (TOPK and CMS command families) join the existing HyperLogLog and Bloom filter support, and vector search gains HNSW range queries, improved hybrid search accuracy, and correct replication across shard count mismatches. Operators will also find improved observability through new Prometheus metrics for TLS, pipelines, streams, and defragmentation.

## Highlights

- [~26% memory reduction for workloads with expiring keys] TTL is now embedded directly in each key, eliminating the separate per-shard expire table and saving roughly 26–35 bytes per key with expiry (#6923, #6933).
- Adds the TOPK command family (TOPK.RESERVE, TOPK.ADD, TOPK.INCRBY, TOPK.QUERY, TOPK.COUNT, TOPK.LIST, TOPK.INFO) and the CMS (Count-Min Sketch) command family (CMS.INITBYDIM, CMS.INITBYPROB, CMS.INCRBY, CMS.QUERY, CMS.INFO, CMS.MERGE), both with RDB persistence (#6950, #6896).
- [Up to 75x less memory for repetitive list data; 3–10x for real-world workloads] Introduces ZSTD dictionary-based compression for Redis lists via the new `list_experimental_zstd_dict_threshold` flag (#6967).
release_notes_v1.37.0_to_v1.38.0.md (new Anthropic backend)
# Dragonfly v1.38.0 Release Notes

This release delivers major advances across memory efficiency, vector search, probabilistic data structures, and operational observability. The standout improvement is a **~26% reduction in memory for TTL-heavy workloads** by eliminating the per-shard expire table and embedding TTL directly in keys. Vector search gains HNSW range search, improved hybrid search accuracy, deferred write operations during serialization, and better FT.AGGREGATE integration. Two new probabilistic data structure families — **Top-K (HeavyKeeper)** and **Count-Min Sketch** — are now fully implemented with RDB persistence. Operators gain richer Prometheus metrics for TLS, pipelines, defragmentation, stream access patterns, and pipeline latency. The release also includes over 30 bug fixes spanning search crashes, replication correctness, tiering races, and stream handling.

## Highlights

- **Eliminated the per-shard expire table**, embedding TTL directly in keys for [~26% memory reduction (900 MB → 665 MB)] on workloads with many expiring keys (#6923, #6933).
- **Full Top-K and Count-Min Sketch command families** added with RDB persistence, ACL integration, and Redis Stack compatibility (#6950, #6896, #6932).
- **HNSW vector search significantly expanded**: range search in FT.SEARCH, KNN + APPLY in FT.AGGREGATE, filtered brute-force for small candidate sets, and deferred writes during serialization (#6898, #7066, #6730, #6746).
release_notes_copilot_v1.37.0_to_v1.38.0.md (new Copilot backend)
# Dragonfly v1.38.0 Release Notes

Dragonfly v1.38.0 delivers substantial memory efficiency gains, expanded probabilistic data structure support, and richer vector search capabilities. The headline change is elimination of the per-shard ExpireTable — embedding TTL directly into keys saves ~26% memory for expiry-heavy workloads (900 MB → 665 MB for 10M keys with TTL). New TOPK and Count-Min Sketch command families join the existing probabilistic primitives with full RDB persistence and RedisBloom compatibility, while ZSTD dictionary-based list compression achieves up to 75× memory reduction in benchmarks. Vector search gains HNSW range queries in both FT.SEARCH and FT.AGGREGATE, improved hybrid search accuracy, and correct index restoration across shard-count-mismatched replicas. Operators also get a distroless Docker image, early TLS/TCP validation, new Prometheus metrics for TLS handshakes and pipeline latency, and 4× higher tiered storage write depth.

## Highlights

- [~26% memory reduction for workloads with expiring keys (900 MB → 665 MB for 10M keys with TTL)] Eliminated the per-shard ExpireTable by embedding TTL directly into each key's CompactKey encoding, removing 26–35 bytes of overhead per expiring key (#6923, #6933).
- [Up to 75× memory reduction in synthetic benchmarks (2.56 GiB → 32.84 MiB); 3–10× expected in real-world workloads] Introduced ZSTD dictionary-based list compression via a shared thread-local dictionary across all QList instances, controlled by the new `list_experimental_zstd_dict_threshold` flag (#6967).
- Added TOPK and Count-Min Sketch (CMS) command families with full RDB persistence, compatible with RedisBloom (#6950, #6896).

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 5, 2026 12:17
… cache, async

Add a GitHub Copilot backend (--backend copilot) that authenticates via the
Copilot CLI instead of ANTHROPIC_API_KEY, making the tool usable without an
Anthropic account.

Add a per-commit analysis disk cache (.release_notes_cache/ by default). The
cache key hashes backend name, model, token budget, system prompt, and commit
SHA, so any change to prompts or model automatically invalidates stale entries.
Re-running over a range that was already analyzed skips all API calls.

Migrate the analysis pipeline from concurrent.futures.ThreadPoolExecutor to
native asyncio, enabling a single asyncio.run() entry point and a live
heartbeat loop during the long compose call.

Tighten commit categorisation prompts: add a CRITICAL RULE block and annotate
every domain category as "(features only)", so crash fixes / data-corruption
fixes / race conditions always land in bugfix rather than leaking into domain
sections.

Refactor for separation of concerns:
- LLMBackend interface gains analyze_commit(), compose_progress_status(),
  post_process_notes(), notes_filename() — backend-specific logic stays in
  the backend, not in callers
- AnthropicBackend._sync_stream() replaces a 75-line nested closure;
  _ComposeStats dataclass replaces five loose instance variables
- _model renamed to public model; _make_cache_key(backend, sha) replaces
  the old 5-argument function
- main() decomposed into _build_arg_parser, _check_prerequisites,
  _build_backend, _resolve_cache_dir, _run_async
- FAIL_FAST_THRESHOLD, _is_fatal_error(), and all numeric constants
  (ANALYZE_MAX_TOKENS_*, COMPOSE_MAX_TOKENS, COMPOSE_HEARTBEAT_S, …)
  promoted to module level

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@romange romange force-pushed the feat/release-notes-copilot-backend branch from 2241a32 to 8d2e3e5 Compare May 5, 2026 12:18
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 5, 2026

🤖 Augment PR Summary

Summary: This PR significantly expands tools/release_notes_generator.py by adding a GitHub Copilot-backed LLM option, introducing a per-commit on-disk analysis cache, and refactoring the pipeline to native asyncio for better concurrency control and streaming composition.

Changes:

  • Adds --backend copilot (Copilot CLI auth) alongside the existing Anthropic backend.
  • Introduces a per-commit JSON cache under .release_notes_cache/ (also added to .gitignore), with cache keys derived from backend/model/prompt/token budget + commit SHA.
  • Refactors commit analysis to asyncio with bounded parallelism, retries, and a fail-fast path for clearly non-retriable errors.
  • Adds streaming composition heartbeat/status reporting and backend-specific post-processing hooks.
  • Tightens commit categorization guidance (especially forcing fixes into bugfix) and theme promotion rules for release notes structure.
  • Updates output naming to distinguish Copilot runs (e.g. release_notes_copilot_<range>.md).

Technical Notes: Backends are now unified behind an LLMBackend interface, with backend-specific decisions (composition, progress, post-processing, filename) kept out of the orchestration code.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread tools/release_notes_generator.py Outdated
Comment thread tools/release_notes_generator.py Outdated
text = text.strip()
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```$", "", text)
match = re.search(r"\{.*\}", text, re.DOTALL)
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re.search(r"\{.*\}", ...) is greedy and can capture more than the intended JSON object if the backend returns extra braces (or multiple JSON objects), leading to intermittent json.loads failures. This makes the Copilot JSON path more brittle than necessary.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread tools/release_notes_generator.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the tools/release_notes_generator.py release-notes tool by adding a GitHub Copilot-backed LLM option, introducing a per-commit disk cache for analysis results, and refactoring the execution pipeline to use asyncio (including streaming/heartbeat progress during composition).

Changes:

  • Add a new --backend copilot option alongside the existing Anthropic backend, with backend-specific behavior encapsulated behind an LLMBackend interface.
  • Add a per-commit analysis disk cache (default .release_notes_cache/) keyed by backend/model/prompt/token budget + commit SHA.
  • Migrate commit analysis + composition from ThreadPoolExecutor to asyncio, adding compose heartbeats and progress counters.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
tools/requirements.txt Notes optional Copilot SDK install for the new backend.
tools/release_notes_generator.py Implements Copilot backend, async pipeline refactor, streaming compose progress, and per-commit analysis caching.
.gitignore Ignores the new default on-disk cache directory.

Comment thread tools/release_notes_generator.py
@romange
Copy link
Copy Markdown
Collaborator Author

romange commented May 5, 2026

Gold

Before the pr.
release_notes_v1.37.0_to_v1.38.0.gold.md

Copilot and Anthropic - after this change

Most notable change is more issues are put under bug fixes category.

release_notes_copilot_v1.37.0_to_v1.38.0.md
release_notes_v1.37.0_to_v1.38.0.md

@romange romange requested a review from vyavdoshenko May 5, 2026 12:27
- Separate anthropic and pydantic imports so pydantic (needed by
  CommitAnalysis) is still available for --backend copilot when the
  anthropic package is not installed

- Fix _make_cache_key to length-prefix each field before hashing,
  preventing hash collisions between distinct (name, model, ...) tuples
  that share the same raw byte concatenation

- Fix _parse_commit_analysis_json to use json.JSONDecoder.raw_decode
  instead of a greedy r"\{.*\}" regex, which correctly handles nested
  braces and stops at the first complete JSON object

- Fix _analyze_round to add commits that were silently skipped due to
  abort_event back into the failed list after asyncio.gather completes,
  so they are visible to the caller and included in the next retry round

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@romange
Copy link
Copy Markdown
Collaborator Author

romange commented May 5, 2026

augment review

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

return parser


def _check_prerequisites(backend_name: str) -> Optional[str]:
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_prerequisites doesn’t validate that Pydantic is installed/compatible, but later code unconditionally uses Pydantic v2 APIs like model_dump()/model_dump_json() (including for the Copilot backend + cache). This can crash at runtime if users follow the Copilot install path (pip install github-copilot-sdk) or have Pydantic v1 (Anthropic allows <3).

Severity: medium

Other Locations
  • tools/release_notes_generator.py:799
  • tools/release_notes_generator.py:943
  • tools/release_notes_generator.py:959

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Contributor

@vyavdoshenko vyavdoshenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

python tools/release_notes_generator.py HEAD~50..HEAD --max-parallel 4

# GitHub Copilot backend (uses Copilot CLI auth, no API key needed):
pip install github-copilot-sdk
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install -r tools/requirements.txt

@romange romange merged commit cbdb4fa into main May 5, 2026
12 checks passed
@romange romange deleted the feat/release-notes-copilot-backend branch May 5, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants