Skip to content

release: v0.4.0 - corpus pipeline + constructor migration#7

Merged
rkrug merged 4 commits into
mainfrom
dev
Apr 10, 2026
Merged

release: v0.4.0 - corpus pipeline + constructor migration#7
rkrug merged 4 commits into
mainfrom
dev

Conversation

@rkrug

@rkrug rkrug commented Apr 9, 2026

Copy link
Copy Markdown
Owner

v0.4.0 Release PR

This PR consolidates the kagiPro v0.4.0 release scope and is ready for merge.

Summary

kagiPro moved from endpoint-constructor migration into a full corpus-oriented workflow release:

  • query constructor rename finalized (query_<endpoint>)
  • high-level project workflow via kagi_fetch()
  • hard-break replacement of legacy abstract augmentation with modular content pipeline
  • updated docs, vignettes, skills, and pkgdown index alignment

Key Changes

API + workflow

  • constructor migration completed:
    • query_search(), query_enrich_web(), query_enrich_news(), query_summarize(), query_fastgpt()
  • added/updated project workflow primitives:
    • kagi_fetch()
    • kagi_update_query()
    • clean_request()
  • kagi_request_parquet() is now JSON -> parquet only (no inline abstract augmentation)

Corpus pipeline (new canonical path)

  • download_content()
  • content_markdown()
  • markdown_abstract()
  • provider functions:
    • summarize_with_openai()
    • summarize_with_kagi()
  • read_corpus(..., abstracts = TRUE) for id + query abstract linking

Documentation + site + skills

  • added vignette: vignettes/corpus-workflow.qmd
  • updated quickstart/endpoint docs and README pipeline sections
  • synced NEWS.md and PROJECT_DESIGN.md to current architecture
  • expanded skills:
    • user-corpus-workflow
    • maintainer-corpus-pipeline
    • maintainer-release-sync
  • fixed pkgdown article index to include corpus-workflow

Quality fixes after check feedback

  • resolved R CMD check notes for namespace/global bindings:
    • stats::setNames in kagi_update_query()
    • .data pronoun usage in read_corpus()
    • rlang import alignment in DESCRIPTION/NAMESPACE
  • resolved Rd issues:
    • documented retry_max_tries in both summarizer providers
    • shortened default OpenAI prompt text to avoid line-width note

Breaking Changes (explicit)

  • removed add_sbstract_to_parquet()
  • removed abstract-augmentation behavior from kagi_request_parquet()

Validation

  • devtools::document() regenerated docs/NAMESPACE
  • devtools::check(document = FALSE, run_dont_test = FALSE, error_on = "never")
    • no errors/warnings
    • only environment NOTE remained: "unable to verify current time"

Recent commits included

  • 22dd663 release: v0.4.0 constructor rename + docs/testing alignment
  • e811159 refactor corpus pipeline + remove legacy abstract API
  • 70a6355 fix check notes (namespace/Rd)
  • 3d2ebc2 fix pkgdown index + strengthen maintainer skill gates

rkrug added 4 commits April 9, 2026 09:26
This release finalizes a breaking constructor naming migration and synchronizes code, tests, and documentation around the new query-first API shape.

API and naming changes:
- Rename endpoint constructors to query_<endpoint> for improved discoverability:
  - search_query() -> query_search()
  - enrich_web_query() -> query_enrich_web()
  - enrich_news_query() -> query_enrich_news()
  - summarize_query() -> query_summarize()
  - fastgpt_query() -> query_fastgpt()
- Remove legacy constructor exports (hard break; no compatibility wrappers).
- Keep non-constructor public APIs stable:
  - kagi_connection()
  - kagi_request()
  - kagi_request_parquet()

Request/dispatch and classes:
- Migrate request validation/dispatch to the kagi_query_* class family.
- Keep endpoint routing behavior unchanged for search/enrich/summarize/fastgpt execution.

Parquet and abstract augmentation:
- Add endpoint-prefixed deterministic IDs in parquet conversion:
  - SEARCH_, ENRICH_WEB_, ENRICH_NEWS_, SUMMARIZE_, FASTGPT_
- Add add_abstract argument to kagi_request_parquet() to trigger abstract augmentation for supported endpoints.
- Keep warning behavior for unsupported endpoints when add_abstract = TRUE.

Bridge/helper and docs generation:
- Add add_sbstract_to_parquet() helper and corresponding Rd.
- Regenerate docs/NAMESPACE/man pages for renamed constructors and new helper.

Tests and fixtures:
- Update test suite to new constructor names/classes.
- Extend tests for endpoint-prefixed IDs and add_abstract behavior.
- Refresh vcr cassettes to align with current request paths and names.

Documentation and project guidance:
- Update README, all endpoint vignettes, quickstart, pkgdown config, and skills content to new constructor naming.
- Update PROJECT_DESIGN.md and NEWS.md for v0.4.0 release narrative and explicit breaking-change mapping.
- Bump package version to 0.4.0 in DESCRIPTION.
- Add zenodo.csl for citation workflow support.
…nc docs/skills

Core refactor (hard break):

- Remove legacy add_sbstract_to_parquet() API and delete R/add_sbstract_to_parquet.R

- Make kagi_request_parquet() JSON->parquet only; remove abstract-augmentation arguments/branches

- Add modular corpus pipeline functions:

  - download_content()

  - content_markdown()

  - markdown_abstract()

  - summarize_with_openai() / summarize_with_kagi()

  - read_corpus(..., abstracts = TRUE) with id+query linking

- Add supporting internals and IO utilities:

  - content_pipeline_jobs

  - parquet_io_utils

  - summarize_text_records

  - clean_request() and kagi_update_query() integration updates

- Keep per-query metadata model and query-partition refresh behavior

Workflow/documentation updates:

- Add new end-to-end vignette: vignettes/corpus-workflow.qmd

- Update quickstart and summarize endpoint vignette for current workflow

- Update README for project-folder workflow + modular corpus enrichment

- Update NEWS.md and PROJECT_DESIGN.md to reflect current architecture and breaking changes

Skills expansion and alignment:

- Add user-corpus-workflow skill with references/examples

- Add maintainer-corpus-pipeline skill (contracts + testing)

- Add maintainer-release-sync skill (release consistency checklist)

- Update inst/skills/README.md and maintainer workflow routing

Generated artifacts/tests:

- Regenerate NAMESPACE/man pages for new/removed APIs

- Remove obsolete man/add_sbstract_to_parquet.Rd

- Update tests for removed legacy flow and new corpus pipeline behavior
Quality and documentation cleanup:

- Fix NOTE 'no visible global function definition for setNames' by using stats::setNames in kagi_update_query().

- Fix NOTE 'no visible binding for global variable id/query/abstract' in read_corpus() by switching to rlang::.data pronoun in dplyr verbs.

- Add roxygen import for rlang::.data and regenerate NAMESPACE accordingly.

- Add missing retry_max_tries documentation for summarize_with_openai() and summarize_with_kagi().

- Shorten summarize_with_openai() default system_prompt string to eliminate Rd usage line-width NOTE.

- Regenerate Rd pages for summarizer providers and refresh generated docs.

Dependency/metadata alignment:

- Add rlang to DESCRIPTION Imports to satisfy new namespace import requirements.

- Keep runtime behavior unchanged except for documentation/default prompt text normalization.

Validation performed:

- Ran devtools::document().

- Ran devtools::check(document = FALSE, run_dont_test = FALSE, error_on = "never").

- Confirmed requested notes are resolved:

  - R code possible problems: OK

  - Rd line widths: OK

- Remaining NOTE is environment-related only ('unable to verify current time').
Behavior/config updates:

- Add corpus-workflow vignette to _pkgdown.yml articles index to resolve pkgdown build error (missing vignette in index).

Skills/process updates:

- Update maintainer-workflow skill to require pre-commit review/sync of NEWS.md, PROJECT_DESIGN.md, README.md, and relevant vignettes for behavior changes.

- Update maintainer-release-sync skill with explicit pre-commit/merge sync checks for the same documentation set.

- Add explicit detailed commit-message requirement in maintainer skills (behavior changes, docs/skills sync, validation outcomes).

Validation context:

- pkgdown articles build now includes corpus-workflow and completes without missing-index vignette error.
@rkrug rkrug changed the title release: v0.4.0 - query_<endpoint> constructor migration release: v0.4.0 - corpus pipeline + constructor migration Apr 10, 2026
@rkrug rkrug merged commit 6f446f1 into main Apr 10, 2026
3 checks passed
@rkrug rkrug deleted the dev branch April 10, 2026 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant