Skip to content

kagiPro 0.5.0: retire v0, rebuild test suite, harden CI#11

Open
rkrug wants to merge 7 commits into
mainfrom
dev
Open

kagiPro 0.5.0: retire v0, rebuild test suite, harden CI#11
rkrug wants to merge 7 commits into
mainfrom
dev

Conversation

@rkrug

@rkrug rkrug commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

  • Retire the Kagi v0 beta API entirely; v1 is now the only supported surface. Rename v1 constructors to drop the _v1 suffix.
  • Add pages (body-paginated search), combine (single parquet output), and file_type / domain / where query helpers.
  • Rebuild the test suite per-function with vcr cassettes; tighten retry + error-reporting; modernise the GitHub Actions config.

Headline changes (DESCRIPTION → 0.5.0, breaking)

API surface

  • Remove v0 constructors (query_search, query_enrich_web, query_enrich_news, query_summarize, query_fastgpt) and summarize_with_kagi().
  • Rename query_search_v1()kagi_query_search() (class kagi_query_search); query_extract()kagi_query_extract().
  • kagi_connection() keeps the api_version argument for forward compat but accepts only "v1"; auth is always Bearer.
  • kagi_fetch() writes search output to <project>/search/ (was search_v1/). Existing project folders need to be rerun.
  • kagi_request() gains pages (1–10) and threads body$page correctly per iteration.
  • kagi_request_parquet() / kagi_fetch() gain combine — collapse Hive partitions into a single combined.parquet.
  • kagi_query_search() gains file_type (validated against Kagi's "Format" filter whitelist), domain (site: operator), and where (intitle: / inurl: / anywhere).
  • Rename open_search_query()kagi_open_search_query(); accepts a single object or a list of kagi_query_search objects; URL-encodes every shaping field into q= + parameters.

Robustness

  • kagi_connection() retries on 408/429/500/502/503/504 with a custom backoff capped at 10 s/attempt.
  • kagi_request() reads e$resp (httr2's actual condition field) so HTTP error envelopes propagate full HTTP status + Kagi error[].code / error[].msg + raw body.
  • kagi_request_parquet() pre-detects which typed result arrays are present before UNNEST, eliminating DuckDB "Could not find key" Binder Errors.

Tests

  • Split monolithic test-v1.R into one file per exported function: 10 files, 186 tests.
  • kagi_open_search_query covered end-to-end with a mocked browseURL.
  • Cassette-backed coverage (vcr) for kagi_request and kagi_fetch: 7 cassettes recorded against live API, Authorization header filtered out (leak-checked).
  • Cassette policy via helper_kagi.R::cassette_record_mode() — default \"once\" (replay), KAGIPRO_RECORD_CASSETTES=true re-records all, KAGIPRO_RECORD_CASSETTES=false is strict replay. VCR_RECORD_MODE takes precedence.

CI

  • R-CMD-check.yaml: push trigger → [main, dev]; explicit PR trigger → [main, dev]; env KAGIPRO_RECORD_CASSETTES=false so missing/mismatched cassettes fail cleanly; workflow_dispatch added.
  • pkgdown.yaml: push (deploy) → [main] only; PR (build, no deploy) → [main, dev]; same strict-replay env.
  • Dependency caching was already in place via r-lib/actions/setup-r-dependencies@v2.

Docs

  • v0 endpoint vignettes deleted (search/enrich/summarize/fastgpt-endpoint.qmd).
  • Rewrote quickstart, corpus-workflow, v1-api-and-corpus, agent-quick-index.
  • CLAUDE.md, PROJECT_DESIGN.md, README.md, NEWS.md, llms.txt / llms-full.txt (+ pkgdown mirrors), _pkgdown.yml, inst/skills/ all aligned with the v1-only surface.

Test plan

  • devtools::test() — 186 / 186 PASS locally
  • devtools::check(args = '--no-manual') — 0 errors / 0 warnings / 0 notes
  • bash scripts/check-ai-docs.sh — passes
  • Cassettes verified to contain no real API key
  • Confirm CI is green on the PR
  • (Reviewer) sanity-check on a fresh clone: Rscript -e 'devtools::test()' works without any KAGI_API_KEY set

🤖 Generated with Claude Code

rkrug and others added 7 commits May 18, 2026 20:15
…r reporting

v0 is gone — v1 is now the only supported Kagi API.

API surface:
- Remove v0 constructors (query_search, query_enrich_web, query_enrich_news,
  query_summarize, query_fastgpt) and the v0-only summarize_with_kagi() helper.
- Rename v1 constructors so the _v1 suffix is no longer needed:
  query_search_v1() -> kagi_query_search() (class kagi_query_search_v1
  -> kagi_query_search); query_extract() -> kagi_query_extract().
- kagi_connection() keeps the api_version argument for forward compatibility
  but accepts only "v1"; auth is always Bearer.
- kagi_fetch() writes search output to <project>/search/ (was search_v1/).
- kagi_request() gains a `pages` argument (1-10) and threads body$page
  correctly per iteration so each page is a distinct request.

Robustness:
- kagi_connection() retries on 408/429/500/502/503/504 (was only 429/503),
  with a custom backoff capped at 10s/attempt.
- perform_request() reads e$resp (httr2's actual field) so HTTP error
  envelopes propagate full HTTP status + API code + API msg + body.
- write_search_parquet() pre-detects which typed result arrays are present
  in the JSON before UNNEST, eliminating DuckDB Binder Error noise.

Docs, tests, AI artifacts:
- DESCRIPTION bumped to 0.5.0; NEWS.md, PROJECT_DESIGN.md updated.
- Delete v0 endpoint vignettes and skill packs (user-enrich, user-summarize,
  user-fastgpt); rewrite quickstart, corpus-workflow, v1-api-and-corpus,
  agent-quick-index for the v1-only surface.
- Resync llms.txt / llms-full.txt + pkgdown/extra mirrors; trim _pkgdown.yml
  navbar and article list; update README and CLAUDE.md.
- Replace v0 cassette-driven tests with v1 unit-test suite (test-v1.R).
- Add inst/api_specs/openapi.yaml (Kagi v1 contract) and
  scripts/diff-against-generated.R.

devtools::check() clean (0 errors / 0 warnings / 0 notes); devtools::test()
29/29 PASS; scripts/check-ai-docs.sh passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After per-query/per-type Hive partitions are written, `combine = TRUE`
union-merges them by column name into a single `<output>/combined.parquet`
file via DuckDB (NULL-fills absent columns across search result types) and
removes the partition directories. Plumbed through `kagi_fetch()` with
default `combine = TRUE`; pass `combine = FALSE` to retain the partitioned
layout.

Smoke-tested against a real 19-partition / 1211-row search dataset:
collapses to one parquet file, preserving `query` and `type` as columns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R/ — dead code:
- Remove unused api_version_for_query_class() from utils.R (zero callers
  after v0 retirement).
- Drop the unused `connection` argument (and its validation) from
  markdown_abstract(); it was scaffolded for a Kagi-side summarizer that
  no longer exists.
- Drop the unused @importFrom utils tail and @importFrom httr2 req_url_query
  from kagi_request.R; NAMESPACE regenerated.

R/ — stale roxygen:
- kagi_request(): @param limit no longer mentions "enrich"; @details
  pagination text now describes the v1 body-driven `pages` mechanism
  instead of the dead `meta$next_cursor` cursor.
- kagi_request(): expand @param pages to explain body-paginated semantics.
- kagi_fetch(): @param limit drops "/enrich".

Prose docs:
- CLAUDE.md: rewrite Project + Architecture sections for the v1-only
  surface — drop v0 wording, list current constructors and their classes,
  surface `pages` and `combine`, describe the retry/error model accurately.
  Update class-check note (kagi_query_search, not kagi_query_search_v1).
  Note vcr plumbing is retained but tests no longer use cassettes.
- PROJECT_DESIGN.md: prune Skills Layer/Skill Mapping to the surviving
  skills (user-search, user-corpus-workflow + maintainer-*); replace the
  "(toward 0.4.1)" historical block with a one-line pointer to NEWS.md.
- llms-full.txt: surface the `pages` arg on kagi_request(), and the
  `combine` arg on kagi_fetch() / kagi_request_parquet(). pkgdown/extra
  mirror resynced.

Verification:
- devtools::document(): NAMESPACE drops the two stale @importFrom lines.
- devtools::test(): 29/29 pass.
- devtools::check(--no-manual): 0 errors / 0 warnings / 0 notes.
- scripts/check-ai-docs.sh: passes (llms mirrors byte-identical).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
open_search_query():
- Rename to kagi_open_search_query() for namespace consistency.
- Accept a single kagi_query_search object OR a list of them (open all).
- Build the URL from the query object: every non-null shaping field
  (workflow, lens, lens_id, filters, safe_search, page, limit, timeout,
  format, personalizations, extract) is URL-encoded into the query string
  alongside `q`. Scalars are URL-encoded; nested lists are JSON-encoded
  then URL-encoded so the full search intent is visible in the address bar.

kagi_query_search():
- Add `file_type` arg validated against Kagi's "Format" filter whitelist
  (pdf, ps, csv, epub, kml, kmz, gpx, hwp, htm, html, xls, xlsx, ppt,
  pptx, doc, docx, odp, ods, odt, rtf, svg, tex, txt, xml). Each value is
  appended to the query string as `filetype:<ext>`.
- Add `domain` arg: each value is appended as `site:<domain>`.
- Add `where` arg: `"anywhere"` (default), `"title"`, or `"url"`. Wraps
  the query term in `intitle:"..."` or `inurl:"..."` when not anywhere.
- The shaping transforms apply after `expand`, so per-term operators
  inherit the operator suffix.
- `open_in_browser = TRUE` now delegates to kagi_open_search_query(result)
  on the whole list.

Touched: NAMESPACE, R/, man/, vignettes/quickstart.qmd, vignettes/
agent-quick-index.qmd, inst/skills/user-search/SKILL.md.

Verification: devtools::document/test/check all clean (0/0/0);
scripts/check-ai-docs.sh passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Restructure tests/testthat:

- One file per exported function:
    test-kagi_connection.R          (11 tests)
    test-kagi_query_search.R        (55 tests — covers file_type / domain /
                                     where, every validation path, body
                                     shaping, and print method)
    test-kagi_query_extract.R       (23 tests — chunking, HTTPS-only,
                                     validation)
    test-kagi_open_search_query.R   (21 tests — single + list dispatch,
                                     URL encoding, session_token, error
                                     modes; browseURL is mocked)
    test-utils.R                    (24 tests — dispatch helpers and
                                     key resolution)
    test-kagi_request_parquet.R     (18 tests — search + extract fixtures,
                                     combine = TRUE/FALSE, error paths)
    test-as_corpus_parquet.R        (7 tests — happy path, endpoint dir
                                     input, missing columns, id_prefix,
                                     no-overwrite)
    test-clean_request.R            (5 tests — dry_run, real delete,
                                     empty project)
    test-kagi_request.R             (14 tests; 4 cassettes — search,
                                     extract, list dispatch, write_dummy
                                     fallback)
    test-kagi_fetch.R               (8 tests; 3 cassettes — combined,
                                     partitioned, extract)

  Old aggregate test-v1.R removed.

- Fixtures under tests/testthat/fixtures/:
    json_search/query_1/{search_1.json,_query_meta.json}  (real Kagi
                                     response copied from output/)
    json_extract/query_1/{extract_1.json,_query_meta.json} (synthetic)
    cassettes/*.yml                  (7 vcr cassettes; Authorization
                                     header filtered, no real key leaks)

- setup-vcr.R now feeds the keyring-resolved key into vcr's
  filter_sensitive_data list, so cassettes recorded with a keyring-only
  setup also get scrubbed.

- DESCRIPTION: add `withr` to Suggests (used by test-kagi_connection.R
  for envvar isolation).

Default behaviour: if a cassette exists, tests replay it with a
placeholder key (no network, no credentials needed). If a cassette is
missing and a key is available via KAGI_API_KEY or keyring `API_kagi`,
vcr records live. Otherwise the cassette-backed test skips with a clear
message.

Verification: devtools::test 186/186 pass; devtools::check
0/0/1 (only the environmental "future file timestamps" note);
scripts/check-ai-docs.sh passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three documented modes, resolved in helper_kagi.R::cassette_record_mode():

  default                        -> "once" — replay if cassette exists,
                                    record only when missing. Routine
                                    local re-runs do NOT re-record.
  KAGIPRO_RECORD_CASSETTES=true  -> "all"  — force re-record every run
                                    (requires KAGI_API_KEY or keyring
                                    "API_kagi").
  KAGIPRO_RECORD_CASSETTES=false -> "none" — strict replay; cassette
                                    miss errors out (CI default).
  VCR_RECORD_MODE=<mode>         -> passthrough (highest precedence).

Helper consolidation in tests/testthat/helper_kagi.R:
- cassette_record_mode()         resolve effective vcr mode
- cassette_will_record(name)     true iff the run hits the network
- make_kagi_test_conn(name)      real key when recording, placeholder
                                 when replaying
- skip_if_cannot_serve_cassette(name)
                                 skip with a clear message when a
                                 recording is needed but no key is set
                                 (or vcr is missing)

Removed duplicated copies of these helpers from test-kagi_request.R and
test-kagi_fetch.R.

setup-vcr.R now derives `record` from cassette_record_mode() so the
config and the per-test logic always agree.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R-CMD-check.yaml:
- Push trigger updated from [main, master] to [main, dev] to match this
  repo's long-lived branches.
- pull_request trigger made explicit ([main, dev]) so PRs into either
  long-lived branch are tested; PRs into feature branches are not.
- Add workflow_dispatch for manual runs.
- Set KAGIPRO_RECORD_CASSETTES=false in env. CI has no Kagi API key, so
  cassettes must be replayed strictly; a missing or mismatched cassette
  now fails the test cleanly instead of trying to record live.

pkgdown.yaml:
- Push (deploy) trigger narrowed to [main]; PRs into [main, dev] still
  build the site (preview) but do not deploy.
- Add the same KAGIPRO_RECORD_CASSETTES=false guard so vignette
  rebuilds never touch the API.

Dependency caching was already in place via r-lib/actions/setup-r-
dependencies@v2; no additional actions/cache steps needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant