Skip to content

Adopt w3c_api 0.3.0: delegate rate-limiting, drop RDF stack#48

Merged
andrew2net merged 7 commits into
lutaml-integrationfrom
rt-adopt-w3c-api-0.3
Jun 3, 2026
Merged

Adopt w3c_api 0.3.0: delegate rate-limiting, drop RDF stack#48
andrew2net merged 7 commits into
lutaml-integrationfrom
rt-adopt-w3c-api-0.3

Conversation

@andrew2net
Copy link
Copy Markdown
Contributor

Summary

Move relaton-w3c onto the published lutaml-hal/w3c_api stack and shed the obsolete RDF/scraping dependencies.

  • deps: require w3c_api ~> 0.3.0 — adopt the released gem (was ~> 0.1.3). w3c_api 0.3.0 builds its HAL client with faraday-retry.
  • refactor: delegate rate-limit retries to w3c_apiRateLimitHandler no longer retries. Retries now live upstream: w3c_api handles HTTP 403 (the W3C rate-limit signal) + connection/timeout, and lutaml-hal handles 429/5xx. The handler only memoizes realized objects and, on a terminal error, skips the resource (caches nil) so one bad link doesn't abort the crawl; network errors are left uncached so a later reference can retry. Specs updated to the no-retry contract.
  • deps: drop unused RDF/Linked-Data/scraping gems — remove linkeddata, mechanize, rdf, rdf-normalize, shex, csv, sparql (referenced nowhere now that fetching goes through w3c_api). rubyzip moves to a test-only Gemfile dep (runtime index zip is handled by relaton-index). Drops ~57 transitive gems.
  • docs: update CLAUDE.md — reflect the new rate-limiting layering and dependency set.
  • test: refresh VCR cassettes against the current stack.

Verification

Full suite green: 71 examples, 0 failures (98.91% coverage), resolving w3c_api 0.3.0 and lutaml-hal 0.2.0 from RubyGems.

🤖 Generated with Claude Code

andrew2net and others added 7 commits June 3, 2026 13:38
w3c_api 0.3.0 builds the HAL client with a retry layer for the W3C
rate-limit (HTTP 403) and connection/timeout errors. relaton-w3c now
relies on the client for those retries, so require the released version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Retries now live upstream: w3c_api retries the W3C rate-limit (HTTP 403)
and connection/timeout errors, and lutaml-hal retries 429/5xx. So
RateLimitHandler no longer retries — it only memoizes realized objects
and, on a terminal error, skips the resource (caches nil) so one bad
link doesn't abort the crawl. Network errors are left uncached so a
later reference can try again. Specs updated to the no-retry contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The W3C data is now fetched through w3c_api (the REST client), so the old
RDF/SPARQL/scraping stack is dead weight. Remove linkeddata, mechanize,
rdf, rdf-normalize, shex, csv and sparql — none are referenced anywhere
in lib/ or spec/. This drops ~57 transitive gems from the install.

rubyzip is no longer a runtime dependency either: the runtime index zip
is unpacked by relaton-index, and the only direct use is a test helper
reading a fixture zip. Move it to the Gemfile as a test dependency.

Full suite green (71 examples, 0 failures).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the current architecture: RateLimitHandler no longer retries
(retries live in w3c_api for 403/connection/timeout and lutaml-hal for
429/5xx); add a Rate limiting & retries section; correct dependency
versions and drop the removed RDF/SPARQL/scraping stack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-record the W3C API cassettes against the current w3c_api 0.3.0 stack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
w3c_api now caches realized objects (thread-safely as of lutaml-hal
0.2.1, required via w3c_api ~> 0.3.2), so RateLimitHandler's own
{ href => object } map was redundant. Drop it: realize now just calls
obj.realize (served by w3c_api's cache) and the handler only remembers
hrefs that failed terminally (renamed `skipped`) to skip a broken
resource. Network errors aren't remembered, so a later reference retries.

Full suite green (71 examples, 0 failures) on lutaml-hal 0.2.1 + w3c_api 0.3.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The module no longer rate-limits or retries (that lives in w3c_api /
lutaml-hal) — it just makes `realize` fault-tolerant: skip a resource
that fails terminally so one bad link doesn't abort the crawl. Rename to
match (RateLimitHandler -> SafeRealize, rate_limit_handler.rb ->
safe_realize.rb), updating the includes, specs and docs.

Also initialize the shared `skipped` map eagerly instead of via a lazy
`||=`, so the parallel fetcher's first concurrent access can't race two
maps into existence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@andrew2net andrew2net merged commit 2e5409b into lutaml-integration Jun 3, 2026
11 checks passed
@andrew2net andrew2net deleted the rt-adopt-w3c-api-0.3 branch June 3, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant