release: v0.3.3 add SPECTER2 + TEI support#9
Merged
Conversation
Add a documented path for using SPECTER2's proximity-adapter model with the existing TEI backend, plus a thin R helper for the boilerplate config. - backend_specter2_tei(): convenience wrapper around backend_config() for a local TEI server serving the merged SPECTER2 proximity model. - inst/scripts/prepare_specter2_merged.py: one-time Python script that fuses the proximity adapter into specter2_base and saves a HF-format model dir TEI can serve directly. Default output under R_user_dir() cache, overridable via OVC_SPECTER2_PATH. - inst/scripts/start_tei_specter2.sh: launcher that resolves the same path convention and runs text-embeddings-router. - vignettes/specter2-setup.qmd: end-to-end setup + serve + smoke test. - NOTES.md (Rbuildignored): captures the option to publish the merged model to HuggingFace Hub so users can skip the merge step entirely. - CLAUDE.md: repo guide for future Claude Code sessions. Model preparation deliberately stays out of the R API surface: TEI cannot load adapter-transformers adapters, so a one-time merge is required, but forcing a Python dependency on this R-first package conflicts with the design principle that the package does not manage external services. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backend_specter2_tei()— thin wrapper aroundbackend_config()for a local TEI server serving the merged SPECTER2 proximity model.inst/scripts/:prepare_specter2_merged.py— Python merge of the proximity adapter intospecter2_baseinto a HF-format model dir TEI can serve directly. Default output underR_user_dir()cache, overridable viaOVC_SPECTER2_PATH.start_tei_specter2.sh— launcher that follows the same path convention.vignettes/specter2-setup.qmd— end-to-end walkthrough (Python env, merge, TEI start, R smoke test).NOTES.md(Rbuildignored) records the follow-up option to publish the merged model to HuggingFace Hub so users can skip the merge step.CLAUDE.mdis a Claude Code guide for the repo (init).Rationale
For 4M+ academic-paper embedding, SPECTER2 is domain-tuned (citation-linked papers), produces 768-dim vectors, and runs locally on TEI at zero per-token cost. The proximity adapter is required for the topic-similarity objective, but TEI does not load adapter-transformers adapters — hence the one-time merge step.
Model preparation deliberately stays out of the R API surface to avoid a Python dependency on this R-first package (see DEVELOPMENT_CONTINUITY.md decision log entry).
Test plan
🤖 Generated with Claude Code