Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
^vignettes/\.quarto$
^vignettes/.*_files$
^IMPLEMENTATION_NOTES\.md$
^DEVELOPMENT_CONTINUITY\.md$
^doc$
^Meta$
^inst/ASR$
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/pr-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,5 @@ jobs:
- name: Run R CMD check
uses: r-lib/actions/check-r-package@v2
with:
args: 'c("--no-manual", "--no-build-vignettes", "--no-multiarch")'
build_args: 'c("--no-manual", "--no-build-vignettes")'
args: 'c("--no-manual", "--no-multiarch", "--no-examples", "--ignore-vignettes")'
13 changes: 7 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
Package: openalexVectorComp
Type: Package
Title: Auto-tagging via TEI Embeddings and Qdrant (Prototype-Margin + Ridge Logistic)
Version: 0.2.0
Title: Embedding Vectorization and Distance-Based Scoring Workflows
Version: 0.3.0
Authors@R: c(
person(given = "Rainer", family = "Krug", role = c("aut", "cre"), email = "you@example.org"),
person(given = "ChatGPT", family = "Assistant", role = "ctb")
)
Author: Rainer Krug [aut, cre],
ChatGPT Assistant [ctb]
Maintainer: Rainer Krug <you@example.org>
Description: R-first orchestration for auto-tagging based on text embeddings served by
a TEI (Text Embeddings Inference) server and vector search in Qdrant.
Provides prototype-margin scoring, ridge logistic classification, simple ensembling,
calibration/threshold selection, and utilities to ingest/query Qdrant.
Description: R-first orchestration for text vectorization (embeddings),
embedding distance computation, and distance-based scoring workflows.
Supports backend-neutral embedding providers (HF, OpenAI, TEI),
prototype cosine-distance scoring, reference-area distance scoring,
and threshold calibration utilities.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Expand Down
14 changes: 7 additions & 7 deletions inst/DEVELOPMENT_CONTINUITY.md → DEVELOPMENT_CONTINUITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,20 +43,20 @@ Core flow:
5. Optional threshold calibration (`calibrate_threshold()`).

OpenAI batch flow:
1. Submit (`embed_corpus_submit_openai_batch()`).
2. Refresh status (`embed_corpus_status_openai_batch()`).
3. Collect completed jobs (`embed_corpus_collect_openai_batch()`).
1. Submit (`batch_submit_openai()`).
2. Refresh status (`batch_status_openai()`).
3. Collect completed jobs (`batch_collect_openai()`).
4. Demo convenience wrapper:
- `finalize_demo_openai_batch()` = status + collect + direct-vs-batch compare.
- `demo_finalize_openai_batch()` = status + collect + direct-vs-batch compare.

## 3. Current Demo Conventions (0.1.4)
## 3. Current Demo Conventions (0.3.0)

Default demo locations:
- `demos/openalex`
- `demos/openai`

OpenAI demo behavior:
- `run_demo_openai_quarto(..., render = TRUE)` may complete before batch does.
- `run_demo_openai(..., render = TRUE)` may complete before batch does.
- User is given explicit follow-up commands for status/finalize.
- Batch comparison outputs are written to:
`project/openai_batch_comparison/label=corpus_batch/`.
Expand All @@ -79,7 +79,7 @@ Template:
- Date: 2026-04-01
- Scope: OpenAI demo and batch comparison robustness
- Decision: Implement two-phase OpenAI batch demo flow with
`finalize_demo_openai_batch()`.
`demo_finalize_openai_batch()`.
- Why: Batch completion is asynchronous; render should not fail on pending jobs.
- Alternatives considered: long blocking poll in render; hard-fail on timeout.
- Impact: Clearer async semantics; stable demo render; persisted comparison
Expand Down
Loading
Loading