Open
Conversation
- Add sefaria_llm_interface/commentary_scoring package with input/output dataclasses - Add commentary_scoring app with OpenAI-powered scoring functionality - Implement CommentaryScorer class for evaluating how well commentaries explain cited texts - Add Celery task integration for async commentary processing - Include text processing utilities for HTML stripping and content flattening - Update Celery autodiscovery to include commentary_scoring tasks
…classes instead of commit_scoring
- added to CommentaryScoringOutput debugging fields: request_status and request_status_message - updated CommentaryScorer, so it will return CommentaryScoringOutput instead of dictionary; this update also influences commentary_scoring.py NOTE: by now importing from sefaria-llm-interface are local and not package-style, since the version with necessary files was not yet released
…ling (0/1)
- Replace 0–4 ExplanationLevel with binary ExplainsFlag {0: NOT_EXPLAINED, 1: EXPLAINED}
- Clamp/validate scores to 0/1 in _validate_level
- Update function-calling JSON schema to minimum: 0, maximum: 1 per cited key
- Rewrite prompt to policy:
-- Return 1 if the commentary provides any substantive interpretation of any part of the citation (incl. methodological/kabbalistic reads)
-- Return 0 if citation is decorative/prooftext/only paraphrased
-- If A is cited only via B and C adds no new interpretation of A beyond B → 0
-- Partial coverage still counts as 1
- Explanations: ask model to begin each rationale with Explained spans: '<phrase1>'; ... then 1–2 sentence justification (no schema change)
- Logging: report explained X/Y (Z%) instead of average 0–4
--
BREAKING BEHAVIOR: numeric scale semantics changed from graded (0–4) to binary (0/1).
…o package importing
- added README with explanation of the code - removed unnecessary imports from commentary_scoring_input.py, commentary_scoring_output.py - in openai_commentary_scorer.py changed the sefaria-llm-inteface importing from local folder importing to package importing; added comments to some functions; removed unnecessary spaces in functions definitions and added spaces after commas. same for text_utils.py - added textwrap.dedent to prompt definition - in tasks.py changed the sefaria-llm-inteface importing from local folder importing to package importing;
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is the first working version of a commentary scoring system that uses an OpenAI model to automatically evaluate how well a Jewish commentary explains the base texts it cites.
🔍 Core Functionality
CommentaryScorer class:
Prompt Engineering:
⚙️ Additional Components
✅ Output Format
Standardized output includes:
Does not yet support chunking for long commentaries.
Empirical testing suggests chunking may not be strictly necessary.