-
Notifications
You must be signed in to change notification settings - Fork 60
LCORE-1094: Rag chunks are not parsed #893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughA new function Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/app/endpoints/query_v2.py (1)
422-429: Fix the parser function to return RAGChunk objects.The integration of RAG chunk parsing is correct, but the
parse_rag_chunks_from_responses_apifunction returns a list of dicts instead oflist[RAGChunk]objects, causing the TurnSummary validation to fail.See the detailed fix in the review comment for lines 455-482.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/app/endpoints/query_v2.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query_v2.py
🧬 Code graph analysis (1)
src/app/endpoints/query_v2.py (2)
src/utils/suid.py (2)
normalize_conversation_id(101-122)to_llama_stack_conversation_id(125-145)src/utils/types.py (1)
TurnSummary(135-220)
🪛 GitHub Actions: Integration tests
src/app/endpoints/query_v2.py
[error] 425-425: TurnSummary validation failed: rag_chunks[0].content must be a string. Received MagicMock during test_query_v2_endpoint_with_tool_calls. Traceback points to query_v2.py:425.
🪛 GitHub Actions: Unit tests
src/app/endpoints/query_v2.py
[error] 476-476: AttributeError: 'dict' object has no attribute 'text' while parsing rag chunks from responses API in parse_rag_chunks_from_responses_api. Expected items to have 'text' and 'score' attributes (triggered during tests/unit/app/endpoints/test_query_v2.py::test_retrieve_response_parses_referenced_documents).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: library mode / azure
🔇 Additional comments (1)
src/app/endpoints/query_v2.py (1)
46-46: LGTM! Import consolidation improves organization.The import reorganization properly consolidates related utility functions into a single import statement.
src/app/endpoints/query_v2.py
Outdated
| def parse_rag_chunks_from_responses_api(response_obj: Any) -> list: | ||
| """ | ||
| Extract rag_chunks from the llama-stack OpenAI response. | ||
| Args: | ||
| response_obj: The ResponseObject from OpenAI compatible response API in llama-stack. | ||
| Returns: | ||
| List of rag chunk dicts with content, source, score | ||
| """ | ||
| rag_chunks = [] | ||
|
|
||
| for output_item in response_obj.output: | ||
| if ( | ||
| hasattr(output_item, "type") | ||
| and output_item.type == "file_search_call" | ||
| and hasattr(output_item, "results") | ||
| ): | ||
|
|
||
| for result in output_item.results: | ||
| rag_chunk = { | ||
| "content": result.text, | ||
| "source": "file_search", | ||
| "score": result.score, | ||
| } | ||
| rag_chunks.append(rag_chunk) | ||
|
|
||
| return rag_chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix AttributeError and type mismatch issues in RAG chunk parsing.
The function has critical issues confirmed by pipeline failures:
- Line 476:
result.textcausesAttributeError: 'dict' object has no attribute 'text'when result is a dict. - Return type: Returns
list[dict]butTurnSummary.rag_chunksexpectslist[RAGChunk]objects, causing validation failures. - Type annotation: Incomplete return type
-> listviolates coding guidelines requiring complete type annotations.
The function must handle both dict and object access patterns like other functions in this file (e.g., _build_tool_call_summary and parse_referenced_documents_from_responses_api).
Apply this diff to fix all issues:
+from utils.types import RAGChunk
+
# ... (in the imports section)
-def parse_rag_chunks_from_responses_api(response_obj: Any) -> list:
+def parse_rag_chunks_from_responses_api(response_obj: Any) -> list[RAGChunk]:
"""
Extract rag_chunks from the llama-stack OpenAI response.
Args:
response_obj: The ResponseObject from OpenAI compatible response API in llama-stack.
Returns:
- List of rag chunk dicts with content, source, score
+ List of RAGChunk objects with content, source, and score
"""
- rag_chunks = []
+ rag_chunks: list[RAGChunk] = []
for output_item in response_obj.output:
if (
hasattr(output_item, "type")
and output_item.type == "file_search_call"
and hasattr(output_item, "results")
):
-
for result in output_item.results:
- rag_chunk = {
- "content": result.text,
- "source": "file_search",
- "score": result.score,
- }
- rag_chunks.append(rag_chunk)
+ # Handle both dict and object access patterns
+ if isinstance(result, dict):
+ content = result.get("text", "")
+ score = result.get("score")
+ else:
+ content = getattr(result, "text", "")
+ score = getattr(result, "score", None)
+
+ if content: # Only add if content exists
+ rag_chunks.append(
+ RAGChunk(
+ content=content,
+ source="file_search",
+ score=score,
+ )
+ )
return rag_chunks🧰 Tools
🪛 GitHub Actions: Unit tests
[error] 476-476: AttributeError: 'dict' object has no attribute 'text' while parsing rag chunks from responses API in parse_rag_chunks_from_responses_api. Expected items to have 'text' and 'score' attributes (triggered during tests/unit/app/endpoints/test_query_v2.py::test_retrieve_response_parses_referenced_documents).
0411286 to
5eae3bf
Compare
5eae3bf to
4cd0215
Compare
Description
Added parsing of rag_chunks, then added them to TurnSummary to store the rag chunks in the transcript.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.