-
Notifications
You must be signed in to change notification settings - Fork 60
[LCORE-873] Support Solr Vector I/O Provider, Add Streaming Query Support, and Introduce Solr Keyword Filtering #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
|
Skipping CI for Draft Pull Request. |
WalkthroughAdds Solr-based RAG vector-store integration and persistence, injects top RAG chunks into query/streaming flows before LLM calls, relaxes ReferencedDocument.doc_url typing to accept plain strings, and exposes RAGChunk/ReferencedDocument through streaming payloads and public APIs. run.yaml now references external providers via env and configures solr-vector and sqlite persistence. Changes
Sequence DiagramsequenceDiagram
autonumber
actor Client
participant QueryHandler as Query Endpoint
participant VectorDB as Vector DB (Solr)
participant LLM as LLM
participant Transcript as Transcript/Storage
Client->>QueryHandler: POST query request
QueryHandler->>VectorDB: query top-N vectors (filters via solr)
VectorDB-->>QueryHandler: rag_chunks + chunk metadata
QueryHandler->>QueryHandler: build rag_context (top 5)
QueryHandler->>LLM: send user message + rag_context
LLM-->>QueryHandler: response + token usage
QueryHandler->>Transcript: store conversation + rag_chunks/referenced_documents
QueryHandler-->>Client: response payload (includes rag_chunks & referenced_documents)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Anxhela Coba <acoba@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/checks.py (1)
60-78: Update docstring to reflect new parent directory validation.The docstring doesn't describe the new behavior introduced in lines 82-91. When
must_exists=Falseandmust_be_writable=True, the function now validates parent directory existence and writability.Apply this diff to update the docstring:
def directory_check( path: FilePath, must_exists: bool, must_be_writable: bool, desc: str ) -> None: """ - Ensure the given path is an existing directory. + Ensure the given path is a valid directory. - If the path is not a directory, raises InvalidConfigurationError. + If must_exists is True, verifies the directory exists. If must_be_writable + is True and the directory doesn't exist, validates that its parent directory + exists and is writable. Parameters: path (FilePath): Filesystem path to validate. - must_exists (bool): Should the directory exists? - must_be_writable (bool): Should the check test if directory is writable? + must_exists (bool): If True, the directory must exist. + must_be_writable (bool): If True, the directory (or its parent if the + directory doesn't exist) must be writable. desc (str): Short description of the value being checked; used in error messages. Raises: InvalidConfigurationError: If `path` does not point to a directory or - is not writable when required. + is not writable when required, or if parent directory validation fails. """src/app/endpoints/query.py (1)
746-769: Avoid relying on vector_db_ids being undefined when no_tools=TrueIn
retrieve_response,vector_db_idsis only populated in theelsebranch (whenno_toolsis False), but it’s later read inside the RAG block:try: if vector_db_ids: ...When
no_toolsis True, this condition currently triggers anUnboundLocalErrorthat is caught by the blanketexcept, producing misleading “Failed to query vector database for chunks” warnings even though RAG is intentionally disabled.Define
vector_db_idsup front and use it consistently so the RAG path is skipped cleanly when no_tools is set:@@ - agent, conversation_id, session_id = await get_agent( + agent, conversation_id, session_id = await get_agent( client, model_id, system_prompt, available_input_shields, available_output_shields, query_request.conversation_id, query_request.no_tools or False, ) @@ - logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id) - # bypass tools and MCP servers if no_tools is True - if query_request.no_tools: + logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id) + + # Track available vector DBs (used for toolgroups and RAG); default to empty + vector_db_ids: list[str] = [] + + # bypass tools and MCP servers if no_tools is True + if query_request.no_tools: mcp_headers = {} agent.extra_headers = {} toolgroups = None else: @@ - # Include RAG toolgroups when vector DBs are available - vector_dbs = await client.vector_dbs.list() - vector_db_ids = [vdb.identifier for vdb in vector_dbs] + # Include RAG toolgroups when vector DBs are available + vector_dbs = await client.vector_dbs.list() + vector_db_ids = [vdb.identifier for vdb in vector_dbs]With this change, the later
if vector_db_ids:guard in the RAG block works as intended without spurious exceptions/logs whenno_tools=True.
🧹 Nitpick comments (11)
tests/unit/cache/test_sqlite_cache.py (1)
394-415: Stronger field-level checks look good; final assertions are slightly redundantThe new field-by-field assertions and the
str(...)comparison fordoc_urlcorrectly exercise the serialization/deserialization path and the wideneddoc_urltype. Functionally this test now does a much better job of pinning down the expected shape ofCacheEntry.You could trim a bit of redundancy by either:
- Dropping the later
assert retrieved_entries[0].referenced_documents is not None/ title check, since they’re implied by the new assertions, or- Reusing
retrieved_entryeverywhere for consistency.This is purely a readability/maintenance tweak; behavior is fine as-is.
src/models/requests.py (1)
3-168: Clarify solr field shape and avoid set literal in examplesThe new
solr: Optional[dict[str, Any]]field is a reasonable flexible hook for Solr filters and matches how it’s consumed invector_io.query. Two small improvements:
- The example currently uses a Python set:
examples=[ {"fq": {"product:*openshift*", "product_version:*4.16*"}}, ]which is unusual in a JSON-ish example. A list is clearer and serializes more predictably:
- examples=[ - {"fq": {"product:*openshift*", "product_version:*4.16*"}}, - ], + examples=[ + {"fq": ["product:*openshift*", "product_version:*4.16*"]}, + ],
- Consider briefly documenting in the docstring or description that
solris an opaque bag of Solr parameters (e.g.,fq,q.op, etc.) so clients know they’re passing them through verbatim.src/models/context.py (1)
8-51: New RAG fields on ResponseGeneratorContext look good; update docstring
vector_io_rag_chunks: list[RAGChunk] | Noneandvector_io_referenced_docs: list[ReferencedDocument] | Noneare well-typed and align with the response models.To keep documentation in sync, consider extending the class docstring’s Attributes section to include brief descriptions of these two new fields.
src/app/endpoints/query.py (1)
771-897: Refine RAG integration: config for OFFLINE source URL and reduce verbose payload loggingThe new vector-DB/RAG flow is a good step, but two aspects are worth tightening up:
Hard‑coded internal base URL and OFFLINE flag
OFFLINE = Trueand the inline"https://mimir.corp.redhat.com"base URL (both fordoc_urlandsource) effectively bake an environment-specific detail into the endpoint.- This makes it hard to run in other environments and leaks internal domain knowledge in code.
Consider:
-# TODO: move this setting to a higher level configuration
-OFFLINE = True
+OFFLINE = True # TODO: drive from configuration instead of a module constantand pushing the base URL into configuration (or deriving it from chunk metadata) so you don’t need `"https://mimir.corp.redhat.com"` literals here. 2. **Logging entire vector_io query_response at info level** - `logger.info(f"The query response total payload: {query_response}")` will dump all retrieved chunks and metadata on every query. - This can be very noisy and may expose large or sensitive content in logs. Recommend downgrading and summarizing: ```diff - logger.info(f"The query response total payload: {query_response}") + logger.debug("vector_io.query response: %s", query_response) + logger.info( + "vector_io.query returned %d chunks for vector_db_id=%s", + len(getattr(query_response, "chunks", []) or []), + vector_db_id, + )This keeps observability while avoiding heavy, content-rich logs at info level.
run.yaml (1)
117-130: Consider parameterizing the Solr URL for production deployments.The hardcoded
localhost:8983URL will not work in containerized or distributed environments. Consider using an environment variable reference similar to line 19.vector_io: - provider_id: solr-vector provider_type: remote::solr_vector_io config: - solr_url: "http://localhost:8983/solr" + solr_url: ${env.SOLR_URL:-http://localhost:8983/solr} collection_name: "portal-rag"src/app/endpoints/streaming_query.py (6)
81-84: HardcodedOFFLINEflag should be moved to configuration.The TODO acknowledges this needs to be addressed. This module-level constant affects source URL resolution behavior and should be configurable, especially for toggling between environments.
Would you like me to help implement this as part of the configuration system, or open an issue to track this task?
165-171: Update docstring to document the newvector_io_referenced_docsparameter.The function signature now includes
vector_io_referenced_docs, but the docstring's Parameters section doesn't document it.Parameters: metadata_map (dict): A mapping containing metadata about referenced documents. summary (TurnSummary): Summary of the conversation turn. token_usage (TokenCounter): Token usage information. media_type (str): The media type for the response format. + vector_io_referenced_docs (list[ReferencedDocument] | None): Referenced + documents from vector_io query. Returns:
487-515: Reconsider emitting "No Violation" token for successful shield validations.Yielding "No Violation" as a
LLM_TOKEN_EVENTfor every successful shield validation may create noise in the client stream. Clients typically only need to know about violations, not successful passes. Consider yielding nothing or using a different event type (e.g., debug/info level).else: - # Yield "No Violation" message for successful shield validations - yield stream_event( - data={ - "id": chunk_id, - "token": "No Violation", - }, - event_type=LLM_TOKEN_EVENT, - media_type=media_type, - ) + # Successful shield validations are silently passed + pass
1230-1231: Use more Pythonic empty list check.Comparing with
== []is less idiomatic than using truthiness.# Convert empty list to None for consistency with existing behavior - if toolgroups == []: + if not toolgroups: toolgroups = None
1296-1298: Missingpylint: disable=broad-exceptcomment for consistency.Line 1086 has the pylint disable comment for the broad except, but line 1296 does not, which may trigger linting warnings.
- except Exception as e: + except Exception as e: # pylint: disable=broad-except logger.warning("Failed to query vector database for chunks: %s", e) logger.debug("Vector DB query error details: %s", traceback.format_exc())
1304-1307: RAG context formatting could benefit from a helper function.The RAG context formatting logic could be extracted for reusability and testability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
run.yaml(3 hunks)src/app/endpoints/query.py(6 hunks)src/app/endpoints/streaming_query.py(14 hunks)src/models/context.py(2 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)src/utils/checks.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)tests/unit/cache/test_postgres_cache.py(1 hunks)tests/unit/cache/test_sqlite_cache.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/cache/test_postgres_cache.pytests/unit/app/endpoints/test_query.pytests/unit/cache/test_sqlite_cache.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/cache/test_postgres_cache.pytests/unit/app/endpoints/test_query.pytests/unit/cache/test_sqlite_cache.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/models/context.pysrc/utils/checks.pysrc/models/responses.pysrc/models/requests.pysrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/context.pysrc/models/responses.pysrc/models/requests.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/requests.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🧬 Code graph analysis (3)
src/models/context.py (1)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)
src/app/endpoints/query.py (1)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)
src/app/endpoints/streaming_query.py (3)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)src/utils/types.py (1)
TurnSummary(89-163)src/app/endpoints/query.py (1)
get_rag_toolgroups(950-977)
🪛 GitHub Actions: E2E Tests
src/utils/checks.py
[error] 87-87: InvalidConfigurationError: Check directory to store feedback '/tmp/data/feedback' cannot be created - parent directory does not exist
🔇 Additional comments (10)
src/utils/checks.py (1)
82-91: Unable to verify review comment claims due to repository access constraints.The review comment makes specific claims about a pipeline failure and broken E2E tests that require access to the full codebase, test files, and pipeline logs to verify. Without repository access, I cannot:
- Confirm the reported pipeline failure with
/tmp/data/feedback- Verify all usages of
directory_checkand assess impact- Review test code to understand breaking changes
- Determine if the parent directory validation is the root cause
Manual verification is required by a developer with repository access.
src/models/responses.py (1)
326-336: doc_url union type change looks correctAllowing
doc_url: AnyUrl | str | Nonealigns with the new tests and preserves backwards compatibility for existing AnyUrl usage; no issues seen here.tests/unit/app/endpoints/test_conversations_v2.py (1)
93-116: Updated expectation for doc_url string is appropriateAsserting
str(ref_docs[0]["doc_url"]) == "http://example.com"makes the test compatible withdoc_urlbeing either anAnyUrlor a plain string, matching the updated model typing.tests/unit/cache/test_postgres_cache.py (1)
516-537: More granular CacheEntry assertions improve robustnessThe revised test that checks each CacheEntry field and stringifies
doc_urlbefore comparison is clearer and more resilient to representation differences than relying on whole-object equality.tests/unit/app/endpoints/test_query.py (1)
1007-1102: Tests correctly capture new doc_url semantics (including malformed URLs)The updated expectations:
- Using
str(doc.doc_url)/str(docs[i].doc_url)for valid URLs, and- Asserting that malformed URLs yield a non-
NoneReferencedDocumentwithdoc.doc_url == "not a valid url"are consistent with
ReferencedDocument.doc_urlnow allowing raw strings as well asAnyUrl. This gives better resilience to imperfect metadata without dropping referenced documents.src/app/endpoints/query.py (1)
289-423: RAG chunk propagation into transcripts and QueryResponse is wired correctlyUnpacking
summary.rag_chunks, converting them once torag_chunks_dictforstore_transcript, and exposingrag_chunks=summary.rag_chunks or []viaQueryResponsekeeps the same source of truth for RAG data across storage and API response. The additional referenced_documents handling remains consistent with that contract.run.yaml (2)
19-19: LGTM!Using environment variable reference for
external_providers_dirimproves configurability across different deployment environments.
144-160: LGTM!The vector_dbs and embedding model configurations are consistent with the new Solr provider setup. The 384 embedding dimension correctly matches the granite-embedding-30m-english model specification.
src/app/endpoints/streaming_query.py (2)
78-78: LGTM!The logger correctly uses the shared logger name "app.endpoints.handlers" as per the project convention for unified logging across endpoint handlers. Based on learnings.
1218-1227: PotentialTypeErrorwhen concatenating toolgroups.If
get_rag_toolgroupsreturnsNone(whenvector_db_idsis empty), andmcp_toolgroupsis a list, line 1225 could attemptNone + listcausing aTypeError. However, the condition on line 1224 guards this case.The logic is correct but could be clearer. The current flow works because line 1224 checks
if vector_db_idsbefore concatenation.
| vector_db_id=vector_db_id, query=query_request.query, params=params | ||
| ) | ||
|
|
||
| logger.info("The query response total payload: %s", query_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging entire query response payloads.
Logging the full query_response payload at INFO level may expose sensitive document content in logs. Consider logging only metadata (chunk count, scores) or moving to DEBUG level.
- logger.info("The query response total payload: %s", query_response)
+ logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)Also applies to: 1264-1264
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1022 and 1264, the code
currently logs the entire query_response payload which can expose sensitive
document content; instead, do not log full payloads at INFO level—either change
the log calls to logger.debug or replace the message with metadata-only
information (e.g., number of chunks, top scores, document IDs, and any
non-sensitive timing/size metrics). Extract and log only those safe fields
(chunk count, aggregated scores, document ids/labels) and remove or redact the
full query_response from INFO-level logs; ensure any debug-level log that
contains more detail is gated behind DEBUG so sensitive data is not emitted in
production.
| ReferencedDocument( | ||
| doc_title=metadata.get("title", None), | ||
| doc_url="https://mimir.corp.redhat.com" | ||
| + reference_doc, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Hardcoded URL should be extracted to configuration.
The URL https://mimir.corp.redhat.com is hardcoded in multiple places. This should be moved to configuration for flexibility across environments.
Consider adding a configuration value like mimir_base_url and referencing it here:
# In configuration
mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com"
# Then use:
source = urljoin(mimir_base_url, parent_id)Also applies to: 1067-1069, 1275-1277
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1046-1051 (also apply same
change at 1067-1069 and 1275-1277): the base URL "https://mimir.corp.redhat.com"
is hardcoded; extract it into configuration (e.g., add mimir_base_url in your
settings with a sensible default), read that config value where this module
runs, and replace the string concatenation with a safe join (use urljoin or
equivalent) to build the full URL from mimir_base_url and the relative path
(reference_doc / parent_id). Ensure the code falls back to the default when
config is missing and update all other occurrences at the noted lines to use the
same config value.
src/utils/checks.py
Outdated
| if not os.access(parent_path, os.W_OK): | ||
| raise InvalidConfigurationError(f"{desc} '{path}' is not writable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misleading error message.
The error message references path but the check is for parent_path writability. This makes debugging more difficult.
Apply this diff to fix the error message:
if not os.access(parent_path, os.W_OK):
- raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+ raise InvalidConfigurationError(
+ f"{desc} '{path}' cannot be created - parent directory is not writable"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not os.access(parent_path, os.W_OK): | |
| raise InvalidConfigurationError(f"{desc} '{path}' is not writable") | |
| if not os.access(parent_path, os.W_OK): | |
| raise InvalidConfigurationError( | |
| f"{desc} '{path}' cannot be created - parent directory is not writable" | |
| ) |
🤖 Prompt for AI Agents
In src/utils/checks.py around lines 90 to 91, the raised
InvalidConfigurationError currently mentions the child `path` while the code is
checking writability of `parent_path`; update the error message to reference the
parent directory (parent_path) — ideally include both parent_path and the
original path in the message (e.g. "parent directory '{parent_path}' for
'{path}' is not writable") so the error accurately reflects what failed.
asimurka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebase, you will need to change vector_db to vector_stores, messages from UserMessage format to pure string and copy everything to query v2 implementations that use responses api. Overall LGTM, great work!
eaf66dd to
60c3c4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/database.py (1)
116-136: Add a default case to handle unsupported database types.The
matchstatement lacks a default case. Ifdb_config.db_typeis neither "sqlite" nor "postgres",engineremainsNoneand Line 136 creates asessionmakerwithbind=None. This could lead to confusing runtime errors later.case "postgres": logger.info("Initialize PostgreSQL database") postgres_config = db_config.config logger.debug("Configuration: %s", postgres_config) if not isinstance(postgres_config, PostgreSQLDatabaseConfiguration): raise TypeError( f"Expected PostgreSQLDatabaseConfiguration, got {type(postgres_config)}" ) engine = _create_postgres_engine(postgres_config, **create_engine_kwargs) + case _: + raise ValueError(f"Unsupported database type: {db_config.db_type}") session_local = sessionmaker(autocommit=False, autoflush=False, bind=engine)
♻️ Duplicate comments (4)
src/app/endpoints/streaming_query.py (4)
1046-1051: Hardcoded URL should be extracted to configuration - still present.The URL
https://mimir.corp.redhat.comis hardcoded in multiple places. This was flagged in a previous review.Also applies to: 1067-1069, 1275-1277
905-908: RAG chunks queried twice per streaming request - still present.The vector DB is queried at line 906-908 via
query_vector_io_for_chunks, and again withinretrieve_responseat lines 1243-1298. This doubles latency and load. Consider passing the pre-fetched chunks toretrieve_responseor consolidating the query logic.Also applies to: 1243-1298
1022-1022: Avoid logging entire query response payloads - still present.This was flagged in a previous review. Logging the full
query_responsepayload at INFO level may expose sensitive document content.- logger.info("The query response total payload: %s", query_response) + logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)
1025-1033: Redundant rag_chunks creation - still present.The
rag_chunkslist created on lines 1025-1033 is never used; it's overwritten byfinal_rag_chunkson lines 1059-1082. Remove the unused code.if query_response.chunks: - rag_chunks = [ - RAGChunk( - content=str(chunk.content), # Convert to string if needed - source=getattr(chunk, "doc_id", None) - or getattr(chunk, "source", None), - score=getattr(chunk, "score", None), - ) - for chunk in query_response.chunks[:5] # Limit to top 5 chunks - ] - logger.info("Retrieved %d chunks from vector DB", len(rag_chunks)) - # Extract doc_ids from chunks for referenced_documentsAlso applies to: 1059-1084
🧹 Nitpick comments (41)
tests/e2e/features/steps/common_http.py (1)
123-132: Richer status-code assertion is good; consider a small style tweakIncluding the parsed JSON (or text fallback) in the failure message is very helpful for debugging and keeps behavior correct. The
if ...: assert False, ...pattern is slightly unconventional, though; consider raisingAssertionErrordirectly for clarity and to avoid the dummyFalse:- if context.response.status_code != status: - # Include response body in error message for debugging - try: - error_body = context.response.json() - except Exception: - error_body = context.response.text - assert False, ( - f"Status code is {context.response.status_code}, expected {status}. " - f"Response: {error_body}" - ) + if context.response.status_code != status: + # Include response body in error message for debugging + try: + error_body = context.response.json() + except Exception: + error_body = context.response.text + raise AssertionError( + f"Status code is {context.response.status_code}, expected {status}. " + f"Response: {error_body}" + )Optionally, you could also narrow the exception to
ValueError/json.JSONDecodeErrorinstead of catchingException, but that’s not critical here.tests/e2e/features/conversation_cache_v2.feature (1)
13-29: TODO: Track and address the skipped test for empty vector DB bug.This scenario is currently skipped due to a known bug where an empty vector database causes a 500 error. The test should be re-enabled once the underlying issue is resolved.
Do you want me to create an issue to track this bug and the test re-enablement, or is this already tracked under one of the referenced stories (LCORE-1036, LCORE-935, LCORE-882)?
scripts/gen_doc.py (2)
10-10: Adding tests directories toDIRECTORIESmay overwrite existing READMEsIncluding
"tests/unit","tests/integration", and"tests/e2e"means this script will now (re)generateREADME.mdin every one of those directories and their subdirectories, overwriting any manually maintained READMEs there. If that’s intentional, this is fine; if some test folders have curated documentation, you may want to either narrow the list or skip directories that already contain a non‑generated README.
53-63: Harden main loop against missing directories and reduce brittleness of path handlingAs written,
generate_documentation_on_path(f"{directory}/")will raise if any directory inDIRECTORIESdoesn’t exist (e.g.,tests/e2ein an older or partial checkout). You can make this more robust by checking for existence once and reusing aPathobject:def main(): """Entry point to this script, regenerates documentation in all directories.""" - for directory in DIRECTORIES: - generate_documentation_on_path(f"{directory}/") - for path in Path(directory).rglob("*"): - if path.is_dir(): - if ( - path.name == "lightspeed_stack.egg-info" - or path.name == "__pycache__" - or ".ruff_cache" in str(path) - ): - continue - generate_documentation_on_path(path) + for directory in DIRECTORIES: + base = Path(directory) + if not base.exists(): + print(f"[gendoc] Skipping missing directory: {base}") + continue + + generate_documentation_on_path(base) + for path in base.rglob("*"): + if not path.is_dir(): + continue + if ( + path.name == "lightspeed_stack.egg-info" + or path.name == "__pycache__" + or ".ruff_cache" in str(path) + ): + continue + generate_documentation_on_path(path)This keeps behavior the same when directories are present, but avoids failures (and makes the assumption about directory existence explicit) when they’re not.
docs/demos/lcore/lcore.md (3)
34-37: Improve language clarity.The phrase "It's a real framework independent on programming language" is awkward. Consider revising to: "It's a language-independent framework with providers for RAG, quota control, guardrails, metrics, etc."
23-104: Consider the scope and audience for the extensive Llama Stack content.The PR introduces significant Llama Stack framework documentation (lines 23-104) comprising roughly 40% of the file's content. While this provides valuable context, verify that this level of detail is appropriate for a Lightspeed Core demo presentation. The new Solr Vector I/O Provider, streaming query support, and keyword filtering features are indirectly covered under RAG and LLM Inference sections, but not explicitly highlighted.
If this is intended as an architectural overview that positions Lightspeed Core within the broader Llama Stack ecosystem, the current approach works well. However, if the primary audience expects practical Solr-specific details or streaming implementation guidance, consider adding a dedicated technical section.
75-104: Consider emoji accessibility for documentation portability.The section headers use emoji (🤖, 🛡️, 🔧, 📚, 🎯) for visual appeal. While this works well for browser/HTML presentation, verify that emoji render correctly across all planned distribution formats (PDF, markdown viewers, screen readers, etc.). If the documentation will be widely shared or translated, consider adding text fallback descriptions or using a more neutral header style.
src/app/endpoints/tools.py (4)
40-59: Docstring now slightly under-describes builtin toolgroupsThe implementation clearly returns tools from both builtin toolgroups and MCP server toolgroups (distinguished via
server_source), but the docstring still frames this primarily as “from all configured MCP servers.” Consider updating the description (and possibly the Raises section) to reflect that builtin toolgroups are also included and that only upstream connectivity issues currently surface as HTTP errors while missing toolgroups are skipped.
68-76: Good upstream connection handling; consider redacting low-level error detailsMapping connection failures to a 503 using
ServiceUnavailableResponseand raisingHTTPException(**response.model_dump())gives a clean, centralized error path and aligns with the “handle Llama Stack connection errors” guidance. The only concern is thatcause=str(e)may expose low‑level connection details to clients (hosts, ports, internal messages). If you want a stricter separation between internal diagnostics and public API, consider logging the full exception but returning a more genericcause(or omitting it) in the response body.If you want to double‑check, please confirm in the Llama Stack client docs that the connection error you’re catching here is the one you intend to surface and that its message doesn’t contain sensitive data.
85-97: BadRequestError is silently converted into partial successFor each toolgroup, a
BadRequestErroris logged and then skipped, resulting in a 200 response with that toolgroup’s tools omitted. This may be exactly what you want (best‑effort aggregation), but it means misconfigured/removed toolgroups never surface to clients as errors, only as missing tools and log noise.If misconfiguration is something operators or clients should notice, consider one of:
- Downgrading log level to
warningif this is expected and transient, or keepingerrorbut adding metrics; and/or- Surfacing partial‑failure information in
ToolsResponse(e.g., a “skipped_toolgroups” field) instead of only logs.This keeps the endpoint resilient while making failures more visible and debuggable.
You may want to confirm in the Llama Stack client docs whether
BadRequestErrorhere strictly means “toolgroup not found” vs broader 4xx classes, to decide how user‑visible this should be.
78-83: Aggregation and server_source derivation are correct; possible micro-refactorsThe aggregation over
toolgroups_response, derivation ofserver_source(builtin vs MCP URL), and the final summary logging all look logically sound. A couple of optional refinements if this endpoint becomes hot ormcp_serversgrows:
- Instead of recomputing
next(s for s in configuration.mcp_servers ...)for each tool, prebuild a{name: server}dict once so lookup is O(1).- The builtin vs MCP counts in the info log require two full passes over
consolidated_tools; if needed, you could maintain these counters while appending tools.These are minor and only worth doing if this endpoint turns into a high‑traffic or large‑dataset path.
Also applies to: 99-127, 135-140
scripts/llama_stack_tutorial.sh (1)
1-24: Consider adding error handling.The script lacks error handling directives like
set -ewhich would cause it to exit on errors. While this might be intentional for a tutorial script (to continue even if some commands fail), consider addingset -eor at least documenting this behavior.Add error handling at the top of the script:
#!/bin/bash +set -e # Exit on error # Llama Stack Tutorial - Interactive GuideOr if you want the tutorial to continue even on errors, add a comment explaining this:
#!/bin/bash +# Note: Error handling is intentionally omitted to allow the tutorial to continue +# even if some endpoints are unavailable # Llama Stack Tutorial - Interactive Guidesrc/authentication/k8s.py (1)
261-284: Variable shadowing:responseis reused with different types.The variable
responseis first assigned the SAR result (line 272), then conditionally reassigned to aForbiddenResponse(line 283). This shadowing makes the code harder to follow and could lead to bugs if the logic changes.try: authorization_api = K8sClientSingleton.get_authz_api() sar = kubernetes.client.V1SubjectAccessReview( spec=kubernetes.client.V1SubjectAccessReviewSpec( user=user_info.user.username, groups=user_info.user.groups, non_resource_attributes=kubernetes.client.V1NonResourceAttributes( path=self.virtual_path, verb="get" ), ) ) - response = authorization_api.create_subject_access_review(sar) + sar_response = authorization_api.create_subject_access_review(sar) except Exception as e: logger.error("API exception during SubjectAccessReview: %s", e) response = ServiceUnavailableResponse( backend_name="Kubernetes API", cause="Unable to perform authorization check", ) raise HTTPException(**response.model_dump()) from e - if not response.status.allowed: + if not sar_response.status.allowed: response = ForbiddenResponse.endpoint(user_id=user_info.user.uid) raise HTTPException(**response.model_dump())src/authentication/jwk_token.py (1)
37-46: TODO: Missing error handling for connection issues.The function has a TODO comment (line 42) about handling connection errors and timeouts. While
aiohttp.ClientErroris caught in the caller (lines 134-139), theresp.raise_for_status()on line 44 will raiseaiohttp.ClientResponseErrorwhich is a subclass ofClientErrorand is handled.However, consider adding an explicit timeout to prevent hanging on slow/unresponsive servers:
async def get_jwk_set(url: str) -> KeySet: """Fetch the JWK set from the cache, or fetch it from the URL if not cached.""" async with _jwk_cache_lock: if url not in _jwk_cache: - async with aiohttp.ClientSession() as session: - # TODO(omertuc): handle connection errors, timeouts, etc. - async with session.get(url) as resp: + timeout = aiohttp.ClientTimeout(total=10) + async with aiohttp.ClientSession(timeout=timeout) as session: + async with session.get(url) as resp: resp.raise_for_status() _jwk_cache[url] = JsonWebKey.import_key_set(await resp.json()) return _jwk_cache[url]src/configuration.py (1)
22-22: Type rename from ConversationCacheConfiguration to ConversationHistoryConfiguration.The import and return type annotation have been updated to use
ConversationHistoryConfiguration, aligning with the broader rename across the codebase.Consider renaming the method to match the new type.
The property method name
conversation_cache_configurationstill contains "cache" while the return type is nowConversationHistoryConfiguration. For consistency, consider renaming toconversation_history_configurationin a future refactor.Also applies to: 141-141
docs/streaming_query_endpoint.puml (2)
27-28: Consider clarifying shield event behavior.Based on the AI summary, shield events are only emitted on violations (silent on success). The diagram shows unconditional "emit shield validation event" which may be misleading. Consider adding a note or updating the alt text to clarify this conditional behavior.
else Chunk Type: shield - EventHandler->>SSE: emit shield validation event + EventHandler->>SSE: emit shield violation event (if violation detected)
13-14: Consider documenting RAG retrieval in the flow.The AI summary indicates the streaming flow incorporates RAG data via
query_vector_io_for_chunks, producingrag_chunksandvector_io_referenced_docsthat are included in end-stream payloads. Consider adding this to the diagram for completeness.src/constants.py (1)
102-104: New authentication module constants are properly defined.The constants follow the existing naming pattern and use descriptive string values. Consider adding brief comments describing each module's purpose, consistent with other constants in this file (e.g., lines 93-98 have comments).
As per coding guidelines, constants should have descriptive comments:
AUTH_MOD_NOOP_WITH_TOKEN = "noop-with-token" +# API key based authentication module AUTH_MOD_APIKEY_TOKEN = "api-key-token" AUTH_MOD_JWK_TOKEN = "jwk-token" +# Red Hat Identity header authentication module AUTH_MOD_RH_IDENTITY = "rh-identity"src/cache/cache_factory.py (1)
4-4: Config type rename wiring is correct; optional docstring polishSwitching
conversation_cacheto acceptConversationHistoryConfigurationand extending the invalid type error message to includeCACHE_TYPE_NOOPare consistent with the new configuration model; behavior of the factory remains the same. You may optionally update the function docstring to mentionNoopCacheas a possible return type for completeness, but it's not functionally required.Also applies to: 20-20, 45-49
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)
1-32: Verifyauth_enabledfor noop-with-token e2e configThis config is named for the noop-with-token auth flow and sets
authentication.module: "noop-with-token", butservice.auth_enabledisfalse. If the intent is to exercise token-based auth in the e2e suite (Authorized-tag scenarios), you may wantauth_enabled: true; otherwise this config might not actually enforce the auth module. Please double-check against how other noop-with-token configs are used in the tests.
Based on learnings, noop-with-token tests typically rely on a dedicated config when the Authorized tag is present.src/authorization/middleware.py (1)
35-45: Confirm APIKEY_TOKEN → Noop resolvers mapping and structured error usageThe switch to
InternalServerErrorResponse/ForbiddenResponsefor raising HTTP exceptions viaresponse.model_dump()looks consistent with the new response model pattern and keeps error shapes centralized.For
AUTH_MOD_APIKEY_TOKEN,get_authorization_resolversnow returnsNoopRolesResolverandNoopAccessResolver, which effectively disables role-based checks—any request that passes API key authentication will satisfycheck_access(action, user_roles). If API keys are meant to grant blanket access, this is fine; if you intend to keep per-Actionauthorization semantics even for API keys, you may want a dedicated resolver instead of the noop pair. Please confirm the intended behavior.Also applies to: 68-69, 85-86, 94-95
src/app/endpoints/rags.py (2)
53-99: Consider log level for configuration details.Line 82 logs the entire
llama_stack_configurationobject atinfolevel. This could potentially expose sensitive configuration details in production logs. Consider usingdebuglevel instead, or logging only non-sensitive attributes.- logger.info("Llama stack config: %s", llama_stack_configuration) + logger.debug("Llama stack config: %s", llama_stack_configuration)
101-154: Same log level concern applies here.Line 130 has the same pattern—consider using
debuglevel to avoid leaking configuration details in production logs.- logger.info("Llama stack config: %s", llama_stack_configuration) + logger.debug("Llama stack config: %s", llama_stack_configuration)tests/e2e/features/environment.py (1)
84-102: Consider usingeliffor mutually exclusive config scenarios.Lines 92-102 use separate
ifstatements forInvalidFeedbackStorageConfigandNoCacheConfig. If a scenario were tagged with both,NoCacheConfigwould overwrite the config and trigger an immediate restart. If these are mutually exclusive, usingelifwould make this explicit and prevent unexpected behavior.if "InvalidFeedbackStorageConfig" in scenario.effective_tags: context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" - if "NoCacheConfig" in scenario.effective_tags: + elif "NoCacheConfig" in scenario.effective_tags: context.scenario_config = ( f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" )src/app/endpoints/root.py (2)
779-784: Consider adding 200 response for OpenAPI consistency.Unlike other endpoints in this PR (e.g.,
shields_responses,models_responses),root_responsesdoesn't include a 200 response entry. While theresponse_class=HTMLResponsehandles the success case, adding an explicit 200 entry would maintain consistency across endpoints.
787-808: Parameter order differs from other endpoints.The signature has
authbeforerequest, while other endpoints in this PR (e.g.,rags.py,shields.py,models.py) userequestbeforeauth. Consider reordering for consistency.async def root_endpoint_handler( - auth: Annotated[AuthTuple, Depends(get_auth_dependency())], request: Request, + auth: Annotated[AuthTuple, Depends(get_auth_dependency())], ) -> HTMLResponse:src/app/endpoints/metrics.py (1)
64-64: Unnecessarystr()cast onCONTENT_TYPE_LATEST.
CONTENT_TYPE_LATESTfromprometheus_clientis already a string constant. The explicitstr()conversion is redundant.- return PlainTextResponse(generate_latest(), media_type=str(CONTENT_TYPE_LATEST)) + return PlainTextResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST).github/workflows/e2e_tests.yaml (1)
188-191: Consider using health check polling instead of fixed sleep.A fixed 20-second sleep may be insufficient under load or excessive when services start quickly. Consider polling the health endpoint with a timeout.
- name: Wait for services run: | echo "Waiting for services to be healthy..." - sleep 20 + for i in {1..30}; do + if curl -sf http://localhost:8080/health > /dev/null 2>&1; then + echo "Services healthy after $i seconds" + break + fi + sleep 1 + donedocs/config.json (1)
156-162: Minor: Consider minimum value of 1 for embedding_dimension.An embedding dimension of 0 is technically allowed by the schema but would be semantically invalid.
"embedding_dimension": { "default": 768, "description": "Dimensionality of embedding vectors.", - "minimum": 0, + "minimum": 1, "title": "Embedding dimension", "type": "integer" },src/utils/endpoints.py (2)
637-659: Consider early return for disabled transcripts.The nested if-else structure can be simplified with an early return pattern for better readability.
# Store transcript if enabled - if not is_transcripts_enabled_func(): - logger.debug("Transcript collection is disabled in the configuration") - else: - # Prepare attachments - attachments = query_request.attachments or [] - - # Determine rag_chunks: use provided value or empty list - transcript_rag_chunks = rag_chunks if rag_chunks is not None else [] - - store_transcript_func( - user_id=user_id, - conversation_id=conversation_id, - model_id=model_id, - provider_id=provider_id, - query_is_valid=True, - query=query_request.query, - query_request=query_request, - summary=summary, - rag_chunks=transcript_rag_chunks, - truncated=False, - attachments=attachments, - ) + if is_transcripts_enabled_func(): + attachments = query_request.attachments or [] + transcript_rag_chunks = rag_chunks if rag_chunks is not None else [] + store_transcript_func( + user_id=user_id, + conversation_id=conversation_id, + model_id=model_id, + provider_id=provider_id, + query_is_valid=True, + query=query_request.query, + query_request=query_request, + summary=summary, + rag_chunks=transcript_rag_chunks, + truncated=False, + attachments=attachments, + ) + else: + logger.debug("Transcript collection is disabled in the configuration")
591-610: Consider refactoringcleanup_after_streamingto reduce parameter count.This function has 19 parameters (triggering pylint R0913/R0917). Consider grouping related parameters into a dataclass or TypedDict to improve maintainability and reduce the chance of argument ordering errors.
Example approach:
@dataclass class StreamingCleanupContext: user_id: str conversation_id: str model_id: str provider_id: str llama_stack_model_id: str query_request: QueryRequest summary: TurnSummary metadata_map: dict[str, Any] started_at: str skip_userid_check: bool rag_chunks: list[dict[str, Any]] | None = None async def cleanup_after_streaming( context: StreamingCleanupContext, client: AsyncLlamaStackClient, config: AppConfig, callbacks: StreamingCleanupCallbacks, # another dataclass for the 4 funcs ) -> None:src/app/endpoints/conversations_v2.py (1)
229-241: Redundant cache type check at line 232.The
check_conversation_existencefunction checks ifconfiguration.conversation_cache_configuration.type is Nonebut this is already validated in the calling handlers before this function is invoked. Consider removing the redundant check or documenting why it's needed for safety.def check_conversation_existence(user_id: str, conversation_id: str) -> None: """Check if conversation exists.""" - # checked already, but we need to make pyright happy - if configuration.conversation_cache_configuration.type is None: - return + # Callers must validate cache availability before calling this function + assert configuration.conversation_cache_configuration.type is not None conversations = configuration.conversation_cache.list(user_id, False)Alternatively, if keeping for type narrowing, the comment already explains the intent which is acceptable.
src/app/endpoints/query.py (4)
79-82: Configuration TODO noted - consider addressing before merge.The
OFFLINEflag controls chunk source resolution but is currently hardcoded. The TODO comment acknowledges this should be moved to configuration. Consider prioritizing this to avoid environment-specific issues.
783-790: Use lazy % formatting for logger calls instead of f-strings.f-strings are evaluated even when the log level is disabled, causing unnecessary overhead. Use
%splaceholders with logger for deferred formatting.- logger.info(f"Initial params: {params}") - logger.info(f"query_request.solr: {query_request.solr}") + logger.info("Initial params: %s", params) + logger.info("query_request.solr: %s", query_request.solr) if query_request.solr: # Pass the entire solr dict under the 'solr' key params["solr"] = query_request.solr - logger.info(f"Final params with solr filters: {params}") + logger.info("Final params with solr filters: %s", params) else: logger.info("No solr filters provided") - logger.info(f"Final params being sent to vector_io.query: {params}") + logger.info("Final params being sent to vector_io.query: %s", params)
816-821: Hardcoded URL should be extracted to configuration.The URL
https://mimir.corp.redhat.comis hardcoded. This should be configurable for different environments.Consider adding a configuration value like
mimir_base_url:# In configuration mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com" # Then use: doc_url=urljoin(mimir_base_url, reference_doc),
827-830: Consider narrowing the exception type or logging the exception class.Catching bare
Exceptioncan mask unexpected errors. Consider catching more specific exceptions or at minimum logging the exception type for debugging.- except Exception as e: - logger.warning(f"Failed to query vector database for chunks: {e}") - logger.debug(f"Vector DB query error details: {traceback.format_exc()}") + except Exception as e: + logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__) + logger.debug("Vector DB query error details: %s", traceback.format_exc())src/app/endpoints/streaming_query.py (2)
505-515: Inconsistent behavior: "No Violation" token emitted for successful shield validations.For successful shield validations (no violations), the code now emits a "No Violation" token to the stream. This may clutter the response with unnecessary tokens. Consider whether this should be silently skipped instead.
else: - # Yield "No Violation" message for successful shield validations - yield stream_event( - data={ - "id": chunk_id, - "token": "No Violation", - }, - event_type=LLM_TOKEN_EVENT, - media_type=media_type, - ) + # Successful shield validations are silently ignored + pass
1296-1298: Bare Exception catch should log exception type.Consider logging the exception type for better debugging.
except Exception as e: - logger.warning("Failed to query vector database for chunks: %s", e) + logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__) logger.debug("Vector DB query error details: %s", traceback.format_exc())src/app/endpoints/streaming_query_v2.py (2)
56-56: Use__name__for module logger pattern.Per coding guidelines, use
logger = logging.getLogger(__name__)instead of a hardcoded string. This ensures consistent logger naming across the codebase.-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
101-103: Consider specifying a more precise return type.The return type
Anyis vague. Consider using aCallabletype hint for better documentation and IDE support.-def create_responses_response_generator( # pylint: disable=too-many-locals,too-many-statements - context: ResponseGeneratorContext, -) -> Any: +from collections.abc import Callable + +def create_responses_response_generator( # pylint: disable=too-many-locals,too-many-statements + context: ResponseGeneratorContext, +) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:src/models/config.py (1)
841-850: Redundant validation check.The check
if self.api_key_config.api_key.get_secret_value() is Noneon line 847 is redundant. TheAPIKeyTokenConfigurationclass already hasmin_length=1on theapi_keyfield, which would fail validation before this check is reached. Additionally,get_secret_value()returns a string, notNone, for a validSecretStr.Consider simplifying or removing this check:
if self.module == constants.AUTH_MOD_APIKEY_TOKEN: if self.api_key_config is None: raise ValueError( "API Key configuration section must be specified " "when using API Key token authentication" ) - if self.api_key_config.api_key.get_secret_value() is None: - raise ValueError( - "api_key parameter must be specified when using API_KEY token authentication" - )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (12)
docs/architecture.pngis excluded by!**/*.pngdocs/architecture.svgis excluded by!**/*.svgdocs/config.pngis excluded by!**/*.pngdocs/config.svgis excluded by!**/*.svgdocs/convesation_history.svgis excluded by!**/*.svgdocs/demos/lcore/images/journey.pngis excluded by!**/*.pngdocs/demos/lcore/images/llama_stack_as_library.svgis excluded by!**/*.svgdocs/demos/lcore/images/llama_stack_as_service.svgis excluded by!**/*.svgdocs/query_endpoint.svgis excluded by!**/*.svgdocs/rest_api.svgis excluded by!**/*.svgdocs/streaming_query_endpoint.svgis excluded by!**/*.svguv.lockis excluded by!**/*.lock
📒 Files selected for processing (107)
.coderabbit.yaml(1 hunks).github/PULL_REQUEST_TEMPLATE.md(1 hunks).github/workflows/e2e_tests.yaml(7 hunks).tekton/lightspeed-stack-pull-request.yaml(3 hunks).tekton/lightspeed-stack-push.yaml(3 hunks)CONTRIBUTING.md(6 hunks)Containerfile(2 hunks)Makefile(3 hunks)README.md(12 hunks)docker-compose-library.yaml(1 hunks)docker-compose.yaml(1 hunks)docs/auth.md(1 hunks)docs/config.html(1 hunks)docs/config.json(1 hunks)docs/config.md(1 hunks)docs/config.puml(9 hunks)docs/demos/lcore/lcore.md(5 hunks)docs/query_endpoint.puml(1 hunks)docs/rag_guide.md(2 hunks)docs/streaming_query_endpoint.puml(1 hunks)examples/lightspeed-stack-api-key-auth.yaml(1 hunks)examples/lightspeed-stack-rh-identity.yaml(1 hunks)pyproject.toml(2 hunks)requirements.hermetic.txt(1 hunks)requirements.torch.txt(1 hunks)rpms.in.yaml(1 hunks)rpms.lock.yaml(1 hunks)scripts/gen_doc.py(2 hunks)scripts/llama_stack_tutorial.sh(1 hunks)scripts/remove_torch_deps.sh(1 hunks)src/app/database.py(2 hunks)src/app/endpoints/README.md(2 hunks)src/app/endpoints/authorized.py(1 hunks)src/app/endpoints/config.py(3 hunks)src/app/endpoints/conversations.py(11 hunks)src/app/endpoints/conversations_v2.py(8 hunks)src/app/endpoints/feedback.py(6 hunks)src/app/endpoints/health.py(2 hunks)src/app/endpoints/info.py(4 hunks)src/app/endpoints/metrics.py(3 hunks)src/app/endpoints/models.py(3 hunks)src/app/endpoints/providers.py(3 hunks)src/app/endpoints/query.py(12 hunks)src/app/endpoints/query_v2.py(9 hunks)src/app/endpoints/rags.py(1 hunks)src/app/endpoints/root.py(3 hunks)src/app/endpoints/shields.py(3 hunks)src/app/endpoints/streaming_query.py(13 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/endpoints/tools.py(3 hunks)src/app/main.py(3 hunks)src/app/routers.py(3 hunks)src/authentication/README.md(2 hunks)src/authentication/__init__.py(2 hunks)src/authentication/api_key_token.py(1 hunks)src/authentication/jwk_token.py(2 hunks)src/authentication/k8s.py(4 hunks)src/authentication/rh_identity.py(1 hunks)src/authentication/utils.py(1 hunks)src/authorization/middleware.py(5 hunks)src/cache/cache_factory.py(3 hunks)src/configuration.py(2 hunks)src/constants.py(2 hunks)src/metrics/utils.py(2 hunks)src/models/README.md(1 hunks)src/models/config.py(21 hunks)src/models/context.py(1 hunks)src/models/requests.py(4 hunks)src/models/responses.py(17 hunks)src/quota/__init__.py(1 hunks)src/quota/quota_limiter.py(2 hunks)src/quota/user_quota_limiter.py(1 hunks)src/utils/README.md(1 hunks)src/utils/checks.py(1 hunks)src/utils/common.py(3 hunks)src/utils/endpoints.py(6 hunks)src/utils/quota.py(2 hunks)src/utils/responses.py(1 hunks)src/utils/shields.py(1 hunks)test.containerfile(1 hunks)tests/configuration/rh-identity-config.yaml(1 hunks)tests/e2e/README.md(1 hunks)tests/e2e/__init__.py(1 hunks)tests/e2e/configs/README.md(1 hunks)tests/e2e/configs/run-azure.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(3 hunks)tests/e2e/configs/run-rhelai.yaml(1 hunks)tests/e2e/configuration/README.md(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-library-mode.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-no-cache.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-server-mode.yaml(1 hunks)tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/features/README.md(1 hunks)tests/e2e/features/authorized_noop_token.feature(2 hunks)tests/e2e/features/conversation_cache_v2.feature(1 hunks)tests/e2e/features/conversations.feature(5 hunks)tests/e2e/features/environment.py(4 hunks)tests/e2e/features/feedback.feature(4 hunks)tests/e2e/features/health.feature(2 hunks)tests/e2e/features/info.feature(4 hunks)tests/e2e/features/query.feature(3 hunks)tests/e2e/features/steps/README.md(1 hunks)tests/e2e/features/steps/__init__.py(1 hunks)tests/e2e/features/steps/common_http.py(2 hunks)tests/e2e/features/steps/conversation.py(5 hunks)
⛔ Files not processed due to max files limit (25)
- tests/e2e/features/steps/llm_query_response.py
- tests/e2e/features/streaming_query.feature
- tests/e2e/test_api.py
- tests/e2e/utils/README.md
- tests/integration/README.md
- tests/integration/endpoints/README.md
- tests/integration/endpoints/test_config_integration.py
- tests/integration/endpoints/test_health_integration.py
- tests/integration/endpoints/test_info_integration.py
- tests/integration/endpoints/test_query_v2_integration.py
- tests/integration/test_middleware_integration.py
- tests/integration/test_openapi_json.py
- tests/integration/test_rh_identity_integration.py
- tests/unit/README.md
- tests/unit/app/README.md
- tests/unit/app/endpoints/README.md
- tests/unit/app/endpoints/test_authorized.py
- tests/unit/app/endpoints/test_config.py
- tests/unit/app/endpoints/test_conversations.py
- tests/unit/app/endpoints/test_conversations_v2.py
- tests/unit/app/endpoints/test_feedback.py
- tests/unit/app/endpoints/test_health.py
- tests/unit/app/endpoints/test_info.py
- tests/unit/app/endpoints/test_metrics.py
- tests/unit/app/endpoints/test_models.py
✅ Files skipped from review due to trivial changes (16)
- src/models/README.md
- rpms.in.yaml
- tests/e2e/features/steps/README.md
- .github/PULL_REQUEST_TEMPLATE.md
- src/quota/init.py
- examples/lightspeed-stack-api-key-auth.yaml
- tests/e2e/README.md
- tests/e2e/configuration/README.md
- docs/config.md
- tests/e2e/configs/README.md
- tests/e2e/init.py
- Containerfile
- src/quota/user_quota_limiter.py
- tests/e2e/features/README.md
- tests/configuration/rh-identity-config.yaml
- requirements.hermetic.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/checks.py
🧰 Additional context used
📓 Path-based instructions (10)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/info.featuretests/e2e/features/feedback.featuretests/e2e/features/health.featuretests/e2e/features/conversation_cache_v2.featuretests/e2e/features/query.featuretests/e2e/features/authorized_noop_token.featuretests/e2e/features/conversations.feature
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/e2e/features/steps/__init__.pytests/e2e/features/steps/common_http.pytests/e2e/features/steps/conversation.pytests/e2e/features/environment.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/main.pysrc/metrics/utils.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/app/endpoints/models.pysrc/utils/endpoints.pysrc/cache/cache_factory.pysrc/models/context.pysrc/app/endpoints/tools.pysrc/authentication/jwk_token.pysrc/configuration.pysrc/app/routers.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/authentication/rh_identity.pysrc/utils/shields.pysrc/app/endpoints/health.pysrc/models/requests.pysrc/app/endpoints/query.pysrc/utils/common.pysrc/authentication/k8s.pysrc/app/endpoints/providers.pysrc/utils/responses.pysrc/utils/quota.pysrc/app/endpoints/info.pysrc/app/database.pysrc/constants.pysrc/authentication/api_key_token.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/authentication/__init__.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/quota/quota_limiter.pysrc/app/endpoints/root.pysrc/models/responses.pysrc/app/endpoints/query_v2.pysrc/models/config.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/providers.pysrc/app/endpoints/info.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/app/endpoints/root.pysrc/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/providers.pysrc/app/endpoints/info.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/app/endpoints/root.pysrc/app/endpoints/query_v2.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/context.pysrc/models/requests.pysrc/models/responses.pysrc/models/config.py
src/**/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define shared constants in central
constants.pyfile with descriptive comments
Files:
src/constants.py
src/**/__init__.py
📄 CodeRabbit inference engine (CLAUDE.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/authentication/__init__.py
src/models/config.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/config.py: All configuration must use Pydantic models extendingConfigurationBasewithextra="forbid"to reject unknown fields
Use type hintsOptional[FilePath],PositiveInt,SecretStrfor Pydantic configuration models
Files:
src/models/config.py
🧠 Learnings (19)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
tests/e2e/features/info.featuresrc/app/main.pysrc/metrics/utils.pysrc/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations.pysrc/app/endpoints/shields.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamlexamples/lightspeed-stack-rh-identity.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/lightspeed-stack-library-mode.yamlsrc/authentication/README.mdtests/e2e/features/authorized_noop_token.featuretests/e2e/configuration/lightspeed-stack-server-mode.yamldocs/auth.mdtests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/features/authorized_noop_token.featuretests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/main.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/app/endpoints/models.pysrc/utils/endpoints.pysrc/app/endpoints/feedback.pysrc/utils/quota.pysrc/app/endpoints/info.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Run `uv run make verify` to run all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion
Applied to files:
MakefileCONTRIBUTING.md
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests
Applied to files:
Makefile.github/workflows/e2e_tests.yamltests/e2e/configuration/lightspeed-stack-library-mode.yamltests/e2e/features/health.feature
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Applied to files:
Makefile.github/workflows/e2e_tests.yamltests/e2e/features/environment.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
docker-compose-library.yamlREADME.mddocs/demos/lcore/lcore.md
📚 Learning: 2025-07-23T14:26:40.340Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 278
File: src/app/endpoints/feedback.py:113-118
Timestamp: 2025-07-23T14:26:40.340Z
Learning: In the lightspeed-stack project after PR #278, the `UserDataCollection.feedback_storage` property dynamically constructs paths from `user_data_dir` and will always return a valid string, making fallback logic like `or Path("")` redundant.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Always check `pyproject.toml` for existing dependencies and versions before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use `uv sync --group dev --group llslibdev` to install development dependencies
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Applied to files:
CONTRIBUTING.md
📚 Learning: 2025-08-18T11:45:59.961Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:77-77
Timestamp: 2025-08-18T11:45:59.961Z
Learning: torch==2.7.1 is available on PyPI and is a valid version that can be used in dependency specifications.
Applied to files:
requirements.torch.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/requests.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : All configuration must use Pydantic models extending `ConfigurationBase` with `extra="forbid"` to reject unknown fields
Applied to files:
docs/config.pumlsrc/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : Use type hints `Optional[FilePath]`, `PositiveInt`, `SecretStr` for Pydantic configuration models
Applied to files:
docs/config.pumlsrc/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Pydantic configuration classes must extend `ConfigurationBase`; data models must extend `BaseModel`
Applied to files:
docs/config.puml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
| ### API Key Token (`api-key-token`) | ||
|
|
||
| Authentication that checks a given API Key token is present as a Bearer token | ||
|
|
||
| **Configuration:** | ||
| ```yaml | ||
| module: "api-key-token" | ||
| api_key_config: | ||
| api_key: "some-api-key" | ||
| ``` | ||
|
|
||
| **Behavior:** | ||
| - Extracts bearer token from the `Authorization` header | ||
| - Same user ID and username handling as `noop` | ||
| - Token is passed through and validated against the API Key given from configuration, for downstream use | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix API key token YAML example to include the authentication root
The new API Key Token section’s config example omits the authentication: root key, unlike the other module examples. As written, the snippet isn’t a complete, copy-pasteable config fragment.
Consider updating it to:
authentication:
module: "api-key-token"
api_key_config:
api_key: "some-api-key"This keeps it consistent with the other documented modules and with the actual configuration shape.
🤖 Prompt for AI Agents
In docs/auth.md around lines 145 to 160, the YAML example for the API Key Token
module is missing the top-level authentication: key so it isn’t a complete,
copy-pasteable config fragment; update the snippet to include the
authentication: root and nest module and api_key_config under it so the example
matches the configuration shape used elsewhere in the docs.
| * It's own REST API (stable, standard) | ||
| * Llama Stack as backed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammatical error.
Line 334 uses "It's own" which should be "Its own" (possessive form).
-* It's own REST API (stable, standard)
+* Its own REST API (stable, standard)🤖 Prompt for AI Agents
In docs/demos/lcore/lcore.md around lines 333 to 334, fix the grammatical error
by changing "It's own REST API (stable, standard)" to the correct possessive
form "Its own REST API (stable, standard)"; update that line only so it uses
"Its" (no apostrophe).
| --- | ||
| ### Lightspeed Evaluation Framework | ||
|
|
||
| <font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the GitHub repository exists and is accessible
url="https://github.com/lightspeed-core/lightspeed-evaluation/"
echo "=== GitHub Repository Verification ==="
echo "URL: $url"
echo
# Try to access the repository
echo "Attempting to access repository..."
status=$(curl -s -o /dev/null -w "%{http_code}" --head "$url" 2>&1)
echo "HTTP Status: $status"
# Also try with -L to follow redirects
status_redirect=$(curl -s -o /dev/null -w "%{http_code}" -L --head "$url" 2>&1)
echo "HTTP Status (with redirects): $status_redirect"
# Validate URL format
if [[ "$url" =~ ^https://github\.com/[^/]+/[^/]+/?$ ]]; then
echo "URL Format: ✓ Valid GitHub URL format"
else
echo "URL Format: ✗ Invalid GitHub URL format"
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 311
🌐 Web query:
GitHub repository lightspeed-core/lightspeed-evaluation
💡 Result:
I couldn't find a GitHub repository named lightspeed-core/lightspeed-evaluation. Closely related projects I found:
- lightspeed-core/lightspeed-stack (LLM tooling stack). [1]
- openshift/lightspeed-service (OpenShift Lightspeed assistant). [2]
- An OpenHub entry for a lightspeed-evaluation project (summary/extraction, not the GitHub repo). [3]
If you want, tell me whether you mean a different org/name or I can search for a fork/alternate name and look inside those repos.
Sources:
[1] https://github.com/lightspeed-core/lightspeed-stack
[2] https://github.com/openshift/lightspeed-service
[3] https://openhub.net/p/lightspeed-evaluation
The GitHub link references a non-existent repository.
The repository at https://github.com/lightspeed-core/lightspeed-evaluation/ does not exist. Did you mean https://github.com/lightspeed-core/lightspeed-stack? Update the link to point to the correct, existing repository or remove it if it's no longer relevant.
🤖 Prompt for AI Agents
In docs/demos/lcore/lcore.md around line 401, the GitHub URL points to a
non-existent repo (https://github.com/lightspeed-core/lightspeed-evaluation/);
update the link to the correct repository (e.g.,
https://github.com/lightspeed-core/lightspeed-stack) or remove the link if it’s
no longer relevant — replace the href and visible text accordingly and verify
the new URL resolves.
| * [Sequence diagrams](#sequence-diagrams) | ||
| * [Query endpoint REST API handler](#query-endpoint-rest-api-handler) | ||
| * [Streaming query endpoint REST API handler](#streaming-query-endpoint-rest-api-handler) | ||
| * [Versioning](#versioning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table of contents heading level inconsistency.
"Streaming query endpoint REST API handler" (line 90) appears at the same level as "Query endpoint REST API handler" (line 89), but in the actual content (lines 967-969), it's rendered as a level-2 heading (##) rather than level-3 (###), breaking the hierarchy.
In the document body, change the streaming query section heading to be consistent:
-## Streaming query endpoint REST API handler
+### Streaming query endpoint REST API handler📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * [Sequence diagrams](#sequence-diagrams) | |
| * [Query endpoint REST API handler](#query-endpoint-rest-api-handler) | |
| * [Streaming query endpoint REST API handler](#streaming-query-endpoint-rest-api-handler) | |
| * [Versioning](#versioning) | |
| ### Streaming query endpoint REST API handler |
🤖 Prompt for AI Agents
In README.md around lines 88 to 91, the table of contents lists "Streaming query
endpoint REST API handler" at the same toc level as the nested "Query endpoint
REST API handler", but the actual section (lines ~967-969) is written as a
level-2 heading (##) instead of a level-3 (###), breaking hierarchy; update the
streaming query section heading in the document body from a level-2 heading to a
level-3 heading so it matches the TOC nesting and restores consistent heading
levels.
| solr: Optional[dict[str, Any]] = Field( | ||
| None, | ||
| description="Solr-specific query parameters including filter queries", | ||
| examples=[ | ||
| {"fq": {"product:*openshift*", "product_version:*4.16*"}}, | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the example: JSON does not support Python sets.
The example uses a Python set literal {"product:*openshift*", "product_version:*4.16*"} which is not valid JSON. JSON only supports arrays for collections. This will cause confusion in the OpenAPI documentation.
solr: Optional[dict[str, Any]] = Field(
None,
description="Solr-specific query parameters including filter queries",
examples=[
- {"fq": {"product:*openshift*", "product_version:*4.16*"}},
+ {"fq": ["product:*openshift*", "product_version:*4.16*"]},
],
)🤖 Prompt for AI Agents
In src/models/requests.py around lines 162-168, the solr example uses a Python
set literal ({"product:*openshift*", "product_version:*4.16*"}) which is invalid
JSON; replace the set with a JSON array or other valid JSON structure (for
example use "fq": ["product:*openshift*", "product_version:*4.16*"]) so the
OpenAPI examples are valid JSON.
| model_config = { | ||
| "json_schema_extra": { | ||
| "examples": [ | ||
| { | ||
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | ||
| "success": True, | ||
| "response": "Conversation deleted successfully", | ||
| } | ||
| "label": "deleted", | ||
| "value": { | ||
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | ||
| "success": True, | ||
| "response": "Conversation deleted successfully", | ||
| }, | ||
| }, | ||
| { | ||
| "label": "not found", | ||
| "value": { | ||
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | ||
| "success": True, | ||
| "response": "Conversation can not be deleted", | ||
| }, | ||
| }, | ||
| ] | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example inconsistency and typo in "not found" case.
The "not found" example (lines 776-782) shows success: True which would be incorrect after fixing the success field. Also, the message uses "can not be deleted" (line 780) while the code uses "cannot be deleted" (line 756).
After fixing the success field, update the example:
{
"label": "not found",
"value": {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
- "success": True,
- "response": "Conversation can not be deleted",
+ "success": False,
+ "response": "Conversation cannot be deleted",
},
},🤖 Prompt for AI Agents
In src/models/responses.py around lines 764-785, the "not found" example is
inconsistent: it sets "success": True and uses "can not be deleted" while the
code expects failure and the phrasing "cannot be deleted". Update that example
to set "success": False and change the response string to "Conversation cannot
be deleted" (matching the code), keeping the rest of the example structure
intact.
| from log import get_logger | ||
| from models.responses import InternalServerErrorResponse, QuotaExceededResponse | ||
| from quota.quota_exceed_error import QuotaExceedError | ||
| from quota.quota_limiter import QuotaLimiter | ||
|
|
||
| logger = get_logger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for get_logger definition
echo "=== Definition of get_logger ==="
rg -n "def get_logger" --type py -A 5
echo ""
echo "=== Import location of get_logger ==="
rg -n "^from log import get_logger" --type py
echo ""
echo "=== Location of log module ==="
fd -t f "log\.py$" | head -20
echo ""
echo "=== get_logger implementation in log module ==="
if [ -f "src/log.py" ]; then
cat -n "src/log.py"
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 311
🌐 Web query:
get_logger python logging pattern
💡 Result:
Recommended pattern — concise:
- In application modules: create a module-level logger with logging.getLogger(name) and configure handlers/formatters in the application entry point only.
- In libraries: create module-level loggers the same way but do NOT configure handlers; install a NullHandler so library logging won’t emit by default.
Minimal examples:
- Application (configure once, e.g. main):
import logging
def main():
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s:%(name)s: %(message)s")
logger = logging.getLogger(name)
logger.info("started")
if name == "main":
main()
- Library module (no configuration; add NullHandler):
import logging
logger = logging.getLogger(name)
logger.addHandler(logging.NullHandler())
def do_work():
logger.debug("doing work")
Why:
- logging.getLogger(name) gives hierarchical, module-level loggers that inherit configuration from the application/root logger. Multiple calls with the same name return the same Logger. Configure logging centrally (root logger or specific named loggers). [1]
- The Logging Cookbook documents patterns for multi-module apps and shows using NullHandler in libraries to avoid forcing handlers on application users. [2]
Sources:
[1] Python logging module docs — usage and getLogger recommendations.
[2] Python Logging Cookbook — using logging across modules and NullHandler.
Use standard logging pattern for consistency with coding guidelines.
Line 11 uses logger = get_logger(__name__) instead of the required logger = logging.getLogger(__name__) pattern. Replace the import with import logging and use logging.getLogger(__name__) to align with coding guidelines and standard Python logging practices.
🤖 Prompt for AI Agents
In src/utils/quota.py around lines 6 to 11, the module uses a custom get_logger
import and assigns logger = get_logger(__name__); replace the custom import with
a standard import logging and change the logger assignment to use
logging.getLogger(__name__) so the file follows the project’s standard logging
pattern and Python logging best practices.
| Then The status code of the response is 200 | ||
| When I use REST API conversation endpoint with conversation_id from above using HTTP GET method | ||
| Then The status code of the response is 200 | ||
| And The conversation history contains 1 messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammatical error in assertion message.
The assertion uses "contains 1 messages" which is grammatically incorrect. It should be "contains 1 message" (singular).
Apply this diff to correct the grammar:
- And The conversation history contains 1 messages
+ And The conversation history contains 1 message📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| And The conversation history contains 1 messages | |
| And The conversation history contains 1 message |
🤖 Prompt for AI Agents
In tests/e2e/features/conversation_cache_v2.feature around line 207, the
assertion text reads "And The conversation history contains 1 messages" which is
grammatically incorrect; update the step text to "And The conversation history
contains 1 message" (singular) so the assertion reads correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (16)
tests/integration/endpoints/test_query_v2_integration.py (2)
357-375: Resolve merge conflict in mock_tool_output.results structure.Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent mock structures for tool search results:
- HEAD: Inline list of mocks with
file_id,filename, andscoreattributes only- upstream/main: Separate
mock_resultobject with additionalattributesdict containingdoc_urlandlinkfieldsChoose the structure that matches the actual Llama Stack API response format. The upstream/main version appears more complete with URL metadata.
-<<<<<<< HEAD - mock_tool_output.results = [ - mocker.MagicMock( - file_id="doc-1", - filename="ansible-docs.txt", - score=0.95, - ) - ] -======= - mock_result = mocker.MagicMock() - mock_result.file_id = "doc-1" - mock_result.filename = "ansible-docs.txt" - mock_result.score = 0.95 - mock_result.attributes = { - "doc_url": "https://example.com/ansible-docs.txt", - "link": "https://example.com/ansible-docs.txt", - } - mock_tool_output.results = [mock_result] ->>>>>>> upstream/main + mock_result = mocker.MagicMock() + mock_result.file_id = "doc-1" + mock_result.filename = "ansible-docs.txt" + mock_result.score = 0.95 + mock_result.attributes = { + "doc_url": "https://example.com/ansible-docs.txt", + "link": "https://example.com/ansible-docs.txt", + } + mock_tool_output.results = [mock_result]
1244-1275: Resolve merge conflict in transcript behavior test.Unresolved merge conflict markers prevent the file from being parsed. The conflict affects the function signature and transcript mocking:
- HEAD: No
mockerparameter, no mocking ofstore_transcript(may create actual files during tests)- upstream/main: Includes
mockerparameter and mocksstore_transcriptto prevent file creationThe upstream/main approach is preferred for integration tests, as it prevents unwanted side effects (actual file I/O) while still verifying the integration logic.
async def test_query_v2_endpoint_transcript_behavior( test_config: AppConfig, mock_llama_stack_client: AsyncMockType, test_request: Request, test_auth: AuthTuple, patch_db_session: Session, -<<<<<<< HEAD -======= + mocker: MockerFixture, ->>>>>>> upstream/main ) -> None: """Test transcript storage behavior based on configuration. This integration test verifies: - Endpoint succeeds with transcripts enabled - Endpoint succeeds with transcripts disabled - Conversation is persisted regardless of transcript setting - Integration between query handler and transcript configuration Args: test_config: Test configuration mock_llama_stack_client: Mocked Llama Stack client test_request: FastAPI request test_auth: noop authentication tuple patch_db_session: Test database session -<<<<<<< HEAD - """ - _ = mock_llama_stack_client - -======= + mocker: pytest-mock fixture + """ + _ = mock_llama_stack_client + + # Mock store_transcript to prevent file creation + mocker.patch("app.endpoints.query.store_transcript") + ->>>>>>> upstream/main test_config.user_data_collection_configuration.transcripts_enabled = Truetests/e2e/features/environment.py (1)
92-112: Harden NoCacheConfig config swapping and restore logicThe new immediate switch/restart for
NoCacheConfiginbefore_scenariois fine, but the restore path inafter_scenariohas a couple of pitfalls:
after_scenarioalways restores usingcontext.feature_configforNoCacheConfig, butcontext.feature_configis only set inbefore_featureforAuthorizedfeatures. If aNoCacheConfigscenario ever appears in a non‑Authorized feature, this will raise anAttributeErrorand leavelightspeed-stack.yamlin the no‑cache state.- Unlike the feature‑level
Authorizedpath, there is no dedicated backup oflightspeed-stack.yamlwhen you overwrite it forNoCacheConfig, so the restore logic depends on knowing the original config path instead of having a real snapshot of the active config.You can make this path robust and self‑contained by introducing a per‑scenario backup for
NoCacheConfigand cleaning it up afterwards, e.g.:@@ def before_scenario(context: Context, scenario: Scenario) -> None: - if "InvalidFeedbackStorageConfig" in scenario.effective_tags: - context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" - if "NoCacheConfig" in scenario.effective_tags: - context.scenario_config = ( - f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" - ) - # Switch config and restart immediately - switch_config( - context.scenario_config - ) # Copies to default lightspeed-stack.yaml - restart_container("lightspeed-stack") + if "InvalidFeedbackStorageConfig" in scenario.effective_tags: + context.scenario_config = ( + f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" + ) + if "NoCacheConfig" in scenario.effective_tags: + context.scenario_config = ( + f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" + ) + # Backup current config, switch to no-cache, and restart immediately + context.scenario_config_backup = create_config_backup("lightspeed-stack.yaml") + switch_config(context.scenario_config) # Copies to default lightspeed-stack.yaml + restart_container("lightspeed-stack") @@ def after_scenario(context: Context, scenario: Scenario) -> None: - if "InvalidFeedbackStorageConfig" in scenario.effective_tags: - switch_config(context.feature_config) - restart_container("lightspeed-stack") - if "NoCacheConfig" in scenario.effective_tags: - switch_config(context.feature_config) - restart_container("lightspeed-stack") + if "InvalidFeedbackStorageConfig" in scenario.effective_tags: + switch_config(context.feature_config) + restart_container("lightspeed-stack") + if "NoCacheConfig" in scenario.effective_tags: + # Restore the backed-up config for this scenario if available + if hasattr(context, "scenario_config_backup"): + switch_config(context.scenario_config_backup) + restart_container("lightspeed-stack") + remove_config_backup(context.scenario_config_backup) + delattr(context, "scenario_config_backup")This removes the dependency on
context.feature_configforNoCacheConfigrestores, avoidsAttributeErrorwhen the tag is used outsideAuthorizedfeatures, and ensures per‑scenario state is properly cleaned up so behavior doesn’t leak across scenarios. Based on learnings, Behave’sContextpersists across scenarios, so this cleanup is important for test isolation.tests/unit/app/endpoints/test_rags.py (1)
192-206: Resolve merge conflict in RagInfo mock class.The
RagInfo.__init__method contains unresolved merge conflict markers, causing multiple pipeline failures:
- IndentationError (syntax error)
- Ruff, Black, and Pylint parsing failures
Remove the conflict markers and decide whether to include the verbose docstring from upstream/main or keep the minimal version from HEAD. Ensure the resulting code is syntactically valid.
requirements.aarch64.txt (4)
1-4197: Unresolved merge conflicts in requirements.aarch64.txt—regenerate via uv from pyproject.toml.This file contains unresolved conflict markers (<<<<<<< HEAD, =======, >>>>>>>). Any marker will break pip/uv and CI. Do not hand-edit. Fix all conflicts in pyproject.toml, then regenerate all requirements-*.txt files using uv to ensure consistency across architectures.
The file includes conflicting pins for faiss-cpu, fastapi, greenlet, kubernetes, litellm, llama-stack, llama-stack-client, mcp, openai, opentelemetry packages (api, sdk, instrumentation, exporter variants, proto), protobuf, and urllib3. Additionally, python-jose (and ecdsa) appear only in the HEAD section and are absent from upstream/main—since authlib is present in both branches, python-jose is likely redundant and should be removed from pyproject.toml before regeneration.
2115-2213: Resolve the OpenTelemetry version conflict by selecting one coherent version set.The requirements file contains an unresolved merge conflict with split OpenTelemetry versions: HEAD uses 1.38.0 family + instrumentation 0.59b0, while upstream/main uses 1.39.0 family + instrumentation 0.60b0. Select one coherent set (e.g., 1.39.0 family + 0.60b0 instrumentation) and regenerate the requirements file to ensure all opentelemetry* packages maintain version coherence throughout requirements.aarch64.txt and related requirement files.
654-696: Resolve the merge conflict in requirements.aarch64.txt and regenerate with uv.The file contains unresolved conflict markers for faiss-cpu and fastapi. Accept the upstream/main versions (faiss-cpu==1.13.1 and fastapi==0.124.0), which match pyproject.toml. Then regenerate:
uv pip compile pyproject.toml -o requirements.aarch64.txt --generate-hashes --group llslibdev --python-platform aarch64-unknown-linux-gnu --torch-backend cpu --python-version 3.12faiss-cpu is pinned at top-level to maintain consistent control across dependency graphs; keep it uniform across architecture files.
1000-1118: Resolve merge conflict: greenlet requires a single pinned version.Remove the unresolved merge markers and choose one greenlet version (3.2.4 or 3.3.0). Both are compatible with SQLAlchemy on Python 3.12; 3.3.0 is newer with improved wheel packaging. After resolving the conflict, regenerate the requirements.aarch64.txt with uv to ensure consistency across all requirements files.
requirements.x86_64.txt (4)
654-696: Pick one: faiss-cpu 1.13.0 vs 1.13.1; keep policy pin.Choose a single version (likely 1.13.1) and ensure it’s pinned in pyproject before compiling, then propagate into all platform requirement files. Based on learnings, keep faiss-cpu pinned at the top-level.
Based on learnings, ...
2115-2212: Resolve the OpenTelemetry package version conflict in requirements.x86_64.txt.Git merge conflict markers show HEAD pinned to 1.38.x/0.59b0 while upstream/main uses 1.39.x/0.60b0. All opentelemetry packages (api, exporter-otlp, exporter-otlp-proto-*, instrumentation, proto, sdk, semantic-conventions) must be aligned to the same minor version. Select one version set, remove the conflict markers, update all affected packages consistently, and recompile to avoid compatibility issues.
1508-1531: litellm / llama-stack[‑client] bumps: verify API compatibility.Upgrading to llama-stack 0.3.0 introduces breaking changes: tools parameter schema changed from "parameters" to { "input_schema", "output_schema" }, API routes reorganized (v1/, v1alpha/, v1beta/), and some endpoints removed/unpublished. If these versions are merged, code must be updated for tool definitions and API endpoint usage; otherwise, keep existing pins (1.80.7, 0.2.22) and regenerate.
673-696: FastAPI and Starlette versions are incompatible.FastAPI 0.124.0 requires Starlette >=0.40.0 and <0.49.0. Starlette 0.50.0 exceeds the upper bound and will not work with FastAPI 0.124.0. Either downgrade Starlette to <0.49.0 or select a different FastAPI version. Resolve the merge conflict and pin compatible versions in pyproject.toml.
Also applies to: 3607-3613
tests/unit/models/responses/test_error_responses.py (1)
9-36: Resolve merge conflicts and adoptPromptTooLongResponsetestsThis test module currently contains unresolved merge conflict markers (
<<<<<<< HEAD,=======,>>>>>>> upstream/main), which breaks parsing and unit tests.You should:
- Keep the upstream/main side that introduces
PROMPT_TOO_LONG_DESCRIPTIONandPromptTooLongResponsein the imports.- Drop the conflict markers and ensure the new
TestPromptTooLongResponseclass is present once.For example:
-from models.responses import ( - BAD_REQUEST_DESCRIPTION, - FORBIDDEN_DESCRIPTION, - INTERNAL_SERVER_ERROR_DESCRIPTION, - NOT_FOUND_DESCRIPTION, -<<<<<<< HEAD -======= - PROMPT_TOO_LONG_DESCRIPTION, ->>>>>>> upstream/main - QUOTA_EXCEEDED_DESCRIPTION, +from models.responses import ( + BAD_REQUEST_DESCRIPTION, + FORBIDDEN_DESCRIPTION, + INTERNAL_SERVER_ERROR_DESCRIPTION, + NOT_FOUND_DESCRIPTION, + PROMPT_TOO_LONG_DESCRIPTION, + QUOTA_EXCEEDED_DESCRIPTION, @@ - NotFoundResponse, -<<<<<<< HEAD -======= - PromptTooLongResponse, ->>>>>>> upstream/main + NotFoundResponse, + PromptTooLongResponse, @@ -<<<<<<< HEAD -======= -class TestPromptTooLongResponse: - """Test cases for PromptTooLongResponse.""" - - def test_constructor_with_default_response(self) -> None: - """Test PromptTooLongResponse with default response.""" - response = PromptTooLongResponse( - cause="The prompt exceeds the maximum allowed length." - ) - assert isinstance(response, AbstractErrorResponse) - assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE - assert isinstance(response.detail, DetailModel) - assert response.detail.response == "Prompt is too long" - assert response.detail.cause == "The prompt exceeds the maximum allowed length." - - def test_openapi_response(self) -> None: - """Test PromptTooLongResponse.openapi_response() method.""" - schema = PromptTooLongResponse.model_json_schema() - model_examples = schema.get("examples", []) - expected_count = len(model_examples) - - result = PromptTooLongResponse.openapi_response() - assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION - assert result["model"] == PromptTooLongResponse - assert "examples" in result["content"]["application/json"] - examples = result["content"]["application/json"]["examples"] - - # Verify example count matches schema examples count - assert len(examples) == expected_count - assert expected_count == 1 - - # Verify example structure - assert "prompt too long" in examples - prompt_example = examples["prompt too long"] - assert "value" in prompt_example - assert "detail" in prompt_example["value"] - assert prompt_example["value"]["detail"]["response"] == "Prompt is too long" - assert ( - prompt_example["value"]["detail"]["cause"] - == "The prompt exceeds the maximum allowed length." - ) - - def test_openapi_response_with_explicit_examples(self) -> None: - """Test PromptTooLongResponse.openapi_response() with explicit examples.""" - result = PromptTooLongResponse.openapi_response(examples=["prompt too long"]) - examples = result["content"]["application/json"]["examples"] - - # Verify only 1 example is returned when explicitly specified - assert len(examples) == 1 - assert "prompt too long" in examples - - ->>>>>>> upstream/main +class TestPromptTooLongResponse: + """Test cases for PromptTooLongResponse.""" + + def test_constructor_with_default_response(self) -> None: + """Test PromptTooLongResponse with default response.""" + response = PromptTooLongResponse( + cause="The prompt exceeds the maximum allowed length." + ) + assert isinstance(response, AbstractErrorResponse) + assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE + assert isinstance(response.detail, DetailModel) + assert response.detail.response == "Prompt is too long" + assert response.detail.cause == "The prompt exceeds the maximum allowed length." + + def test_openapi_response(self) -> None: + """Test PromptTooLongResponse.openapi_response() method.""" + schema = PromptTooLongResponse.model_json_schema() + model_examples = schema.get("examples", []) + expected_count = len(model_examples) + + result = PromptTooLongResponse.openapi_response() + assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION + assert result["model"] == PromptTooLongResponse + assert "examples" in result["content"]["application/json"] + examples = result["content"]["application/json"]["examples"] + + # Verify example count matches schema examples count + assert len(examples) == expected_count + assert expected_count == 1 + + # Verify example structure + assert "prompt too long" in examples + prompt_example = examples["prompt too long"] + assert "value" in prompt_example + assert "detail" in prompt_example["value"] + assert prompt_example["value"]["detail"]["response"] == "Prompt is too long" + assert ( + prompt_example["value"]["detail"]["cause"] + == "The prompt exceeds the maximum allowed length." + ) + + def test_openapi_response_with_explicit_examples(self) -> None: + """Test PromptTooLongResponse.openapi_response() with explicit examples.""" + result = PromptTooLongResponse.openapi_response(examples=["prompt too long"]) + examples = result["content"]["application/json"]["examples"] + + # Verify only 1 example is returned when explicitly specified + assert len(examples) == 1 + assert "prompt too long" in examplesAfter this, the SyntaxError and Ruff “merge conflict markers detected” should go away.
Also applies to: 666-719
docs/config.html (1)
1-21: Resolve merge conflicts in generated config HTML (and regenerate from Markdown)
docs/config.htmlstill contains merge conflict markers across multiple sections (<<<<<<< HEAD,=======,>>>>>>> upstream/main). This makes the file invalid HTML and breaks tooling (Ruff/Black are already complaining).Given this file is clearly generated from
docs/config.md, the safest approach is:
- First, resolve the conflicts in
docs/config.md(see separate comment) so that it contains a single, correct version of the configuration schema, including the newAPIKeyTokenConfigurationsection andapi_key_configfield inAuthenticationConfiguration.- Then, regenerate
docs/config.htmlfrom the Markdown (e.g., via your existing pandoc or docs build pipeline), rather than hand‑editing the HTML.- Ensure the regenerated HTML no longer includes any conflict markers and that the tables (e.g., AuthenticationConfiguration) mention
api_key_configalongsidejwk_configandrh_identity_config.Until these conflicts are resolved and the HTML is regenerated, this doc will remain broken.
Also applies to: 191-268, 443-721, 1009-1266, 1570-1815, 2275-2452
docs/config.md (1)
10-23: Resolve Markdown merge conflicts and keep the API key schema updates
docs/config.mdalso has unresolved merge conflict markers, which breaks the Markdown and the HTML generation:
- At the top around
APIKeyTokenConfiguration(Lines ~10‑23).- In
AuthenticationConfigurationwhereapi_key_configis conditionally present (Lines ~59‑63).- In narrative text for
ModelContextProtocolServer,PostgreSQLDatabaseConfiguration, andServiceConfiguration.To fix:
- Remove all
<<<<<<< HEAD/=======/>>>>>>> upstream/mainmarkers.- Keep the upstream/main additions that introduce:
- The
## APIKeyTokenConfigurationsection with itsapi_keyfield table.- The
api_key_configrow in theAuthenticationConfigurationtable.- The reflowed one‑line paragraphs (they are equivalent content, just wrapped differently).
Conceptually, the cleaned sections should look like:
## APIKeyTokenConfiguration API Key Token configuration. | Field | Type | Description | |--------|--------|-------------| | api_key | string | | ... | Field | Type | Description | |----------------|--------|-------------| | module | string | | | skip_tls_verification | boolean | | | k8s_cluster_api | string | | | k8s_ca_cert_path | string | | | jwk_config | | | | api_key_config | | | | rh_identity_config | | |Once these conflicts are resolved, re‑run your docs build to regenerate
docs/config.htmland clear the markdownlint errors.Also applies to: 59-63, 318-331, 351-359, 473-482
tests/unit/app/endpoints/test_streaming_query_v2.py (1)
1-637: Critical: Resolve all merge conflicts before proceeding.This file contains unresolved merge conflicts throughout (lines 1-5, 12-19, 50-56, 89-95, 148-152, 184-198, 220-224, 240-244, 263-289, 309-315, 348-354, 405-409, 501-505, 557-637), causing all pipeline checks to fail:
- Unit tests:
SyntaxError: invalid syntax- Ruff/Black/Pylint: Cannot parse due to conflict markers
Key decisions to make when resolving:
Imports (lines 12-19): Choose between
HTTPException(HEAD) orRateLimitError(upstream) — likely need both if keeping the quota test.
stream_end_eventsignature (lines 148-152, 405-409, 501-505): Upstream useslambda _m, _t, _aq, _rd, _mediawhich aligns with new RAG chunk fields (rag_chunks,referenced_documents). HEAD useslambda _m, _s, _t, _media. Choose based on the production signature instreaming_query_v2.py.Conversation mocking (lines 50-56, 89-95, 309-315, 348-354): Upstream adds
mock_client.conversations.createmocks — include these ifretrieve_responsenow creates conversations.Error handling pattern (lines 263-289): HEAD raises
HTTPException; upstream returnsStreamingResponsewith error status code. Align with the production implementation.Conversation ID expectations (lines 220-224, 240-244): HEAD expects
conv-xyz; upstream expectsabc123def456. Align with howretrieve_responsereturns the conversation ID.Quota test (lines 557-637): Upstream adds a new test for
RateLimitError— keep if this error handling exists in production.
♻️ Duplicate comments (5)
src/utils/checks.py (1)
90-91: Fix misleading error message.The error message references
pathbut the actual check is forparent_pathwritability, making debugging more difficult.Apply this diff to clarify the error message:
if not os.access(parent_path, os.W_OK): - raise InvalidConfigurationError(f"{desc} '{path}' is not writable") + raise InvalidConfigurationError( + f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable" + )Based on past review comments, this issue was previously flagged but remains unresolved.
src/models/responses.py (1)
749-788:ConversationDeleteResponse.successshould reflectdeletedand examples must be updated
__init__still hardcodessuccess=True(Line 763), so the response claims success even whendeleted=False. The “not found” example also showssuccess: Truewith a failure message and uses the inconsistent phrase “can not be deleted”.This is the same issue previously flagged; please wire
successtodeletedand fix the example payload.Apply something like:
def __init__(self, *, deleted: bool, conversation_id: str) -> None: @@ response_msg = ( "Conversation deleted successfully" if deleted else "Conversation cannot be deleted" ) super().__init__( conversation_id=conversation_id, # type: ignore[call-arg] - success=True, # type: ignore[call-arg] + success=deleted, # type: ignore[call-arg] response=response_msg, # type: ignore[call-arg] ) @@ { "label": "not found", "value": { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "success": True, - "response": "Conversation can not be deleted", + "success": False, + "response": "Conversation cannot be deleted", }, },src/app/endpoints/query.py (2)
978-1005: Fix “attatchment” typos in validation errorsBoth error messages use “attatchment” instead of “attachment”, and a previous review called this out. These strings surface in HTTP 422 responses and should be correct and consistent.
- message = ( - f"Invalid attatchment type {attachment.attachment_type}: " + message = ( + f"Invalid attachment type {attachment.attachment_type}: " @@ - message = ( - f"Invalid attatchment content type {attachment.content_type}: " + message = ( + f"Invalid attachment content type {attachment.content_type}: "You may also want to align these messages with the example texts in
UnprocessableEntityResponse(“Invalid attribute value”) if you care about strict consistency.
792-797: Avoid logging full vector DB payloads at INFOBoth vector DB queries log the entire
query_responseat INFO, which can contain sensitive content and is also noisy:logger.info(f"The query response total payload: {query_response}")Prefer logging lightweight metadata (e.g., chunk count) at DEBUG instead, as in the diff above. This aligns with the earlier review feedback on the same issue.
Also applies to: 890-897
tests/unit/models/responses/test_successful_responses.py (1)
673-737: AlignConversationDeleteResponsetests with correctedsuccesssemanticsOnce
ConversationDeleteResponseis fixed to setsuccess=deleted, the tests here must be updated accordingly:
test_constructor_not_deletedshould expectsuccess is False.- The “not found” OpenAPI example should assert
"success": Falseand the corrected message"Conversation cannot be deleted".Proposed changes:
def test_constructor_not_deleted(self) -> None: """Test ConversationDeleteResponse when conversation cannot be deleted.""" response = ConversationDeleteResponse(deleted=False, conversation_id="conv-123") - assert response.success is True - assert response.response == "Conversation cannot be deleted" + assert response.success is False + assert response.response == "Conversation cannot be deleted" @@ not_found_example = examples["not found"] @@ - assert not_found_example["value"]["success"] is True - assert ( - not_found_example["value"]["response"] == "Conversation can not be deleted" - ) + assert not_found_example["value"]["success"] is False + assert ( + not_found_example["value"]["response"] == "Conversation cannot be deleted" + )This keeps the tests consistent with the corrected model and with the updated examples in
ConversationDeleteResponse.model_config.
🧹 Nitpick comments (2)
src/models/responses.py (1)
25-30: Avoid duplicatingRAGChunkmodel across modules
RAGChunkis now defined here and also insrc/utils/types.pywith the same fields. Maintaining two nearly identical models will drift over time and complicates typing (e.g.,TurnSummary.rag_chunksvsQueryResponse.rag_chunks). Consider making a single canonical definition (e.g., inutils.types) and importing/re-exporting it here, or at least aliasing one to the other to keep them in sync.src/app/endpoints/query.py (1)
879-936: Second vector DB query duplicates work and repeats issuesThe post‑agent block re‑lists vector DBs and re‑queries
vector_io.querywith almost identical parameters, again logging the full payload and re‑importingRAGChunk/ReferencedDocumentlocally. It also buildsRAGChunkinstances withcontent=str(chunk.content)andsourcefrom loosely typed attributes.Even if you decide to keep the second query (e.g., for different params vs. Solr filtering), it should:
- Reuse the module‑level
RAGChunk/ReferencedDocumentimports.- Avoid logging the full payload at INFO.
- Normalize
contentandsourcetypes the same way as in the pre‑agent block (e.g.,interleaved_content_as_str+str | Noneguarded from metadata).- Consider reusing the initial
vector_db_idsinstead of callingclient.vector_dbs.list()again.You may also want to reconsider whether two separate
vector_io.querycalls per request are necessary; a single call whose results are reused for both RAG context injection and finalrag_chunks/referenced_documentswould simplify the flow and reduce load on the vector backend.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
docs/config.html(12 hunks)docs/config.md(6 hunks)requirements.aarch64.txt(21 hunks)requirements.hermetic.txt(1 hunks)requirements.x86_64.txt(21 hunks)src/app/endpoints/query.py(10 hunks)src/models/requests.py(2 hunks)src/models/responses.py(19 hunks)src/utils/checks.py(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/features/conversation_cache_v2.feature(2 hunks)tests/e2e/features/environment.py(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(10 hunks)tests/unit/app/endpoints/test_rags.py(2 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(15 hunks)tests/unit/authentication/test_api_key_token.py(3 hunks)tests/unit/models/responses/test_error_responses.py(4 hunks)tests/unit/models/responses/test_successful_responses.py(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/e2e/features/environment.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_successful_responses.pytests/unit/authentication/test_api_key_token.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_rags.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_successful_responses.pytests/unit/authentication/test_api_key_token.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_rags.py
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversation_cache_v2.feature
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/checks.pysrc/app/endpoints/query.pysrc/models/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧠 Learnings (9)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/features/environment.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/unit/authentication/test_api_key_token.py
📚 Learning: 2025-08-25T09:11:38.701Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.701Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.
Applied to files:
tests/e2e/features/environment.py
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-08-18T10:57:39.266Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:59-59
Timestamp: 2025-08-18T10:57:39.266Z
Learning: In the lightspeed-stack project, transitive dependencies like faiss-cpu are intentionally pinned as top-level dependencies to maintain better control over the dependency graph and avoid version conflicts when bundling ML/LLM tooling packages.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use Python package manager `uv` with `uv run` prefix for all development commands
Applied to files:
requirements.hermetic.txt
🧬 Code graph analysis (5)
tests/e2e/features/environment.py (1)
tests/e2e/utils/utils.py (2)
switch_config(98-106)restart_container(130-144)
tests/unit/models/responses/test_successful_responses.py (1)
src/models/responses.py (3)
RAGChunk(25-30)ToolCall(33-38)ToolsResponse(90-125)
tests/unit/app/endpoints/test_streaming_query_v2.py (2)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(318-345)src/models/requests.py (1)
QueryRequest(73-275)
src/app/endpoints/query.py (2)
src/models/responses.py (4)
ToolCall(33-38)model(1525-1529)ReferencedDocument(326-339)RAGChunk(25-30)src/utils/types.py (2)
TurnSummary(135-234)RAGChunk(127-132)
src/models/responses.py (2)
src/utils/types.py (1)
RAGChunk(127-132)src/quota/quota_exceed_error.py (1)
QuotaExceedError(6-38)
🪛 GitHub Actions: Black
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD
tests/unit/models/responses/test_successful_responses.py
[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD
[warning] Black would reformat this file. Run 'black --write' to fix code style issues.
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Cannot parse for target version Python 3.13: 14:0: <<<<<<< HEAD
[warning] Black would reformat this file. Run 'black --write' to fix code style issues.
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD
tests/unit/app/endpoints/test_rags.py
[error] 193-193: Cannot parse for target version Python 3.13: 193:0: <<<<<<< HEAD
src/app/endpoints/query.py
[error] Black would reformat this file. Run 'black --write' to fix code style issues.
src/models/responses.py
[error] 1-1: Black would reformat this file. Run 'black --write' to fix code style issues.
🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)
🪛 GitHub Actions: Pyright
src/app/endpoints/query.py
[error] 753-753: pyright: Operator '+' not supported for 'None' (reportOptionalOperand).
[error] 778-778: pyright: 'vector_db_ids' is possibly unbound (reportPossiblyUnboundVariable).
[error] 816-816: pyright: Argument missing for parameter 'doc_id' (reportCallIssue).
[error] 840-840: pyright: Argument of type 'float | str | List[object] | object | Literal[True]' cannot be assigned to parameter 'url' of type 'AnyStr@urljoin | None' in function 'urljoin'.
[error] 848-848: pyright: 'RAGChunk' is possibly unbound (reportPossiblyUnboundVariable).
[error] 849-849: pyright: Argument of type 'InterleavedContent' cannot be assigned to parameter 'content' of type 'str' in function 'init'.
[error] 850-850: pyright: Argument of type 'str | bool | float | List[object] | object | None' cannot be assigned to parameter 'source' of type 'str | None' in function 'init'.
[error] 875-875: pyright: Argument of type 'list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None' cannot be assigned to parameter 'toolgroups' of type 'List[Toolgroup] | None'.
[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).
🪛 GitHub Actions: Python linter
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'
tests/unit/models/responses/test_successful_responses.py
[error] 56-56: Pylint parsing failed: 'unmatched ')' (tests.unit.models.responses.test_successful_responses, line 56)'
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Pylint parsing failed: 'invalid syntax (tests.unit.models.responses.test_error_responses, line 14)'
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: Pylint parsing failed: 'invalid syntax (tests.unit.app.endpoints.test_streaming_query_v2, line 1)'
tests/unit/app/endpoints/test_rags.py
[error] 193-193: Pylint parsing failed: 'expected an indented block after function definition on line 192 (tests.unit.app.endpoints.test_rags, line 193)'
🪛 GitHub Actions: Ruff
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/models/responses/test_successful_responses.py
[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.
tests/unit/app/endpoints/test_rags.py
[error] 192-192: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
src/app/endpoints/query.py
[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.
[error] 1-1: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
🪛 GitHub Actions: Type checks
src/app/endpoints/query.py
[error] 753-753: mypy: Unsupported left operand type for + ("None").
[error] 793-793: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | NotGiven'.
[error] 840-840: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'.
🪛 GitHub Actions: Unit tests
tests/unit/models/responses/test_error_responses.py
[error] 14-14: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file
tests/unit/app/endpoints/test_rags.py
[error] 193-193: IndentationError / SyntaxError: invalid syntax due to merge conflict markers (e.g., '<<<<<<< HEAD') in test_rags.py
🪛 LanguageTool
docs/config.md
[style] ~323-~323: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. ======= MCP ...
(NOT_ABLE_PREMIUM)
[style] ~328-~328: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. >>>>>>> upst...
(NOT_ABLE_PREMIUM)
🪛 markdownlint-cli2 (0.18.1)
docs/config.md
10-10: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
59-59: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
60-60: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
61-61: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
318-318: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
351-351: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
472-472: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
🪛 YAMLlint (1.37.1)
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
[error] 19-19: syntax error: could not find expected ':'
(syntax)
🔇 Additional comments (20)
tests/unit/authentication/test_api_key_token.py (1)
134-139: Verify inconsistentdetailformat across error cases.This test expects
detailto be a plain string ("Invalid API Key"), while other error tests expectdetailto be a dict with a"cause"key. Confirm this asymmetry is intentional in the implementation, or align the assertion style for consistency.tests/integration/endpoints/test_query_v2_integration.py (1)
398-405: Resolve merge conflicts for tool call field name assertions.Unresolved merge conflict markers at three locations prevent the file from being parsed:
- Line 398-405:
test_query_v2_endpoint_with_tool_calls- Line 469-476:
test_query_v2_endpoint_with_mcp_list_tools- Line 543-547:
test_query_v2_endpoint_with_multiple_tool_typesHEAD uses
tool_calls[0].tool_name, while upstream/main usestool_calls[0].name. Determine the correct field name by checking theToolCallSummarymodel definition, then resolve all three conflicts consistently. Also remove the conditionalif response.rag_chunks:guard introduced in HEAD (lines 401-402) if it was not in upstream/main.requirements.aarch64.txt (7)
1234-1236: Clean up conflict markers in requirements.aarch64.txt.Lines 1234-1236 contain stray conflict markers (<<<<<<, =======, >>>>>>>) around a comment. Regenerate the file with
uvafter resolving the conflict between upstream and HEAD.
2587-2610: Resolve protobuf version conflict in requirements.aarch64.txt (6.33.1 vs 6.33.2). Pick one version and ensure it's consistent with googleapis-common-protos and opentelemetry-proto dependencies. Update pyproject.toml if needed.
1508-1531: Versions shown are already in lockstep, not diverged.The snippet shows versions moving together: litellm 1.80.7→1.80.8, llama-stack 0.2.22→0.3.0, and llama-stack-client 0.2.22→0.3.0. All three are synchronized. Verify the actual merged state of requirements.aarch64.txt to confirm this was resolved correctly.
Likely an incorrect or invalid review comment.
3001-3007: Verify python-jose necessity before merging; confirm it's not redundant with existing authlib/pyjwt stack.The review raises a valid concern about duplicate JWT/cryptography dependencies—repo already has authlib and pyjwt. However, direct verification of python-jose usage in the codebase could not be completed due to technical constraints. Before merging:
- Search the codebase for
from jose importorimport josepatterns to confirm if python-jose is actually used- If unused, remove python-jose and regenerate lock files
- If required, document the rationale in code comments or dependency documentation
- Ensure cryptography dependency pins remain consistent across the stack
Also check lines 639-644, 3525-3527, 3374-3379 for consistency.
1494-1501: Resolve kubernetes version conflict between 33.1.0 and 34.1.0.This merge conflict requires a decision on which version to use. Consult the kubernetes-client/python CHANGELOG between v33.1.0 and v34.1.0 to identify any breaking changes, then verify compatibility with your kubernetes client usage patterns and regenerate with uv.
2098-2110: Resolve openai version drift (2.8.1 vs 2.9.0) by pinning a single version.Version 2.9.0 includes streaming fixes (stream closure, response config handling) but maintains API compatibility with 2.8.1; no breaking changes to streaming behavior. However, verify jiter (Rust-based JSON parser dependency) compatibility in your target environments, as it can be problematic in some runtimes. Align the pin in pyproject.toml and recompile with uv.
1830-1837: Resolve the mcp version conflict (1.22.0 vs 1.23.1) in requirements.aarch64.txt.This file contains a merge conflict. Determine which version is specified in pyproject.toml and update the requirements file accordingly, or manually choose the intended version and regenerate the locked requirements if this file is auto-generated.
requirements.x86_64.txt (5)
2639-2706: Retain psycopg2-binary pin (policy).psycopg2-binary is correctly pinned; ensure it remains in pyproject and survives re‑compile since some llama‑stack providers require it. Based on learnings, ...
3001-3007: Verify python-jose dependency requirement before resolving merge conflict.This file shows a merge conflict where HEAD adds python-jose==3.5.0 while upstream removes it. Before resolving, confirm whether python-jose is actually used in the codebase. Search for
joseimports and usage (includingjose.jwtand related auth functions). If the new auth implementation doesn't depend on it, remove the dependency from both the requirements file and pyproject.toml. If it is required, ensure the pinned version and hashes are intentional.Note: This concern also applies to the related dependency occurrences at lines 639-644, 2771-2773, 3525-3527, and 3374-3379.
605-607: The review comment addresses a critical issue (unresolved merge conflicts in lockfiles making them unusable), but verification of the specific technical claims requires direct access to the repository code, which is currently unavailable.
1000-1117: Resolve the greenlet merge conflict by choosing version 3.3.0.Remove the conflicting versions (3.2.4 vs 3.3.0) and standardize on greenlet 3.3.0, which is compatible with SQLAlchemy 2.0.44 and includes wheels and improved support for CPython 3.14. Also applies to lines 3607-3613.
3808-3818: Prefer urllib3==2.6.1; both versions compatible with requests 2.32.5 and kubernetes client.urllib3 2.6.1 and 2.3.0 are both compatible with requests 2.32.5 (which supports urllib3 >=1.21.1 and <3). The kubernetes Python client imposes no specific urllib3 version constraint and uses the environment's installed version. Accept the upstream version (2.6.1) as it is newer and includes recent fixes. Before merging, verify no other transitive dependencies in the environment pin urllib3<2.6 (e.g., older botocore versions can conflict), though this is unlikely given the upstream choice.
src/models/responses.py (3)
332-339:ReferencedDocumentextension looks consistentThe addition of
doc_idand wideningdoc_urltoAnyUrl | str | Nonematch how it’s used inquery.py(vector DB metadata and Solr docs). No issues from a modeling perspective.
370-452:QueryResponseRAG/tool fields wiring is soundAdding
rag_chunks: list[RAGChunk]andtool_calls: list[ToolCall] | Noneplus updating the examples keeps the model aligned with the new query pipeline (TurnSummary + tool execution metadata). Defaults ([]/None) also match the usage inquery.py.
1102-1143: Abstract error response OpenAPI helpers look correctThe new
AbstractErrorResponse.__init__,get_description(), andopenapi_response()implementations are consistent with the subclasses and the tests (labelled examples, description-from-class-variable, JSON examples underdetail). No functional issues here.src/app/endpoints/query.py (3)
427-445: Improved database and rate‑limit error handling looks goodCatching
SQLAlchemyErrorwithlogger.exceptionand mapping it toInternalServerErrorResponse.database_error(), plus translatingRateLimitErrorintoQuotaExceededResponse.model(), is consistent with the error model and the FastAPIHTTPExceptionpattern you’re using elsewhere.
1008-1035:get_rag_toolgroupshelper matchesToolgrouptyping, but consider simplifying call sitesThe helper correctly returns
list[Toolgroup] | None, usingToolgroupAgentToolGroupWithArgsfor the RAG knowledge search tool. This is compatible with the llama‑stack types.Given the call sites now need to deal with
Noneand concatenate with MCP toolgroup names, using this helper together with the “extend and or None” pattern (shown in the earlier diff) will avoidNone + listissues and make the intent clearer.No functional issues in the helper itself.
746-769: Fix vector DB / RAG setup: unboundvector_db_ids, type issues, andRAGChunkconstructionThis block has several correctness and typing issues that CI is flagging:
vector_db_idsis only assigned inside theelsebranch of theno_toolscheck, but it's used here unconditionally (if vector_db_ids:). Whenquery_request.no_toolsisTrue, this will be empty/undefined, leading to the pyright "possibly unbound" and a potential runtime error.toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroupsis problematic becauseget_rag_toolgroupscan returnNone, and you're adding a list ofstrto a list ofToolgroup(type mismatch andNone + listrisk). This is what mypy/pyright are complaining about.params = {"k": 5, "score_threshold": 0.0}does not match the union type expected byAsyncVectorIoResource.query; CI reports an incompatible dict type.urljoin("https://mimir.corp.redhat.com", parent_id)is called withparent_idfrom chunk metadata, which is typed as a very broadobjectunion; pyright rightfully complains aboutAnyStrincompatibility.RAGChunk(content=chunk.content, source=source, score=score)passes anInterleavedContentand an untyped metadata value into fields typed asstr/str | None, which triggers the content/source type errors.Consider applying the suggested fixes to resolve the unbound variable, type mismatches for
params,urljoin, andRAGChunkfields, and remove the INFO‑level logging of the full vector DB payload. Re‑run mypy/pyright after applying.
| import traceback | ||
| from datetime import UTC, datetime | ||
| from typing import Annotated, Any, Optional, cast | ||
| from urllib.parse import urljoin | ||
|
|
||
| from fastapi import APIRouter, Depends, HTTPException, Request | ||
| from litellm.exceptions import RateLimitError | ||
| from pydantic import ValidationError | ||
| from llama_stack_client import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up imports and centralize RAGChunk import
ValidationError(Line 14) is unused; Ruff flags this, so it should be removed.RAGChunkis used inretrieve_responsebut only via function‑local imports; import it once at module level alongsideToolCallandReferencedDocumentand drop the in‑function imports to avoid duplication and potential “possibly unbound” issues from static analyzers.- The
OFFLINE = Trueflag (Lines 79‑82) is currently a hard‑coded module global with a TODO. For production behavior, it should eventually come from configuration (e.g., configuration flag/env) rather than a constant; OK as a temporary stopgap, but don’t forget to wire it up.
Also applies to: 32-55, 79-82
🧰 Tools
🪛 GitHub Actions: Ruff
[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 7 to 15 (and apply similar cleanup at
32-55 and 79-82), remove the unused pydantic.ValidationError import, add a
module‑level import of RAGChunk alongside the existing ToolCall and
ReferencedDocument imports so retrieve_response no longer does local imports,
and delete the in‑function imports to avoid duplication and possible “possibly
unbound” analyzer warnings; leave the OFFLINE = True temporary flag but add a
TODO comment (or wire it to config/env) so production behavior is sourced from
configuration instead of a hardcoded constant.
| summary = TurnSummary( | ||
| llm_response=( | ||
| content_to_str(response.output_message.content) | ||
| interleaved_content_as_str(response.output_message.content) | ||
| if ( | ||
| getattr(response, "output_message", None) is not None | ||
| and getattr(response.output_message, "content", None) is not None | ||
| ) | ||
| else "" | ||
| ), | ||
| tool_calls=[], | ||
| tool_results=[], | ||
| rag_chunks=[], | ||
| rag_chunks=rag_chunks, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include tool_results when constructing TurnSummary
TurnSummary (from utils.types) expects both tool_calls and tool_results, but you only pass tool_calls and rag_chunks here. Pyright reports “argument missing for parameter 'tool_results'” and at runtime you’d get a TypeError.
Initialize it explicitly:
- summary = TurnSummary(
+ summary = TurnSummary(
llm_response=(
interleaved_content_as_str(response.output_message.content)
@@
- ),
- tool_calls=[],
- rag_chunks=rag_chunks,
- )
+ ),
+ tool_calls=[],
+ tool_results=[],
+ rag_chunks=rag_chunks,
+ )🧰 Tools
🪛 GitHub Actions: Pyright
[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 937 to 948, the TurnSummary
construction is missing the required tool_results argument; update the
constructor call to include a tool_results keyword argument (pass the collected
tool results if available, otherwise an empty list) so TurnSummary receives both
tool_calls and tool_results explicitly and the call matches the type signature.
| <<<<<<< HEAD | ||
| ======= | ||
| # Mock conversations.create for new conversation creation | ||
| # Returns ID in llama-stack format (conv_ prefix + 48 hex chars) | ||
| mock_conversation = mocker.MagicMock() | ||
| mock_conversation.id = "conv_" + "a" * 48 # conv_aaa...aaa (proper format) | ||
| mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation) | ||
|
|
||
| >>>>>>> upstream/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve merge conflict in conversations.create mock setup.
Unresolved merge conflict markers prevent the file from being parsed or executed. The conflict shows divergent mock setups:
- HEAD: Missing or empty mock for
conversations.create - upstream/main: Explicitly mocks
conversations.createto return a conversation ID in llama-stack format (conv_prefix + 48 hex characters)
Choose the appropriate mock setup based on the expected behavior of the Llama Stack client. The upstream/main version appears more complete.
-<<<<<<< HEAD
-=======
- # Mock conversations.create for new conversation creation
- # Returns ID in llama-stack format (conv_ prefix + 48 hex chars)
- mock_conversation = mocker.MagicMock()
- mock_conversation.id = "conv_" + "a" * 48 # conv_aaa...aaa (proper format)
- mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)
-
->>>>>>> upstream/main
+ # Mock conversations.create for new conversation creation
+ # Returns ID in llama-stack format (conv_ prefix + 48 hex chars)
+ mock_conversation = mocker.MagicMock()
+ mock_conversation.id = "conv_" + "a" * 48 # conv_aaa...aaa (proper format)
+ mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)
+🧰 Tools
🪛 GitHub Actions: Black
[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD
🪛 GitHub Actions: Integration tests
[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)
🪛 GitHub Actions: Python linter
[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'
🪛 GitHub Actions: Ruff
[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
🤖 Prompt for AI Agents
In tests/integration/endpoints/test_query_v2_integration.py around lines 84 to
92, there are unresolved Git conflict markers and a missing mock for
conversations.create; remove the conflict markers and adopt the upstream/main
mock: create a MagicMock conversation with id set to "conv_" + 48 hex chars
(e.g., "a"*48) and set mock_client.conversations.create to an AsyncMock
returning that mock conversation so the test uses a valid llama-stack
conversation id format.
| <<<<<<< HEAD | ||
| assert response.conversation_id == "response-123" | ||
| ======= | ||
| # Conversation ID is normalized (without conv_ prefix) from conversations.create() | ||
| assert response.conversation_id == "a" * 48 | ||
| >>>>>>> upstream/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve merge conflict in conversation_id assertion.
Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent expectations for the conversation_id format:
- HEAD: Expects
conversation_id == "response-123"(from mock_response.id) - upstream/main: Expects
conversation_id == "a" * 48(normalized format withoutconv_prefix)
Resolve this conflict consistently with the mock setup at lines 84-92. If upstream/main's conversations.create mock is adopted, the normalized assertion should be used.
-<<<<<<< HEAD
- assert response.conversation_id == "response-123"
-=======
- # Conversation ID is normalized (without conv_ prefix) from conversations.create()
- assert response.conversation_id == "a" * 48
->>>>>>> upstream/main
+ # Conversation ID is normalized (without conv_ prefix) from conversations.create()
+ assert response.conversation_id == "a" * 48📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <<<<<<< HEAD | |
| assert response.conversation_id == "response-123" | |
| ======= | |
| # Conversation ID is normalized (without conv_ prefix) from conversations.create() | |
| assert response.conversation_id == "a" * 48 | |
| >>>>>>> upstream/main | |
| # Conversation ID is normalized (without conv_ prefix) from conversations.create() | |
| assert response.conversation_id == "a" * 48 |
🤖 Prompt for AI Agents
In tests/integration/endpoints/test_query_v2_integration.py around lines 171 to
176, remove the unresolved git conflict markers and replace the two conflicting
assertion variants with the single correct assertion that matches the
conversations.create mock declared at lines 84-92; specifically, use the
normalized conversation id expected from the mock (the 48-character "a" string)
— i.e., delete the HEAD/UPSTREAM markers and the wrong assertion for
"response-123" and leave only assert response.conversation_id == "a" * 48, then
run the test suite to confirm the expectation aligns with the mock.
| <<<<<<< HEAD | ||
| from fastapi import Request, HTTPException | ||
| import pytest | ||
| ======= | ||
| import pytest | ||
| from fastapi import HTTPException, Request | ||
| >>>>>>> upstream/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve unresolved merge conflicts blocking CI.
The file contains three unresolved merge conflict markers causing Ruff, Black, and Pylint to fail. These must be resolved before the PR can proceed.
Apply this diff to resolve all conflicts (preferring the more defensive upstream/main assertion style with explicit type checking):
-<<<<<<< HEAD
-from fastapi import Request, HTTPException
-import pytest
-=======
import pytest
from fastapi import HTTPException, Request
->>>>>>> upstream/main assert exc_info.value.status_code == 401
-<<<<<<< HEAD
- assert exc_info.value.detail["cause"] == "No Authorization header found"
-=======
detail = exc_info.value.detail
assert isinstance(detail, dict)
assert detail["cause"] == "No Authorization header found"
->>>>>>> upstream/main assert exc_info.value.status_code == 401
-<<<<<<< HEAD
- assert exc_info.value.detail["cause"] == "No token found in Authorization header"
-=======
detail = exc_info.value.detail
assert isinstance(detail, dict)
assert detail["cause"] == "No token found in Authorization header"
->>>>>>> upstream/mainThe upstream/main style is preferred because it:
- Groups stdlib imports before third-party (PEP 8)
- Includes explicit
isinstance(detail, dict)assertions for better test diagnostics if the detail structure changes unexpectedly
Also applies to: 79-85, 108-114
🧰 Tools
🪛 GitHub Actions: Black
[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD
🪛 GitHub Actions: Python linter
[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'
🪛 GitHub Actions: Ruff
[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
60c3c4f to
c971439
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
src/utils/types.py (1)
36-50: Alignget_tool_callssignature, docstring, and guard condition.The method is typed as
get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall], but the docstring still documentsCompletionMessage | Noneand you guard withif output_message and output_message.tool_calls:. Given the non‑optional type, thatif output_messageis redundant and the docstring is misleading.Consider tightening this to keep everything in sync:
- def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]: + def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]: @@ - Parameters: - output_message (CompletionMessage | None): Completion - message potentially containing `tool_calls`. + Parameters: + output_message (CompletionMessage): Completion message potentially + containing `tool_calls`. @@ - if output_message and output_message.tool_calls: - return output_message.tool_calls - return [] + return output_message.tool_calls or [](or, if
Noneis actually possible at call sites, make the parameterCompletionMessage | Noneand keep the guard.)src/utils/endpoints.py (1)
345-375: Inconsistent return order betweenget_agentandget_temp_agent.
get_agentreturns(agent, conversation_id, session_id)(line 342), butget_temp_agentreturns(agent, session_id, conversation_id)(line 375). This inconsistency is error-prone for callers.Additionally, the docstring (lines 359-360) says it returns
tuple[AsyncAgent, str]with "(agent, session_id)" but the actual return type annotation showstuple[AsyncAgent, str, str].- Returns: - tuple[AsyncAgent, str]: A tuple containing the agent and session_id. + Returns: + tuple[AsyncAgent, str, str]: A tuple containing (agent, conversation_id, session_id). """ ... - return agent, session_id, conversation_id + return agent, conversation_id, session_idtests/unit/app/endpoints/test_streaming_query.py (2)
147-172: Assertions insidepytest.raisesblock are never executedIn
test_streaming_query_endpoint_handler_configuration_not_loaded, the status/detail assertions are placed inside thewith pytest.raises(HTTPException) as e:block, after the awaited call. Once the exception is raised, code after the call inside the context is not executed, so those assertions never run.Move the assertions outside the
withblock, e.g.:- with pytest.raises(HTTPException) as e: - await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH) - assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR - assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore + with pytest.raises(HTTPException) as e: + await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH) + + assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR + assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore
175-205: Connection‑error test likely not exercising the real failure path
test_streaming_query_endpoint_on_connection_errorcurrently:
- Patches
mock_client.models.side_effect = APIConnectionError(...), and- Patches
client.AsyncLlamaStackClientHolder.get_clienttwice on the same target, and- Only asserts that a
StreamingResponseis returned (no check that an error event is actually streamed).If the production code calls
await client.models.list(...)(as other tests suggest by settingmock_client.models.list.return_value), the side effect onmodelswill never be triggered, so this test will not verify the connection‑error behavior.Consider:
- mock_client = mocker.AsyncMock() - mock_client.models.side_effect = APIConnectionError(request=query_request) # type: ignore - mock_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client") - mock_lsc.return_value = mock_client - mock_async_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client") - mock_async_lsc.return_value = mock_client + mock_client = mocker.AsyncMock() + mock_client.models.list = mocker.AsyncMock( + side_effect=APIConnectionError(request=query_request) # type: ignore + ) + mocker.patch( + "client.AsyncLlamaStackClientHolder.get_client", + return_value=mock_client, + )and then asserting that the streamed content actually contains an error event (status 503, appropriate message) rather than only checking the response type.
docs/openapi.json (5)
1366-1373: 422 example contradicts schema: QueryRequest only requires 'query'.Examples say missing ['query','model','provider'], but model/provider are optional in the schema. Update examples to only require 'query' (or change the schema if intentional).
- "cause": "Missing required attributes: ['query', 'model', 'provider']", + "cause": "Missing required attributes: ['query']",Applies to v1 query 422, v2 query 422, and v2 streaming_query 422.
Also applies to: 3586-3593, 3868-3875
2608-2613: 'not found' delete example sets success=true; fix success=false and grammar.A failed delete should not return success=true; also prefer “cannot”.
- "response": "Conversation can not be deleted", - "success": true + "response": "Conversation cannot be deleted", + "success": falseReflect the same in components/ConversationDeleteResponse examples.
Also applies to: 5005-5011
1374-1380: Typo: 'attatchment' → 'attachment'.Correct the spelling across all error examples.
- "cause": "Invalid attatchment type: must be one of [...] + "cause": "Invalid attachment type: must be one of [...]Also applies to: 1654-1661, 3594-3601, 3877-3881, 7702-7707
4284-4290: metrics error responses: text/plain uses JSON object schemas.For 401/403/500/503 you declare text/plain but reference JSON object schemas. Either:
- Return JSON only (application/json), or
- Keep text/plain with a simple string schema.
Recommend JSON-only for errors to match the rest of the API.
- "text/plain": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } } + // remove text/plain; keep application/json onlyRepeat for 403/500/503 blocks.
Also applies to: 4306-4311, 4328-4333, 4351-4355
67-71: Root 401 text/html schema references a JSON object; use string.Change text/html schema to a simple string for the error page (or remove if you always emit JSON for errors).
- "text/html": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } } + "text/html": { "schema": { "type": "string" } }src/app/endpoints/query_v2.py (1)
131-136: PotentialAttributeErrorifidattribute is missing.On line 132,
getattr(output_item, "id")is called without a default value, which will raiseAttributeErrorif the attribute doesn't exist. Other similar calls in this function use fallback patterns (e.g., line 90-92 forfunction_call).Apply this diff to add a fallback:
return ToolCallSummary( - id=str(getattr(output_item, "id")), + id=str(getattr(output_item, "id", "") or ""), name=DEFAULT_RAG_TOOL, args=args, response=response_payload, )src/models/responses.py (1)
1-3: Black formatting check failed.The pipeline indicates Black formatting issues in this file. Run
black src/models/responses.pyto fix formatting.#!/bin/bash # Check Black formatting issues uv tool run black --check --diff src/models/responses.py 2>&1 | head -50
♻️ Duplicate comments (18)
tests/unit/authentication/test_api_key_token.py (1)
5-5: Tests read cleanly after conflict resolution; directdetail["cause"]checks are fine here.The FastAPI imports are now normalized, and the updated assertions on
exc_info.value.detail["cause"]keep the tests concise while still validating the exact error causes. Given that you control the exception construction in the auth dependency, this level of defensiveness is sufficient for unit tests.Also applies to: 74-74, 97-97
requirements.hermetic.txt (1)
1-2: Clarify the rationale for pinning older versions of build tools, particularly pip 24.2 which contains known security vulnerabilities.The versions pinned in requirements.hermetic.txt are indeed downgrades: uv 0.8.15 (September 3, 2025) is approximately 3 months older than uv 0.9.16 (December 6, 2025), and pip 24.2 (July 28, 2024) is over a year older than pip 25.3 (October 24-25, 2025).
The pip 24.2 downgrade is particularly concerning. This version contains multiple unpatched security vulnerabilities:
- CVE-2025-8869: Symlink/tar extraction vulnerability allowing file extraction outside target directory (unfixed in pip <25.0)
- Arbitrary code execution via self_outdated_check: pip loads untrusted code after installation (unfixed in pip <25.0)
If these downgrades are necessary for compatibility with specific components, that rationale should be documented in a comment or README. If not, consider upgrading to the latest stable versions.
src/utils/checks.py (1)
79-91: New parent-directory check is breaking tests; also keep error message aligned with what is checkedThe added parent-directory checks change behavior in a way that’s currently failing integration tests:
- When
must_exists=Falseandmust_be_writable=True, a missing parent now raisesInvalidConfigurationError("... cannot be created - parent directory does not exist"). The pipeline failure for'/tmp/data/feedback'shows this is hit in normal test configuration, so either:
- tests/infra need to ensure the parent directory (e.g.
/tmp/data) is created up front, or- this check should be relaxed or adjusted (e.g. test writability of the nearest existing ancestor, or let later code
os.makedirs()the full path and surface that error instead).Additionally, the branch that checks
os.access(parent_path, os.W_OK)still reports"{desc} '{path}' is not writable", even though the actual check is on the parent directory. To keep the error self-explanatory, consider something like:- if not os.access(parent_path, os.W_OK): - raise InvalidConfigurationError(f"{desc} '{path}' is not writable") + if not os.access(parent_path, os.W_OK): + raise InvalidConfigurationError( + f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable" + )This both fixes the message and makes it clear why creation of
pathwould fail.requirements.aarch64.txt (1)
1409-1411: urllib3 downgrade reintroduces known security issues; resolve via authoritative constraints, not this lockfile.You’ve moved to
kubernetes==34.1.0while downgradingurllib3to2.3.0. Earlier analysis on this PR already noted that 2.3.x carries known redirect/info‑disclosure and DoS‑style vulnerabilities that were fixed in later 2.x releases, and recommended landing on a secureurllib3version that also satisfies the kubernetes client’s constraint.Right now this looks like the compatibility problem was “fixed” by picking an insecure
urllib3. Instead:
- Decide the authoritative constraint (in
pyproject.tomlor your constraints file) for bothkubernetesandurllib3so you can use a version ofurllib3that includes the security fixes (per the earlier bot comment, a >=2.5.x/2.6.x line, or whatever the current secure guidance is) while remaining within the kubernetes client’s supported range.- Re‑run
uv pip compileto regenerate this file instead of editingrequirements.aarch64.txtby hand.Until that’s done, this environment is exposed to the urllib3 issues that motivated the original upgrade.
To double‑check the current safe combo for this exact kubernetes client version, please run a web search like:
What urllib3 version range is officially supported by the Python kubernetes client version 34.1.0, and what is the earliest urllib3 2.x release that includes fixes for the 2.3.x redirect/info‑disclosure and decompression/chained-encoding DoS issues?Also applies to: 3601-3603
src/app/endpoints/query.py (4)
796-796: Avoid logging entire query response payloads at INFO level.This was flagged in a past review. Logging the full
query_responsepayload may expose sensitive document content.
929-941: Fix typo: "attatchment" should be "attachment".This was flagged in a past review. The typo appears in error messages at lines 930 and 940.
836-852: Fix type errors in RAGChunk construction.Pipeline reports multiple type errors:
- Line 840:
urljoinreceivesobjectinstead ofstr- Line 848:
RAGChunkis possibly unbound (local import issue)- Line 849:
chunk.contentisInterleavedContent, notstr- Line 850:
sourcetype mismatchThe
RAGChunkimport at lines 799-800 is inside a conditional block and may not execute. Also,chunk.metadata.get()returnsobject, notstr.- if chunk.metadata: - if OFFLINE: - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin("https://mimir.corp.redhat.com", parent_id) + if chunk.metadata: + if OFFLINE: + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin("https://mimir.corp.redhat.com", parent_id) else: - source = chunk.metadata.get("reference_url") + ref_url = chunk.metadata.get("reference_url") + source = ref_url if isinstance(ref_url, str) else None # ... rag_chunks.append( RAGChunk( - content=chunk.content, + content=str(chunk.content) if chunk.content else "", source=source, score=score, ) )Also, move the
RAGChunkimport to module level (this was flagged in past review).
14-14: Remove unused importValidationError.Pipeline failure confirms this import is unused. Remove it to fix the Ruff F401 error.
-from pydantic import ValidationErrorsrc/app/endpoints/streaming_query.py (5)
81-84: OFFLINE flag duplicated from query.py.This was flagged in a past review. Extract to a shared constants module.
925-930:ResponseGeneratorContextdoes not havevector_io_rag_chunksorvector_io_referenced_docsattributes.Pipeline reports
Cannot assign to attribute "vector_io_rag_chunks"and"vector_io_referenced_docs". Thehasattrcheck will always returnFalsefor a dataclass/Pydantic model without these fields defined.This was flagged in a past review. Add these fields to
ResponseGeneratorContext.
1022-1022: Avoid logging entire query response payloads.This was flagged in a past review. Move to DEBUG level and log metadata only.
Also applies to: 1264-1264
1046-1051: Hardcoded URL should be extracted to configuration.This was flagged in a past review. The URL
https://mimir.corp.redhat.comappears multiple times.Also applies to: 1067-1069
1243-1298: Significant code duplication withquery_vector_io_for_chunks.This was flagged in a past review. The vector DB querying logic is duplicated between
query_vector_io_for_chunks(lines 982-1091) andretrieve_response(lines 1243-1298). Additionally,query_vector_io_for_chunksis called at line 906-908 ANDretrieve_responsequeries vector DB again, resulting in duplicate queries per request.src/app/endpoints/conversations_v2.py (1)
99-101: Typo "Converastion" still present.The typo "Converastion" should be "Conversation" at lines 99, 126, and 161.
docs/config.html (1)
205-247: Clarify auth config references and RHIdentity semantics in schema docIn the
AuthenticationConfigurationtable, thejwk_configandrh_identity_configrows have empty Type/Description cells, even though dedicated sections forJwkConfigurationandRHIdentityConfigurationexist below. Consider:
- Setting the Type to the concrete schema name (
JwkConfiguration,RHIdentityConfiguration), and- Adding a brief description (e.g., “JWT-based auth via JWK” / “Red Hat Identity-based auth”).
In the
RHIdentityConfigurationsection, it would also help to explicitly document what happens whenrequired_entitlementsis omitted or an empty list (e.g., whether entitlement checks are skipped vs. requests fail). This keeps runtime behavior unambiguous for operators.Also applies to: 1086-1103
tests/e2e/configs/run-azure.yaml (1)
93-103: Parameterize Braintrust OpenAI API key via env var instead of literalIn the Azure E2E config, the
braintrustscoring provider usesopenai_api_key: '********'. Similar to the CI config, it’s preferable to reference an environment variable (or remove the provider if unused) so the key is not hard‑coded and can be provided securely at runtime, e.g.:scoring: - provider_id: braintrust provider_type: inline::braintrust config: openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}This keeps the config consistent with how other credentials (e.g.,
AZURE_API_KEY) are handled.src/models/responses.py (2)
746-762:successfield still hardcoded toTrue.This issue was flagged in a previous review but remains unfixed. When
deleted=False,successshould also beFalse.super().__init__( conversation_id=conversation_id, # type: ignore[call-arg] - success=True, # type: ignore[call-arg] + success=deleted, # type: ignore[call-arg] response=response_msg, # type: ignore[call-arg] )
775-782: Example inconsistency remains unfixed.The "not found" example still shows
success: Trueand uses "can not be deleted" (with space) while the code uses "cannot be deleted" (no space).{ "label": "not found", "value": { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "success": True, - "response": "Conversation can not be deleted", + "success": False, + "response": "Conversation cannot be deleted", }, },
🧹 Nitpick comments (21)
tests/integration/test_openapi_json.py (2)
87-99: Type annotation onspeccould be more precise/consistentUsing bare
dictworks, but other helpers in this file usedict[str, Any]. For consistency and stricter typing, consider annotatingspecasdict[str, Any]orMapping[str, Any]instead of plaindict.
185-187: Fixture argument type hints now use plaindictSwitching
spec_from_file/spec_from_urlannotations fromdict[str, Any]todictis harmless at runtime but slightly reduces type precision and differs from other helpers. Consider standardizing on one style (e.g.,dict[str, Any]everywhere orMapping[str, Any]) for readability.Also applies to: 255-257
docker-compose.yaml (1)
26-26: Debug logging flag is reasonable; consider making it easily tunable.Enabling
LLAMA_STACK_LOGGING=all=debugon the llama‑stack service is helpful while stabilizing the new flows. If this compose file is also used beyond local dev, you might optionally gate this via an env default (e.g.,${LLAMA_STACK_LOGGING:-all=debug}) so you can dial it back without editing the file, and confirm thatall=debugmatches the logging syntax for the llama‑stack version you now support.scripts/query_llm.py (1)
16-16: Docstring is correct but could be slightly more descriptive.The new
main()docstring is accurate but very generic. Optional: expand it to mention that it sends a single/v1/queryrequest to the local LLM endpoint and prints the response plus timing, which would make the script’s intent clearer when read in isolation.scripts/generate_openapi_schema.py (1)
26-27: Consider retaining parameter and return value documentation in docstrings.The docstrings for
read_version_from_openapiandread_version_from_pyprojecthave been condensed to single-line descriptions. While concise, they now lack parameter and return type documentation that would help future maintainers understand the function contracts.Per coding guidelines requiring Google Python docstring conventions, consider including Args and Returns sections:
def read_version_from_openapi(filename: str) -> str: - """Read version from OpenAPI.json file.""" + """Read version from OpenAPI.json file. + + Args: + filename: Path to the OpenAPI.json file. + + Returns: + Version string extracted from the 'info.version' field. + """As per coding guidelines, all functions should have docstrings with brief descriptions following Google Python docstring conventions.
Also applies to: 38-39
docs/config.md (1)
300-304: Optional: Consider more concise wording.The MCP server description has been updated for clarity. The static analysis tool suggests using "inaccessible" instead of "not accessible" on line 304 for more concise wording, though the current phrasing is perfectly clear and correct.
If you prefer more concise wording:
-Only MCP servers defined in the lightspeed-stack.yaml configuration are -available to the agents. Tools configured in the llama-stack run.yaml -are not accessible to lightspeed-core agents. +Only MCP servers defined in the lightspeed-stack.yaml configuration are +available to the agents. Tools configured in the llama-stack run.yaml +are inaccessible to lightspeed-core agents.However, the current wording is clear and acceptable as-is.
src/utils/types.py (2)
92-163: RAG chunk extraction silently drops some valid-but-unexpected JSON; consider a more forgiving fallback.
_extract_rag_chunks_from_responsehandles the expected shapes well (dict with"chunks", or list of dicts), and falls back to a single chunk when JSON parsing fails or when a structural exception is raised. But there are two notable edge cases where you currently drop the response entirely:
response_contentparses as a dict without a"chunks"key (e.g.{"text": "foo"}) – no chunks are appended and no exception is raised, so the outerexceptdoesn’t trigger.- It parses as a list of non‑dict items (e.g.
["foo", "bar"]) – they’re skipped by theisinstance(chunk, dict)guard without a fallback.If you want to retain as much signal as possible, a small tweak like this would keep behavior robust while preserving the structured cases:
- if isinstance(data, dict) and "chunks" in data: - for chunk in data["chunks"]: - self.rag_chunks.append( - RAGChunk( - content=chunk.get("content", ""), - source=chunk.get("source"), - score=chunk.get("score"), - ) - ) - elif isinstance(data, list): - # Handle list of chunks - for chunk in data: - if isinstance(chunk, dict): - self.rag_chunks.append( - RAGChunk( - content=chunk.get("content", str(chunk)), - source=chunk.get("source"), - score=chunk.get("score"), - ) - ) + if isinstance(data, dict) and "chunks" in data: + for chunk in data["chunks"]: + self.rag_chunks.append( + RAGChunk( + content=chunk.get("content", ""), + source=chunk.get("source"), + score=chunk.get("score"), + ) + ) + elif isinstance(data, list): + # Handle list of chunks (dicts or plain values) + for chunk in data: + if isinstance(chunk, dict): + self.rag_chunks.append( + RAGChunk( + content=chunk.get("content", str(chunk)), + source=chunk.get("source"), + score=chunk.get("score"), + ) + ) + else: + self.rag_chunks.append( + RAGChunk( + content=str(chunk), + source=DEFAULT_RAG_TOOL, + score=None, + ) + ) + else: + # Unknown JSON shape: keep the whole payload as a single chunk + if response_content.strip(): + self.rag_chunks.append( + RAGChunk( + content=response_content, + source=DEFAULT_RAG_TOOL, + score=None, + ) + )That way, any non‑empty response still yields at least one
RAGChunk, even when the tool payload format drifts slightly.
15-29: Minor guideline nits: logging and type hints onSingleton.__call__.Two small style/guideline points:
- Module has no
logger = logging.getLogger(__name__)despite the project guideline to standardize logging per–module.Singleton.__call__is currently untyped (*args, **kwargsonly,# type: ignore); guidelines call for complete type annotations on functions, even if the return type is justSingleton/object.These don’t block functionality, but cleaning them up would keep this module aligned with the rest of the stack.
Also applies to: 89-96
src/app/endpoints/query.py (1)
818-821: Hardcoded URL should be extracted to configuration.The URL
https://mimir.corp.redhat.comis hardcoded. This should be moved to configuration for flexibility across environments.+# At module level or in configuration +MIMIR_BASE_URL = configuration.mimir_base_url or "https://mimir.corp.redhat.com" + # In the code: - doc_url="https://mimir.corp.redhat.com" - + reference_doc, + doc_url=urljoin(MIMIR_BASE_URL, reference_doc),src/app/endpoints/streaming_query.py (1)
1126-1126: Functionretrieve_responsehas too many statements (75/50).Pipeline reports this function exceeds the statement limit. Consider extracting the RAG context injection logic into a helper function.
async def _build_rag_context( client: AsyncLlamaStackClient, query_request: QueryRequest, vector_db_ids: list[str], ) -> tuple[list[RAGChunk], str]: """Build RAG context from vector DB query.""" rag_chunks: list[RAGChunk] = [] rag_context = "" # ... extracted logic from lines 1243-1313 return rag_chunks, rag_contextrun.yaml (1)
121-121: Hardcoded Solr URL may need environment configuration.The
solr_url: "http://localhost:8983/solr"is hardcoded. For production deployments across different environments, consider using an environment variable.config: - solr_url: "http://localhost:8983/solr" + solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}tests/integration/endpoints/test_query_v2_integration.py (1)
307-372: Re‑evaluate LCORE‑1025 skips for tool‑call integration testsBoth
test_query_v2_endpoint_with_tool_callsandtest_query_v2_endpoint_with_multiple_tool_typesare permanently skipped via@pytest.mark.skip(reason="LCORE-1025: ToolCallSummary.response type mismatch"). These tests exercise the new RAG/tool-call surface and are valuable regression coverage.If LCORE‑1025 is now resolved (the ToolCall / ToolCallSummary shapes look stabilized elsewhere in this PR), consider removing the skips so these paths are exercised again. If the issue is still open, keeping the skip is fine but it’s worth confirming the ticket status and planning to re‑enable once fixed.
Also applies to: 439-507
tests/unit/app/endpoints/test_streaming_query.py (1)
372-757: Inconsistent async test marking may cause tests to be skipped or mis‑runSeveral async tests (
test_retrieve_response_vector_db_available,test_retrieve_response_no_available_shields,test_retrieve_response_one_available_shields,test_retrieve_response_two_available_shields,test_retrieve_response_four_available_shields, and the attachment variants) are defined asasync defbut lack@pytest.mark.asyncio, while other async tests in this module are explicitly marked.Depending on your
pytest-asyncioconfiguration, these functions may not run as intended or may be skipped. For consistency and clarity, consider adding@pytest.mark.asyncioto these async tests as well.tests/e2e/configs/run-ci.yaml (1)
99-102: Avoid hard‑coding Braintrust OpenAI API key placeholder in CI configThe
braintrustscoring provider is configured with a literalopenai_api_key: '********'. For CI/e2e configs this is better modeled as an environment‑driven value (or omitted entirely if Braintrust isn’t used in tests), e.g.:scoring: - provider_id: braintrust provider_type: inline::braintrust config: openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}This avoids committing even placeholder secret‑looking values and makes it clear where the real key should come from at runtime.
src/app/endpoints/streaming_query_v2.py (1)
154-162: Consider catching specific exception types.The broad
except Exceptionclause catches all exceptions, which can mask unexpected errors. Since you're accessing attributes,AttributeErrorwould be more appropriate.# Emit start on response.created if event_type == "response.created": try: conv_id = getattr(chunk, "response").id - except Exception: # pylint: disable=broad-except + except AttributeError: logger.warning("Missing response id!") conv_id = "" yield stream_start_event(conv_id) continuedocs/openapi.json (1)
6631-6645: Constrain media_type to known values.Provide an enum for media_type to guide clients and validation.
"media_type": { - "anyOf": [ { "type": "string" }, { "type": "null" } ], + "anyOf": [ + { "type": "string", "enum": ["application/json", "text/plain"] }, + { "type": "null" } + ], "title": "Media Type", "description": "Media type for the response format", "examples": ["application/json", "text/plain"] }src/app/endpoints/query_v2.py (3)
138-145: Missing fallback foridattribute inweb_search_call.Similar to the
file_search_callcase,getattr(output_item, "id")on line 141 lacks a default value. For consistency with thefunction_callpattern (lines 90-92), consider adding a fallback.if item_type == "web_search_call": args = {"status": getattr(output_item, "status", None)} + call_id = getattr(output_item, "id", None) or getattr(output_item, "call_id", None) return ToolCallSummary( - id=str(getattr(output_item, "id")), + id=str(call_id), name="web_search", args=args, response=None, )
409-415: TODO: Document parsing not yet implemented.The function returns an empty list with a clear TODO explaining the needed work. This is acceptable as a placeholder, but tracking the implementation would be valuable.
Would you like me to open an issue to track the implementation of document parsing from the Responses API?
519-539: Consider makingmax_num_resultsconfigurable.The
max_num_results: 10is hardcoded. Consider making this configurable via theAppConfigif different use cases require different limits.src/models/responses.py (1)
367-376: Inconsistent default patterns forrag_chunksvstool_calls.
rag_chunksdefaults to[](empty list) whiletool_callsdefaults toNone. Consider using consistent defaults for similar collection fields.rag_chunks: list[RAGChunk] = Field( - [], + default_factory=list, description="List of RAG chunks used to generate the response", ) tool_calls: list[ToolCall] | None = Field( - None, + default_factory=list, description="List of tool calls made during response generation", )src/models/config.py (1)
1281-1284: Consider adding error handling for file operations.The
dumpmethod opens and writes to a file without exception handling. Consider wrapping in try-except for better error reporting.def dump(self, filename: str = "configuration.json") -> None: """Dump actual configuration into JSON file.""" - with open(filename, "w", encoding="utf-8") as fout: - fout.write(self.model_dump_json(indent=4)) + try: + with open(filename, "w", encoding="utf-8") as fout: + fout.write(self.model_dump_json(indent=4)) + except OSError as e: + raise ValueError(f"Failed to dump configuration to {filename}: {e}") from e
docs/openapi.json
Outdated
| "application/json": { | ||
| "schema": {} | ||
| }, | ||
| "text/event-stream": { | ||
| "schema": { | ||
| "type": "string", | ||
| "format": "text/event-stream" | ||
| "type": "string" | ||
| }, | ||
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n" | ||
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SSE content types for /v1/streaming_query (remove stray application/json, keep text/event-stream).
200 → content lists application/json with empty schema plus text/event-stream for SSE. OpenAPI generators will model a JSON response; that’s incorrect for SSE. Remove application/json for 200 and keep only text/event-stream, and ensure examples match the declared schema.
Apply this diff minimally:
"responses": {
"200": {
- "description": "Streaming response (Server-Sent Events)",
- "content": {
- "application/json": {
- "schema": {}
- },
- "text/event-stream": {
+ "description": "Streaming response (Server-Sent Events)",
+ "content": {
+ "text/event-stream": {
"schema": {
"type": "string"
},
- "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
+ "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n"
}
}
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "application/json": { | |
| "schema": {} | |
| }, | |
| "text/event-stream": { | |
| "schema": { | |
| "type": "string", | |
| "format": "text/event-stream" | |
| "type": "string" | |
| }, | |
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n" | |
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n" | |
| } | |
| "text/event-stream": { | |
| "schema": { | |
| "type": "string" | |
| }, | |
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n" | |
| } |
🤖 Prompt for AI Agents
In docs/openapi.json around lines 1519 to 1527, the 200 response for
/v1/streaming_query incorrectly lists "application/json" with an empty schema
alongside "text/event-stream", which causes OpenAPI generators to treat the
response as JSON; remove the "application/json" content entry so only
"text/event-stream" remains for the 200 response, and ensure the remaining
"text/event-stream" entry's schema/type and example accurately reflect SSE
(string) format.
docs/openapi.json
Outdated
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE example sets truncated=null but schema is boolean.
Align the SSE “end” example with QueryResponse which defines truncated as boolean.
- "truncated": null
+ "truncated": falseAlso move available_quotas inside the same data object consistently (your v1 example already does after previous comment’s fix).
Also applies to: 3740-3742
🤖 Prompt for AI Agents
In docs/openapi.json around lines 1526-1527 (and also apply the same change at
lines 3740-3742), the SSE "end" example currently sets "truncated": null and
places "available_quotas" outside the "data" object which conflicts with the
QueryResponse schema; change "truncated" to a boolean (true or false as
appropriate for the example) and move "available_quotas" inside the same "data"
object so the example matches the schema structure and types.
docs/openapi.json
Outdated
| "200": { | ||
| "description": "Streaming response with Server-Sent Events", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "type": "string", | ||
| "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n" | ||
| } | ||
| }, | ||
| "text/plain": { | ||
| "schema": { | ||
| "type": "string", | ||
| "example": "Hello world!\n\n---\n\nReference: https://example.com/doc" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/v2/streaming_query returns SSE; content-types are wrong.
200 → content declares application/json (string) and text/plain. Use text/event-stream only, like v1, and align the example.
Suggested fix:
- "content": {
- "application/json": {
- "schema": {
- "type": "string",
- "example": "data:{...SSE...}"
- }
- },
- "text/plain": {
- "schema": {
- "type": "string",
- "example": "Hello world!\n\n---\n\nReference: https://example.com/doc"
- }
- }
- }
+ "content": {
+ "text/event-stream": {
+ "schema": { "type": "string" },
+ "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 0, \"output_tokens\": 0, \"available_quotas\": {} }}\n\n"
+ }
+ }🤖 Prompt for AI Agents
In docs/openapi.json around lines 3735 to 3750, the 200 response for
/v2/streaming_query incorrectly lists application/json and text/plain; this
endpoint returns Server-Sent Events and should use only text/event-stream.
Remove the application/json and text/plain content entries, replace them with a
single "text/event-stream" content entry whose schema is a string and whose
example shows SSE frames (data: ...\n\n) matching the v1 format; keep the
descriptive "Streaming response with Server-Sent Events" description intact.
| "api_key": { | ||
| "type": "string", | ||
| "minLength": 1, | ||
| "format": "password", | ||
| "title": "API key", | ||
| "writeOnly": true, | ||
| "examples": [ | ||
| "some-api-key" | ||
| ] | ||
| "title": "Api Key", | ||
| "default": "some-api-key" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
API keys in docs: remove default and mark writeOnly.
Avoid shipping a default-looking API key and mark it writeOnly for generators.
- "api_key": { "type": "string", "title": "Api Key", "default": "some-api-key" }
+ "api_key": { "type": "string", "title": "API Key", "minLength": 1, "writeOnly": true }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "api_key": { | |
| "type": "string", | |
| "minLength": 1, | |
| "format": "password", | |
| "title": "API key", | |
| "writeOnly": true, | |
| "examples": [ | |
| "some-api-key" | |
| ] | |
| "title": "Api Key", | |
| "default": "some-api-key" | |
| } | |
| "api_key": { | |
| "type": "string", | |
| "title": "API Key", | |
| "minLength": 1, | |
| "writeOnly": true | |
| } |
🤖 Prompt for AI Agents
In docs/openapi.json around lines 4365 to 4369, the schema for "api_key"
currently includes a default value and isn't marked writeOnly; remove the
"default": "some-api-key" entry and add "writeOnly": true to the property so
generators don't treat it as a readable default or expose it in responses.
docs/openapi.md
Outdated
| | Status Code | Description | Component | | ||
| |-------------|-------------|-----------| | ||
| | 200 | Streaming response with Server-Sent Events | string | ||
| string | | ||
| | 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix malformed Markdown table for streaming response.
The responses table has formatting issues: missing trailing pipes and incorrect column structure. This will cause rendering problems in documentation.
### ✅ Responses
| Status Code | Description | Component |
|-------------|-------------|-----------|
-| 200 | Streaming response with Server-Sent Events | string
-string |
+| 200 | Streaming response with Server-Sent Events | string |
| 401 | Unauthorized | [UnauthorizedResponse](#unauthorizedresponse)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
3273-3273: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
3274-3274: Table pipe style
Expected: leading_and_trailing; Actual: trailing_only; Missing leading pipe
(MD055, table-pipe-style)
3274-3274: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
3275-3275: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
🤖 Prompt for AI Agents
In docs/openapi.md around lines 3271 to 3275, the Markdown table rows are
malformed (missing trailing pipes and an extra line break inside a cell); fix by
ensuring each row has exactly three pipe-separated columns and a trailing pipe —
specifically replace the 200 row with a single line: Status Code column "200",
Description column "Streaming response with Server-Sent Events", Component
column "string" (all separated by pipes and ending with a trailing pipe), and
add the missing trailing pipe to the 401 row as well so the table renders
correctly.
| source = urljoin( | ||
| "https://mimir.corp.redhat.com", parent_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix urljoin type error with parent_id.
Pipeline reports urljoin receives object type instead of str. Add type guard.
if OFFLINE:
parent_id = chunk.metadata.get("parent_id")
- if parent_id:
+ if parent_id and isinstance(parent_id, str):
source = urljoin(
"https://mimir.corp.redhat.com", parent_id
)Apply this fix at both occurrences (lines 1065-1069 and 1273-1277).
Also applies to: 1275-1276
🧰 Tools
🪛 GitHub Actions: Pyright
[error] 1068-1068: Pyright: Argument of type "float | str | List[object] | object | Literal[True]" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin" (reportArgumentType)
🪛 GitHub Actions: Type checks
[error] 1067-1067: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object"
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1065-1069 and again around
1273-1277, urljoin is being called with parent_id which the type checker reports
as object; add a type guard that ensures parent_id is a str before calling
urljoin (e.g., if isinstance(parent_id, str) use urljoin, otherwise convert to
str(parent_id) or skip the join if None/unexpected), and apply the same
guard/handling at both occurrences so urljoin always receives a string.
| final_rag_chunks.append( | ||
| RAGChunk( | ||
| content=chunk.content, | ||
| source=source, | ||
| score=score, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix RAGChunk constructor type errors.
Pipeline reports:
chunk.contentisInterleavedContent, notstrsourcetype mismatch
final_rag_chunks.append(
RAGChunk(
- content=chunk.content,
- source=source,
+ content=str(chunk.content) if chunk.content else "",
+ source=source if isinstance(source, str) else None,
score=score,
)
)Apply at both locations (lines 1076-1082 and 1284-1290).
Also applies to: 1284-1290
🧰 Tools
🪛 GitHub Actions: Pyright
[error] 1078-1078: Pyright: Argument of type "InterleavedContent" cannot be assigned to parameter "content" of type "str" in function "init" (reportArgumentType)
[error] 1079-1079: Pyright: Argument of type "str | bool | float | List[object] | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init" (reportArgumentType)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1076-1082 (and similarly at
1284-1290), the RAGChunk constructor is being passed chunk.content which is an
InterleavedContent object and a source whose type doesn't match RAGChunk's
expected type; convert the content to a plain string (e.g., use
chunk.content.text or str(chunk.content) / chunk.content.to_string()) and ensure
source is normalized to the expected primitive (e.g., source.url or str(source)
or source.get("source") depending on the source shape) before constructing
RAGChunk so the types match the RAGChunk signature.
src/app/endpoints/streaming_query.py
Outdated
| documents=documents, | ||
| stream=True, | ||
| # toolgroups=toolgroups, | ||
| toolgroups=toolgroups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix toolgroups type for create_turn call.
Pipeline reports type incompatibility. The toolgroups variable may be list[str] when only MCP toolgroups are present, but create_turn expects List[Toolgroup] | None.
Ensure proper typing throughout the toolgroups assignment chain, using cast if necessary:
from typing import cast
from llama_stack_client.types.agents.turn_create_params import Toolgroup
# When assigning mcp_toolgroups:
-mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
+mcp_toolgroups: list[Toolgroup] = [
+ mcp_server.name for mcp_server in configuration.mcp_servers
+]Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 GitHub Actions: Pyright
[error] 1320-1320: Pyright: Argument of type "list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None" cannot be assigned to parameter "toolgroups" of type "List[Toolgroup] | None" in function "create_turn" (reportArgumentType)
[error] 1320-1320: Pyright: Type parameter invariance: list[str] is not assignable to List[Toolgroup] (reportArgumentType)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 1318-1320, the variable
`toolgroups` may be a list[str] but `create_turn` requires List[Toolgroup] |
None; ensure `toolgroups` is typed appropriately before the call by converting
any MCP string entries into Toolgroup instances (or wrapping/constructing the
correct Toolgroup objects) or, if you have validated they are already
Toolgroup-like, use typing.cast[List[Toolgroup]](toolgroups) to satisfy the
signature; update upstream assignments and annotate the variable so the type is
consistently List[Toolgroup] | None prior to passing it to create_turn.
| retrieved_entry = retrieved_entries[0] | ||
| # Check all fields match | ||
| assert retrieved_entry.query == entry_with_docs.query | ||
| assert retrieved_entry.response == entry_with_docs.response | ||
| assert retrieved_entry.provider == entry_with_docs.provider | ||
| assert retrieved_entry.model == entry_with_docs.model | ||
| assert retrieved_entry.started_at == entry_with_docs.started_at | ||
| assert retrieved_entry.completed_at == entry_with_docs.completed_at | ||
| # Check referenced documents - URLs may be strings after serialization | ||
| assert retrieved_entry.referenced_documents is not None | ||
| assert entry_with_docs.referenced_documents is not None | ||
| assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents) | ||
| # pylint: disable-next=unsubscriptable-object | ||
| assert ( | ||
| str(retrieved_entry.referenced_documents[0].doc_url) | ||
| == str(entry_with_docs.referenced_documents[0].doc_url) # pylint: disable=unsubscriptable-object | ||
| ) | ||
| # pylint: disable-next=unsubscriptable-object | ||
| assert ( | ||
| retrieved_entry.referenced_documents[0].doc_title | ||
| == entry_with_docs.referenced_documents[0].doc_title # pylint: disable=unsubscriptable-object | ||
| ) | ||
| assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Black formatting issue and consider simplifying pylint workarounds.
The pipeline indicates Black formatting failed for this file. Please run uv tool run black . to fix formatting.
The field-by-field validation approach is appropriate for testing serialization round-trips where object equality may fail due to type coercion. However, the multiple # pylint: disable-next=unsubscriptable-object comments suggest type narrowing issues. Since you've already verified referenced_documents is not None on both objects, consider extracting the documents to local variables to help type inference:
# Check referenced documents - URLs may be strings after serialization
assert retrieved_entry.referenced_documents is not None
assert entry_with_docs.referenced_documents is not None
assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents)
- # pylint: disable-next=unsubscriptable-object
- assert (
- str(retrieved_entry.referenced_documents[0].doc_url)
- == str(entry_with_docs.referenced_documents[0].doc_url) # pylint: disable=unsubscriptable-object
- )
- # pylint: disable-next=unsubscriptable-object
- assert (
- retrieved_entry.referenced_documents[0].doc_title
- == entry_with_docs.referenced_documents[0].doc_title # pylint: disable=unsubscriptable-object
- )
+ retrieved_doc = retrieved_entry.referenced_documents[0]
+ expected_doc = entry_with_docs.referenced_documents[0]
+ assert str(retrieved_doc.doc_url) == str(expected_doc.doc_url)
+ assert retrieved_doc.doc_title == expected_doc.doc_title📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| retrieved_entry = retrieved_entries[0] | |
| # Check all fields match | |
| assert retrieved_entry.query == entry_with_docs.query | |
| assert retrieved_entry.response == entry_with_docs.response | |
| assert retrieved_entry.provider == entry_with_docs.provider | |
| assert retrieved_entry.model == entry_with_docs.model | |
| assert retrieved_entry.started_at == entry_with_docs.started_at | |
| assert retrieved_entry.completed_at == entry_with_docs.completed_at | |
| # Check referenced documents - URLs may be strings after serialization | |
| assert retrieved_entry.referenced_documents is not None | |
| assert entry_with_docs.referenced_documents is not None | |
| assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents) | |
| # pylint: disable-next=unsubscriptable-object | |
| assert ( | |
| str(retrieved_entry.referenced_documents[0].doc_url) | |
| == str(entry_with_docs.referenced_documents[0].doc_url) # pylint: disable=unsubscriptable-object | |
| ) | |
| # pylint: disable-next=unsubscriptable-object | |
| assert ( | |
| retrieved_entry.referenced_documents[0].doc_title | |
| == entry_with_docs.referenced_documents[0].doc_title # pylint: disable=unsubscriptable-object | |
| ) | |
| assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc" | |
| retrieved_entry = retrieved_entries[0] | |
| # Check all fields match | |
| assert retrieved_entry.query == entry_with_docs.query | |
| assert retrieved_entry.response == entry_with_docs.response | |
| assert retrieved_entry.provider == entry_with_docs.provider | |
| assert retrieved_entry.model == entry_with_docs.model | |
| assert retrieved_entry.started_at == entry_with_docs.started_at | |
| assert retrieved_entry.completed_at == entry_with_docs.completed_at | |
| # Check referenced documents - URLs may be strings after serialization | |
| assert retrieved_entry.referenced_documents is not None | |
| assert entry_with_docs.referenced_documents is not None | |
| assert len(retrieved_entry.referenced_documents) == len(entry_with_docs.referenced_documents) | |
| retrieved_doc = retrieved_entry.referenced_documents[0] | |
| expected_doc = entry_with_docs.referenced_documents[0] | |
| assert str(retrieved_doc.doc_url) == str(expected_doc.doc_url) | |
| assert retrieved_doc.doc_title == expected_doc.doc_title | |
| assert retrieved_entries[0].referenced_documents[0].doc_title == "Test Doc" |
🤖 Prompt for AI Agents
In tests/unit/cache/test_postgres_cache.py around lines 516 to 538, Black
formatting failed and the test uses repeated pylint disables for
unsubscriptable-object; run `uv tool run black .` to fix formatting, then
simplify the assertions by first assigning
retrieved_entry.referenced_documents[0] and
entry_with_docs.referenced_documents[0] to local variables (after the explicit
None checks) so type checkers can infer their types and you can remove the `#
pylint: disable-next=unsubscriptable-object` comments; keep the same
field-by-field asserts (query, response, provider, model, started_at,
completed_at, length check and doc fields) but reference the local variables for
doc_url/doc_title comparisons and the final doc_title assertion.
c971439 to
4ca24a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
run.yaml (1)
140-143: Updateprovider_shield_idto reference a valid Llama-Guard model.The
provider_shield_idis set to"gpt-3.5-turbo", which is an OpenAI model, not a Llama-Guard model. For thellama-guardprovider,provider_shield_idmust be a Llama-Guard model identifier (e.g.,"meta-llama/Llama-Guard-3-8B").src/app/endpoints/query.py (3)
878-889: Add missingtool_resultsparameter toTurnSummary.The
TurnSummaryconstructor is missing the requiredtool_resultsparameter. This will cause a runtimeTypeError.summary = TurnSummary( llm_response=( interleaved_content_as_str(response.output_message.content) if ( getattr(response, "output_message", None) is not None and getattr(response.output_message, "content", None) is not None ) else "" ), tool_calls=[], + tool_results=[], rag_chunks=rag_chunks, )
745-758: Fix type error:get_rag_toolgroupsmay returnNone.Line 752 concatenates the result of
get_rag_toolgroups(vector_db_ids)withmcp_toolgroups, butget_rag_toolgroupscan returnNone. This will cause a TypeError at runtime.toolgroups = None if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups + rag_toolgroups = get_rag_toolgroups(vector_db_ids) + toolgroups = (rag_toolgroups or []) + mcp_toolgroups elif mcp_toolgroups: toolgroups = mcp_toolgroups
798-799: Move imports to module level.Importing
RAGChunkandReferencedDocumentinside the function is inefficient. Move these to the module-level imports at the top of the file.+from models.responses import ( + ForbiddenResponse, + QueryResponse, + RAGChunk, + ReferencedDocument, + ToolCall, + UnauthorizedResponse, +) ... if query_response.chunks: - from models.responses import RAGChunk, ReferencedDocument - retrieved_chunks = query_response.chunkssrc/app/endpoints/streaming_query.py (3)
77-80: Extract OFFLINE flag to shared constants module.This constant is duplicated from query.py. Extract it to a shared constants module (e.g.,
src/app/constants.py) and import from both files to ensure consistency and follow DRY principles.
1045-1046: Extract hardcoded Mimir base URL to configuration.The URL
https://mimir.corp.redhat.comis hardcoded here (and duplicated at lines 1065, 1241). Extract this to a configuration value for flexibility across environments.
1208-1275: Significant code duplication: vector DB querying logic.Lines 1208-1275 duplicate the vector DB query logic from
query_vector_io_for_chunks(lines 979-1088). Both blocks:
- Query vector_io with identical params setup
- Process chunks with OFFLINE flag logic
- Extract metadata and build RAGChunk objects
Consider consolidating by having
retrieve_responsecallquery_vector_io_for_chunksto get the chunks, then format the RAG context from those results.- # Get RAG chunks before sending to LLM (reuse logic from query_vector_io_for_chunks) - rag_chunks = [] - try: - if vector_db_ids: - # ... 60+ lines of duplicate logic - except Exception as e: - logger.warning("Failed to query vector database for chunks: %s", e) + # Reuse query_vector_io_for_chunks for consistency + rag_chunks, _ = await query_vector_io_for_chunks(client, query_request) # Format RAG context for injection into user message rag_context = ""
🧹 Nitpick comments (1)
lightspeed-stack.yaml (1)
24-28: Consider persistent storage for production deployments.The conversation cache uses
/tmp/data/conversation-cache.dbwhich won't survive container restarts. This is fine for development/testing as indicated by the comment, but production deployments should use a persistent volume mount or external database.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lightspeed-stack.yaml(1 hunks)run.yaml(3 hunks)src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(14 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/models/requests.py
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_conversations_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_conversations_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (4)
src/models/responses.py (2)
RAGChunk(149-154)ReferencedDocument(179-191)src/utils/types.py (1)
TurnSummary(89-163)src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/query.py (1)
get_rag_toolgroups(951-978)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (5)
tests/unit/app/endpoints/test_conversations_v2.py (1)
103-103: LGTM! Test updated to match model type changes.The assertion correctly uses
str()to compare doc_url, accommodating the type change in ReferencedDocument where doc_url is nowstr | Optional[AnyUrl].tests/unit/app/endpoints/test_query.py (1)
1014-1014: LGTM! Tests updated to match model type changes.The test assertions correctly use
str()for doc_url comparisons, aligning with the ReferencedDocument type change. The malformed URL test (lines 1041-1043) properly verifies that invalid URLs are stored as raw strings rather than being rejected.Also applies to: 1041-1043, 1066-1066, 1093-1095
run.yaml (1)
155-160: LGTM! Granite embedding model configuration is consistent.The granite-embedding model configuration correctly specifies the embedding dimension (384) which matches the vector_io configuration at line 125. The model identifier
ibm-granite/granite-embedding-30m-englishis properly formatted for the sentence-transformers provider.src/app/endpoints/streaming_query.py (1)
174-246: LGTM! Streaming end event properly includes RAG data.The stream_end_event function correctly includes both
rag_chunksandvector_io_referenced_docsin the final streaming response. The implementation properly converts RAG chunks to dict format and combines metadata_map documents with vector_io documents.src/app/endpoints/query.py (1)
856-867: LGTM! RAG context injection implementation.The RAG context formatting and injection logic properly:
- Limits to top 5 chunks
- Formats each chunk with source attribution
- Prepends relevant documentation to the user query
- Logs the injection for debugging
This approach cleanly augments the user message with retrieved context before the LLM call.
| - provider_id: solr-vector | ||
| provider_type: remote::solr_vector_io | ||
| config: | ||
| solr_url: "http://localhost:8983/solr" | ||
| collection_name: "portal-rag" | ||
| vector_field: "chunk_vector" | ||
| content_field: "chunk" | ||
| embedding_dimension: 384 | ||
| inference_provider_id: sentence-transformers | ||
| persistence: | ||
| type: sqlite | ||
| db_path: .llama/distributions/ollama/portal_rag_kvstore.db | ||
| namespace: portal-rag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Hardcoded Solr URL should use environment variable.
The solr_url: "http://localhost:8983/solr" is hardcoded and won't work in containerized or distributed deployments. Consider using an environment variable reference similar to line 19's pattern.
config:
- solr_url: "http://localhost:8983/solr"
+ solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
collection_name: "portal-rag"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - provider_id: solr-vector | |
| provider_type: remote::solr_vector_io | |
| config: | |
| solr_url: "http://localhost:8983/solr" | |
| collection_name: "portal-rag" | |
| vector_field: "chunk_vector" | |
| content_field: "chunk" | |
| embedding_dimension: 384 | |
| inference_provider_id: sentence-transformers | |
| persistence: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/portal_rag_kvstore.db | |
| namespace: portal-rag | |
| - provider_id: solr-vector | |
| provider_type: remote::solr_vector_io | |
| config: | |
| solr_url: ${env.SOLR_URL:-http://localhost:8983/solr} | |
| collection_name: "portal-rag" | |
| vector_field: "chunk_vector" | |
| content_field: "chunk" | |
| embedding_dimension: 384 | |
| inference_provider_id: sentence-transformers | |
| persistence: | |
| type: sqlite | |
| db_path: .llama/distributions/ollama/portal_rag_kvstore.db | |
| namespace: portal-rag |
🤖 Prompt for AI Agents
In run.yaml around lines 118 to 130, the solr_url is hardcoded to
"http://localhost:8983/solr"; replace that literal with an environment-variable
reference (e.g. follow the pattern used on line 19) so the URL is configurable
at runtime. Update the solr_url field to read from a SOLR_URL (or similarly
named) env var, preserve quoting/format, and optionally provide a sensible
default/fallback if your template system supports it; ensure documentation and
deployment manifests (docker-compose / k8s) set that env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
run.yaml (1)
29-35: Fix duplicate top‑levelmodelsandvector_dbskeys in YAML.You now have two
models:mappings (lines 29–35 and 140–150) and twovector_dbs:mappings (lines 134–139 and 155), which YAMLlint flags as duplicate keys. Most parsers will silently let the last one win, so the earlier definitions are ignored.Consolidate these into single blocks, e.g.:
- Move all model declarations into the bottom
models:list and delete the earliermodels:at lines 29–35 (or vice‑versa).- Keep only one
vector_dbs:mapping (likely the newportal-ragblock) and remove the trailingvector_dbs: [].Until this is fixed, the effective config will be ambiguous and may not match your intent.
Also applies to: 134-150, 154-154
src/app/endpoints/streaming_query.py (3)
54-71: ResolveRAGChunk/ReferencedDocumentimport issues and redefinitions.
models.responsesapparently does not exportRAGChunk(Ruff reports “no name 'RAGChunk' in module 'models.responses'”), and you also importReferencedDocumentagain fromutils.endpoints. This leads to F811/F821 redefinition errors.- You only need a single canonical
ReferencedDocumenttype (and likely only frommodels.responses).Suggestion:
-from models.responses import ( - ForbiddenResponse, - RAGChunk, - ReferencedDocument, - UnauthorizedResponse, - ... -) -from utils.endpoints import ( - ReferencedDocument, - check_configuration_loaded, - ... -) +from models.responses import ( + ForbiddenResponse, + UnauthorizedResponse, + RAGChunk, # if actually defined there; otherwise import from its real module + ReferencedDocument, + ... +) +from utils.endpoints import ( + check_configuration_loaded, + cleanup_after_streaming, + create_rag_chunks_dict, + get_agent, + get_system_prompt, + validate_model_provider_override, +)Adjust the
RAGChunkimport to match where it is really defined. This will clear the redefinition/undefined-name warnings.
902-1064: Broken context wiring and dead nestedresponse_generatorblock.Inside
streaming_query_endpoint_handler_base:
- You define a second async
response_generator(lines 926–1047) that references many undefined names (interleaved_content_as_str,get_session,create_referenced_documents_with_metadata,CacheEntry) and acontextvariable that is never created.- The old
ResponseGeneratorContextinitialization is fully commented out (1049–1060), yet you still callcreate_response_generator_func(context)(line 1063), which will raiseNameErrorat runtime.- The nested
response_generatoris never used; the actual generator comes fromcreate_response_generator_func.This block needs to be either completed or removed. Given you already have
create_agent_response_generator, the simplest fix is to delete the unused nested generator and reinstate a validResponseGeneratorContext:- metadata_map: dict[str, dict[str, Any]] = {} - - async def response_generator(...): - ... - cache_entry = CacheEntry(...) - # Create context object for response generator -# context = ResponseGeneratorContext( -# ... -# ) + metadata_map: dict[str, dict[str, Any]] = {} + + context = ResponseGeneratorContext( + conversation_id=conversation_id, + user_id=user_id, + skip_userid_check=_skip_userid_check, + model_id=model_id, + provider_id=provider_id, + llama_stack_model_id=llama_stack_model_id, + query_request=query_request, + started_at=started_at, + client=client, + metadata_map=metadata_map, + )…and delete the entire nested
async def response_generator(...)block if it is not intended to be used anymore.As written, this function will not work and is the main source of the
context/undefined-name errors from Ruff and mypy.
1357-1455: Normalize vector DB toolgroup handling and RAG chunk typing inretrieve_response.In
retrieve_response(streaming path):
toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroupswill fail ifget_rag_toolgroupsreturnsNone(mypy error and potential TypeError).paramsandurljoinsuffer from the same typing issues asquery_vector_io_for_chunks.RAGChunk(content=chunk.content, source=source, ...)again passes non‑string content and potentially non‑string source.You can mirror the patterns used in the earlier suggestion:
- vector_dbs = await client.vector_dbs.list() - vector_db_ids = [vdb.identifier for vdb in vector_dbs] - mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers] - - toolgroups = None - if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups - elif mcp_toolgroups: - toolgroups = mcp_toolgroups + vector_dbs = await client.vector_dbs.list() + vector_db_ids = [vdb.identifier for vdb in vector_dbs] + mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers] + + rag_toolgroups = get_rag_toolgroups(vector_db_ids) if vector_db_ids else None + toolgroups = (rag_toolgroups or []) + mcp_toolgroups if mcp_toolgroups else rag_toolgroups @@ - params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} @@ - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin( - "https://mimir.corp.redhat.com", parent_id - ) + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin( + "https://mimir.corp.redhat.com", parent_id + ) @@ - rag_chunks.append( - RAGChunk( - content=chunk.content, - source=source, - score=score, - ) - ) + rag_chunks.append( + RAGChunk( + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, + score=score, + ) + )This addresses the mypy errors and ensures
RAGChunkinstances have the expected types.src/app/endpoints/query.py (1)
582-613: Fix return type and docstring ofparse_metadata_from_text_item.Two small issues:
- The function is annotated as
Optional[ReferencedDocument]but returns alist[ReferencedDocument]whentext_itemis not aTextContentItem:docs: list[ReferencedDocument] = [] if not isinstance(text_item, TextContentItem): return docs # wrong type
- The docstring says it returns
ReferencedDocumentbut doesn’t mentionNone.Suggested fix:
- docs: list[ReferencedDocument] = [] - if not isinstance(text_item, TextContentItem): - return docs + if not isinstance(text_item, TextContentItem): + return None @@ - if url and title: - return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None) + if url and title: + return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)Optionally update the docstring “Returns” section to mention that
Noneis returned when no valid metadata is found.
♻️ Duplicate comments (5)
src/app/endpoints/query.py (4)
78-82: Shared OFFLINE flag should be centralized (optional).You now have an
OFFLINEconstant here and instreaming_query.pywith identical semantics. Consider moving this to a shared config/constants module and reading from configuration/env rather than hard‑coding it in two places, to avoid divergence later.
874-885: EnsureTurnSummaryis fully populated and referenced documents include RAG docs.You now construct:
summary = TurnSummary( llm_response=..., tool_calls=[], rag_chunks=rag_chunks, ) referenced_documents = parse_referenced_documents(response) referenced_documents.extend(doc_ids_from_chunks)Two points:
- If
TurnSummaryalso expectstool_results, confirm that its default is optional; otherwise addtool_results=[]explicitly (similar to other code paths in the repo).- Extending
referenced_documentswithdoc_ids_from_chunksis correct, but ensuredoc_ids_from_chunkselements are built with the sameReferencedDocumenttype (which they are) and that the “Mimir” base URL is eventually configurable rather than hard‑coded.If
tool_resultsis required, adjust:summary = TurnSummary( llm_response=..., - tool_calls=[], - rag_chunks=rag_chunks, + tool_calls=[], + tool_results=[], + rag_chunks=rag_chunks, )Also applies to: 887-891
915-942: Fix “attatchment” typo in validation error messages.The error messages in
validate_attachments_metadatastill contain the typo:"Invalid attatchment type ..." "Invalid attatchment content type ..."This was previously flagged. Update to:
- message = ( - f"Invalid attatchment type {attachment.attachment_type}: " + message = ( + f"Invalid attachment type {attachment.attachment_type}: " @@ - message = ( - f"Invalid attatchment content type {attachment.content_type}: " + message = ( + f"Invalid attachment content type {attachment.content_type}: "This improves API error clarity and keeps tests/docs consistent.
766-826: Fix vector IOparamstyping, logging, and exception message formatting inretrieve_response.Issues in this block:
params = {"k": 5, "score_threshold": 0.0}is inferred asdict[str, float], which doesn’t match the more generaldict[str, bool | float | str | Iterable | object | None]expected byvector_io.query(mypy error).- Logging uses f-strings with potentially large
query_responsepayloads at INFO level (may leak content and is noisy).- Exception logging uses f-strings inside logger methods (
logger.info(f"...")) instead of%sformatting (style, not correctness).Suggested cleanup:
- params = {"k": 5, "score_threshold": 0.0} - logger.info(f"Initial params: {params}") - logger.info(f"query_request.solr: {query_request.solr}") + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} + logger.info("Initial params: %s", params) + logger.info("query_request.solr: %s", query_request.solr) @@ - params["solr"] = query_request.solr - logger.info(f"Final params with solr filters: {params}") + params["solr"] = query_request.solr + logger.info("Final params with solr filters: %s", params) @@ - logger.info(f"Final params being sent to vector_io.query: {params}") + logger.info("Final params being sent to vector_io.query: %s", params) @@ - logger.info(f"The query response total payload: {query_response}") + logger.debug( + "Vector IO query returned %d chunks", + len(query_response.chunks) if query_response.chunks else 0, + ) @@ - except Exception as e: - logger.warning(f"Failed to query vector database for chunks: {e}") - logger.debug(f"Vector DB query error details: {traceback.format_exc()}") + except Exception as e: + logger.warning("Failed to query vector database for chunks: %s", e) + logger.debug("Vector DB query error details: %s", traceback.format_exc())This addresses the mypy error and improves logging hygiene.
src/app/endpoints/streaming_query.py (1)
1153-1262: Type and logging issues inquery_vector_io_for_chunks.Several problems here affect type‑checking and logging:
params = {"k": 5, "score_threshold": 0.0}is inferred asdict[str, float], which doesn’t match the vector_ioparamstype.parent_id = chunk.metadata.get("parent_id")is typed asobject; callingurljoin(..., parent_id)without a guard causes mypy/pyright errors and could pass non‑strings at runtime.RAGChunk(content=chunk.content, source=source, ...)passes anInterleavedContent(or similar) object into astrfield, andsourcemay not be astr.logger.info("The query response total payload: %s", query_response)logs the entire payload at INFO, which is noisy and may leak content (similar to past comments).Suggested fixes:
- params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} @@ - logger.info("The query response total payload: %s", query_response) + logger.debug( + "Vector IO query returned %d chunks", + len(query_response.chunks) if query_response.chunks else 0, + ) @@ - if OFFLINE: - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin( - "https://mimir.corp.redhat.com", parent_id - ) + if OFFLINE: + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin( + "https://mimir.corp.redhat.com", parent_id + ) @@ - final_rag_chunks.append( - RAGChunk( - content=chunk.content, - source=source, - score=score, - ) - ) + final_rag_chunks.append( + RAGChunk( + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, + score=score, + ) + )Also consider moving the hard‑coded Mimir base URL into configuration (as prior reviews suggested).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
run.yaml(2 hunks)src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(10 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/models/requests.py
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py
[error] 1-1: Black formatting check failed. 1 file would be reformatted.
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py
[error] 1-1: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' while importing module 'app.endpoints.query'.
🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py
[warning] 54-54: Reimport 'ForbiddenResponse' (imported line 54) (reimported)
[warning] 54-54: Reimport 'UnauthorizedResponse' (imported line 54) (reimported)
[error] 54-54: No name 'RAGChunk' in module 'models.responses' (no-name-in-module)
[warning] 70-70: Reimport 'ReferencedDocument' (imported line 54) (reimported)
[error] 159-159: Too many arguments (7/5) (too-many-arguments)
[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)
[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)
[error] 1016-1016: Undefined variable 'get_session' (undefined-variable)
[error] 1029-1029: Undefined variable 'create_referenced_documents_with_metadata' (undefined-variable)
[error] 1037-1037: Undefined variable 'CacheEntry' (undefined-variable)
[warning] 1015-1015: Unused variable 'topic_summary' (unused-variable)
[warning] 1037-1037: Unused variable 'cache_entry' (unused-variable)
[error] 1063-1063: Undefined variable 'context' (undefined-variable)
[warning] 1435-1435: Catching too general exception Exception (broad-exception-caught)
[error] 1265-1265: Too many statements (74/50) (too-many-statements)
🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py
[error] 55-55: F811: Redefinition of unused 'ForbiddenResponse'.
[error] 67-67: F811: Redefinition of unused 'UnauthorizedResponse'.
[error] 71-71: F821: Undefined or redefined name 'ReferencedDocument'.
[error] 960-960: F821: Undefined name 'interleaved_content_as_str'.
[error] 1016-1016: F821: Undefined name 'get_session'.
[error] 1023-1023: F841: Local variable 'topic_summary' assigned to but never used.
[error] 1029-1029: F821: Undefined name 'create_referenced_documents_with_metadata'.
[error] 1037-1037: F841: Local variable 'cache_entry' assigned to but never used.
[error] 1037-1037: F821: Undefined name 'CacheEntry'.
[error] 1063-1063: F811: Redefinition of unused 'response_generator'.
[error] 1063-1063: F821: Undefined name 'context'.
src/app/endpoints/query.py
[error] 22-22: F401: Remove unused import: 'Document' from llama_stack_client.types.agents.turn_create_params.
🪛 GitHub Actions: Type checks
tests/unit/app/endpoints/test_conversations_v2.py
[error] Process completed with exit code 1.
tests/unit/app/endpoints/test_query.py
[error] Process completed with exit code 1.
src/app/endpoints/streaming_query.py
[error] 809-809: mypy: Argument 4 to 'stream_end_event' has incompatible type 'Any | list[Any]'; expected 'dict[str, int]'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 960-960: mypy: Name 'interleaved_content_as_str' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1016-1016: mypy: Name 'get_session' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1029-1029: mypy: Name 'create_referenced_documents_with_metadata' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1037-1037: mypy: Name 'CacheEntry' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1063-1063: mypy: Name 'context' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1190-1190: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1238-1238: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1400-1400: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1400-1400: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1414-1414: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] Process completed with exit code 1.
src/app/endpoints/query.py
[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 788-788: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] Process completed with exit code 1.
run.yaml
[error] Process completed with exit code 1.
🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_query.py
[error] 22-22: ImportError while importing test module due to missing dependency ('llama_stack_client.types.agents').
src/app/endpoints/query.py
[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents'. This import error occurs while importing Turn from llama_stack_client.types.agents in app.endpoints.query, causing test collection to fail during the 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing' step.
🪛 YAMLlint (1.37.1)
run.yaml
[error] 140-140: duplication of key "models" in mapping
(key-duplicates)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: server mode / azure
🔇 Additional comments (4)
tests/unit/app/endpoints/test_conversations_v2.py (1)
93-97: Updateddoc_urlassertion matches widened type.Using
str(ref_docs[0]["doc_url"]) == "http://example.com"is consistent withReferencedDocument.doc_urlnow being either a plain string or AnyUrl; the test is correct and resilient.Also applies to: 111-116
tests/unit/app/endpoints/test_query.py (1)
1009-1023: Doc URL expectations correctly adapted to string/URL union.The updated assertions (
str(doc.doc_url) == ...) and the “malformed URL” test expecting a non‑NoneReferencedDocumentwith rawdoc_urlalign with the newReferencedDocument.doc_urlsemantics. The test logic itself looks sound; remaining test failures stem from import errors inapp.endpoints.query.Also applies to: 1041-1051, 1070-1075, 1097-1103
src/app/endpoints/query.py (2)
299-310:query_endpoint_handler_basenow depends on extendedretrieve_responsecontract — ensure all call sites match.You’ve updated the base handler to expect
retrieve_response_functo return(summary, conversation_id, referenced_documents, token_usage)and to passprovider_idas a keyword argument. That’s correct for the newretrieve_responsein this module, but any other callers (e.g., alternative backends) must maintain this contract.Double‑check any other
query_endpoint_handler_base(..., retrieve_response_func=...)usages to ensure they:
- Accept
provider_idas kwarg in theirretrieve_responseimplementation.- Return the 4‑tuple so
token_usageandreferenced_documentsare well‑defined.Within this file, the wiring looks coherent.
Also applies to: 333-350
7-25: The import paths in the current code are correct forllama-stack-client==0.3.0. According to official documentation, the imports should usellama_stack_client.types.agents(not thealphanamespace):from llama_stack_client.types.agents import Turn, TurnCreateResponseThe current code's import structure aligns with this. The
ModuleNotFoundErrorcited in the review comment is not due to incorrect import paths—reverting totypes.alpha.agentswould introduce the actual error. If CI is failing, the issue lies elsewhere (e.g., dependency installation, version mismatch, or a different module).Regarding the unused
Documentimport: if this is truly unused, it should be removed per code cleanliness standards, but this is a minor concern unrelated to the import path issue.Likely an incorrect or invalid review comment.
| # Include RAG toolgroups when vector DBs are available | ||
| vector_dbs = await client.vector_dbs.list() | ||
| vector_db_ids = [vdb.identifier for vdb in vector_dbs] | ||
| mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers] | ||
|
|
||
| toolgroups = None | ||
| if vector_db_ids: | ||
| toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups | ||
| elif mcp_toolgroups: | ||
| toolgroups = mcp_toolgroups | ||
|
|
||
| # Convert empty list to None for consistency with existing behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard get_rag_toolgroups result when building toolgroups.
Here:
vector_dbs = await client.vector_dbs.list()
vector_db_ids = [vdb.identifier for vdb in vector_dbs]
mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
toolgroups = None
if vector_db_ids:
toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
elif mcp_toolgroups:
toolgroups = mcp_toolgroupsget_rag_toolgroups can return None, so None + mcp_toolgroups will raise a TypeError and mypy flags this.
Use an intermediate variable and coerce None to []:
- toolgroups = None
- if vector_db_ids:
- toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
- elif mcp_toolgroups:
- toolgroups = mcp_toolgroups
+ toolgroups = None
+ if vector_db_ids:
+ rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+ toolgroups = rag_toolgroups + mcp_toolgroups
+ elif mcp_toolgroups:
+ toolgroups = mcp_toolgroupsThis will satisfy mypy and avoid runtime failures when no RAG toolgroups are available.
🧰 Tools
🪛 GitHub Actions: Type checks
[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 741 to 752, the code adds the return
value of get_rag_toolgroups (which can be None) to mcp_toolgroups causing a
TypeError and mypy failure; call get_rag_toolgroups into an intermediate
variable, coerce None to an empty list (e.g., rag_toolgroups =
get_rag_toolgroups(vector_db_ids) or []), then set toolgroups = rag_toolgroups +
mcp_toolgroups (or toolgroups = mcp_toolgroups if no vector_db_ids), and finally
convert an empty toolgroups list to None to preserve existing behavior.
| for i, chunk in enumerate(retrieved_chunks): | ||
| # Extract source from chunk metadata based on OFFLINE flag | ||
| source = None | ||
| if chunk.metadata: | ||
| if OFFLINE: | ||
| parent_id = chunk.metadata.get("parent_id") | ||
| if parent_id: | ||
| source = urljoin("https://mimir.corp.redhat.com", parent_id) | ||
| else: | ||
| source = chunk.metadata.get("reference_url") | ||
|
|
||
| # Get score from retrieved_scores list if available | ||
| score = retrieved_scores[i] if i < len(retrieved_scores) else None | ||
|
|
||
| rag_chunks.append( | ||
| RAGChunk( | ||
| content=chunk.content, | ||
| source=source, | ||
| score=score, | ||
| ) | ||
| ) | ||
|
|
||
| logger.info(f"Retrieved {len(rag_chunks)} chunks from vector DB") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize urljoin parent_id type and RAGChunk field types.
Within the loop that builds rag_chunks:
if OFFLINE:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
source = urljoin("https://mimir.corp.redhat.com", parent_id)
...
rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
)Problems:
parent_idis typed asobject; passing it directly tourljointriggers mypy/pyright errors and might not be a string at runtime.RAGChunk.contentis presumablystr, butchunk.contentis an interleaved content object.
Fix:
- if OFFLINE:
- parent_id = chunk.metadata.get("parent_id")
- if parent_id:
- source = urljoin("https://mimir.corp.redhat.com", parent_id)
+ if OFFLINE:
+ parent_id = chunk.metadata.get("parent_id")
+ if parent_id and isinstance(parent_id, str):
+ source = urljoin("https://mimir.corp.redhat.com", parent_id)
@@
- rag_chunks.append(
- RAGChunk(
- content=chunk.content,
- source=source,
- score=score,
- )
- )
+ rag_chunks.append(
+ RAGChunk(
+ content=str(chunk.content) if chunk.content else "",
+ source=source if isinstance(source, str) else None,
+ score=score,
+ )
+ )This resolves the urljoin type error and ensures RAGChunk fields have the expected primitive types.
Also applies to: 835-835
🧰 Tools
🪛 GitHub Actions: Type checks
[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 828-851, normalize types before
constructing RAGChunk: ensure parent_id is coerced to a string before calling
urljoin (e.g. parent_id = chunk.metadata.get("parent_id"); if parent_id is not
None: parent_id = str(parent_id); source =
urljoin("https://mimir.corp.redhat.com", parent_id)), ensure content is a
primitive string (extract a .text attribute if present or fallback to
str(chunk.content) so RAGChunk.content is str), and coerce score to a float or
None (e.g. score = float(retrieved_scores[i]) if i < len(retrieved_scores) and
retrieved_scores[i] is not None else None) before creating the RAGChunk.
| def stream_end_event( | ||
| metadata_map: dict, | ||
| summary: TurnSummary, | ||
| token_usage: TokenCounter, | ||
| available_quotas: dict[str, int], | ||
| referenced_documents: list[ReferencedDocument], | ||
| media_type: str = MEDIA_TYPE_JSON, | ||
| vector_io_referenced_docs: list[ReferencedDocument] | None = None, | ||
| ) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix stream_end_event signature usage and argument ordering.
stream_end_event now expects:
def stream_end_event(
metadata_map: dict,
summary: TurnSummary,
token_usage: TokenCounter,
available_quotas: dict[str, int],
referenced_documents: list[ReferencedDocument],
media_type: str = MEDIA_TYPE_JSON,
vector_io_referenced_docs: list[ReferencedDocument] | None = None,
) -> str:But create_agent_response_generator calls it as:
yield stream_end_event(
context.metadata_map,
token_usage,
available_quotas,
referenced_documents,
media_type,
)So summary is never passed and the remaining args are shifted (mypy error plus wrong runtime behavior).
Update the call to match the signature:
- yield stream_end_event(
- context.metadata_map,
- token_usage,
- available_quotas,
- referenced_documents,
- media_type,
- )
+ yield stream_end_event(
+ context.metadata_map,
+ summary,
+ token_usage,
+ available_quotas,
+ referenced_documents,
+ media_type,
+ vector_io_referenced_docs=None,
+ )(or pass through any vector‑IO docs via context if you add them there).
Also applies to: 793-811
🧰 Tools
🪛 GitHub Actions: Python linter
[error] 159-159: Too many arguments (7/5) (too-many-arguments)
[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)
[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 159-167, the
stream_end_event signature expects (metadata_map, summary, token_usage,
available_quotas, referenced_documents, media_type=...,
vector_io_referenced_docs=None) but calls in create_agent_response_generator
(and also at lines ~793-811) pass arguments in the wrong order omitting summary;
update those calls to pass the TurnSummary value as the second positional
argument so the call becomes: stream_end_event(context.metadata_map, summary,
token_usage, available_quotas, referenced_documents, media_type,
vector_io_referenced_docs) — or if vector_io_referenced_docs lives on context,
pass context.vector_io_referenced_docs for that parameter; ensure all calls
match the signature and that mypy/type checks are satisfied.
Signed-off-by: Anxhela Coba <acoba@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (8)
src/app/endpoints/query.py (7)
817-818: Remove redundant local import.
RAGChunkandReferencedDocumentare imported locally inside the function, which is inefficient. Import them at module level instead. This was previously flagged.
897-908: Includetool_resultsinTurnSummaryconstruction.The
TurnSummaryconstruction is missing the requiredtool_resultsparameter. This was previously flagged and will cause aTypeErrorat runtime.🔎 Previously suggested fix
summary = TurnSummary( llm_response=( content_to_str(response.output_message.content) if ( getattr(response, "output_message", None) is not None and getattr(response.output_message, "content", None) is not None ) else "" ), tool_calls=[], + tool_results=[], rag_chunks=rag_chunks, )
948-949: Fix typo: "attatchment" should be "attachment".There's a typo in the error message at line 948 (and also line 958). This was previously flagged but remains unresolved.
764-777: Fix type error:get_rag_toolgroupscan returnNone.At line 771,
get_rag_toolgroups(vector_db_ids)can returnNone, causing aTypeErrorwhen concatenating withmcp_toolgroups. This issue was previously flagged but remains unresolved.🔎 Previously suggested fix
- toolgroups = None - if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups - elif mcp_toolgroups: - toolgroups = mcp_toolgroups + toolgroups = None + if vector_db_ids: + rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] + toolgroups = rag_toolgroups + mcp_toolgroups + elif mcp_toolgroups: + toolgroups = mcp_toolgroups
814-814: Avoid logging entire query response payload at INFO level.Logging the full
query_responsemay expose sensitive document content. This was previously flagged but remains unresolved.
850-871: Fix type errors in RAGChunk construction.Pipeline failures indicate:
- Line 858:
parent_idis typed asobject, causing a type error withurljoinwhich expectsstr.chunk.contentmay not be a primitive string type expected byRAGChunk.content.🔎 Previously suggested fix
if OFFLINE: parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin("https://mimir.corp.redhat.com", parent_id) + if parent_id and isinstance(parent_id, str): + source = urljoin("https://mimir.corp.redhat.com", parent_id) @@ rag_chunks.append( RAGChunk( - content=chunk.content, - source=source, + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, score=score, ) )
799-812: Fix type incompatibility forparamsdictionary.Pipeline reports that
paramsat line 811 has typedict[str, float]butvector_io.queryexpectsdict[str, bool | float | str | Iterable[object] | object | None] | Omit. The issue arises becauseparams["solr"]is assigned a complex object (line 804).🔎 Proposed fix
Explicitly type
paramsto satisfy the broader type requirement:- params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}run.yaml (1)
101-113: Hardcoded Solr URL should use environment variable.The
solr_url: "http://localhost:8983/solr"at line 104 remains hardcoded and won't work in containerized or distributed deployments. This was previously flagged but has not been addressed.
🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)
79-82: Move OFFLINE flag to configuration.The module-level
OFFLINE = Trueconstant should be sourced from configuration or environment variables rather than hardcoded. This will allow runtime control without code changes.🔎 Suggested approach
Add to your configuration class or read from environment:
# In configuration module OFFLINE_MODE = os.getenv("LIGHTSPEED_OFFLINE_MODE", "true").lower() == "true"Then import and use in this module:
from configuration import configuration # Later in code if configuration.offline_mode: # ...
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
run.yamlsrc/app/endpoints/query.pysrc/models/requests.pytests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/models/requests.py
- tests/unit/app/endpoints/test_conversations_v2.py
- tests/unit/app/endpoints/test_query.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.
Applied to files:
run.yaml
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/models/responses.py (1)
ReferencedDocument(328-340)src/utils/types.py (1)
RAGChunk(128-133)
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py
[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.
🪛 GitHub Actions: Ruff
src/app/endpoints/query.py
[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.
🪛 GitHub Actions: Type checks
src/app/endpoints/query.py
[error] 771-771: mypy: Unsupported left operand type for + ("None").
[error] 811-811: mypy: Argument "params" to "query" of "AsyncVectorIoResource" has incompatible type "dict[str, float]"; expected "dict[str, bool | float | str | Iterable[object] | object | None] | Omit".
[error] 811-811: mypy: Note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance.
[error] 858-858: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object".
🪛 GitHub Actions: Unit tests
src/app/endpoints/query.py
[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / vertexai
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: server mode / azure
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (1)
src/app/endpoints/query.py (1)
875-887: RAG context injection implementation looks good.The logic correctly formats the top 5 RAG chunks and injects them into the user message. The approach of appending context to the query is appropriate for providing relevant documentation to the LLM.
| from llama_stack_client.types.agents.turn import Turn | ||
| from llama_stack_client.types.agents.turn_create_params import ( | ||
| Document, | ||
| Toolgroup, | ||
| ToolgroupAgentToolGroupWithArgs, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n src/app/endpoints/query.py | head -50Repository: lightspeed-core/lightspeed-stack
Length of output: 208
🏁 Script executed:
rg "Document" src/app/endpoints/query.py -nRepository: lightspeed-core/lightspeed-stack
Length of output: 916
🏁 Script executed:
fd "llama_stack" --type f | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 971
🏁 Script executed:
rg "from llama_stack_client" src/ -n | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 1789
🏁 Script executed:
cat -n src/app/endpoints/query.py | head -80Repository: lightspeed-core/lightspeed-stack
Length of output: 3331
🏁 Script executed:
# Check the correct import paths more thoroughly
rg "Turn|Document" src/app/endpoints/query.py -B2 -A2 -n | head -60Repository: lightspeed-core/lightspeed-stack
Length of output: 2104
🏁 Script executed:
# Verify the actual import path for Turn and Document in llama_stack_client
rg "from llama_stack_client.types" src/ --type py | grep -E "(turn|agents)" | sort -uRepository: lightspeed-core/lightspeed-stack
Length of output: 730
🏁 Script executed:
# Check if Document is actually used anywhere in query.py
rg -n "Document" src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 916
🏁 Script executed:
# Also check Toolgroup and ToolgroupAgentToolGroupWithArgs usage
rg -n "Toolgroup|ToolgroupAgentToolGroupWithArgs" src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 289
🏁 Script executed:
# Verify the full file length to see if Document is used elsewhere
wc -l src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 106
🏁 Script executed:
# Double-check that Document isn't used anywhere else in the file besides import and comments
rg -n "Document[^a-zA-Z]" src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 838
🏁 Script executed:
# Verify the correct import path by checking what llama_stack_client actually exports
python3 << 'EOF'
# Check if we can find documentation or inspect the actual module structure
import subprocess
result = subprocess.run(
['rg', '-n', 'class Turn|class Document', 'src/', '--type', 'py', '-A', '1'],
capture_output=True,
text=True
)
print("Local class definitions:")
print(result.stdout[:2000] if result.stdout else "No local definitions found")
EOFRepository: lightspeed-core/lightspeed-stack
Length of output: 221
Fix import paths for llama_stack_client types.
The import paths at lines 20-25 use incorrect module paths. They should use llama_stack_client.types.alpha.agents instead of llama_stack_client.types.agents. This is consistent with other files in the codebase (src/models/requests.py, src/utils/token_counter.py, src/metrics/utils.py, etc.).
Additionally, remove the unused Document import on line 22—it is not used anywhere in the file except in a commented line.
Corrected imports should be:
from llama_stack_client.types.alpha.agents.turn import Turnfrom llama_stack_client.types.alpha.agents.turn_create_params import Toolgroup, ToolgroupAgentToolGroupWithArgs
🧰 Tools
🪛 GitHub Actions: Integration tests
[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.
🪛 GitHub Actions: Ruff
[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.
🪛 GitHub Actions: Unit tests
[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 20 to 25, the imports use the wrong
module path and include an unused Document import; update the import paths to
use llama_stack_client.types.alpha.agents and remove Document. Specifically
replace the two imports so they read: import Turn from
llama_stack_client.types.alpha.agents.turn and import Toolgroup and
ToolgroupAgentToolGroupWithArgs from
llama_stack_client.types.alpha.agents.turn_create_params, and remove the unused
Document name from the import list.
| title = data.get("title") | ||
| if url and title: | ||
| return ReferencedDocument(doc_url=url, doc_title=title) | ||
| return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find ReferencedDocument class definition
rg "class ReferencedDocument" -A 15 --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 938
🏁 Script executed:
# Check line 631 in context
head -650 src/app/endpoints/query.py | tail -30Repository: lightspeed-core/lightspeed-stack
Length of output: 1195
🏁 Script executed:
# Search for the typo "attatchment" mentioned in the scratchpad
rg "attatchment" src/app/endpoints/query.py -nRepository: lightspeed-core/lightspeed-stack
Length of output: 238
🏁 Script executed:
# Check if ReferencedDocument has any ConfigDict or extra field configuration
rg "class ReferencedDocument" -B 5 -A 20 src/models/responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 807
🏁 Script executed:
# Check if BaseModel is configured to allow extra fields globally
rg "ConfigDict\|config_dict\|extra.*allow" src/models/responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check context around lines 912-913 for doc_ids_from_chunks
sed -n '900,920p' src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 793
🏁 Script executed:
# Check more context around the ReferencedDocument instantiation and see where doc_ids_from_chunks is used
sed -n '920,960p' src/app/endpoints/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 1750
🏁 Script executed:
# Check if doc_id is defined anywhere in ReferencedDocument or if there's a mismatch
rg "doc_id" src/models/responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
Remove doc_id=None parameter from ReferencedDocument instantiation at line 631.
The ReferencedDocument model in src/models/responses.py defines only doc_url and doc_title fields. Passing doc_id=None at line 631 will cause a Pydantic ValidationError at runtime.
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around line 631, the instantiation of
ReferencedDocument passes an extraneous doc_id=None which does not exist on the
model and will cause a Pydantic ValidationError; remove the doc_id=None argument
so you only pass doc_url=url and doc_title=title when constructing
ReferencedDocument.
Description
This PR introduces several changes to the Lightspeed Stack query pipeline with a focus on Solr vector I/O support, streaming query, and Solr-specific filtering support within query requests.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.