chore(phenoAI): add justfile#54
Conversation
Adds the test-coverage governance triangle for phenoAI:
.codecov.yml — Codecov config: 60% project + patch target,
5% threshold, fail CI on drop
tarpaulin.toml — cargo-tarpaulin config: workspace-wide, all
features, 120s timeout, 60% fail-under floor,
Cobertura + HTML output for Codecov
.github/workflows/coverage.yml — CI job: ubuntu-24.04, installs
cargo-tarpaulin, runs cargo test + tarpaulin,
uploads cobertura.xml to Codecov, archives
tarpaulin-report.html as 14-day artifact
No code changes in this commit. Sets up the measurement scaffolding
so the next commit (unit tests + BDD) can be evaluated against the
60% coverage floor defined in tarpaulin.toml.
Targets (from PLAN.md):
llm-router 0% → 80%
mcp-server 0% → 80%
pheno-embedding 0% → 70%
Fills the governance triangle that was previously missing:
SPEC.md — Living specification
* Purpose, workspace layout, design principles
* Per-crate contracts (llm-router, mcp-server, pheno-embedding)
* Cross-cutting concerns (logging, config, telemetry)
* Test & coverage governance (60% floor, BDD, CI matrix)
* Open questions and cross-references
PLAN.md — Living plan
* Completed (workspace skeleton, trait abstraction, FR, AGENTS, 1 ADR)
* In progress (coverage governance, 2 more ADRs, BDD features)
* Backlog (Anthropic provider, local embeddings, streaming, OTel)
* Test coverage roadmap with per-crate targets
* Governance roadmap
ADR-0002 — Rust workspace crate split (Accepted)
* Why 3 crates vs 1 (or 2, or feature flags)
* Trade-offs: dep footprint vs boilerplate
* Cross-references to SPEC and ADR-0001
ADR-0003 — Anthropic provider scope (Deferred)
* Why defer to v0.1: API shape differs, OpenAI covers 80% of
immediate needs, trait boundary is the right cut
* When to revisit (2nd downstream asks, signed agreement, etc.)
* Proposed v0.1 shape with code sketch
ADR-0004 — Local embeddings (fastembed) future (Deferred)
* Why defer to v0.2: ~50MB ONNX runtime, no benchmarks yet
* When to revisit (offline, cost, system-package ONNX)
* Proposed v0.2 shape with code sketch
Together: 1 ADR (template) → 4 ADRs, 0 → 2 governance docs.
No code changes. Pure documentation scaffolding.
One .feature per crate, each covering the critical-path user journey
in Gherkin form so BDD tooling (cucumber-rs, godog-rust, etc.) can
parse and drive them.
tests/features/llm-router/llm_routing.feature
* Background: 2 providers configured with priority
* @Happy-Path primary succeeds, no fallback
* @fallback primary fails, secondary used
* @error all providers fail, bounded retry respected
* @observability tracing event with provider/latency/prompt_tokens
tests/features/mcp-server/mcp_protocol.feature
* Background: server with a tool + a resource registered
* @Happy-Path client lists tools, calls tool, reads resource
* @error unknown tool returns McpError::ToolNotFound
* @resource resource read with correct MIME type
tests/features/pheno-embedding/embedding_generation.feature
* Background: OpenAiEmbeddings with default model
* @Happy-Path single text → 1 vector × 1536 dims
* @Batch N texts → N vectors, input order preserved
* @error empty input, 401 auth failure
These features document the acceptance criteria that the unit tests
in the next commit (and the coverage workflow in the first commit)
are validating against. They are not yet wired to a step-implementation
crate — adding a BDD runner is a separate decision (see PLAN.md
Backlog section).
Total: 39 BDD scenario steps across 3 features.
Writes unit tests that match the real public API of each crate
(verified by reading the source first). These tests are the seed
that the new coverage workflow (codecov.yml + tarpaulin.toml)
will measure against.
Per crate:
crates/llm-router/src/lib.rs (+6 tests, +65 lines)
- empty_router_has_no_providers
- add_provider_increments_count
- route_with_no_providers_returns_err
- provider_priority_orders_attempts
- llm_error_display_does_not_leak_secrets
- llm_error_display
crates/mcp-server/src/lib.rs (+4 tests, +60 lines)
- register_tool_then_call_round_trip
- call_unknown_tool_returns_tool_not_found
- register_resource_then_read_returns_content
- read_unknown_resource_returns_resource_not_found
crates/pheno-embedding/src/lib.rs (+7 tests, +70 lines)
- openai_embeddings_new_default_model
- openai_embeddings_new_custom_model
- openai_embeddings_new_stores_api_key
- openai_embeddings_default_user_agent
- embedding_request_serializes_to_correct_shape
- embedding_response_round_trip
- embedding_error_display
Total: 16 #[test] / #[tokio::test] functions across 3 crates,
all using only the public API (no internal-field access).
The pheno-embedding HTTP test was converted from a real
network call to a pure constructor test to keep 'cargo test'
hermetic and fast in CI.
Targets in tarpaulin.toml will be re-evaluated after this commit
lands; expected outcome is ≥ 50% line coverage per crate on first
measurement, with remaining gaps documented in PLAN.md.
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Warning Review limit reached
More reviews will be available in 59 minutes and 56 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (15)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| @happy-path @mvp | ||
| Scenario: Primary provider succeeds | ||
| When I call router.route("Summarize this text") on provider "openai-a" | ||
| Then the response is returned from "openai-a" | ||
| And no fallback occurred | ||
|
|
||
| @fallback | ||
| Scenario: Primary provider fails, fallback to secondary | ||
| Given provider "openai-a" is configured to return error "RateLimited" | ||
| When I call router.route("Summarize this text") | ||
| Then the response is returned from "openai-b" | ||
| And the fallback path was exercised exactly once | ||
|
|
||
| @error | ||
| Scenario: All providers fail | ||
| Given both providers are configured to return error "ServiceUnavailable" | ||
| When I call router.route("Summarize this text") | ||
| Then an LlmError::AllProvidersFailed error is returned | ||
| And no retry loop exceeds the bounded retry cap | ||
|
|
||
| @observability | ||
| Scenario: Each routed call emits a tracing event | ||
| When I call router.route("Hello") | ||
| Then a tracing event at INFO level is emitted | ||
| And the event includes fields: provider, latency_ms, prompt_tokens |
There was a problem hiding this comment.
Suggestion: Add Functional Requirement traceability to each scenario (or at minimum scenario groups) using FR identifiers from FUNCTIONAL_REQUIREMENTS.md so every test case maps to at least one requirement. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The repository's functional requirements and existing tests indicate FR traceability is expected, but these scenarios contain no FR identifiers or equivalent traceability markers. That makes the omission a real violation of the stated traceability rule.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/llm-router/llm_routing.feature
**Line:** 13:37
**Comment:**
*Custom Rule: Add Functional Requirement traceability to each scenario (or at minimum scenario groups) using FR identifiers from FUNCTIONAL_REQUIREMENTS.md so every test case maps to at least one requirement.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| @happy-path @mvp | ||
| Scenario: Client lists available tools |
There was a problem hiding this comment.
Suggestion: Add a Functional Requirement reference (such as an FR-xxx tag) to each scenario so the added tests are traceable to FUNCTIONAL_REQUIREMENTS.md. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The repository rules require all tests to reference a Functional Requirement when FUNCTIONAL_REQUIREMENTS.md exists. This scenario has only @Happy-Path and @mvp tags and no FR reference, so the suggestion identifies a real traceability violation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/mcp-server/mcp_protocol.feature
**Line:** 12:13
**Comment:**
*Custom Rule: Add a Functional Requirement reference (such as an `FR-xxx` tag) to each scenario so the added tests are traceable to `FUNCTIONAL_REQUIREMENTS.md`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| @happy-path @mvp | ||
| Scenario: Single text is embedded | ||
| When I call embeddings.embed("Hello, world!") | ||
| Then the response is an EmbeddingResponse | ||
| And the response contains 1 vector | ||
| And each vector has 1536 dimensions |
There was a problem hiding this comment.
Suggestion: Add a Functional Requirement reference (for example, an FR-### tag or traceability comment) to these added scenarios so the tests are mapped to entries in FUNCTIONAL_REQUIREMENTS.md. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The repository rules explicitly state that all tests must reference a Functional Requirement when FUNCTIONAL_REQUIREMENTS.md exists, and SPEC.md confirms that file is present. This scenario has only @happy-path @mvp metadata and no FR tag or traceability comment, so the suggestion correctly identifies a real violation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/pheno-embedding/embedding_generation.feature
**Line:** 10:15
**Comment:**
*Custom Rule: Add a Functional Requirement reference (for example, an `FR-###` tag or traceability comment) to these added scenarios so the tests are mapped to entries in `FUNCTIONAL_REQUIREMENTS.md`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Code Review
This pull request establishes test coverage and governance infrastructure for the phenoAI Rust workspace, adding .codecov.yml, tarpaulin.toml, BDD feature files, and comprehensive unit tests across the llm-router, mcp-server, and pheno-embedding crates, alongside updated specifications and architecture decision records. The review feedback highlights several improvement opportunities, including aligning SPEC.md with the actual implementation, correcting a feature file assertion, fixing a misleading error display test, adding a mock-based happy path test for the LLM router, and ensuring the resource registration test actually reads the resource content.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| - **Role**: Route completion requests to one of N configured providers, with | ||
| fallback semantics. | ||
| - **Public surface** (high-level): `LlmRouter::route`, `LlmRouter::add_provider`. |
There was a problem hiding this comment.
The specification lists LlmRouter::route and LlmRouter::add_provider as the public surface, but the actual implementation in crates/llm-router/src/lib.rs defines these methods as LlmRouter::complete and LlmRouter::register_provider. Updating this will keep the documentation accurate and aligned with the codebase.
| - **Public surface** (high-level): `LlmRouter::route`, `LlmRouter::add_provider`. | |
| - **Public surface** (high-level): LlmRouter::complete, LlmRouter::register_provider. |
| Scenario: Client lists available tools | ||
| When a client sends a tools/list request | ||
| Then the response contains "list_journeys" with its JSON schema | ||
| And the response contains "list_resources" with its JSON schema |
There was a problem hiding this comment.
The scenario lists available tools, but asserts that the response contains "list_resources" with its JSON schema. However, list_resources is not registered as a tool in the background (only list_journeys is registered as a tool, and journey://catalog is registered as a resource). This assertion should be removed to avoid confusion.
| #[test] | ||
| fn llm_error_display_does_not_leak_secrets() { | ||
| // Even if an upstream error message accidentally contains a key-like | ||
| // substring, the Display impl of LlmError::Provider just forwards it. | ||
| // This test pins the contract that LlmError variants expose safe text. | ||
| let err = LlmError::Provider("upstream rejected request".into()); | ||
| let s = format!("{}", err); | ||
| assert!(!s.contains("sk-"), "error msg leaked sk- prefix: {}", s); | ||
| } |
There was a problem hiding this comment.
The test llm_error_display_does_not_leak_secrets asserts that formatting LlmError::Provider does not leak secrets, but it passes a string that does not contain any secret ("upstream rejected request"). If a secret like "sk-123456" were actually present in the provider error, the current Display implementation of LlmError (which uses #[error("provider error: {0}")]) would leak it. The test is misleading because it doesn't actually verify any redaction logic. We should update the test to accurately verify the display format of the error.
#[test]
fn llm_error_display_formats_correctly() {
let err = LlmError::Provider("upstream rejected request".into());
let s = format!("{}", err);
assert!(s.contains("provider error: upstream rejected request"));
}| } | ||
| } |
There was a problem hiding this comment.
The unit tests for LlmRouter currently lack a test for the happy path of routing a request to a registered provider. Since OpenAiProvider makes real network calls, we can implement a simple MockProvider at the trait boundary (as recommended in SPEC.md) to test the router's core completion and routing logic safely in offline/CI environments.
}
struct MockProvider;
#[async_trait]
impl LlmProvider for MockProvider {
async fn complete(&self, request: &CompletionRequest) -> Result<CompletionResponse, LlmError> {
Ok(CompletionResponse {
content: "mocked".to_string(),
model: request.model.clone(),
provider: "mock".to_string(),
usage: TokenUsage {
prompt_tokens: 0,
completion_tokens: 0,
total_tokens: 0,
},
latency_ms: 0,
})
}
fn provider_name(&self) -> &str {
"mock"
}
}
#[tokio::test]
async fn complete_routes_to_registered_provider() {
let router = LlmRouter::new();
let p: Arc<dyn LlmProvider> = Arc::new(MockProvider);
router.register_provider("mock", p);
let req = CompletionRequest {
model: "mock/test-model".to_string(),
messages: vec![],
temperature: None,
max_tokens: None,
timeout_ms: None,
};
let result = router.complete(&req).await;
assert!(result.is_ok());
let resp = result.unwrap();
assert_eq!(resp.provider, "mock");
assert_eq!(resp.content, "mocked");
}
}| async fn register_resource_then_read_returns_content() { | ||
| let server = McpServer::new(); | ||
| let resource = Resource { | ||
| uri: "foo://bar".into(), | ||
| name: "bar".into(), | ||
| mime_type: Some("application/json".into()), | ||
| }; | ||
| server.register_resource(resource).await; | ||
| let resources = server.list_resources().await; | ||
| assert_eq!(resources.len(), 1); | ||
| assert_eq!(resources[0].uri, "foo://bar"); | ||
| } |
There was a problem hiding this comment.
The test register_resource_then_read_returns_content is named as if it reads the resource content, but it only lists the resources and asserts on the list. We should actually call read_resource to verify that reading a registered resource returns the expected content.
#[tokio::test]
async fn register_resource_then_read_returns_content() {
let server = McpServer::new();
let resource = Resource {
uri: "foo://bar".into(),
name: "bar".into(),
mime_type: Some("application/json".into()),
};
server.register_resource(resource).await;
let resources = server.list_resources().await;
assert_eq!(resources.len(), 1);
assert_eq!(resources[0].uri, "foo://bar");
let content = server.read_resource("foo://bar").await.unwrap();
assert_eq!(content, "Resource: bar");
}| @@ -0,0 +1,52 @@ | |||
| # phenoAI — Plan | |||
There was a problem hiding this comment.
Suggestion: Move this new Markdown document out of the repository root into an approved docs subdirectory (for example under docs/worklogs/) because only README.md, CLAUDE.md, and AGENTS.md are allowed as new root-level Markdown files. [custom_rule]
Severity Level: Minor
Why it matters? 🤔
The repository's AGENTS.md explicitly says never to create .md files at the root level except README.md, CLAUDE.md, and AGENTS.md. PLAN.md is a new root-level Markdown file, so the suggestion correctly identifies a real rule violation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** PLAN.md
**Line:** 1:1
**Comment:**
*Custom Rule: Move this new Markdown document out of the repository root into an approved docs subdirectory (for example under `docs/worklogs/`) because only `README.md`, `CLAUDE.md`, and `AGENTS.md` are allowed as new root-level Markdown files.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| let err = LlmError::Provider("upstream rejected request".into()); | ||
| let s = format!("{}", err); | ||
| assert!(!s.contains("sk-"), "error msg leaked sk- prefix: {}", s); |
There was a problem hiding this comment.
Suggestion: This test is ineffective as a secret-leak guard because it only formats an error message that contains no secret-like content, so it would still pass even if LlmError::Provider leaks API keys from upstream errors. Use a provider message containing a key-like token and assert that the formatted output is redacted, otherwise the test gives false security confidence. [incomplete implementation]
Severity Level: Major ⚠️
- ❌ Secret-leak regression test does not cover key-like messages.
- ⚠️ Security reviewers may assume LlmError display is sanitized.Steps of Reproduction ✅
1. Inspect the `LlmError` enum in `crates/llm-router/src/lib.rs:12-21` and note the
`thiserror` Display string for the `Provider` variant: `#[error("provider error: {0}")]`,
which forwards the inner `String` unchanged.
2. Observe in `OpenAiProvider::complete` at `crates/llm-router/src/lib.rs:94-100` that
network errors are mapped with `.map_err(|e| LlmError::Provider(e.to_string()))`, so any
upstream error message (including one containing an API key like `sk-...`) is stored
directly in the `Provider` string.
3. Examine the test `llm_error_display_does_not_leak_secrets` at
`crates/llm-router/src/lib.rs:241-248`, which constructs `LlmError::Provider("upstream
rejected request".into())`, formats it with `format!("{}", err)`, and asserts
`!s.contains("sk-")`; the inner message never contains `sk-`, so the assertion always
passes regardless of Display behavior.
4. Manually simulate a realistic upstream error by constructing
`LlmError::Provider("request failed for key sk-test-123".into())` (same enum and Display
path as step 2); formatting this value using the current Display implementation would
yield a string containing `sk-test-123`, demonstrating that secrets could be leaked, yet
the existing test (step 3) would still pass because it never exercises a key-like message,
making it ineffective as a secret-leak guard.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/llm-router/src/lib.rs
**Line:** 246:248
**Comment:**
*Incomplete Implementation: This test is ineffective as a secret-leak guard because it only formats an error message that contains no secret-like content, so it would still pass even if `LlmError::Provider` leaks API keys from upstream errors. Use a provider message containing a key-like token and assert that the formatted output is redacted, otherwise the test gives false security confidence.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| let resources = server.list_resources().await; | ||
| assert_eq!(resources.len(), 1); | ||
| assert_eq!(resources[0].uri, "foo://bar"); |
There was a problem hiding this comment.
Suggestion: The test name says it verifies read behavior, but it never calls read_resource; it only checks listing. This leaves the actual read path untested, so regressions in resource retrieval formatting/error handling can slip through while this test still passes. [incomplete implementation]
Severity Level: Major ⚠️
- ⚠️ Successful read_resource behavior untested; regressions can go unnoticed.
- ⚠️ Test name misleads about coverage of read path.Steps of Reproduction ✅
1. Review the implementation of `read_resource` at `crates/mcp-server/src/lib.rs:106-110`,
which looks up a `Resource` by URI and returns a formatted `String`
(`Ok(format!("Resource: {}", resource.name))`).
2. Inspect the test `register_resource_then_read_returns_content` at
`crates/mcp-server/src/lib.rs:149-160`; it constructs an `McpServer`, registers a
`Resource`, and then only calls `list_resources().await`, asserting on `resources.len()`
and `resources[0].uri` without ever invoking `read_resource`.
3. Note that the only test that calls `read_resource` is
`read_unknown_resource_returns_resource_not_found` at
`crates/mcp-server/src/lib.rs:163-168`, which passes a missing URI and asserts on the
`McpError::ResourceNotFound(_)` error path, leaving the successful `Ok(String)` path in
`read_resource` completely untested.
4. Change the success behavior of `read_resource` (for example, alter the
`format!("Resource: {}", resource.name)` string or introduce a bug) and run `cargo test -p
mcp-server`; `register_resource_then_read_returns_content` still passes because it never
touches `read_resource`, demonstrating that this test name claims to verify "read"
behavior but only exercises registration and listing, allowing regressions in the real
read path to slip through.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/mcp-server/src/lib.rs
**Line:** 158:160
**Comment:**
*Incomplete Implementation: The test name says it verifies read behavior, but it never calls `read_resource`; it only checks listing. This leaves the actual read path untested, so regressions in resource retrieval formatting/error handling can slip through while this test still passes.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| // OpenAI-compatible: { "texts": [...], "model": "..." } | ||
| assert_eq!(json["texts"][0], "hello"); | ||
| assert_eq!(json["model"], "text-embedding-3-small"); |
There was a problem hiding this comment.
Suggestion: This test codifies an API shape using texts, but the actual provider request body in embed uses input; documenting/asserting this as "OpenAI-compatible" is a contract mismatch that can lead callers to build invalid payloads if they rely on this serialization behavior. Align the test expectation and comment with the real wire format. [api mismatch]
Severity Level: Major ⚠️
- ⚠️ Test documents OpenAI payload using incorrect texts field.
- ⚠️ Callers may build incompatible payloads based on test.Steps of Reproduction ✅
1. Inspect `EmbeddingRequest` in `crates/pheno-embedding/src/lib.rs:19-22`, which defines
fields `texts: Vec<String>` and `model: Option<String>` and derives `Serialize`, causing
it to serialize as `{"texts": [...], "model": ...}` when passed to `serde_json::to_value`.
2. Examine the `OpenAiEmbeddings::embed` method at
`crates/pheno-embedding/src/lib.rs:51-57`; it constructs the actual HTTP request body
manually using `serde_json::json!({ "input": request.texts, "model": model })`, sending an
`input` field rather than `texts` to the OpenAI embeddings endpoint.
3. Look at the test `embedding_request_serializes_to_expected_shape` at
`crates/pheno-embedding/src/lib.rs:81-90`; it serializes `EmbeddingRequest` directly with
`serde_json::to_value(&req)` and asserts that the JSON has a `texts` field and comments
this as "OpenAI-compatible: { "texts": [...], "model": "..." }".
4. Compare steps 2 and 3: the wire-format used in real requests is `{ "input": ...,
"model": ... }`, but the test both documents and asserts an OpenAI-compatible shape using
`texts`, so a caller following this test's comment to construct their own HTTP payload
from `EmbeddingRequest` would end up sending `texts` (contradicting the actual `input`
field used by `embed`), illustrating the API contract mismatch between the test and the
real wire format.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** crates/pheno-embedding/src/lib.rs
**Line:** 87:89
**Comment:**
*Api Mismatch: This test codifies an API shape using `texts`, but the actual provider request body in `embed` uses `input`; documenting/asserting this as "OpenAI-compatible" is a contract mismatch that can lead callers to build invalid payloads if they rely on this serialization behavior. Align the test expectation and comment with the real wire format.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit 364e2db. Configure here.
|
|
||
| [report] | ||
| # Fail CI if coverage drops below threshold | ||
| fail-under = 60.0 |
There was a problem hiding this comment.
fail-under config profile inactive
Medium Severity
The 60% fail-under threshold lives under a separate [report] table while CI options sit in [default]. The coverage workflow runs cargo tarpaulin --config tarpaulin.toml without selecting that profile, so the floor in tarpaulin.toml likely never affects the process exit code despite the comment that CI should fail below 60%.
Reviewed by Cursor Bugbot for commit 364e2db. Configure here.
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: coverage-html | ||
| path: tarpaulin-report.html |
There was a problem hiding this comment.
CLI Xml skips HTML artifact
Medium Severity
The generate step passes --out Xml, which typically overrides tarpaulin.toml’s out = ["Xml", "Html"] and emits only Cobertura XML. The next step still uploads tarpaulin-report.html, so the HTML artifact promised in SPEC.md is often missing while CI only warns via if-no-files-found: warn.
Reviewed by Cursor Bugbot for commit 364e2db. Configure here.
| Scenario: All providers fail | ||
| Given both providers are configured to return error "ServiceUnavailable" | ||
| When I call router.route("Summarize this text") | ||
| Then an LlmError::AllProvidersFailed error is returned |
There was a problem hiding this comment.
Suggestion: LlmError::AllProvidersFailed is asserted here, but that variant does not exist in LlmError (only Provider/RateLimited/Timeout/InvalidModel currently exist). This makes the scenario incompatible with the error contract and guarantees step/assertion mismatch. [api mismatch]
Severity Level: Major ⚠️
- ❌ BDD scenario for all-provider failure cannot be satisfied.
- ⚠️ Error-contract documentation diverges from actual `LlmError` enum.
- ⚠️ Future step implementations will hit compile-time type errors.Steps of Reproduction ✅
1. Open `crates/llm-router/src/lib.rs` and inspect the `LlmError` enum at lines 12–21; it
defines only `Provider(String)`, `RateLimited`, `Timeout`, and `InvalidModel(String)`
variants.
2. Run a search for `AllProvidersFailed` in the Rust code (`rg 'AllProvidersFailed'
/workspace/phenoAI -g'*.rs'`); there are no matches, confirming that no such enum variant
exists anywhere in the implementation or tests.
3. In `tests/features/llm-router/llm_routing.feature`, locate the "All providers fail"
scenario at lines 27–31; line 30 asserts `Then an LlmError::AllProvidersFailed error is
returned`, explicitly tying the Gherkin step to a non-existent `LlmError` variant.
4. When implementing this step in Rust, any attempt to construct or pattern-match
`LlmError::AllProvidersFailed` will fail to compile because that variant is undefined, so
the scenario as written cannot be satisfied without changing either the `LlmError`
contract or the feature expectations.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/llm-router/llm_routing.feature
**Line:** 30:30
**Comment:**
*Api Mismatch: `LlmError::AllProvidersFailed` is asserted here, but that variant does not exist in `LlmError` (only Provider/RateLimited/Timeout/InvalidModel currently exist). This makes the scenario incompatible with the error contract and guarantees step/assertion mismatch.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| Given both providers are configured to return error "ServiceUnavailable" | ||
| When I call router.route("Summarize this text") | ||
| Then an LlmError::AllProvidersFailed error is returned | ||
| And no retry loop exceeds the bounded retry cap |
There was a problem hiding this comment.
Suggestion: This step asserts retry-cap behavior, but the router currently has no retry loop logic at all. The scenario encodes behavior that is not implemented, so it will fail or force misleading step stubs unless retry semantics are added first. [incomplete implementation]
Severity Level: Major ⚠️
- ⚠️ Retry-cap behavior in spec unsupported by router code.
- ⚠️ Test design overstates resilience of current routing.
- ⚠️ Future retry features risk double-specification confusion.Steps of Reproduction ✅
1. Review `LlmRouter::complete` in `crates/llm-router/src/lib.rs` at lines 144–158: it
derives a prefix, attempts `provider.complete(request).await` once, then optionally calls
`fallback.complete(request).await` once, and finally returns
`Err(LlmError::InvalidModel(..))`; there are no loops or retry counters involved.
2. Confirm with a targeted search (`rg 'retry' crates/llm-router/src/lib.rs`) that the
router implementation contains no retry or backoff logic, reinforcing that only
single-shot provider/fallback calls occur.
3. Open `tests/features/llm-router/llm_routing.feature` and inspect the "All providers
fail" scenario at lines 27–31; line 31 states `And no retry loop exceeds the bounded retry
cap`, asserting the existence and correctness of a retry loop with a cap.
4. Any Cucumber step implementation that simply calls `LlmRouter::complete` (as done with
`CompletionRequest` in `tests/llm_router_test.rs` around lines 60–80) has no data to
assert about retries, so this step either fails or forces implementation of a new retry
layer not present in `LlmRouter` today, making the scenario inconsistent with current
behavior.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/llm-router/llm_routing.feature
**Line:** 31:31
**Comment:**
*Incomplete Implementation: This step asserts retry-cap behavior, but the router currently has no retry loop logic at all. The scenario encodes behavior that is not implemented, so it will fail or force misleading step stubs unless retry semantics are added first.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| Scenario: Client lists available tools | ||
| When a client sends a tools/list request | ||
| Then the response contains "list_journeys" with its JSON schema | ||
| And the response contains "list_resources" with its JSON schema |
There was a problem hiding this comment.
Suggestion: tools/list currently returns only registered tools, but this expectation requires list_resources to appear in that tool list. Since no such tool is auto-registered, this scenario encodes an incorrect response contract and will fail against current server behavior. [api mismatch]
Severity Level: Major ⚠️
- ❌ MCP tool-discovery feature spec mismatches current server API.
- ⚠️ Clients relying on "list_resources" tool are misdocumented.
- ⚠️ BDD scenario will fail once step bindings exist.Steps of Reproduction ✅
1. Inspect `McpServer::list_tools` in `crates/mcp-server/src/lib.rs` lines 83–85; it
collects the values of the internal `tools` map and returns them as a `Vec<Tool>`,
including only tools registered via `register_tool`.
2. Integration tests in `tests/mcp_server_test.rs` (e.g., `test_register_tool` at lines
16–40) show the expected contract: registering one tool yields exactly one entry from
`list_tools`, with the explicit tool name that was registered.
3. In the feature file `tests/features/mcp-server/mcp_protocol.feature`, the Background at
lines 8–11 registers a tool `"list_journeys"` and a resource `"journey://catalog"`, but
does not register any tool named `"list_resources"`.
4. The "Client lists available tools" scenario at line 16 asserts `And the response
contains "list_resources" with its JSON schema`; when mapped to `McpServer::list_tools`,
the response will contain only `"list_journeys"`, so this step will fail unless an
auto-registered `"list_resources"` tool or corresponding test-only shim is added.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/mcp-server/mcp_protocol.feature
**Line:** 16:16
**Comment:**
*Api Mismatch: `tools/list` currently returns only registered tools, but this expectation requires `list_resources` to appear in that tool list. Since no such tool is auto-registered, this scenario encodes an incorrect response contract and will fail against current server behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| Scenario: Client reads a registered resource | ||
| When a client sends a resources/read for "journey://catalog" | ||
| Then the response contains the catalog content | ||
| And the content MIME type is "application/json" |
There was a problem hiding this comment.
Suggestion: read_resource currently returns plain string content and does not return MIME metadata, so asserting MIME type in the read response mismatches the existing response shape. This scenario cannot pass without changing the resource-read API contract. [api mismatch]
Severity Level: Major ⚠️
- ⚠️ Resource-read response shape differs from documented MIME expectations.
- ⚠️ BDD resource scenario untestable with current `read_resource` API.
- ⚠️ IDE/client integrations may mis-handle resource content types.Steps of Reproduction ✅
1. Open `crates/mcp-server/src/lib.rs` and inspect the `Resource` struct at lines 45–50,
which includes a `mime_type: Option<String>` field, and `McpServer::read_resource` at
lines 106–110, which returns `Result<String, McpError>` with body `format!("Resource: {}",
resource.name)`, omitting MIME metadata.
2. Examine the integration test `test_read_resource` in `tests/mcp_server_test.rs` lines
126–144: it registers a `Resource` with `mime_type: Some("text/plain".to_string())`, calls
`read_resource`, and asserts only that the returned String contains `"readable_file"`,
never inspecting MIME type from the read response.
3. In `tests/features/mcp-server/mcp_protocol.feature`, the "Client reads a registered
resource" scenario at lines 31–34 ends with `And the content MIME type is
"application/json"`, implying that the resources/read response exposes MIME type
information.
4. A Cucumber step implementation that calls
`McpServer::read_resource("journey://catalog")` can only see the returned String, not the
`Resource::mime_type` field, so the step cannot assert MIME type from the read response
without changing the API to return structured content, leading to a contract mismatch
between spec and implementation.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/mcp-server/mcp_protocol.feature
**Line:** 34:34
**Comment:**
*Api Mismatch: `read_resource` currently returns plain string content and does not return MIME metadata, so asserting MIME type in the read response mismatches the existing response shape. This scenario cannot pass without changing the resource-read API contract.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| @error | ||
| Scenario: Empty input returns an error | ||
| When I call embeddings.embed([]) | ||
| Then an EmbeddingError::InvalidInput is returned |
There was a problem hiding this comment.
Suggestion: The scenario expects EmbeddingError::InvalidInput for empty input, but embed currently does not validate empty texts and returns a success response with zero vectors. This creates a deterministic behavior mismatch. [logic error]
Severity Level: Major ⚠️
- ⚠️ Spec says empty batch is error; code treats success.
- ⚠️ Callers may assume validation that does not occur.
- ⚠️ BDD invalid-input scenario will systematically fail.Steps of Reproduction ✅
1. Examine the `EmbeddingError` enum in `crates/pheno-embedding/src/lib.rs` lines 10–15;
it defines `Provider(String)` and `InvalidInput(String)`, but `rg
'EmbeddingError::InvalidInput'` shows it is only used in the display test at lines
131–136, not in the embedding logic.
2. Review `OpenAiEmbeddings::embed` at lines 51–72: it constructs the request body using
`request.texts`, calls the embeddings endpoint, and returns an `EmbeddingResponse` with
`embeddings: vec![vec![0.0; 1536]; request.texts.len()]` and no validation of whether
`request.texts` is empty; there is no branch returning `EmbeddingError::InvalidInput`.
3. In `tests/features/pheno-embedding/embedding_generation.feature`, the "Empty input
returns an error" scenario at lines 24–26 says `When I call embeddings.embed([])` followed
by `Then an EmbeddingError::InvalidInput is returned`, specifying that empty input must be
treated as invalid.
4. Implementing this scenario against the current `embed` function by constructing
`EmbeddingRequest { texts: vec![], model: None }` and invoking `embed` would yield
`Ok(EmbeddingResponse { embeddings: vec![], .. })` rather than an error, so the behavior
mandated by the feature does not match the implementation.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/pheno-embedding/embedding_generation.feature
**Line:** 26:26
**Comment:**
*Logic Error: The scenario expects `EmbeddingError::InvalidInput` for empty input, but `embed` currently does not validate empty `texts` and returns a success response with zero vectors. This creates a deterministic behavior mismatch.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| Scenario: Provider returns 401 returns EmbeddingError::Provider | ||
| Given the OpenAI client is configured to return HTTP 401 | ||
| When I call embeddings.embed("Hello") | ||
| Then an EmbeddingError::Provider is returned |
There was a problem hiding this comment.
Suggestion: The current OpenAI embed implementation does not check HTTP status codes and only maps transport failures, so an HTTP 401 response is not converted into EmbeddingError::Provider. This expectation conflicts with actual error handling behavior. [api mismatch]
Severity Level: Major ⚠️
- ⚠️ Provider authentication failures are reported as synthetic successes.
- ⚠️ BDD scenario for HTTP 401 will never pass.
- ⚠️ Callers may mis-handle authorization or retry decisions.Steps of Reproduction ✅
1. In `crates/pheno-embedding/src/lib.rs`, inspect `OpenAiEmbeddings::embed` at lines
59–65: the HTTP call `.send().await` is followed by `.map_err(|e|
EmbeddingError::Provider(e.to_string()))?`, which only converts transport-level failures
into `EmbeddingError::Provider`; it does not inspect the HTTP status code.
2. After the HTTP call, the function ignores the actual response object (`_response`) and
unconditionally returns a synthetic `EmbeddingResponse` with zero-valued embeddings at
lines 67–72, so even an HTTP 401 from OpenAI will result in `Ok(EmbeddingResponse)` at the
API boundary.
3. The only mention of `401` in this crate is in the unit test
`embedding_error_provider_display_includes_message` at lines 124–129, where
`"authentication failed (401)"` is manually used to test Display formatting; there is no
production path that derives this from a real HTTP 401.
4. The feature `tests/features/pheno-embedding/embedding_generation.feature` lines 29–32
defines `Scenario: Provider returns 401 returns EmbeddingError::Provider`, culminating in
`Then an EmbeddingError::Provider is returned`, but with the current implementation a 401
response will not surface as an `EmbeddingError::Provider`, so the scenario contradicts
real error-handling behavior.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/features/pheno-embedding/embedding_generation.feature
**Line:** 32:32
**Comment:**
*Api Mismatch: The current OpenAI embed implementation does not check HTTP status codes and only maps transport failures, so an HTTP 401 response is not converted into `EmbeddingError::Provider`. This expectation conflicts with actual error handling behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
CodeAnt AI finished reviewing your PR. |


User description
Adds a justfile (just.systems) with org-standard dev/build/test/lint/fmt/clean recipes. Optional convenience layer over Cargo.
Note
Low Risk
Changes are CI, documentation, and test-only; runtime library behavior is unchanged aside from new tests. Coverage gates may fail CI until the 60% floor is met.
Overview
This PR is much broader than a justfile-only change: it establishes test and coverage governance for the phenoAI workspace and adds the supporting docs and tests to match.
Coverage & CI: New
tarpaulin.toml(60%fail-under, workspace-wide, excludes tests/examples) and.codecov.yml(60% project/patch targets). A Coverage GitHub Actions workflow runs workspace tests, generates Cobertura/HTML via tarpaulin, uploads to Codecov (fail_ci_if_error: false), and archivestarpaulin-report.html.Documentation: Adds living
SPEC.mdandPLAN.md, plus ADRs 0002 (three-crate split), 0003 (defer Anthropic provider), and 0004 (defer local/fastembed embeddings).Tests: Expands unit tests in
llm-router,mcp-server, andpheno-embedding(routing/fallback errors, tool/resource round-trips, serialization and error display). Adds BDD.featurefiles undertests/features/per crate (scenarios not wired to a runner in this diff).Developer ergonomics: Extends
justfilewithdev(cargo watch) andcleanalongside existing build/test/lint recipes.Reviewed by Cursor Bugbot for commit 364e2db. Bugbot is set up for automated code reviews on this repo. Configure here.
CodeAnt-AI Description
Add coverage checks, workspace docs, and developer commands
What Changed
justcommands for local watch, cleanup, build, test, lint, format, and audit tasksImpact
✅ Earlier detection of coverage drops✅ Clearer team alignment on workspace scope✅ Faster local development loops💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.