Skip to content

Phase 5: Cloud Integration Testing (#139 follow-up)#140

Merged
oss-amikos merged 27 commits intomainfrom
feature/phase-05-cloud-integration-testing
Mar 24, 2026
Merged

Phase 5: Cloud Integration Testing (#139 follow-up)#140
oss-amikos merged 27 commits intomainfrom
feature/phase-05-cloud-integration-testing

Conversation

@oss-amikos
Copy link
Contributor

Summary

Phase 5: Cloud Integration Testing
Goal: Build deterministic cloud parity test suites that validate search, schema/index, and array metadata behavior against Chroma Cloud.
Status: Verified ✓ (human verification pending for live cloud tests)

Adds a comprehensive cloud integration test suite (SearchApiCloudIntegrationTest) with 23 test methods covering search parity (CLOUD-01), schema/index round-trips (CLOUD-02), and array metadata handling (CLOUD-03). All tests gate on cloud credentials for graceful CI skip. Also adds client-side mixed-type array validation in ChromaHttpCollection with 18 unit tests.

Changes

Plan 05-01: Schema/Index + Array Metadata Cloud Tests

  • SearchApiCloudIntegrationTest with shared cloud infrastructure and 12 test methods
  • CLOUD-02: distance space, HNSW, SPANN config round-trips, invalid transitions, schema round-trip
  • CLOUD-03: string/number/bool array metadata, contains/notContains filters, edge cases
  • D-22: mixed-type array client-side validation in ChromaHttpCollection.validateMetadataArrayTypes()
  • MetadataValidationTest with 18 unit tests including 3 behavioral wiring tests

Key files:

  • src/test/java/tech/amikos/chromadb/v2/SearchApiCloudIntegrationTest.java
  • src/main/java/tech/amikos/chromadb/v2/ChromaHttpCollection.java (validation wiring)
  • src/test/java/tech/amikos/chromadb/v2/MetadataValidationTest.java

Plan 05-02: CLOUD-01 Search Parity Tests

  • 11 test methods validating Phase 3 Search API end-to-end against Chroma Cloud
  • KNN search, batch search, GroupBy, ReadLevel (INDEX_AND_WAL, INDEX_ONLY)
  • Knn.limit vs Search.limit candidate pool distinction
  • Filter matrix: 8 sub-scenarios (Where, IDIn, IDNotIn, DocumentContains, combined, triple)
  • Pagination: basic limit, limit+offset, client-side validation (limit=0, negative offset)
  • Field projection: selected fields present, unselected null, custom metadata key

Plan 05-03: Gap Closure

  • Fixed testCloudSearchProjectionPresent assertion to accept [[null]] server response
  • Fixed testCloudSearchReadLevelIndexAndWal to search isolated collection with 3D embedding

Requirements Addressed

REQ-ID Description
CLOUD-01 Cloud search parity tests: pagination, IDIn/IDNotIn, document filters, metadata projection, combined filters
CLOUD-02 Cloud schema/index tests: distance space, HNSW/SPANN config, invalid transitions, round-trip
CLOUD-03 Cloud array metadata tests: string/number/bool arrays, round-trip, contains/not_contains

Verification

  • Automated verification: 4/4 must-haves verified
  • Compilation: mvn compile passes
  • Unit tests: 1092 passing, 0 new regressions
  • Cloud tests: require CHROMA_API_KEY, CHROMA_TENANT, CHROMA_DATABASE env vars
  • Graceful skip: all cloud tests skip cleanly without credentials

Key Decisions

  • validateMetadataArrayTypes uses ChromaBadRequestException with typed errorCode strings (MIXED_TYPE_ARRAY, NULL_ARRAY_ELEMENT)
  • Behavioral wiring tests use ChromaHttpCollection.from() with stub client at localhost:1 — validation fires before network call
  • Embedding projection assertion loosened to accept null or [[null]] — server returns [[null]] for unselected embeddings
  • WAL read-level test uses isolated 3D collection instead of 4D seedCollection for dimensionality safety

Test Plan

  • Run mvn test -Dtest=MetadataValidationTest — 18 unit tests pass offline
  • Run mvn test -Dtest=SearchApiCloudIntegrationTest without credentials — all tests skip, 0 failures
  • Run mvn test -Dtest=SearchApiCloudIntegrationTest with cloud credentials — 22 tests pass, 1 skips (RRF unsupported)

…dLevel, KnnLimit)

- Add QUERY_ELECTRONICS/GROCERY/SPORTS embedding constants (4D clusters)
- testCloudKnnSearch: KNN search with embedding, limit 5, row access assertions
- testCloudRrfSearch: RRF search auto-skipped (server unsupported)
- testCloudGroupBySearch: GroupBy with category key, maxK=2, rows() access
- testCloudBatchSearch: two independent KNN searches, searchCount==2 assertion
- testCloudSearchReadLevelIndexAndWal: isolated collection, immediate WAL search
- testCloudSearchReadLevelIndexOnly: shared seed, INDEX_ONLY succeeds assertion
- testCloudKnnLimitVsSearchLimit: Knn.limit(10) vs Search.limit(3) distinction
…ests

- testCloudSearchFilterMatrix: 8 sub-tests (Where, IDIn, IDNotIn, DocumentContains,
  IDNotIn+metadata, Where+DocumentContains, DocumentNotContains, triple Where+IDIn+DocumentContains)
- testCloudSearchPagination: basic limit, limit+offset page2, client-side validation
  for limit=0 and negative offset throwing IllegalArgumentException per D-14
- testCloudSearchProjectionPresent: selected fields ID/SCORE/DOCUMENT present,
  unselected EMBEDDING is null
- testCloudSearchProjectionCustomKey: Select.key("category") and Select.key("price")
  projected via custom metadata key per D-16
…OADMAP updated

- 05-02-SUMMARY.md: 11 new test methods covering KNN, batch, GroupBy, ReadLevel,
  filter matrix (8 combos), pagination with client-side validation, field projection
- STATE.md: plan advanced to 2/2, progress 100%, decisions recorded
- ROADMAP.md: phase 5 plan progress updated (2 plans, 2 summaries)
- REQUIREMENTS.md: CLOUD-01 marked complete
- Loosen assertNull to accept null or [[null]] for embedding projection test
- Fix testCloudSearchReadLevelIndexAndWal to search isolated col instead of seedCollection with 3D query embedding
@oss-amikos oss-amikos requested a review from tazarov as a code owner March 23, 2026 13:50
- Add non-empty assertions before all for-each loops to prevent vacuous
  passes on empty result sets (filter matrix, pagination, projection)
- Strengthen WAL assertion from >= 1 to == 3 to properly test contract
- Add IAE message validation in pagination catch blocks
- Verify KNN/batch top results are from expected clusters
- Verify GroupBy returns multiple categories
- Replace null guards with assertNotNull in custom key projection
- Add non-projected key exclusion checks (in_stock, tags absent)
- Make embedding projection assertion handle all server response shapes
- Add INDEX_ONLY minimum result count assertion
- Convert RRF test from permanent skip to expect-error pattern
- Remove unused QUERY_SPORTS constant
- Extract ELECTRONICS_IDS and GROCERY_IDS to class constants (was
  duplicated in testCloudKnnSearch and testCloudBatchSearch)
- Add proper imports for Set/HashSet, remove inline FQNs
- Delete dead metadata(String...) helper (zero call sites)
- Narrow RRF catch from ChromaException to ChromaClientException
- Assert metadata non-null in GroupBy test instead of silently skipping
testCloudForkCountReturnsZeroForNewCollection was calling forkCount()
without catching ChromaNotFoundException, causing CI failures when the
endpoint is unavailable. Apply the same try/catch/skip pattern used by
the self-hosted CollectionApiExtensionsIntegrationTest.
- RRF test: catch ChromaException (not just ChromaClientException) and
  don't check message content — error format varies by Cloud version
- INDEX_ONLY test: allow 0 results since index compaction is async on
  Cloud; the key assertion is that the call succeeds without error
The Chroma server has no native $rrf operator. The Go client expands
RRF into arithmetic expressions: -(sum(weight_i / (k + rank_i))).

Rewrite buildRrfRankMap to produce the same wire format:
- Each term: $div { left: $val(weight), right: $sum[$val(k), $knn] }
- All terms summed via $sum
- Negated via $mul[$val(-1), sum] (higher-is-better → lower-is-better)
- Normalize flag divides weights by their sum before expansion

Update unit tests for new wire format structure. Enable RRF in both
cloud and self-hosted integration tests (remove permanent skips).
Self-hosted Chroma 1.5.5 may not support arithmetic rank expressions
($sum, $div, $val, $mul) used by the client-side RRF expansion. The
server returns an unexpected response causing NPE on deserialization.

Skip gracefully with Assume when the server rejects or fails to handle
the arithmetic expression tree.
@tazarov
Copy link
Contributor

tazarov commented Mar 24, 2026

@claude PTAL

@claude
Copy link

claude bot commented Mar 24, 2026

Claude finished @tazarov's task in 1m 49s —— View job


PR Review: Phase 5 - Cloud Integration Testing

  • Read and analyze source code changes
  • Read and analyze test code changes
  • Review planning/documentation changes
  • Post review feedback

Overall

Solid PR. The cloud test suite is well-structured with proper credential gating (Assume.assumeTrue), shared seed collection to avoid per-test setup overhead, and thorough cleanup in @After/@AfterClass. The client-side RRF expansion into arithmetic rank expressions is a nice approach to work around the server's lack of a native $rrf operator.

A few things worth addressing:


Issues

1. Catching NullPointerException as control flowSearchApiIntegrationTest.java:212-214

} catch (NullPointerException e) {
    // Server may return an unexpected response format for arithmetic expressions
    Assume.assumeTrue("RRF not supported on self-hosted Chroma " + configuredChromaVersion(), false);
}

Catching NPE to skip a test masks real bugs. If the server returns an unexpected format, the deserialization layer should throw a meaningful exception, not NPE. This will hide legitimate null-pointer bugs in unrelated code paths during test runs. Consider either:

  • Adding null checks in the response parsing to throw a descriptive ChromaException
  • At minimum, checking e.getStackTrace() to verify the NPE is from response parsing

Fix this →

2. No-op assertion in testCloudSchemaRoundTripSearchApiCloudIntegrationTest.java:542

assertTrue("Schema keys map should not be empty if present",
        schema.getKeys().isEmpty() || !schema.getKeys().isEmpty()); // always passes, confirms non-null

This assertion is a tautology (X || !X is always true). The comment says "confirms non-null" but getKeys() returning null would NPE before reaching the assertion, making this line do nothing. If the intent is to assert non-null, use assertNotNull. If empty vs non-empty matters, assert one way or the other.

Fix this →

3. testCloudHnswConfigRoundTrip has redundant branchesSearchApiCloudIntegrationTest.java:394-412

The if (usedHnsw) and else branches do the exact same thing (both call hnswSearchEf(200)). The usedHnsw = true assignment in the else branch and the conditional below it could be simplified. Consider:

try {
    col.modifyConfiguration(UpdateCollectionConfiguration.builder()
            .hnswSearchEf(200)
            .build());
} catch (IllegalArgumentException e) {
    if (!isIndexGroupSwitchError(e)) throw e;
    return; // SPANN → HNSW switch not allowed
}
// Always verify after successful modify
Collection fetched = client.getCollection(col.getName());
assertNotNull("Configuration must not be null after HNSW update", fetched.getConfiguration());
assertEquals("HNSW searchEf must round-trip to 200",
        Integer.valueOf(200), fetched.getConfiguration().getHnswSearchEf());

Same applies to testCloudSpannConfigRoundTrip.

4. Division by zero in RRF normalizationChromaDtos.java:1761-1764

if (sum > 1e-6) {
    for (int i = 0; i < weights.length; i++) weights[i] /= sum;
}

If sum <= 1e-6 (e.g., all zero weights), the weights remain at 0 and the resulting $val(0) numerators make the entire RRF expression evaluate to 0 for all terms. This is mathematically valid but may be confusing. Consider throwing an IllegalArgumentException if all weights are effectively zero, since that's almost certainly a user error.

5. testCloudSearchReadLevelIndexAndWal - timing sensitivitySearchApiCloudIntegrationTest.java:949-955

The test asserts that all 3 freshly written records are returned immediately with INDEX_AND_WAL. While this is the semantic contract of WAL reads, cloud infrastructure with async replication could introduce races. The test comment acknowledges this, but there's no retry or tolerance. If this test ends up being flaky in CI, consider adding a brief poll or relaxing to >= 1.


Nits (non-blocking)

  • testCloudEmptyArrayMetadata (line 821): assertNull("Cloud nullified the empty array (tags is null)", tags) — the tags == null check on line 819 already confirms this; the assertNull is redundant though harmless.
  • The buildMeta helper uses float for price, which JSON will serialize as a double. Consistent use of double would avoid implicit widening.
  • testCloudInvalidConfigTransitionRejected (line 498): isIndexGroupSwitchError(e) || e.getMessage() != null — the second condition is almost always true (most exceptions have messages), making this assertion weak. Consider just checking isIndexGroupSwitchError(e).

What looks good

  • Clean credential gating pattern: @BeforeClass sets cloudAvailable, individual tests use Assume.assumeTrue
  • Seed collection is shared read-only with isolated collections for mutation tests — good for parallel-safety
  • The RRF arithmetic expansion in ChromaDtos.buildRrfRankMap() is well-documented with clear Javadoc explaining the formula
  • Filter matrix test (testCloudSearchFilterMatrix) covers 8 sub-scenarios comprehensively
  • Unit tests for RRF wire format updated to match the new expansion structure
  • Metadata validation correctly normalizes numeric types (Integer/Long/Short/Byte → integer group, Float/Double → floating group)
  • forkCount test gracefully handles 404/500 on cloud accounts that don't support it

|

…ation

- Add inner-list null validation in SearchResultImpl.from() and remove
  NPE catch in RRF integration test (proper deserialization boundary)
- Add zero-weight validation in Rrf.Builder.build() with unit test
- Replace tautological schema assertion with assertNotNull
- Simplify redundant if/else branches in HNSW/SPANN config round-trip tests
- Add assertEventually helper for WAL read-level test (Go-style polling)
- Strengthen weak assertion in invalid config transition test
- Change buildMeta price param from float to double for type consistency
- Remove redundant assertNull inside null-checked branch
The removed NPE catch was masking a missing cloud credential gate —
searchCollection is null when CHROMA_API_KEY is absent.
…coverage

Production:
- Rrf.Builder: reject negative/NaN/infinite weights and non-positive k
- ChromaDtos: unify normalization epsilon (1e-6 → 1e-9) to match builder
- Rrf/ChromaDtos: fix Javadoc (scores → weights, document normalization)

Tests:
- assertEventually: use do-while (prevent NPE), retry on ChromaException
- Narrow SPANN catch from ChromaException to BadRequest + NotFound
- Narrow self-hosted RRF catch to BadRequest + Server + Deserialization
- Remove overly broad ChromaServerException catch from fork count test
- Add 13 new unit tests: second rank structure, custom k, 3-rank expansion,
  negative/NaN/infinite weight rejection, k validation, RRF routing
…g catch blocks

- assertEventually: catch only ChromaConnectionException + ChromaServerException
  (non-transient 4xx/deserialization errors now propagate immediately)
- SPANN config test: use Assume.assumeTrue instead of silent return
- Config transition test: narrow ChromaException to BadRequest + Server
- Fix misleading "server-side embeddings" comment
- Complete @throws tag on Rrf.Builder.build()
@tazarov
Copy link
Contributor

tazarov commented Mar 24, 2026

@claude PTAL

@claude
Copy link

claude bot commented Mar 24, 2026

Claude finished @tazarov's task in 1m 49s —— View job


PR Review: Phase 5 - Cloud Integration Testing

  • Read and analyze source code changes
  • Read and analyze test code changes
  • Post review feedback

Overall

Good work. The RRF client-side expansion into arithmetic rank expressions is well-designed, the Rrf builder now has proper input validation (NaN, infinity, negative weights, zero k, all-zero weights), and the SearchResultImpl null inner-ids guard closes a real deserialization gap. The cloud test suite is comprehensive with clean credential gating and proper cleanup.

I see the previous review's feedback has been largely addressed — the NPE catch was replaced with typed exception catches, the tautological assertion was removed, and assertEventually was narrowed to only retry transient exceptions. Nice.

A few remaining items:


Issues

1. Normalization guard in buildRrfRankMap is redundant with Rrf.build() validationChromaDtos.java:1766

if (sum > 1e-9) {
    for (int i = 0; i < weights.length; i++) weights[i] /= sum;
}

Rrf.build() already throws if all weights sum to < 1e-9, so this guard can never trigger at runtime (the Rrf would have been rejected at construction). The silent no-op on sum <= 1e-9 is technically unreachable, but if the invariant ever breaks (e.g., someone constructs Rrf via reflection), the weights would silently remain un-normalized. Consider either removing the guard (trust the builder) or throwing an IllegalStateException instead of silently skipping.

2. testCloudInvalidConfigTransitionRejected accepts too many outcomesSearchApiCloudIntegrationTest.java:459-483

The test catches IllegalArgumentException, ChromaBadRequestException, ChromaServerException, and also allows the happy path (no exception). Every possible code path is accepted as a pass. This means the test can never fail — it's effectively a no-op. Consider at minimum asserting something about the behavior (e.g., that at least one of "exception thrown" or "index group is UNKNOWN" is true).

3. testCloudSearchReadLevelIndexOnly assertion is always trueSearchApiCloudIntegrationTest.java:1007-1008

assertTrue("INDEX_ONLY result count must be >= 0 and <= 15",
        result.getIds().get(0).size() >= 0 && result.getIds().get(0).size() <= 15);

List.size() is always >= 0, and with 15 records in the seed collection, this condition is always true. If the intent is "the call succeeds without error", the assertion is redundant — the test would already fail on an exception. Consider either removing the assertion (the test validates success-without-error) or asserting something more meaningful (e.g., that results are present if the seed data has had time to index).

4. forkCount cloud test catches only ChromaNotFoundExceptionCollectionApiExtensionsCloudTest.java:96-99

} catch (ChromaNotFoundException e) {
    Assume.assumeTrue("forkCount endpoint not available on this Chroma Cloud account", false);
}

The PR description mentions "gracefully handles 404/500", but only 404 (ChromaNotFoundException) is caught. A 500 from the server (ChromaServerException) would still fail the test. If the intent is to handle both, add ChromaServerException to the catch.


Nits (non-blocking)

  • SearchApiCloudIntegrationTest.java:563,617,734: Blank lines after .execute() calls are inconsistent with the rest of the file. Minor style nit.
  • buildMeta uses double for price but Arrays.<Object>asList(4, 5, 3) for ratings — the ratings will be boxed as Integer. This is fine, but a comment noting that JSON round-tripping may widen these to Double would help future readers (the test at line 635 already handles this via instanceof Number).
  • SearchApiIntegrationTest.java:210-222: Three separate catch blocks for ChromaBadRequestException, ChromaServerException, and ChromaDeserializationException all do the same thing (skip with a message). Could use a single catch of ChromaException (their common superclass) to reduce duplication, unless you want to preserve the distinct skip messages.

What looks good

  • RRF expansion formula is correct: -(sum(w_i / (k + rank_i))) with proper negation for Chroma's lower-is-better scoring
  • Single-rank optimization (skip $sum wrapper) reduces payload for the common case
  • Rrf.Builder validation is thorough: NaN, infinity, negative weights, zero/negative k, all-zero weights
  • assertEventually properly narrows retries to AssertionError, ChromaConnectionException, and ChromaServerException — non-transient errors propagate immediately
  • SearchResultImpl.from() null inner-ids guard prevents NPE cascades from malformed server responses
  • Cloud test cleanup is solid: per-test createdCollections list cleaned in reverse order, shared seed collection cleaned in @AfterClass
  • Filter matrix test (8 sub-scenarios) is comprehensive and well-structured with inline assertions per sub-test

… up style

- Fail testCloudInvalidConfigTransitionRejected when known index group
  transition is silently accepted; strengthen catch-block assertions
- Remove tautological `>= 0` check on List.size() in INDEX_ONLY test
- Add ChromaServerException catch to forkCount test for cloud 5xx
- Remove double blank lines after .execute() calls (style consistency)
- Add comment about Integer→Double JSON round-trip in buildMeta
@tazarov
Copy link
Contributor

tazarov commented Mar 24, 2026

@claude again

@claude
Copy link

claude bot commented Mar 24, 2026

Claude finished @tazarov's task in 1m 36s —— View job


PR Review: Phase 5 - Cloud Integration Testing

  • Read and analyze source code changes
  • Read and analyze test code changes
  • Post review feedback

Overall

Looking good. The previous review feedback has been well addressed: the NPE catch was replaced with typed exception catches, the tautological assertion was replaced with assertNotNull, HNSW/SPANN config tests were simplified, assertEventually is properly narrowed, the forkCount test now catches both ChromaNotFoundException and ChromaServerException, and the INDEX_ONLY assertion was tightened. The code is in solid shape.

A few remaining items:


Issues

1. testCloudInvalidConfigTransitionRejected still can't fail when indexGroup == UNKNOWNSearchApiCloudIntegrationTest.java:459-490

The test now correctly fail()s if the server accepts a cross-group switch when indexGroup != UNKNOWN (line 474-477). However, when indexGroup == UNKNOWN, every path (no exception, IllegalArgumentException, ChromaBadRequestException, ChromaServerException) is still accepted silently. This means on cloud accounts where index group detection returns UNKNOWN (which is likely common), the test exercises no assertion at all. Consider at minimum logging/documenting what happened, or asserting that some response was received (e.g., store a flag and assert it was set). Low priority since UNKNOWN is a legitimate edge case, but worth noting.

2. Normalization guard in buildRrfRankMap is dead codeChromaDtos.java:1766

if (sum > 1e-9) {
    for (int i = 0; i < weights.length; i++) weights[i] /= sum;
}

As flagged in the previous review, Rrf.build() already throws IllegalArgumentException if weightSum < 1e-9, so the sum <= 1e-9 branch in buildRrfRankMap is unreachable. The silent no-op would be a problem if the invariant were ever broken. Consider either:

  • Removing the guard entirely (trust the builder)
  • Replacing it with assert sum > 1e-9 or an IllegalStateException

This is low-risk but it's dead code that could mask future bugs if the validation in Rrf.build() is ever loosened.


Nits (non-blocking)

  • SearchApiCloudIntegrationTest.java:676-677: Double blank line after .execute() call (inconsistent with rest of file).
  • SearchApiCloudIntegrationTest.java:786-788: Same double blank line after .execute().
  • assertEventually Javadoc (line 1352) says {@link AssertionError} — typo, should be AssertionError (or AssertionError if that's the actual class name; it looks like a typo for AssertionError which is actually AssertionError in the code at line 1366). Actually looking again, Java's class is AssertionError — wait, it's java.lang.AssertionError. The Javadoc and catch both say AssertionError which is the correct spelling; ignore this nit.

What looks good

  • Previous review feedback thoroughly addressed across 5 commits
  • Clean credential gating with @BeforeClass / Assume.assumeTrue — tests skip gracefully
  • Shared seed collection for read-only tests, isolated collections for mutation tests
  • assertEventually properly retries only transient errors (AssertionError, ChromaConnectionException, ChromaServerException)
  • Filter matrix test covers 8 sub-scenarios with per-row assertions
  • RRF builder validation is comprehensive (NaN, infinity, negative, zero weights, zero k)
  • SearchResultImpl.from() null inner-ids guard prevents NPE from malformed responses
  • Pagination test validates both happy path and client-side validation (limit=0, negative offset)
  • Embedding projection assertion correctly handles null/[[null]] server responses
  • forkCount test now catches both 404 and 500

LGTM — the two items above are minor. Ship it.


View job

…ption

- Replace silent no-op in buildRrfRankMap with IllegalStateException when
  weight sum is effectively zero (unreachable via Rrf.build() validation,
  but will crash loudly if the invariant is ever broken)
- Remove remaining double blank lines after .execute() calls
@oss-amikos oss-amikos merged commit 8fbc6cb into main Mar 24, 2026
11 checks passed
@oss-amikos oss-amikos deleted the feature/phase-05-cloud-integration-testing branch March 24, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants