Feature/fluentapi by aravindcz · Pull Request #16 · amikos-tech/chromadb-java-client

aravindcz · 2023-09-12T18:48:16Z

Restructured the code so that there is a proper flow
Composed some objects like client into collection and collection into a new object embedding
The changes were made so that while creating the flient apis its more intuitive
Waiting for your review to improve upon
Thanks

This commit is made to discuss with the team regarding the approach with which we need to design the fluent api and thereby work upon on those ideas Author: Aravind C Date: 28/08/23

@throws

Critical fixes: - Add else-throw in buildKnnRankMap for unrecognized query types (#1) - Fix groups() to return empty list when not grouped, matching interface contract (#2) - Change SearchResultRow.getScore() from Float to Double for wire precision (#3) - Add defensive copy in Knn.getQuery() for float[] queries (#4) - Fix Knn.queryText() Javadoc: text sent to server, not client-side embedded (#5) Important fixes: - Add null dto check in SearchResultImpl.from() (#6) - Add bounds validation with actionable message in rows(searchIndex) (#7) - Add Objects.requireNonNull to Search.Builder setters (knn, rrf, where, groupBy, select) (#8) - Add null validation to Knn.key() (#9) - Validate null elements in SearchBuilderImpl.searches() (#10) - Rename groupCount() to searchCount() with clarified Javadoc (#13) Suggestions: - Add minK/maxK range and ordering validation in GroupBy.build() (#14) - Make Rrf.RankWithWeight fields private (#15) - Add null rows check in SearchResultGroupImpl (#16) - Add defensive rank-missing check in buildSearchItemMap (#18) - Remove internal "D-04" reference from production comment (#23) - Make ResultRow Javadoc mechanism-agnostic (Include/Select) (#24) - Clarify SearchBuilder limit/offset as per-search fallback (#25) - Add mutual-exclusivity docs to SearchBuilder convenience methods - Add IllegalArgumentException to SearchBuilder.execute() @throws

@throws

#139) * docs(05): research phase cloud integration testing * docs(phase-5): add validation strategy * docs(05): create phase plan for cloud integration testing * fix(05): revise plans based on checker feedback * docs(05): finalize validation strategy — set nyquist_compliant true * feat(05-01): add cloud schema/index and array metadata integration tests - Create SearchApiCloudIntegrationTest with 12 test methods (CLOUD-02, CLOUD-03, D-22) - CLOUD-02: distance space round-trip (cosine/l2/ip), HNSW/SPANN config round-trips, invalid config transition rejection, schema round-trip via CollectionConfiguration - CLOUD-03: string/number/bool array round-trips with type fidelity (D-23), contains/notContains edge cases (D-24), empty array behavior documentation (D-25) - D-22: mixed-type array client-side validation in ChromaHttpCollection.validateMetadataArrayTypes - Wire validation into AddBuilderImpl, UpsertBuilderImpl, UpdateBuilderImpl execute() methods - Create MetadataValidationTest with 18 unit tests (static + behavioral wiring) - All cloud-dependent tests gate on Assume.assumeTrue(cloudAvailable) per D-02 * docs(05-01): complete cloud schema/index and array metadata plan - Add 05-01-SUMMARY.md documenting CLOUD-02, CLOUD-03, D-22 outcomes - Update STATE.md: advance to plan 2, add decisions, record metrics - Update ROADMAP.md: phase 5 progress (1/2 plans complete) * docs(05-01): mark CLOUD-02 and CLOUD-03 requirements complete * docs: add context files for phases 02 and 05 * docs(03): capture Search API implementation decisions in CONTEXT.md 21 decisions across 4 areas: builder API shape (hybrid), result types (single SearchResult with isGrouped()), field projection (Select class, no Include), sparse vector scope (SparseVector type in Phase 3, EFs deferred). * docs(03): research Search API implementation patterns and wire format Researches ChromaDB v1.5+ Search endpoint for planning Phase 3. Covers wire format (KNN, RRF, Select, GroupBy, ReadLevel), Go client struct patterns, Java builder/DTO translation strategy, and pitfalls. * docs(phase-3): add validation strategy * docs(03-search-api): create phase plan * feat(03-search-api-01): create Search API value types (SparseVector, Select, ReadLevel, GroupBy) - SparseVector: immutable type with int[]/float[] arrays, defensive copies, validation - Select: field projection constants (#document, #score, #embedding, #metadata, #id) and key() factory - ReadLevel: enum with INDEX_AND_WAL/INDEX_ONLY constants and fromValue() lookup - GroupBy: builder with required key and optional minK/maxK bounds * feat(03-search-api-01): add Search API ranking builders, result interfaces, and SearchBuilder on Collection - Knn: factory methods (queryText, queryEmbedding, querySparseVector) with fluent immutable chain - Rrf: builder with rank(Knn, weight), k, normalize; auto-sets returnRank on sub-rankings - Search: builder composing knn/rrf with filter, select, groupBy, limit, offset - SearchResultRow: extends ResultRow with Float getScore() - SearchResultGroup: interface with getKey() and rows() for grouped results - SearchResult: column-oriented (ids/documents/metadatas/embeddings/scores) and row-oriented access - Collection: declares SearchBuilder search() method with SearchBuilder inner interface - ChromaHttpCollection: stub SearchBuilderImpl (throws UnsupportedOperationException, wired in Plan 02) * docs(03-search-api-01): complete Search API type system plan - Add 03-01-SUMMARY.md documenting all 11 new/modified files - Update STATE.md with decisions, metrics, and session info - Update ROADMAP.md with plan progress (phase 03 now 3/3 summaries) - Mark SEARCH-01, SEARCH-02, SEARCH-03, SEARCH-04 complete in REQUIREMENTS.md * feat(03-search-api-02): add Search DTOs and API path - Add collectionSearch() path builder to ChromaApiPaths - Add SearchRequest/SearchResponse DTOs to ChromaDtos - Add buildKnnRankMap, buildRrfRankMap, buildSearchItemMap helpers - Uses "filter" key (not "where") per Search API spec * feat(03-search-api-02): implement SearchResult impls and wire SearchBuilderImpl - Add SearchResultRowImpl: composition over ResultRowImpl, Float score field - Add SearchResultGroupImpl: key + ResultGroup<SearchResultRow> - Add SearchResultImpl: lazy-cached row groups, Double scores, grouped flag - Replace stub SearchBuilderImpl with full HTTP POST implementation - SearchBuilderImpl: convenience queryText/queryEmbedding, batch searches, global filter/limit/offset, readLevel, routes to /search via ChromaApiPaths * docs(03-search-api-02): complete Search API implementation plan - Create 03-02-SUMMARY.md documenting DTOs, HTTP wiring, result converters - Update STATE.md: advance plan, record metrics and decisions - Update ROADMAP.md: phase 03 all 3 plans summarized (Complete) * test(03-search-api-03): add SparseVectorTest, SelectTest, and SearchApiUnitTest - SparseVectorTest: 8 tests covering immutability, defensive copies, validation, equals/hashCode, toString - SelectTest: 7 tests covering standard constants, key factory, all(), equals, blank/null guards - SearchApiUnitTest: 30 tests covering Knn, Rrf, Search, GroupBy, ReadLevel, and wire format serialization via ChromaDtos * test(03-search-api-03): add SearchApiIntegrationTest and update compatibility test - Add SearchApiIntegrationTest with 12 cloud-only tests (KNN, batch, RRF skip, field projection, ReadLevel, GroupBy, global filter, convenience shortcuts); tests skip gracefully without credentials - Update PublicInterfaceCompatibilityTest: bump EXPECTED_COLLECTION_METHOD_COUNT to 22 and add testCollectionSearchMethod() assertion - Fix wire format bug in ChromaDtos: use '$knn'/'$rrf' keys (not 'knn'/'rrf') - Fix NPE in SearchResultImpl.rows() when Chroma Cloud returns null inner lists in batch search responses (e.g., "documents":[null,null]) - Update SearchApiUnitTest assertions to match corrected '$knn'/'$rrf' keys * docs(03-search-api-03): complete Search API tests plan - Create 03-03-SUMMARY.md documenting unit + integration test coverage, wire format bug fix ($knn/$rrf keys), and NPE fix in batch responses - Update STATE.md: advance plan counter, record metrics and decisions - Update ROADMAP.md: Phase 3 Search API marked Complete (3/3 plans) * docs(phase-03): add verification report for Search API phase 8/8 must-haves verified. SEARCH-01 through SEARCH-04 all satisfied. 3 human verification items noted for server compatibility (RRF, text query, GroupBy key) — these are server-side limitations, not code gaps. * fix(search-api): address 25 review findings across Search API types Critical fixes: - Add else-throw in buildKnnRankMap for unrecognized query types (#1) - Fix groups() to return empty list when not grouped, matching interface contract (#2) - Change SearchResultRow.getScore() from Float to Double for wire precision (#3) - Add defensive copy in Knn.getQuery() for float[] queries (#4) - Fix Knn.queryText() Javadoc: text sent to server, not client-side embedded (#5) Important fixes: - Add null dto check in SearchResultImpl.from() (#6) - Add bounds validation with actionable message in rows(searchIndex) (#7) - Add Objects.requireNonNull to Search.Builder setters (knn, rrf, where, groupBy, select) (#8) - Add null validation to Knn.key() (#9) - Validate null elements in SearchBuilderImpl.searches() (#10) - Rename groupCount() to searchCount() with clarified Javadoc (#13) Suggestions: - Add minK/maxK range and ordering validation in GroupBy.build() (#14) - Make Rrf.RankWithWeight fields private (#15) - Add null rows check in SearchResultGroupImpl (#16) - Add defensive rank-missing check in buildSearchItemMap (#18) - Remove internal "D-04" reference from production comment (#23) - Make ResultRow Javadoc mechanism-agnostic (Include/Select) (#24) - Clarify SearchBuilder limit/offset as per-search fallback (#25) - Add mutual-exclusivity docs to SearchBuilder convenience methods - Add IllegalArgumentException to SearchBuilder.execute() @throws * test(search-api): add comprehensive unit tests and property-based tests for review findings * fix(search-api): restore groups() IllegalStateException and add bounds check Re-review found that returning empty list from groups() when not grouped masks caller mistakes — user can't distinguish "forgot GroupBy" from "server returned no groups". Restore the original throwing behavior. Also adds bounds validation on searchIndex in groups() to match rows(). * test(search-api): add remaining test coverage gaps from re-review - Rrf normalize=false not serialized (key absent from wire format) - SearchResultGroupImpl null rows guard (NullPointerException) - groups() bounds check on grouped results (IndexOutOfBoundsException) * refactor(search-api): simplify code per reuse, quality, and efficiency review Reuse: - Extract ImmutableCopyUtils with shared nestedList/nestedMetadata/ nestedEmbeddings — eliminates ~100 lines of exact duplication between QueryResultImpl and SearchResultImpl - Add Search.toBuilder() to avoid fragile field-by-field reconstruction in SearchBuilderImpl.execute() Quality: - Extract checkSearchIndex() helper in SearchResultImpl (DRY bounds check) - Hoist list extraction (docList, metaList, embList) out of per-row loop - Make $knn/$rrf wire-format strings constants (WIRE_KNN, WIRE_RRF) Efficiency: - Cache SparseVector.getIndices()/getValues() in local variables in buildKnnRankMap to avoid 4 unnecessary defensive array copies * docs(03): ship phase 3 Search API — PR #139 * fix(cloud-tests): add explicit embeddings to SearchApiCloudIntegrationTest Cloud API rejects add requests without embeddings when no server-side embedding model is configured. Provide synthetic vectors for all add() calls, matching the pattern used by CloudParityIntegrationTest. * fix(cloud-tests): remove waitForIndexing — get() reads from WAL Cloud indexing is eventually consistent with no guaranteed timing. The get() endpoint reads from the WAL, so recently added records are visible immediately without waiting for index completion. * fix(cloud-tests): tolerate cloud config response differences Cloud API may not echo back distance space or SPANN parameters in configuration responses. Re-fetch via getCollection() and skip gracefully when values are absent rather than hard-failing. * fix(search-api): address code review findings 1, 2, and 4 1. Search.Builder.select(): add defensive copy via Arrays.copyOf() 2. SearchBuilderImpl: apply globalLimit and globalOffset independently as per-search fallbacks, not gated on limit being null 4. Knn.limit(): add validation rejecting zero and negative values * refactor(search-api): remove groups()/isGrouped() from SearchResult GroupBy is a server-side processing stage that flattens results back into the standard column-major response. Neither the Python nor Go Chroma clients expose a groups() accessor — groupBy results are accessed via rows() like any other search result. Removes SearchResultGroup, SearchResultGroupImpl, and the grouped boolean plumbing through SearchResultImpl.

aravindcz added 2 commits August 28, 2023 13:08

feat: Add the fluent api for creating collection

7943a77

This commit is made to discuss with the team regarding the approach with which we need to design the fluent api and thereby work upon on those ideas Author: Aravind C Date: 28/08/23

feat: Add fluent api support

6107a63

aravindcz requested a review from tazarov September 12, 2023 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/fluentapi#16

Feature/fluentapi#16
aravindcz wants to merge 2 commits intoamikos-tech:developfrom
aravindcz:feature/fluentapi

aravindcz commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aravindcz commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant