⚡ Parallelize embedding generation in RAG indexing#24
⚡ Parallelize embedding generation in RAG indexing#24google-labs-jules[bot] wants to merge 8 commits into
Conversation
Replaced sequential loop in `indexCodebase` with `Promise.all` and `p-limit` (concurrency 10) to parallelize embedding generation requests. Benchmark results (simulated 100ms latency, 50 files): - Sequential: ~5117ms - Parallel: ~539ms - Speedup: ~9.5x
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
🤖 AI Code ReviewHere's a detailed code review of your pull request: 1. WalkthroughThis pull request introduces a significant performance improvement to the RAG indexing process by parallelizing the embedding generation.
In essence, the PR transforms the embedding generation from a blocking, sequential process into an asynchronous, controlled-concurrent operation, directly addressing the performance bottleneck. 2. Sequence DiagramsequenceDiagram
participant IndexCodebase as indexCodebase
participant PLimit as p-limit instance
participant GenerateEmbedding as generateEmbedding function
participant AIAPI as Google AI API
participant Pinecone as Pinecone Index
IndexCodebase->>PLimit: Instantiate with concurrency limit (10)
IndexCodebase->>IndexCodebase: Map files to asynchronous tasks
loop For each file in files
IndexCodebase->>PLimit: Enqueue task to call generateEmbedding
PLimit-->>IndexCodebase: Task queued/executed based on limit
activate PLimit
PLimit->>GenerateEmbedding: Call generateEmbedding(truncatedContent)
activate GenerateEmbedding
GenerateEmbedding->>AIAPI: Request embedding for content
activate AIAPI
AIAPI-->>GenerateEmbedding: Return embedding result
deactivate AIAPI
GenerateEmbedding-->>PLimit: Return embedding data or null on error
deactivate GenerateEmbedding
deactivate PLimit
end
IndexCodebase->>IndexCodebase: Wait for all tasks to complete (Promise.all)
IndexCodebase->>IndexCodebase: Filter out null results to get valid vectors
loop For each batch of vectors
IndexCodebase->>Pinecone: Upsert batch of vectors
end
IndexCodebase->>IndexCodebase: Log "Indexing completed"
3. SummaryThis pull request significantly improves the performance of the RAG codebase indexing process by parallelizing the 4. Strengths
5. Issues
6. Suggestions
7. PoemA loop once slow, a journey long and grand, Ten threads now race, no longer one by one, Powered By CodeHorse |
Thanks for the excellent review! I've addressed your suggestions:
|
…reporting - Added `concurrencyLimit` parameter to `indexCodebase` (default 10). - Defined `MAX_EMBEDDING_CONTENT_LENGTH` constant (8000). - `indexCodebase` now returns indexing statistics (success/failed counts, failed files list). - Updated `inngest/functions/index.ts` to utilize the returned success count. - Improved type safety for vector processing.
🤖 AI Code ReviewHere's a detailed, constructive code review for your pull request. 1. WalkthroughThis pull request focuses on significantly improving the performance of the
2. Sequence DiagramsequenceDiagram
participant Inngest as Inngest Function (indexRepo)
participant GitHub as GitHub API
participant RAG as RAG Module (indexCodebase)
participant EmbedService as Google Embedding Service
participant Pinecone as Pinecone DB
Inngest->>GitHub: Fetch repository file contents
GitHub-->>Inngest: File list and content
Inngest->>RAG: Call indexCodebase(repoId, files, concurrencyLimit)
RAG->>RAG: Initialize p-limit(concurrencyLimit)
loop For each file
RAG->>RAG: Add embedding task to p-limit queue
end
loop Process tasks (up to concurrencyLimit concurrently)
RAG->>EmbedService: generateEmbedding(truncatedContent)
EmbedService-->>RAG: Embedding vector (or error)
end
RAG->>RAG: Aggregate successful vectors
RAG->>RAG: Identify failed embeddings
loop For each batch of successful vectors
RAG->>Pinecone: upsert(batch)
Pinecone-->>RAG: Upsert confirmation
end
RAG-->>Inngest: Return { successCount, failedCount, failedFiles }
Inngest-->>Inngest: Update repository status with indexedFiles
3. SummaryThis pull request significantly optimizes the 4. Strengths
5. Issues
6. Suggestions
7. PoemA loop once slow, a task so grand, Powered By CodeHorse |
Thank you for the detailed feedback! I have implemented the following improvements:
|
- Set default `concurrencyLimit` from `EMBEDDING_CONCURRENCY_LIMIT` env var. - Improved error message extraction for failed files. - Added JSDoc for return type.
🤖 AI Code ReviewHere's a detailed code review of your pull request: 1. WalkthroughLet's break down the changes file by file:
2. Sequence DiagramsequenceDiagram
participant Inngest as Inngest Function
participant RagModule as AI RAG Module
participant GoogleAI as Google AI SDK
participant Pinecone as Pinecone DB
Inngest->>RagModule: Call indexCodebase(repoId, files, concurrencyLimit)
activate RagModule
RagModule->>RagModule: Initialize p-limit(concurrencyLimit)
loop For each file in files (up to concurrencyLimit concurrently)
RagModule->>GoogleAI: Call generateEmbedding(truncatedContent)
activate GoogleAI
GoogleAI-->>RagModule: Return embedding or error
deactivate GoogleAI
RagModule->>RagModule: Collect embedding result (success/error)
end
RagModule->>RagModule: Aggregate successful vectors and failed files
alt If successful vectors exist
RagModule->>Pinecone: Upsert vectors in batches
activate Pinecone
Pinecone-->>RagModule: Upsert confirmation
deactivate Pinecone
end
RagModule-->>Inngest: Return {successCount, failedCount, failedFiles}
deactivate RagModule
Inngest->>Inngest: Update indexed files count with successCount
3. SummaryThis pull request implements a significant performance optimization for the RAG indexing process. The 4. Strengths
5. Issues
6. Suggestions
7. PoemThe code, once a stream, so calm and so slow, But now, a new spirit, Powered By CodeHorse |
Thank you for the review! I've made the requested updates:
Regarding |
- Added explanatory comment for `MAX_EMBEDDING_CONTENT_LENGTH`. - Documented `EMBEDDING_CONCURRENCY_LIMIT` in README.md.
🤖 AI Code ReviewHere's a detailed code review of your pull request. 1. WalkthroughThis pull request primarily focuses on improving the performance of the RAG indexing process by parallelizing the embedding generation.
2. Sequence DiagramsequenceDiagram
participant InngestFunction as Inngest Function (indexRepo)
participant RagModule as modules/ai/lib/rag.ts
participant P_Limit as p-limit
participant EmbeddingService as AI SDK (Google Embedding)
participant PineconeDB as Pinecone Vector DB
InngestFunction->>RagModule: Call indexCodebase(repoId, files, concurrencyLimit)
RagModule->>P_Limit: Initialize with concurrencyLimit
loop For each file in files
RagModule->>P_Limit: Add embedding task for file
P_Limit-->>RagModule: Allows task execution (up to limit)
RagModule-->>EmbeddingService: generateEmbedding(truncatedContent)
EmbeddingService-->>RagModule: Returns embedding or throws error
RagModule->>RagModule: Store result (success/error)
end
RagModule->>RagModule: Collect all results after Promise.all
RagModule->>RagModule: Filter successful vectors and failed files
RagModule->>PineconeDB: Upsert vectors in batches (sequentially)
PineconeDB-->>RagModule: Confirms upsert
RagModule-->>InngestFunction: Return {successCount, failedCount, failedFiles}
InngestFunction->>InngestFunction: Use successCount for further processing
3. SummaryThis pull request significantly enhances the performance of the RAG codebase indexing process by introducing parallel embedding generation. Previously, embedding each file was a sequential operation, leading to bottlenecks, especially with high-latency AI API calls. By integrating 4. Strengths
5. Issues
6. Suggestions
7. PoemA loop once trudged, a task so slow, No longer waiting, one by one, From "sequential" to "speedy light," Powered By CodeHorse |
Thank you for the comprehensive review! I've implemented the suggestion to wrap the Pinecone upsert in a |
Wrapped the Pinecone upsert call in a try/catch block to prevent a single batch failure from halting the entire indexing process.
🤖 AI Code ReviewHere's a detailed, constructive code review for your pull request: 1. WalkthroughLet's go through the changes file by file.
2. Sequence DiagramsequenceDiagram
participant InngestEvent as Inngest (pr.review.requested)
participant InngestFunc as indexRepo Function
participant GithubAPI as GitHub API
participant DB as Prisma DB
participant RAGModule as RAG Module
participant EmbeddingService as Google AI Embedding Service
participant VectorDB as Pinecone Vector DB
InngestEvent->>InngestFunc: Trigger 'repository.connected' event (owner, repo, userId)
InngestFunc->>DB: Fetch user's GitHub access token
DB-->>InngestFunc: Access Token
alt Access Token Found
InngestFunc->>GithubAPI: getRepoFileContents(token, owner, repo)
GithubAPI-->>InngestFunc: List of files (path, content)
InngestFunc->>RAGModule: indexCodebase(repoId, files, concurrencyLimit)
RAGModule->>EmbeddingService: <Parallel Requests via p-limit>
RAGModule->>EmbeddingService: generateEmbedding(fileContent_1)
RAGModule->>EmbeddingService: generateEmbedding(fileContent_2)
...
RAGModule->>EmbeddingService: generateEmbedding(fileContent_N)
loop For each file (up to concurrency limit)
EmbeddingService-->>RAGModule: Embedding Vector (or Error)
end
RAGModule->>RAGModule: Collect successful vectors and failed file errors
RAGModule->>VectorDB: <Batch Upsert>
VectorDB-->>RAGModule: Upsert Status (or Error)
RAGModule-->>InngestFunc: { successCount, failedCount, failedFiles }
InngestFunc-->>InngestEvent: { success: true, indexedFiles: successCount }
else Access Token Not Found
InngestFunc--xInngestEvent: Error: "No GitHub access token found"
end
3. SummaryThis Pull Request significantly enhances the performance of the RAG indexing process by parallelizing the generation of embeddings for codebase files. It replaces a sequential loop with a concurrent approach using 4. Strengths
5. Issues
6. Suggestions
7. PoemA sequential task, a heavy, slow embrace, From old Powered By CodeHorse |
Thank you for the detailed feedback! I have implemented the suggestions to improve robustness and type safety:
|
- Implemented safer `concurrencyLimit` parsing (handles NaN). - Added `failedUpsertCount` to return statistics. - Improved logging for failed upserts (includes batch IDs). - Used discriminated union `EmbeddingResult` for stricter type safety.
🤖 AI Code ReviewHere's a detailed code review for your pull request: 1. WalkthroughThis pull request primarily focuses on improving the performance of the
2. Sequence DiagramsequenceDiagram
participant InngestFn as Inngest Function (indexRepo)
participant GithubModule as GitHub Module (getRepoFileContents)
participant AIRagModule as AI Rag Module (indexCodebase)
participant GoogleAI as Google AI API (generateEmbedding)
participant PineconeDB as Pinecone DB (upsert)
InngestFn->>GithubModule: fetch-pr-data (getRepoFileContents)
GithubModule-->>InngestFn: files[]
InngestFn->>AIRagModule: indexCodebase(repoId, files[])
AIRagModule->>AIRagModule: pLimit(concurrency) initialized
loop for each file in files[]
AIRagModule->>GoogleAI: generateEmbedding(truncatedContent) (concurrently, up to limit)
alt Embedding Success
GoogleAI-->>AIRagModule: embedding vector
AIRagModule->>AIRagModule: Collect successful vector
else Embedding Failure
GoogleAI--xAIRagModule: error
AIRagModule->>AIRagModule: Collect failed file
end
end
AIRagModule->>AIRagModule: Filter successful vectors & failed files
loop for each batch of vectors
AIRagModule->>PineconeDB: upsert(batch)
alt Upsert Success
PineconeDB-->>AIRagModule: success
AIRagModule->>AIRagModule: Track success
else Upsert Failure
PineconeDB--xAIRagModule: error
AIRagModule->>AIRagModule: Track failed upsert count
end
end
AIRagModule-->>InngestFn: indexing stats (successCount, failedCount, etc.)
InngestFn-->>InngestFn: log stats
3. SummaryThis pull request significantly enhances the performance of the RAG indexing process by introducing parallelism for embedding generation. It replaces a sequential loop with 4. Strengths
5. Issues
6. Suggestions
7. PoemIn codebase vast, where tasks once slowly crept, With promises parallel, a swift, concurrent dance, Powered By CodeHorse |
Thank you for the detailed feedback! I've updated the PR with the following changes:
|
- Updated `MAX_EMBEDDING_CONTENT_LENGTH` comment for accuracy. - Added JSDoc for `indexCodebase` return stats. - Moved `prisma` to `devDependencies`.
🤖 AI Code ReviewHere's a detailed, constructive code review for your pull request. 1. WalkthroughThis pull request primarily focuses on improving the performance of the RAG (Retrieval-Augmented Generation) indexing process by parallelizing the generation of embeddings for codebase files.
2. Sequence DiagramsequenceDiagram
participant InngestWorker as Inngest Worker
participant GitHubAPI as GitHub API
participant RAGModule as RAG Module (rag.ts)
participant GoogleAIModel as Google AI Model
participant PineconeDB as Pinecone DB
InngestWorker->>InngestWorker: Event: pr.review.requested (trigger indexRepo)
InngestWorker->>GitHubAPI: Fetch Repository Files
GitHubAPI-->>InngestWorker: Returns file list
InngestWorker->>RAGModule: Call indexCodebase(repoId, files)
RAGModule->>RAGModule: Initialize p-limit with concurrency
loop For each file in files
RAGModule->>RAGModule: Truncate file content
RAGModule->>GoogleAIModel: Request generateEmbedding(truncatedContent)
Note right of RAGModule: Executed in parallel, respecting concurrency limit
GoogleAIModel-->>RAGModule: Returns embedding vector or error
end
RAGModule->>RAGModule: Aggregate embedding results (success/error)
RAGModule->>RAGModule: Filter successful embeddings into batches
loop For each batch of vectors
RAGModule->>PineconeDB: Upsert batch of vectors
PineconeDB-->>RAGModule: Acknowledge upsert or error
end
RAGModule-->>InngestWorker: Return indexing stats (successCount, failedCount, etc.)
InngestWorker->>InngestWorker: Log indexing results
InngestWorker-->>InngestWorker: Mark step complete
3. SummaryThis pull request significantly enhances the performance of the RAG indexing process by parallelizing the generation of embeddings. Previously, embedding calls were made sequentially, creating a bottleneck due to network latency. By leveraging 4. Strengths
5. Issues
6. Suggestions
7. PoemA sequential march, a slow, steady pace, From files to vectors, a swift, parallel flight, Powered By CodeHorse |
Thank you for the detailed feedback! I've made the requested updates:
Regarding |
- Defined `DEFAULT_CONCURRENCY_LIMIT` constant. - Added warning log when content is truncated before embedding.
🤖 AI Code ReviewPowered By CodeHorse |
Acknowledged. |
Promise.allandp-limit.indexCodebasewhich was bottlenecked by sequential API calls.PR created automatically by Jules for task 14872559739613709846 started by @amaan-ur-raheman