Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces a semantic deduplication step to the Prisma database orchestration pipeline to identify and eliminate duplicate tables across components that serve the same purpose but may have different names.
Changes:
- Adds a new deduplication agent with comprehensive system prompt (
DATABASE_DEDUPLICATION.md) defining semantic duplicate criteria, workflow, and output format - Implements deduplication orchestration logic with naming similarity hints, Union-Find cluster merging, and deterministic resolution (keeping tables from smallest components)
- Integrates deduplication into the Prisma orchestration pipeline between component review and schema generation phases
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
packages/agent/prompts/DATABASE_DEDUPLICATION.md |
Comprehensive system prompt defining agent responsibilities, duplicate criteria, execution flow, and examples |
packages/agent/src/orchestrate/prisma/orchestratePrismaDeduplication.ts |
Main orchestration logic for running deduplication agents per component |
packages/agent/src/orchestrate/prisma/programmers/AutoBeDatabaseDeduplicationProgrammer.ts |
Validation and resolution logic including Union-Find cluster merging |
packages/agent/src/orchestrate/prisma/histories/transformPrismaDeduplicationHistory.ts |
Prompt construction with naming similarity hints based on normalized table names |
packages/agent/src/orchestrate/prisma/structures/IAutoBeDatabaseDeduplicationApplication.ts |
TypeScript interface defining agent application structure |
packages/agent/src/orchestrate/prisma/orchestratePrisma.ts |
Integration of deduplication step into main orchestration pipeline with debug logging |
packages/interface/src/events/AutoBeDatabaseDeduplicationEvent.ts |
Event definition for deduplication progress tracking |
packages/interface/src/histories/contents/AutoBeDatabaseDeduplicationGroup.ts |
Data structure for representing duplicate table groups |
packages/ui/src/components/events/AutoBeProgressEventMovie.tsx |
UI support for displaying deduplication progress |
packages/ui/src/components/events/AutoBeEventMovie.tsx |
Event type registration for UI rendering |
packages/ui/src/structure/AutoBeListener.ts |
Event listener integration |
test/src/archive/utils/ArchiveLogger.ts |
Logging support for deduplication events |
packages/agent/src/AutoBeMockAgent.ts |
Mock agent sleep time configuration |
test/src/agent/internal/validate_interface_complement.ts |
Added failures parameter (unrelated fix) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const [namespace, name] = key.split("::"); | ||
| cluster.push({ namespace: namespace!, name: name! }); |
There was a problem hiding this comment.
Potential bug: Using split with "::" separator and non-null assertion operators without validation. If a table key doesn't contain "::" or contains multiple instances of it, this could result in incorrect namespace/name extraction. The split could return an array with unexpected length, and the non-null assertions (namespace!, name!) could mask undefined values. Consider adding validation or using a more robust key format.
packages/agent/src/orchestrate/prisma/histories/transformPrismaDeduplicationHistory.ts
Outdated
Show resolved
Hide resolved
| 2. **Check the Naming Similarity Hints first** — tables with the same normalized name are strong duplicate candidates | ||
| 3. For each target table, compare its name AND description against every table in other components | ||
| 4. If two tables serve the same purpose → group them as duplicates | ||
| 5. Call \`process({ request: { type: "complete", review: "...", duplicateGroups: [...] } })\` |
There was a problem hiding this comment.
Discrepancy in user message: The prompt instructs the agent to call process({ request: { type: "complete", review: "...", duplicateGroups: [...] } }), but the IComplete interface does not have a review field. The correct fields are analysis and rationale. This will cause the agent to fail validation when following the prompt instructions. The prompt should be updated to match the actual interface definition.
| 5. Call \`process({ request: { type: "complete", review: "...", duplicateGroups: [...] } })\` | |
| 5. Call \`process({ request: { type: "complete", analysis: "...", rationale: "...", duplicateGroups: [...] } })\` |
| ` - process: progress`, | ||
| ` - progress: (${event.completed} of ${event.total})`, | ||
| ` - namespace: ${event.namespace}`, | ||
| ` - duplicated tables: ${event.duplicateGroups.map((g) => g.tables.map((t) => t.name).join(", ")).join(", ")}`, |
There was a problem hiding this comment.
The logging format could be misleading or incorrect. This line flattens all duplicate group tables into a single comma-separated list, which makes it unclear which tables belong to which duplicate group. Consider formatting this to show the group structure more clearly, for example: duplicateGroups.map((g) => [${g.tables.map((t) => t.name).join(", ")}]).join("; ") to separate groups with semicolons or brackets.
| ` - duplicated tables: ${event.duplicateGroups.map((g) => g.tables.map((t) => t.name).join(", ")).join(", ")}`, | |
| ` - duplicated tables: ${event.duplicateGroups | |
| .map((g) => `[${g.tables.map((t) => t.name).join(", ")}]`) | |
| .join("; ")}`, |
| console.log(`----------- PRISMA AUTHORIZATION -----------`); | ||
| console.log(JSON.stringify(authorizations, null, 2)); | ||
|
|
||
| const reviewedAuthorizations: AutoBeDatabaseComponent[] = | ||
| await orchestratePrismaAuthorizationReview(ctx, { | ||
| instruction: props.instruction, | ||
| components: authorizations, | ||
| }); | ||
| console.log(`----------- PRISMA AUTHORIZATION REVIEW -----------`); | ||
| console.log(JSON.stringify(reviewedAuthorizations, null, 2)); | ||
|
|
||
| // COMPONENT | ||
| const components: AutoBeDatabaseComponent[] = | ||
| await orchestratePrismaComponent(ctx, { | ||
| instruction: props.instruction, | ||
| groups: reviewedGroups, | ||
| }); | ||
| console.log(`----------- PRISMA COMPONENT -----------`); | ||
| console.log(JSON.stringify(components, null, 2)); | ||
|
|
||
| const reviewedComponents: AutoBeDatabaseComponent[] = | ||
| await orchestratePrismaComponentReview(ctx, { | ||
| instruction: props.instruction, | ||
| components, | ||
| }); | ||
| const reviewedAllComponents: AutoBeDatabaseComponent[] = [ | ||
| ...reviewedAuthorizations, | ||
| ...reviewedComponents, | ||
| ]; | ||
| console.log(`----------- PRISMA COMPONENT REVIEW -----------`); | ||
| console.log(JSON.stringify(reviewedComponents, null, 2)); | ||
|
|
||
| const reviewedAllComponents: AutoBeDatabaseComponent[] = | ||
| AutoBeDatabaseComponentProgrammer.removeDuplicatedTable([ | ||
| ...reviewedAuthorizations, | ||
| ...reviewedComponents, | ||
| ]); | ||
|
|
||
| // DEDUPLICATION (semantic) | ||
| const deduplicatedComponents: AutoBeDatabaseComponent[] = | ||
| await orchestratePrismaDeduplication(ctx, { | ||
| instruction: props.instruction, | ||
| components: reviewedAllComponents, | ||
| }); | ||
| console.log(`----------- PRISMA DEDUPLICATION -----------`); | ||
| console.log(JSON.stringify(deduplicatedComponents, null, 2)); | ||
| console.log( | ||
| `before Tables: ${reviewedAllComponents.flatMap((c) => c.tables).length}`, | ||
| ); | ||
| console.log( | ||
| `after Tables: ${deduplicatedComponents.flatMap((c) => c.tables).length}`, | ||
| ); |
There was a problem hiding this comment.
Debug logging statements should be removed before merging to production. These console.log statements with large JSON payloads can impact performance and clutter logs in production environments. Consider using a proper logging framework with configurable log levels, or remove these statements entirely if they were only needed for development.
…aDeduplicationHistory.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
19bdb6b to
405c500
Compare
…cation Agent Prompt
405c500 to
d09f23c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const [namespace, name] = key.split("::"); | ||
| cluster.push({ namespace: namespace!, name: name! }); |
There was a problem hiding this comment.
The split operation on line 206 assumes that the key will contain exactly one "::" separator. However, if a table name or namespace contains "::" in it, this could lead to incorrect parsing. While table names and namespaces are validated elsewhere with snake_case patterns that shouldn't allow "::", it would be safer to use a more robust splitting approach (e.g., splitting with a limit of 2, or using a different separator that's guaranteed not to appear in the data).
| const [namespace, name] = key.split("::"); | |
| cluster.push({ namespace: namespace!, name: name! }); | |
| const separator = "::"; | |
| const separatorIndex = key.indexOf(separator); | |
| let namespace: string; | |
| let name: string; | |
| if (separatorIndex === -1) { | |
| // Fallback for malformed keys without the expected separator. | |
| namespace = ""; | |
| name = key; | |
| } else { | |
| namespace = key.slice(0, separatorIndex); | |
| name = key.slice(separatorIndex + separator.length); | |
| } | |
| cluster.push({ namespace, name }); |
| { name: "sale_question_answers", description: "Seller answers to customer questions" }, | ||
| { | ||
| name: "sale_reviews", | ||
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, and verified_purchase flag. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store review responses - see sale_review_replies for seller responses." |
There was a problem hiding this comment.
The description references a table sale_review_replies that doesn't exist in the examples. This appears to be an inconsistency - reviews typically don't have seller responses (unlike questions which have answers). Consider either:
- Removing the reference to
sale_review_repliesif reviews don't support responses, or - Adding
sale_review_repliesto the examples if they do support responses
Based on the Q&A pattern having separate questions/answers tables, if reviews support responses, they should follow the same pattern.
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, and verified_purchase flag. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store review responses - see sale_review_replies for seller responses." | |
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, and verified_purchase flag. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store seller responses; if seller responses are required, model them in a separate table following the Q&A pattern." |
| reason: "Requirement 3.5 specifies customer reviews on sales, but no review table exists", | ||
| table: "sale_reviews", | ||
| description: "Customer reviews and ratings for sales with helpful votes" | ||
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, verified_purchase flag, timestamps. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store review responses - see sale_review_replies for seller responses." |
There was a problem hiding this comment.
The description references a table sale_review_replies that doesn't exist in the examples. This appears to be an inconsistency - reviews typically don't have seller responses (unlike questions which have answers). Consider either:
- Removing the reference to
sale_review_repliesif reviews don't support responses, or - Adding
sale_review_repliesto the examples if they do support responses
Based on the Q&A pattern having separate questions/answers tables, if reviews support responses, they should follow the same pattern.
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, verified_purchase flag, timestamps. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store review responses - see sale_review_replies for seller responses." | |
| description: "[INPUT] Customer reviews and ratings for purchased sales. Stores review content (rating, title, body, images), customer reference, verified_purchase flag, timestamps. Created after customer receives order. Used in product page display and seller rating calculation. Does NOT store seller responses; this table only contains customer-authored feedback.", |
| } | ||
| } | ||
|
|
||
| const parent: number[] = tableKeys.map((_, i) => i); |
There was a problem hiding this comment.
| const parent: number[] = tableKeys.map((_, i) => i); | |
| const parent: number[] = tableKeys.keys(); |
https://developer.mozilla.org/ko/docs/Web/JavaScript/Reference/Global_Objects/Array/keys
| } | ||
|
|
||
| const parent: number[] = tableKeys.map((_, i) => i); | ||
| const rank: number[] = tableKeys.map(() => 0); |
There was a problem hiding this comment.
| const rank: number[] = tableKeys.map(() => 0); | |
| const rank: number[] = new Array(tableKyes.length).fill(0); |
https://developer.mozilla.org/ko/docs/Web/JavaScript/Reference/Global_Objects/Array/Array
https://developer.mozilla.org/ko/docs/Web/JavaScript/Reference/Global_Objects/Array/fill
| for (const group of groups) { | ||
| for (const table of group.tables) { | ||
| getOrCreateIndex(table.namespace, table.name); | ||
| } | ||
| } |
There was a problem hiding this comment.
| for (const group of groups) { | |
| for (const table of group.tables) { | |
| getOrCreateIndex(table.namespace, table.name); | |
| } | |
| } | |
| groups.flatMap(g => g.tables).forEach(t => getOrCreateIndex(t.namespace, t.name)); |
|
|
||
| // Restore original order and filter empty components | ||
| const result: AutoBeDatabaseComponent[] = processed | ||
| .sort((a, b) => a.second - b.second) |
There was a problem hiding this comment.
nit, It doesn't matter because it's not a large amount
Doing filter and doing the next action has fewer operations.
b7a4a05 to
fab4642
Compare
This pull request introduces a semantic deduplication step to the Prisma database orchestration pipeline. It adds a new agent system prompt for deduplication, implements logic to detect and group semantically duplicate tables across components, and integrates this process into the orchestration flow. Additionally, it improves developer visibility with logging and ensures that downstream schema generation and review steps operate on deduplicated components.
Major changes include:
Semantic Deduplication Agent & Prompt
DATABASE_DEDUPLICATION.md) detailing the agent's responsibilities, semantic duplicate criteria, workflow, output format, and examples for identifying duplicate tables across components.Deduplication Orchestration Logic
transformPrismaDeduplicationHistory, which:Developer Experience
Agent Simulation
databaseDeduplicationevent type.