fix: prevent duplicate features in bulk creation #148

exbald · 2026-01-31T16:40:11Z

Problem

When feature_create_bulk is called multiple times (e.g., from expand sessions or if the initializer runs again), it creates duplicate features with the same (category, name) pairs. This can lead to hundreds of duplicate features in the database.

Solution

Added deduplication logic to feature_create_bulk:

Query existing features - Check (category, name) pairs already in the database
Filter input batch - Skip features that already exist or appear multiple times in the same batch
Report skipped count - Return skipped_duplicates in the response so callers know how many were filtered

Changes

mcp_server/feature_mcp.py: Added deduplication before feature creation

Testing

Tested on a project that had 607 features (should have been ~98). After identifying the duplicates came from repeated bulk creation calls, this fix prevents the issue from recurring.

Summary by CodeRabbit

Release Notes

New Features
- Bulk feature creation now automatically detects and filters duplicate entries based on category and name.
- Response includes a count of skipped duplicates for transparency.
- Improved data integrity by preventing duplicate features within a single bulk creation request.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

When feature_create_bulk is called multiple times (e.g., from expand sessions), it now checks for existing (category, name) pairs and skips duplicates instead of creating them again. Changes: - Query existing features before creation - Filter duplicates from input batch - Report skipped_duplicates count in response

coderabbitai · 2026-01-31T16:40:28Z

📝 Walkthrough

Walkthrough

The feature_create_bulk function now includes a pre-creation deduplication pass to identify and filter out duplicate (category, name) pairs from the input batch. The function now follows a four-pass process: validation, duplicate filtering, creation, and dependency resolution. The response includes a new skipped_duplicates field reporting filtered duplicates.

Changes

Cohort / File(s)	Summary
Deduplication Logic `mcp_server/feature_mcp.py`	Added pre-creation pass to collect existing (category, name) pairs and filter duplicates from input batch. Second and third passes now operate on deduplicated list. Response includes new `skipped_duplicates` field. Updated control-flow comments to reflect four-pass process.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Whiskers twitching with glee,
Duplicate features fade away,
One by one, clean and neat,
Deduplication saves the day! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: prevent duplicate features in bulk creation' directly and clearly summarizes the main change: adding deduplication logic to the bulk feature creation function to prevent duplicate (category, name) pairs.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mcp_server/feature_mcp.py (1)

549-591: ⚠️ Potential issue | 🔴 Critical

Critical bug: Dependency indices break when features are skipped.

The depends_on_indices values refer to positions in the original features list, but after filtering into unique_features, the index mapping is invalidated. This causes:

IndexError if idx >= len(unique_features) after filtering
Wrong dependencies if the dependency target was skipped (exists in DB) — the code will reference the wrong feature from created_features instead of looking up the existing feature's ID
Self-dependency in edge cases where a feature incorrectly depends on itself

Example: If features [A, B] where B has depends_on_indices: [0], and A is skipped (already exists with id=5), then unique_features = [B], created_features = [B], and B's dependency resolves to created_features[0].id (B's own ID) instead of 5.

🔧 Proposed fix: maintain index-to-ID mapping for both new and existing features

             # Second pass: check for existing features to avoid duplicates
-            existing = set()
-            for f in session.query(Feature.category, Feature.name).all():
-                existing.add((f.category, f.name))
+            existing_map = {}  # (category, name) -> feature_id
+            for f in session.query(Feature.id, Feature.category, Feature.name).all():
+                existing_map[(f.category, f.name)] = f.id

             # Filter out duplicates from input batch
             unique_features = []
+            original_idx_to_id = {}  # Maps original index -> feature ID (existing or to-be-created)
             skipped = 0
-            for feature_data in features:
+            for orig_idx, feature_data in enumerate(features):
                 key = (feature_data["category"], feature_data["name"])
-                if key in existing:
+                if key in existing_map:
+                    # Track existing feature's ID for dependency resolution
+                    original_idx_to_id[orig_idx] = existing_map[key]
                     skipped += 1
                 else:
-                    unique_features.append(feature_data)
-                    existing.add(key)  # Mark as will-exist for batch dedup
+                    unique_features.append((orig_idx, feature_data))
+                    existing_map[key] = None  # Placeholder; ID assigned after flush

             # Third pass: create unique features only
             created_features: list[Feature] = []
-            for i, feature_data in enumerate(unique_features):
+            for orig_idx, feature_data in unique_features:
                 db_feature = Feature(
-                    priority=start_priority + i,
+                    priority=start_priority + len(created_features),
                     category=feature_data["category"],
                     name=feature_data["name"],
                     description=feature_data["description"],
                     steps=feature_data["steps"],
                     passes=False,
                     in_progress=False,
                 )
                 session.add(db_feature)
                 created_features.append(db_feature)

             # Flush to get IDs assigned
             session.flush()

+            # Build complete index-to-ID mapping after flush
+            for i, (orig_idx, _) in enumerate(unique_features):
+                original_idx_to_id[orig_idx] = created_features[i].id
+
             # Fourth pass: resolve index-based dependencies to actual IDs
             deps_count = 0
-            for i, feature_data in enumerate(unique_features):
+            for i, (orig_idx, feature_data) in enumerate(unique_features):
                 indices = feature_data.get("depends_on_indices", [])
                 if indices:
-                    # Convert indices to actual feature IDs
-                    dep_ids = [created_features[idx].id for idx in indices]
+                    # Convert original indices to actual feature IDs (new or existing)
+                    dep_ids = [original_idx_to_id[idx] for idx in indices]
                     created_features[i].dependencies = sorted(dep_ids)
                     deps_count += 1

🧹 Nitpick comments (1)

mcp_server/feature_mcp.py (1)

549-552: Consider selecting only necessary columns for the existence check.

The query now retrieves all (category, name) pairs into memory. This is fine for moderate datasets, but if the feature table grows large, consider adding a database index on (category, name) or using an EXISTS subquery per feature. For now, this approach is acceptable given the described scale (~600 features).

…AutoForgeAI#148) - Updated display_derivation.py icon values to match rest of codebase (coding->code, testing->test-tube, refactoring->wrench) - feature_compiler.py now imports TASK_TYPE_ICONS and DEFAULT_ICON from display_derivation instead of defining its own duplicate constants - spec_builder.py already delegates to display_derivation (prior session) - Updated all test assertions to match consolidated icon values - Added 34 tests in test_feature_148_display_consolidation.py verifying: - Single source of truth (no duplicate definitions) - Consistent icon values across all code paths - Import chain correctness (feature_compiler -> display_derivation) - AgentSpecs created via any path get correct display values Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Created comprehensive E2E test suite to verify ChatTab handles 404 errors and missing conversations gracefully, as implemented in Features AutoForgeAI#146, AutoForgeAI#147, and AutoForgeAI#148. Test File: ui/e2e/chattab-errors.spec.ts (599 lines, 10 test scenarios) Test Scenarios: 1. ChatTab loads without crashes when project is selected 2. No 404 errors when conversation list is empty 3. No TypeError when conversation is missing or deleted 4. Console shows warnings (not errors) for 404s 5. ChatTab recovers gracefully from API errors 6. Dev Mode toggle resets ChatTab state correctly 7. No console errors when opening browser DevTools 8. No errors on rapid project switching 9. Empty state message displayed when no conversations 10. Validate network requests for conversation endpoints Key Features: - Helper functions for project selection, ChatTab switching, console tracking - Comprehensive error detection (404, TypeError, crashes) - Screenshot assertions for visual verification (13 screenshots) - Network request validation - Console error tracking - Graceful degradation testing Related Features: - Feature AutoForgeAI#146: ChatTab 404 Error Handling - Feature AutoForgeAI#147: Defensive Programming for ChatTab - Feature AutoForgeAI#148: ChatTab State Reset on Mode Switch Verification: - All 10 required test scenarios implemented - Follows existing E2E test patterns (conversation-history.spec.ts) - Uses @playwright/test framework - Can run headless or with UI - Added to E2E test suite How to Run: cd /home/stu/projects/autocoder/ui npm run test:e2e -- chattab-errors.spec.ts npm run test:e2e:ui (interactive mode) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

rudiheydra added a commit to rudiheydra/AutoBuildr that referenced this pull request Feb 2, 2026

docs: Update progress notes for Feature AutoForgeAI#148 completion

87a7235

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate features in bulk creation #148

fix: prevent duplicate features in bulk creation #148

Uh oh!

exbald commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: prevent duplicate features in bulk creation #148

Are you sure you want to change the base?

fix: prevent duplicate features in bulk creation #148

Uh oh!

Conversation

exbald commented Jan 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

exbald commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading