Skip to content

Commit 19219f1

Browse files
PureWeenCopilot
andauthored
Simplify built-in presets: keep PR Review Squad + add Implement & Challenge (#225)
## Summary Simplify built-in multi-agent presets from 6 down to 2: ### Kept - **📋 PR Review Squad** — Orchestrator mode, 5 specialized reviewers - **⚔️ Implement & Challenge** (new) — OrchestratorReflect adversarial loop ### Removed - Code Review Team (overlapped with PR Review Squad) - Multi-Perspective Analysis (broadcast mode, niche use case) - Quick Reflection Cycle (overlapped with Implement & Challenge) - Deep Research (niche use case) ### Implement & Challenge Preset Uses OrchestratorReflect mode with two workers: - **Implementer** (claude-sonnet-4.6): Codes the solution - **Challenger** (claude-opus-4.6): Reviews, finds issues, suggests improvements The loop iterates until the challenger is satisfied (`[[GROUP_REFLECT_COMPLETE]]`), with stall detection and max 5 iterations (overridable with `--max N`). Users who need the removed presets can recreate them as custom presets or via `.squad/` team definitions. ### Test Changes - Updated all test references from removed presets to remaining ones - All 1400 tests pass --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent b00dc11 commit 19219f1

22 files changed

Lines changed: 1396 additions & 262 deletions

.github/copilot-instructions.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,12 @@ Fast Deployment requires `dotnet build -t:Install` — it pushes assemblies to `
3535
**Package name**: `com.microsoft.PolyPilot` (not `com.companyname.PolyPilot`)
3636
**Launch activity**: `crc64ef8e1bf56c865459.MainActivity`
3737

38+
### Windows
39+
```bash
40+
dotnet build -f net10.0-windows10.0.19041.0 # Build only
41+
dotnet run -f net10.0-windows10.0.19041.0 # Build + launch
42+
```
43+
3844
### iOS (physical device)
3945
```bash
4046
dotnet build -f net10.0-ios -r ios-arm64 # Build only
@@ -68,7 +74,7 @@ PolyPilot discovers [bradygaster/squad](https://github.com/bradygaster/squad) te
6874

6975
**Squad write-back:** When saving a multi-agent group as a preset, PolyPilot writes the team definition back to `.squad/` format in the worktree root via `SquadWriter`. This creates `team.md`, `agents/{name}/charter.md`, and optional `decisions.md`/`routing.md`. The preset is also saved to `presets.json` as a personal backup. This enables round-tripping: discover → modify → save back → share via repo.
7076

71-
**Preset priority (three-tier merge):** Built-in presets < User presets (`~/.polypilot/presets.json`) < Repo teams (`.squad/`). Repo teams shadow presets with the same name. The preset picker shows three sections: "📂 From Repo", "⚙️ Built-in", and "👤 My Presets".
77+
**Preset priority (three-tier merge):** Built-in presets are always shown and cannot be overridden. User presets (`~/.polypilot/presets.json`) and repo teams (`.squad/`) with the same name as a built-in are skipped. The preset picker shows three sections: "📂 From Repo" (with delete button), "⚙️ Built-in", and "👤 My Presets".
7278

7379
**Group deletion:** Deleting a multi-agent team closes and removes all its sessions (they're meaningless without the team). Deleting a regular group moves sessions to the default group.
7480

@@ -113,6 +119,9 @@ When switching between Embedded and Persistent modes (via Settings → Save & Re
113119

114120
## Critical Conventions
115121

122+
### Build & Launch
123+
- **NEVER use `--no-build`** when running the app (`dotnet run`). Always do a full build to catch and fix compile errors before launching. Using `--no-build` can cause silent crashes from stale binaries.
124+
116125
### Git Workflow
117126
- **NEVER use `git push --force`** — always use `git push --force-with-lease` instead when a force push is needed (e.g., after a rebase). This prevents overwriting remote changes made by others.
118127
- **NEVER commit screenshots, images, or binary files** — use `git diff --stat` or `git status` before committing to verify no `.png`, `.jpg`, `.bmp`, or other image files are staged. Screenshots from PolyPilot (e.g., `screenshot_*.png`) are generated locally and must NEVER be committed. The `.gitignore` blocks common patterns, but always double-check.

PolyPilot.Tests/MultiAgentRegressionTests.cs

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1243,7 +1243,7 @@ public void OrchestratorPlanningPrompt_IncludesWorkerPersonas()
12431243
Assert.NotNull(method);
12441244

12451245
var workers = new List<string> { "sec-worker", "perf-worker" };
1246-
var result = (string)method!.Invoke(svc, new object?[] { "Review this code", workers, null, null })!;
1246+
var result = (string)method!.Invoke(svc, new object?[] { "Review this code", workers, null, null, false })!;
12471247

12481248
Assert.Contains("security auditor", result);
12491249
Assert.Contains("performance optimizer", result);
@@ -1265,12 +1265,12 @@ public void BuiltInPresets_WorkerSystemPrompts_MatchWorkerCount()
12651265
}
12661266

12671267
/// <summary>
1268-
/// Code Review Team preset has distinct personas for each worker.
1268+
/// Implement & Challenge preset has distinct personas for each worker.
12691269
/// </summary>
12701270
[Fact]
1271-
public void CodeReviewTeam_Preset_HasDistinctPersonas()
1271+
public void ImplementAndChallenge_Preset_HasDistinctPersonas()
12721272
{
1273-
var preset = GroupPreset.BuiltIn.First(p => p.Name == "Code Review Team");
1273+
var preset = GroupPreset.BuiltIn.First(p => p.Name == "Implement & Challenge");
12741274
Assert.NotNull(preset.WorkerSystemPrompts);
12751275
Assert.Equal(2, preset.WorkerSystemPrompts!.Length);
12761276
Assert.All(preset.WorkerSystemPrompts, p => Assert.False(string.IsNullOrWhiteSpace(p)));
@@ -1569,4 +1569,45 @@ public void GetOrchestratorGroupId_ReturnsNull_ForBroadcastMode()
15691569
}
15701570

15711571
#endregion
1572+
1573+
#region Bug #7: FlushedResponse — TurnEnd flush clears CurrentResponse before CompleteResponse reads it
1574+
1575+
[Fact]
1576+
public void ParseTaskAssignments_WithFlushedResponse_ReturnsAssignments()
1577+
{
1578+
// Regression: when FlushCurrentResponse runs on TurnEnd before CompleteResponse
1579+
// on SessionIdle, the orchestrator plan text was lost and ParseTaskAssignments
1580+
// returned 0 assignments — breaking worker delegation.
1581+
var workers = new List<string> { "squad-worker-1", "squad-worker-2" };
1582+
var plan = """
1583+
I'll assign review tasks to each worker.
1584+
1585+
@worker:squad-worker-1
1586+
Review the authentication module for security issues.
1587+
@end
1588+
1589+
@worker:squad-worker-2
1590+
Review the database queries for SQL injection.
1591+
@end
1592+
""";
1593+
1594+
var assignments = CopilotService.ParseTaskAssignments(plan, workers);
1595+
Assert.Equal(2, assignments.Count);
1596+
Assert.Equal("squad-worker-1", assignments[0].WorkerName);
1597+
Assert.Contains("authentication", assignments[0].Task);
1598+
Assert.Equal("squad-worker-2", assignments[1].WorkerName);
1599+
Assert.Contains("SQL injection", assignments[1].Task);
1600+
}
1601+
1602+
[Fact]
1603+
public void ParseTaskAssignments_EmptyResponse_ReturnsNoAssignments()
1604+
{
1605+
// Documents the root cause: when plan response is empty string,
1606+
// no worker assignments can be parsed.
1607+
var workers = new List<string> { "squad-worker-1" };
1608+
var assignments = CopilotService.ParseTaskAssignments("", workers);
1609+
Assert.Empty(assignments);
1610+
}
1611+
1612+
#endregion
15721613
}
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
using System.Collections.Concurrent;
2+
using PolyPilot.Models;
3+
using PolyPilot.Services;
4+
5+
namespace PolyPilot.Tests;
6+
7+
/// <summary>
8+
/// Tests for the PendingReasoningMessages dedup mechanism that prevents
9+
/// duplicate reasoning ChatMessages when rapid deltas arrive before
10+
/// InvokeOnUI fires.
11+
/// </summary>
12+
public class ReasoningDedupTests
13+
{
14+
[Fact]
15+
public void PendingReasoningMessages_PreventsDuplicateCreation()
16+
{
17+
// Simulates the race: two rapid deltas for the same reasoningId
18+
// arrive before InvokeOnUI adds the first to History.
19+
var pending = new ConcurrentDictionary<string, ChatMessage>();
20+
var history = new List<ChatMessage>();
21+
var reasoningId = "reason-1";
22+
23+
// First delta: not in pending, not in history → create new
24+
var msg1 = pending.GetValueOrDefault(reasoningId)
25+
?? history.LastOrDefault(m => m.MessageType == ChatMessageType.Reasoning && m.ReasoningId == reasoningId);
26+
27+
Assert.Null(msg1); // nothing found → would create new
28+
29+
var newMsg = ChatMessage.ReasoningMessage(reasoningId);
30+
pending[reasoningId] = newMsg;
31+
32+
// Second delta: found in pending map → reuse existing
33+
var msg2 = pending.GetValueOrDefault(reasoningId)
34+
?? history.LastOrDefault(m => m.MessageType == ChatMessageType.Reasoning && m.ReasoningId == reasoningId);
35+
36+
Assert.NotNull(msg2);
37+
Assert.Same(newMsg, msg2); // Same object, no duplicate
38+
39+
// Simulate InvokeOnUI completing: move to history, remove from pending
40+
history.Add(newMsg);
41+
pending.TryRemove(reasoningId, out _);
42+
43+
// Third delta: now found in history → still no duplicate
44+
var msg3 = pending.GetValueOrDefault(reasoningId)
45+
?? history.LastOrDefault(m => m.MessageType == ChatMessageType.Reasoning && m.ReasoningId == reasoningId);
46+
47+
Assert.NotNull(msg3);
48+
Assert.Same(newMsg, msg3);
49+
Assert.Single(history); // Only one message ever created
50+
}
51+
52+
[Fact]
53+
public void PendingReasoningMessages_DifferentIds_CreateSeparateMessages()
54+
{
55+
var pending = new ConcurrentDictionary<string, ChatMessage>();
56+
57+
var msg1 = ChatMessage.ReasoningMessage("reason-1");
58+
var msg2 = ChatMessage.ReasoningMessage("reason-2");
59+
pending["reason-1"] = msg1;
60+
pending["reason-2"] = msg2;
61+
62+
Assert.Equal(2, pending.Count);
63+
Assert.NotSame(pending["reason-1"], pending["reason-2"]);
64+
}
65+
66+
[Fact]
67+
public void PendingReasoningMessages_ClearedOnNewTurn()
68+
{
69+
// Verify that pending map is properly clearable (as done in SendPromptAsync/CompleteResponse)
70+
var pending = new ConcurrentDictionary<string, ChatMessage>();
71+
pending["reason-1"] = ChatMessage.ReasoningMessage("reason-1");
72+
pending["reason-2"] = ChatMessage.ReasoningMessage("reason-2");
73+
74+
Assert.Equal(2, pending.Count);
75+
76+
pending.Clear();
77+
78+
Assert.Empty(pending);
79+
}
80+
81+
[Fact]
82+
public void PendingReasoningMessages_ConcurrentAccess_IsThreadSafe()
83+
{
84+
// ConcurrentDictionary should handle simultaneous reads/writes without corruption
85+
var pending = new ConcurrentDictionary<string, ChatMessage>();
86+
var reasoningId = "reason-concurrent";
87+
88+
// Simulate 10 concurrent deltas for the same reasoningId
89+
var results = new ConcurrentBag<ChatMessage>();
90+
Parallel.For(0, 10, _ =>
91+
{
92+
var existing = pending.GetOrAdd(reasoningId, _ => ChatMessage.ReasoningMessage(reasoningId));
93+
results.Add(existing);
94+
});
95+
96+
// Only one entry in the dictionary
97+
Assert.Single(pending);
98+
// All 10 callers got the same object reference
99+
var first = results.First();
100+
Assert.All(results, r => Assert.Same(first, r));
101+
}
102+
}

0 commit comments

Comments
 (0)