NMPC fallback never clears on slow-but-feasible solves (test_fallback_clears_on_success regression in CI)

## Symptom

`tests/test_nmpc_controller.py::TestNMPCControllerFallback::test_fallback_clears_on_success` fails on GitHub Actions runners but passes on a local laptop. Surfaced after PR #1 (runbook-sync gate fix) — the runbook gate had been masking it.

```
E   assert True is False
   assert r['nmpc_fallback_active'] is False
```

## Root cause

`control/nmpc_controller.py:937-952` treats two distinct conditions as a single `failed` flag:

```python
is_timeout = result.get('solve_time_ms', 0.0) > p.max_solve_ms   # 20 ms budget
failed = (not result['feasible']) or is_timeout
if failed:
    self._consecutive_failures += 1
    if self._consecutive_failures >= p.max_consecutive_failures:
        self._fallback_active = True
    # NOTE: fallback never clears in this branch
else:
    self._consecutive_failures = 0
    self._fallback_active = False     # only path that clears
```

When the SLSQP solver returns `feasible=True` but exceeds the 20 ms wall-clock budget (common on GH Actions runners, where the perf tests already had to be CI-skipped at 5/8/20 ms), `is_timeout` is True → `failed` is True → fallback never clears even though the math worked.

The test's intent is "fallback clears when the solver succeeds." Production code's effective definition of "succeeds" is "feasible AND fast." That mismatch is the bug.

## Two ways to resolve

### Option A — test-side (low risk, doesn't touch production)

Pin `solve_time_ms = 0.0` in both branches of the test mock so the test exercises only the fallback state machine, not the timeout policy:

```python
def sometimes_fail(*args, **kwargs):
    r = orig_solve(*args, **kwargs)
    if fail_mode[0]:
        r['feasible'] = False
        r['solve_time_ms'] = 0.0
    else:
        r['solve_time_ms'] = 0.0   # ← add this line
    return r
```

Honest with the test's stated intent. Doesn't change runtime behavior. CI green on the next push.

### Option B — production-side (semantic fix, needs review)

A 21 ms feasible solve probably shouldn't *re-enter* fallback or block clearing it. Slow solves should emit a separate "slow" telemetry signal, not equate with infeasibility. Sketch:

```python
is_timeout = result.get('solve_time_ms', 0.0) > p.max_solve_ms
infeasible = not result['feasible']
if infeasible:
    self._consecutive_failures += 1
    if self._consecutive_failures >= p.max_consecutive_failures:
        self._fallback_active = True
    result['steering_normalized'] = self._last_steering
elif is_timeout:
    # slow but valid: keep the result, leave fallback state untouched
    self._last_steering = result['steering_normalized']
else:
    self._consecutive_failures = 0
    self._fallback_active = False
    self._last_steering = result['steering_normalized']
```

This is more correct but changes the controller's failure model. Wants a real driving review (does a single slow frame justify dropping to LMPC for the next ~3 cycles?).

## Recommendation

Apply **Option A** to unblock CI immediately. Defer **Option B** until someone with controller-tuning context can confirm whether slow-feasible solves should affect fallback state.

## Related

- `test_no_oscillation_at_q4` (MPC delay compensation) — separate logic regression also surfaced after PR #1's runbook-sync fix. Different file, different root cause; punted to its own ticket.
- 3 perf tests already CI-skipped via `@pytest.mark.skipif(_IN_CI, reason=...)` in PR #1 commit `71b7a4d`. This NMPC test is the same family of issue (CI hardware too slow for an ms-level budget) but on a different test that wasn't part of that batch.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NMPC fallback never clears on slow-but-feasible solves (test_fallback_clears_on_success regression in CI) #2

Symptom

Root cause

Two ways to resolve

Option A — test-side (low risk, doesn't touch production)

Option B — production-side (semantic fix, needs review)

Recommendation

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

NMPC fallback never clears on slow-but-feasible solves (test_fallback_clears_on_success regression in CI) #2

Description

Symptom

Root cause

Two ways to resolve

Option A — test-side (low risk, doesn't touch production)

Option B — production-side (semantic fix, needs review)

Recommendation

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions