Fix auth-handoff cookies not reaching parallel-session agent leases#1
Merged
Merged
Conversation
The auth portal logs the human in against an identity's BASE profile dir
(auth.py launches Chrome with --user-data-dir={identity.profile_dir}), but
identities with max_parallel_sessions > 1 serve agent leases from per-slot
replica copies under profiles/.replicas/<identity>/<slot>. Replicas are only
rsynced from base when a slot is (re)activated, and the lease path deliberately
reuses warm replicas without re-syncing. A replica synced before the human's
login therefore never picks up the new cookies, so the agent lease sees a
logged-out session (e.g. api.slack.com still logged out after a portal login).
Root-cause fix, preserving per-slot replica isolation:
- pool.lease: when a lease would be served from a warm replica whose cookie DB
predates the identity's base profile, force re-activation so the replica is
re-synced from base before it is handed out (_replica_is_stale_against_base,
_cookie_db_mtime). Self-healing for any base profile update, not just auth.
- identities.invalidate_identity_replicas: on auth completion, drop stale
replicas + their slot configs for every NON-leased slot of the identity so
the next lease rebuilds them from the freshly-authenticated base. Slots with
an active foreign lease are skipped (the pool freshness guard re-syncs them on
their next lease) so live sessions are never corrupted.
- api auth_complete: stop the portal browser FIRST (flush cookies to disk),
then invalidate replicas and verify the target-origin cookie actually landed
in the base profile, surfacing a cookie_verification block + warning event.
Verified against real diverged on-disk profiles: chrome-depontefede and
chrome-rocketlist replicas with 0 target-origin cookies are correctly flagged
stale and would re-sync from base (which holds the authenticated cookies).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A human completed a login in the broker auth portal (auth_request → portal_url → noVNC, identity
chrome-depontefede), but a concurrent/subsequent agentbrowser_leaseon the SAME identity did NOT see the logged-in session —api.slack.comstill showed logged-out in the agent's lease tab. The portal's authenticated cookies did not become visible to the agent lease, breaking the "human logs in once, agent continues" handoff.Root cause
--user-data-dir={identity.profile_dir}(auth.py:451) — i.e. the identity's base profile dir. The login cookies land there.max_parallel_sessions > 1(chrome-depontefede= 3), agent leases are served from per-slot replicas underprofiles/.replicas/<identity>/<slot>(pool._identity_profile_for_slot).write_slot_config→_sync_replica_profile), and the lease path deliberately reuses warm replicas without re-syncing (test_identity_lease_prefers_warm_active_replica_without_resync).Verified empirically: base
chrome-depontefedeprofile had 23 slack cookies; replicapool-bhad 0,pool-dhad 0. Same pattern onchrome-rocketlist(base 11 supabase cookies, replicapool-c0).Fix (root cause, preserves replica isolation)
pool.py): when a lease would be served from a warm replica whose cookie DB predates the identity's base profile, force re-activation so the replica re-syncs from base before it is handed out. Self-healing for any base update, not only auth.identities.invalidate_identity_replicas): on/auth/{token}/complete, drop stale replicas + slot configs for every non-leased slot of the identity so the next lease rebuilds from the freshly-authenticated base. Slots with an active foreign lease are skipped (guard re-syncs on their next lease) so live sessions are never corrupted.api._verify_auth_cookie_landed): after stopping the portal browser (flush to disk), verify the target-origin cookie actually landed in the base profile; surface acookie_verificationblock + warning telemetry if not.Verification
chrome-depontefedeandchrome-rocketlistreplicas with 0 target-origin cookies are correctly flagged stale and would re-sync from base.Deploy note
The broker runtime is the
/root/ax-browser-brokercheckout (uvicorn imports from cwd). Applying needs a broker restart, which was deliberately NOT performed because 2 active foreign leases (kimi-baradona-gscon pool-a/pool-b) were live. Restart only after leases drain.🤖 Generated with Claude Code