Skip to content

Bug fix release v0.3.46#50

Merged
abi-chatterjee merged 2 commits into
mainfrom
develop
May 15, 2026
Merged

Bug fix release v0.3.46#50
abi-chatterjee merged 2 commits into
mainfrom
develop

Conversation

@abi-chatterjee

Copy link
Copy Markdown
Contributor

Jiva v0.3.46 - Multi-Tenant Concurrency Fixes

Release Date: May 15, 2026


Summary

This release fixes a set of concurrency bugs that made the HTTP/Cloud Run interface unsafe for parallel multi-tenant use. The root cause was a shared mutable context on the StorageProvider singleton: every HTTP session called setContext() on the same object, so concurrent requests from different tenants would corrupt each other's GCS paths. Three related singleton problems in OrchestrationLogger and SessionManager compounded the issue.


Bug Fixes

1. JIVA_STORAGE_PROVIDER=gcp fell through to LocalStorageProvider

Symptom: With JIVA_STORAGE_PROVIDER=gcp set in Cloud Run, the service silently used LocalStorageProvider (ephemeral container filesystem) instead of GCS. Per-tenant MCP server lists, directives, and conversation history were never read from GCS. All GCS diagnostic logs were absent because the GCPBucketProvider code never ran.

Root cause: StorageProviderType.GCP_BUCKET has the value 'gcp-bucket', but the documented/common short form 'gcp' did not match any switch case, so the factory fell through to the default: LocalStorageProvider branch.

Fix: src/storage/factory.ts now normalises well-known aliases before the switch:

'gcp'       → StorageProviderType.GCP_BUCKET  ('gcp-bucket')
's3' | 'aws' → StorageProviderType.AWS_S3      ('aws-s3')

Both the short and canonical forms are now accepted. The Cloud Run env var JIVA_STORAGE_PROVIDER has been updated to gcp-bucket for clarity.


2. Shared StorageProvider.context caused cross-tenant GCS path corruption

Symptom: Under concurrent load, tenant A's storage operations (reading config, saving conversations, loading directives) would silently use tenant B's GCS path. Data appeared to go missing or be saved in the wrong location.

Root cause: A single GCPBucketProvider instance is created at server startup and shared across all sessions via SessionConfig.storageProvider. Its this.context field (set by setContext()) was overwritten by each new session creation. Because createSession() is async and makes several GCS calls, a concurrent setContext() call from another tenant could corrupt the path mid-flight.

Fix: Added StorageProvider.createSessionScoped(context) — a new method that returns an isolated provider instance with a fixed, immutable context. GCPBucketProvider overrides it to share the GCS Bucket client (stateless, safe for concurrent use) and the per-tenant configCache (a Map keyed by tenantId, so reads from different tenants are isolated), but gives each session its own context, logBuffer, and other mutable state. LocalStorageProvider overrides it trivially (creates a new instance with the same basePath).

SessionManager.createSession() now calls createSessionScoped() instead of setContext() and stores the returned instance in ActiveSession.storageProvider. Every storage operation for that session (config reads, MCP server loading, workspace directive, conversation history, log flushing) goes through this per-session instance. The shared singleton's context field is never mutated during request handling.

Files changed:

  • src/storage/provider.ts — added createSessionScoped()
  • src/storage/gcp-bucket-provider.ts — overrode createSessionScoped()
  • src/storage/local-provider.ts — overrode createSessionScoped()
  • src/interfaces/http/session-manager.ts — uses per-session provider throughout

3. GCPBucketProvider.setConfig() captured context after an await

Symptom: Sporadic config writes going to the wrong tenant's GCS path, particularly during session initialisation when model config was being saved for the first time.

Root cause: setConfig() called this.requireContext() and this.getConfigPath() after await this.loadConfigCache(). If a concurrent setContext() call arrived during that await, the path used for the write reflected the new tenant.

Fix: Context and config path are now captured at the very top of setConfig(), before the first await. (src/storage/gcp-bucket-provider.ts)


4. SessionManager.getOrCreateSession() had a TOCTOU race creating duplicate sessions

Symptom: Under concurrent requests for the same (tenantId, sessionId) — e.g. two simultaneous HTTP requests arriving on a cold-start session — two separate DualAgent instances and two sets of MCP sub-processes could be created for the same session key. The second one silently overwrote the first in the sessions map, leaking the sub-processes.

Fix: SessionManager now maintains a pendingSessions: Map<string, Promise<ActiveSession>> alongside the sessions map. When a creation is in progress, subsequent concurrent callers await the same Promise instead of starting a second creation. The pending entry is removed (in a finally block) whether creation succeeds or fails. (src/interfaces/http/session-manager.ts)


5. SessionManager.destroySession() saved conversation to the wrong tenant path

Symptom: When a session was destroyed (idle timeout or graceful shutdown), saveConversation() and flushLogs() were called on the shared singleton provider, which held whatever context the most recently created session had set. Under concurrent load, conversations could be saved to the wrong tenant.

Fix: destroySession() now retrieves session.storageProvider (the session-scoped instance stored in ActiveSession) and uses it for all teardown writes. (src/interfaces/http/session-manager.ts)


6. OrchestrationLogger singleton mixed events across concurrent sessions

Symptom: All Manager/Worker/Client orchestration events from all concurrent sessions were logged to the most-recently-registered session's GCS path. Debug logs from tenant A appeared in tenant B's orchestration log.

Root cause: OrchestrationLogger was a strict singleton. setStorageProvider(provider, sessionId) mutated shared this.storageProvider and this.sessionId fields; concurrent sessions overwrote each other.

Fix: The private constructor restriction is removed. The class now accepts optional (storageProvider, sessionId) constructor arguments: when both are provided it enters cloud mode immediately (no filesystem log file); when omitted it falls back to CLI mode (filesystem log in ~/.jiva/logs/). static getInstance() and the module-level orchestrationLogger singleton export are kept for CLI backward compatibility.

DualAgentConfig gains an optional orchestrationLogger?: OrchestrationLogger field. SessionManager.createSession() instantiates a fresh new OrchestrationLogger(storageProvider, sessionId) and passes it to DualAgent via this field. DualAgent stores it as this.orchLogger and passes it down to ManagerAgent, WorkerAgent, and ClientAgent constructors (each accepts an optional last parameter, defaulting to the singleton for CLI use). All orchestrationLogger.logXxx() calls in agent code became this.orchLogger.logXxx(). The deprecated setStorageProvider() method is kept (marked deprecated) so any existing integrations continue to compile.

7. Additional bug fixes from v0.3.45

The dev release v0.3.45 included additional bug fixes as documented in v0.3.45

Files changed:

  • src/utils/orchestration-logger.ts
  • src/core/dual-agent.ts
  • src/core/manager-agent.ts
  • src/core/worker-agent.ts
  • src/core/client-agent.ts
  • src/interfaces/http/session-manager.ts

Architecture Changes

Per-session provider isolation model

bootstrap()
  └─ createStorageProvider()   ← one shared GCPBucketProvider (parent)
       └─ shares: Bucket client, configCache Map

getOrCreateSession(tenantA, sessionX)
  └─ parent.createSessionScoped({tenantA, sessionX})
       └─ returns new GCPBucketProvider
            context = {tenantA, sessionX}  ← fixed, never changes
            bucket  = parent.bucket        ← shared (stateless)
            configCache = parent.configCache ← shared (keyed by tenantId)
            logBuffer = []                 ← own
       └─ stored as ActiveSession.storageProvider
       └─ new OrchestrationLogger(provider, sessionX)
            └─ stored as ActiveSession.orchestrationLogger
       └─ DualAgent({ orchestrationLogger: session.orchestrationLogger })
            └─ ManagerAgent(..., orchLogger)
            └─ WorkerAgent(..., orchLogger)

destroySession(tenantA, sessionX)
  └─ session.storageProvider.saveConversation(...)   ← always tenantA path
  └─ session.orchestrationLogger.flush()             ← always sessionX buffer

Backward compatibility

  • CLI mode: unchanged. orchestrationLogger singleton continues to work. DualAgent constructed without orchestrationLogger in config falls back to singleton.
  • JIVA_STORAGE_PROVIDER=gcp: still accepted (now aliased). gcp-bucket is the preferred value.
  • All public APIs (StorageProvider, GCPBucketProvider, agent constructors) are additive changes only — no removals.

Upgrade

npm install -g jiva-core@0.3.46

Cloud Run: update JIVA_STORAGE_PROVIDER

The environment variable value gcp now works correctly (it is aliased), but update to the canonical value for clarity:

- name: JIVA_STORAGE_PROVIDER
  value: gcp-bucket   # was: gcp (still works, but gcp-bucket is canonical)

Cloud Run: per-tenant MCP and LLM config

Each tenant's configuration lives at gs://{bucket}/{tenantId}/config.json and is now guaranteed to be read from GCS on every new session (the in-memory cache is invalidated via setContext() override). To configure per-tenant MCP servers or a different LLM, upload:

{
  "models": {
    "reasoning": { "endpoint": "...", "apiKey": "...", "model": "..." },
    "multimodal": null
  },
  "mcpServers": [
    { "name": "tavily-mcp", "command": "npx", "args": ["-y", "tavily-mcp@latest"], "env": { "TAVILY_API_KEY": "..." } }
  ]
}
gsutil cp config.json gs://your-bucket/my-tenant/config.json

@abi-chatterjee abi-chatterjee merged commit c041cce into main May 15, 2026
3 of 4 checks passed
@abi-chatterjee abi-chatterjee self-assigned this May 15, 2026
@abi-chatterjee abi-chatterjee added the bug Something isn't working label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant