Raw logs and full repro artifacts are in this secret Gist: https://gist.github.com/vivek100/f3ebec62813042ec63bacb24e855e4f8
Blaxel Incident Report: Agent Drive File API Failures Through Sandbox
Summary
OpenCowork is seeing intermittent failures when accessing an Agent Drive through Blaxel sandboxes. The same flows sometimes work and sometimes fail. The failures appear in three paths:
- OpenCowork HTTP API routes that write/list/read files through a mounted sandbox drive.
- TypeScript SDK calls against
@blaxel/core.
- Python SDK calls against
blaxel.
In one app-level route test, Blaxel reports the drive mounted, but files written under /workspace are not visible through the drive listing route. In direct SDK repros, sandbox sub-APIs such as drives.list, drives.mount, process.exec, and fs.read fail with connection/fetch errors.
Important nuance: the OpenCowork frontend often surfaces these as HTTP 400 responses because our Express drive routes currently catch sandbox/SDK errors and return res.status(400).json({ error: err.message }). A frontend 400 in this area should therefore be read as "the backend drive operation failed" unless the response body is one of our explicit validation errors such as Invalid drive path, No files provided, Too many files, Missing dataBase64, or exceeds 5MB upload limit.
Please inspect Blaxel logs for the sandbox names below around the UTC timestamps listed.
Environment
- Blaxel workspace:
openclawguy
- Sandbox image/template:
template-guardian
- Agent Drive name/id:
open-cowork-agent-drive
- Agent Drive display name:
OpenCowork Agent Drive
- Agent Drive region:
us-was-1
- Drive mount path:
/workspace
- Drive path:
/
- OpenCowork agent id:
code-agent
- Session-scoped harness id format:
augment-agent-<sessionId>
- Session-scoped sandbox name format:
augment-<sessionId prefix>
- Node version from failing app test:
v22.13.1
- TypeScript SDK package:
@blaxel/core@0.2.79
- Python SDK package path in stack trace:
blaxel.core.sandbox.default.*
No API keys are included in this report.
Repro Run 1: OpenCowork HTTP API Through Mounted Sandbox
Run time: 2026-05-13T18:00Z approximate.
Command:
cd <open-cowork>/personalV0/augment/server
npm run test:e2e:drive-api
Observed run identifiers:
- Session id prefix:
78e63172
- Inferred sandbox name prefix:
augment-78e63172
- Drive:
open-cowork-agent-drive
- Mount returned by
drives.list: {"driveName":"open-cowork-agent-drive","mountPath":"/workspace","drivePath":"/"}
Relevant output:
> @augment/server@0.1.0 test:e2e:drive-api
> tsx src/tests/e2e-drive-api.ts
[drives.list] raw response: {"mounts":[]}
ok: session created - 78e63172
ok: workspace registered
ok: drive registered - open-cowork-agent-drive
ok: drive endpoint returns id - open-cowork-agent-drive
ok: drive endpoint returns mount path - /workspace
[drives.list] raw response: {"mounts":[{"driveName":"open-cowork-agent-drive","mountPath":"/workspace","drivePath":"/"}]}
Error: root lists demo directory failed
at assertCheck (...\src\tests\e2e-drive-api.ts:20:11)
at <anonymous> (...\src\tests\e2e-drive-api.ts:95:3)
What the test does:
- Creates a session with
workspaceProvider=sandbox, sandboxProvider=blaxel.
- Provisions a real Blaxel sandbox and mounts
open-cowork-agent-drive at /workspace.
- Uses
sandbox.process.exec to run:
mkdir -p /workspace/demo/subdir &&
printf "hello from agent drive\n" > /workspace/demo/hello.txt &&
printf "nested file\n" > /workspace/demo/subdir/nested.txt
- Calls the OpenCowork HTTP route
GET /api/sessions/:id/drive/files?path=/.
Expected:
- Root listing includes
/demo.
Actual:
- Blaxel mount list reports the drive mounted.
- The API does not see
/demo, so the test fails at root lists demo directory.
Local captured log:
personalV0/augment/server/drive-api-failure-2026-05-13.log
Control Run: Same Drive Can Work Immediately After Mount
Run time: 2026-05-13T18:08Z approximate.
Sandbox intentionally left for Blaxel inspection:
- Sandbox name:
open-cowork-drive-incident-loop-20260513
- Workspace:
openclawguy
- Region:
us-was-1
- Image/template:
template-guardian
- Drive:
open-cowork-agent-drive
- Mount path:
/workspace
This run created a fresh sandbox, mounted the same drive, then performed eight repeated write/read/list attempts. All attempts passed, both immediately after write and after a 1.5 second delay.
Relevant output excerpt:
{"at":"2026-05-13T18:08:09.128Z","label":"drives.mount","ok":true,"value":{"success":true,"message":"Drive mounted successfully","driveName":"open-cowork-agent-drive","mountPath":"/workspace","drivePath":"/"}}
{"at":"2026-05-13T18:08:09.214Z","label":"drives.list.afterMount","ok":true,"value":[{"driveName":"open-cowork-agent-drive","mountPath":"/workspace","drivePath":"/"}]}
{"at":"2026-05-13T18:08:09.407Z","label":"attempt.1.process.write","ok":true}
{"at":"2026-05-13T18:08:09.643Z","label":"attempt.1.fs.read.immediate","ok":true,"value":"loop-1-1778695689214\n"}
{"at":"2026-05-13T18:08:09.910Z","label":"attempt.1.process.list.immediate","ok":true}
...
{"at":"2026-05-13T18:08:26.852Z","label":"attempt.8.fs.read.afterDelay","ok":true,"value":"loop-8-1778695704901\n"}
{"at":"2026-05-13T18:08:26.992Z","label":"attempt.8.process.list.afterDelay","ok":true}
Interpretation of this control run:
- The failure is not deterministic.
- The evidence does not support "all reads immediately after mount fail."
- The same drive and template can work immediately after mount in a fresh sandbox.
- This makes the issue look intermittent or dependent on sandbox instance/readiness/state, rather than a simple required propagation delay.
Local captured log:
personalV0/augment/server/drive-loop-probe-2026-05-13.log
Follow-Up Probe: Existing Sandboxes After Standby
Run time: 2026-05-13T18:10Z approximate.
Command output captured in:
personalV0/augment/server/drive-old-sandbox-probe-2026-05-13.log
This probe reused three already-created incident sandboxes and performed five write/read/list rounds on each.
Summary:
open-cowork-drive-incident-loop-20260513
ok=16 fail=2
FAIL 2026-05-13T18:10:22.090Z drives.list.initial: TypeError: fetch failed
FAIL 2026-05-13T18:10:22.137Z process.pwd: TypeError: fetch failed
open-cowork-drive-incident-20260513
ok=18 fail=0
open-cowork-drive-incident-20260513-py
ok=17 fail=1
FAIL 2026-05-13T18:10:36.998Z process.pwd: TypeError: fetch failed
Interpretation:
- This does not look like "old sandboxes always fail."
- It also does not look like "new sandboxes always work."
- The strongest pattern from this probe is first-call flakiness after an existing sandbox is in
STANDBY and then reused.
- Once the sandbox accepts a successful operation, subsequent write/read/list calls usually succeed in the same short window.
- This pattern matches the frontend symptom: opening the file viewer or refreshing the tree can produce a transient HTTP
400/backend error, but a later refresh may work.
Repro Run 2: Direct TypeScript SDK
Run time: 2026-05-13T18:02Z approximate.
Sandbox intentionally left for Blaxel inspection:
- Sandbox name:
open-cowork-drive-incident-20260513
- Workspace:
openclawguy
- Region:
us-was-1
- Image/template:
template-guardian
- Drive:
open-cowork-agent-drive
- Mount path:
/workspace
Command shape:
import 'dotenv/config';
import { SandboxInstance, DriveInstance } from '@blaxel/core';
const sandbox = await SandboxInstance.createIfNotExists({
name: 'open-cowork-drive-incident-20260513',
image: process.env.BL_SANDBOX_TEMPLATE || 'blaxel/base-image:latest',
memory: 2048,
region: 'us-was-1',
});
await DriveInstance.createIfNotExists({
name: 'open-cowork-agent-drive',
region: 'us-was-1',
displayName: 'OpenCowork Agent Drive',
});
await sandbox.drives.list();
await sandbox.drives.mount({
driveName: 'open-cowork-agent-drive',
mountPath: '/workspace',
drivePath: '/',
});
await sandbox.process.exec({
command: "mkdir -p /workspace && printf 'incident repro\\n' > /workspace/incident.txt && sync && ls -la /workspace",
waitForCompletion: true,
workingDir: '/',
});
await sandbox.fs.read('/workspace/incident.txt');
Observed output:
--- config ---
{
"workspace": "openclawguy",
"sandboxName": "open-cowork-drive-incident-20260513",
"driveName": "open-cowork-agent-drive",
"region": "us-was-1",
"mountPath": "/workspace",
"drivePath": "/",
"image": "template-guardian"
}
--- drives.list.before.error ---
TypeError: fetch failed
at async SandboxDrive.list (.../@blaxel/core/dist/esm/sandbox/drive/drive.js:55:26)
--- drives.mount.error ---
TypeError: fetch failed
at async SandboxDrive.mount (.../@blaxel/core/dist/esm/sandbox/drive/drive.js:17:26)
[drives.list] raw response: {"mounts":[]}
--- drives.list.after ---
[]
--- fatal.error ---
TypeError: fetch failed
at async SandboxProcess.exec (.../@blaxel/core/dist/esm/sandbox/process/process.js:111:47)
Local captured log:
personalV0/augment/server/sdk-drive-incident-2026-05-13.log
Repro Run 3: Direct Python SDK
Run time: 2026-05-13T18:03Z approximate.
Sandbox intentionally left for Blaxel inspection:
- Sandbox name:
open-cowork-drive-incident-20260513-py
- Workspace:
openclawguy
- Region:
us-was-1
- Image/template:
template-guardian
- Drive:
open-cowork-agent-drive
- Mount path:
/workspace
Command shape:
from blaxel.core import SandboxInstance
from blaxel.core.drive import DriveInstance
sandbox = await SandboxInstance.create_if_not_exists({
"name": "open-cowork-drive-incident-20260513-py",
"image": "template-guardian",
"memory": 2048,
"region": "us-was-1",
})
drive = await DriveInstance.create_if_not_exists({
"name": "open-cowork-agent-drive",
"region": "us-was-1",
"display_name": "OpenCowork Agent Drive",
})
await sandbox.drives.list()
await sandbox.drives.mount(
drive_name="open-cowork-agent-drive",
mount_path="/workspace",
drive_path="/",
)
await sandbox.process.exec({
"command": "mkdir -p /workspace && printf 'python incident repro\\n' > /workspace/python-incident.txt && sync",
"wait_for_completion": True,
"working_dir": "/",
})
await sandbox.fs.read("/workspace/python-incident.txt")
Observed output:
--- config ---
{
"workspace": "openclawguy",
"sandboxName": "open-cowork-drive-incident-20260513-py",
"driveName": "open-cowork-agent-drive",
"region": "us-was-1",
"mountPath": "/workspace",
"drivePath": "/",
"image": "template-guardian"
}
--- drives.list.before.error ---
httpx.ConnectError
File "...site-packages\\blaxel\\core\\sandbox\\default\\drive.py", line 72, in list
response = await client.get("/drives/mount")
--- drives.mount.error ---
httpx.ConnectError
File "...site-packages\\blaxel\\core\\sandbox\\default\\drive.py", line 40, in mount
response = await client.post("/drives/mount", json=payload)
--- process.exec.write.error ---
httpx.ConnectError
File "...site-packages\\blaxel\\core\\sandbox\\default\\process.py", line 252, in exec
response = await client.post("/process", json=process.to_dict())
--- sandbox.fs.read.error ---
httpx.ConnectError
File "...site-packages\\blaxel\\core\\sandbox\\default\\filesystem.py", line 156, in read
response = await client.get(f"/filesystem/{path}")
Local captured log:
personalV0/augment/server/python-sdk-drive-incident-2026-05-13.log
Expected Behavior
sandbox.drives.list() should reliably return mounted drives.
sandbox.drives.mount() should mount open-cowork-agent-drive at /workspace or return a typed API error.
sandbox.process.exec() should execute inside the sandbox once SandboxInstance.createIfNotExists returns a sandbox.
- Files written under
/workspace should be visible through both:
- shell/process reads inside the sandbox
- filesystem/list/read APIs used by the SDK and our HTTP routes
Actual Behavior
- TypeScript SDK intermittently returns
TypeError: fetch failed for sandbox drive/process APIs.
- Python SDK returns
httpx.ConnectError for sandbox drive/process/filesystem APIs.
- In the OpenCowork HTTP route repro,
drives.list reports the drive mounted, but a file tree written into /workspace is not visible via the drive listing API.
Interpretation
- This looks like a flaky sandbox/drive API or sandbox reachability issue, not a deterministic OpenCowork-only validation problem.
- A fresh control sandbox at
2026-05-13T18:08Z succeeded on immediate post-mount write/read/list operations for eight attempts, so the issue is not simply "read immediately after mount always fails."
- A follow-up probe at
2026-05-13T18:10Z saw first-call failures after sandbox standby/resume, followed by successful operations. That is the strongest current lead.
- Both SDKs fail at the sandbox API boundary in the fixed repros, before OpenCowork-specific file parsing or preview logic.
- The Python stack traces show failures calling sandbox API endpoints:
GET /drives/mount
POST /drives/mount
POST /process
GET /filesystem/{path}
- The TypeScript stack traces show the same classes of failures in
@blaxel/core.
- The mounted drive is reported by Blaxel in one repro, but drive file visibility is inconsistent afterward.
- OpenCowork should improve its own error mapping so SDK/sandbox connection failures are not returned as generic HTTP
400, but that mapping does not explain the underlying SDK connection failures.
Request for Blaxel
Please inspect logs/metrics for:
- Workspace:
openclawguy
- Sandbox:
open-cowork-drive-incident-20260513
- Sandbox:
open-cowork-drive-incident-20260513-py
- Session/sandbox prefix from HTTP repro:
78e63172 / augment-78e63172...
- Drive:
open-cowork-agent-drive
- Time window:
2026-05-13T18:00:00Z to 2026-05-13T18:05:00Z
Questions:
- Are the sandbox internal API endpoints failing to come up or becoming unreachable after sandbox creation/resume?
- Are Agent Drive mounts succeeding but not surfacing a consistent filesystem view at
/workspace?
- Is
template-guardian missing something required for sandbox API/drive support, or is this happening below the template layer?
- Is there an account/quota/rate-limit condition that would cause
fetch failed/httpx.ConnectError instead of a typed Blaxel API error?
- Can the SDKs expose the underlying URL/status/error body for these sandbox API connection failures?
Raw logs and full repro artifacts are in this secret Gist: https://gist.github.com/vivek100/f3ebec62813042ec63bacb24e855e4f8
Blaxel Incident Report: Agent Drive File API Failures Through Sandbox
Summary
OpenCowork is seeing intermittent failures when accessing an Agent Drive through Blaxel sandboxes. The same flows sometimes work and sometimes fail. The failures appear in three paths:
@blaxel/core.blaxel.In one app-level route test, Blaxel reports the drive mounted, but files written under
/workspaceare not visible through the drive listing route. In direct SDK repros, sandbox sub-APIs such asdrives.list,drives.mount,process.exec, andfs.readfail with connection/fetch errors.Important nuance: the OpenCowork frontend often surfaces these as HTTP
400responses because our Express drive routes currently catch sandbox/SDK errors and returnres.status(400).json({ error: err.message }). A frontend400in this area should therefore be read as "the backend drive operation failed" unless the response body is one of our explicit validation errors such asInvalid drive path,No files provided,Too many files,Missing dataBase64, orexceeds 5MB upload limit.Please inspect Blaxel logs for the sandbox names below around the UTC timestamps listed.
Environment
openclawguytemplate-guardianopen-cowork-agent-driveOpenCowork Agent Driveus-was-1/workspace/code-agentaugment-agent-<sessionId>augment-<sessionId prefix>v22.13.1@blaxel/core@0.2.79blaxel.core.sandbox.default.*No API keys are included in this report.
Repro Run 1: OpenCowork HTTP API Through Mounted Sandbox
Run time:
2026-05-13T18:00Zapproximate.Command:
Observed run identifiers:
78e63172augment-78e63172open-cowork-agent-drivedrives.list:{"driveName":"open-cowork-agent-drive","mountPath":"/workspace","drivePath":"/"}Relevant output:
What the test does:
workspaceProvider=sandbox,sandboxProvider=blaxel.open-cowork-agent-driveat/workspace.sandbox.process.execto run:GET /api/sessions/:id/drive/files?path=/.Expected:
/demo.Actual:
/demo, so the test fails atroot lists demo directory.Local captured log:
personalV0/augment/server/drive-api-failure-2026-05-13.logControl Run: Same Drive Can Work Immediately After Mount
Run time:
2026-05-13T18:08Zapproximate.Sandbox intentionally left for Blaxel inspection:
open-cowork-drive-incident-loop-20260513openclawguyus-was-1template-guardianopen-cowork-agent-drive/workspaceThis run created a fresh sandbox, mounted the same drive, then performed eight repeated write/read/list attempts. All attempts passed, both immediately after write and after a 1.5 second delay.
Relevant output excerpt:
Interpretation of this control run:
Local captured log:
personalV0/augment/server/drive-loop-probe-2026-05-13.logFollow-Up Probe: Existing Sandboxes After Standby
Run time:
2026-05-13T18:10Zapproximate.Command output captured in:
personalV0/augment/server/drive-old-sandbox-probe-2026-05-13.logThis probe reused three already-created incident sandboxes and performed five write/read/list rounds on each.
Summary:
Interpretation:
STANDBYand then reused.400/backend error, but a later refresh may work.Repro Run 2: Direct TypeScript SDK
Run time:
2026-05-13T18:02Zapproximate.Sandbox intentionally left for Blaxel inspection:
open-cowork-drive-incident-20260513openclawguyus-was-1template-guardianopen-cowork-agent-drive/workspaceCommand shape:
Observed output:
Local captured log:
personalV0/augment/server/sdk-drive-incident-2026-05-13.logRepro Run 3: Direct Python SDK
Run time:
2026-05-13T18:03Zapproximate.Sandbox intentionally left for Blaxel inspection:
open-cowork-drive-incident-20260513-pyopenclawguyus-was-1template-guardianopen-cowork-agent-drive/workspaceCommand shape:
Observed output:
Local captured log:
personalV0/augment/server/python-sdk-drive-incident-2026-05-13.logExpected Behavior
sandbox.drives.list()should reliably return mounted drives.sandbox.drives.mount()should mountopen-cowork-agent-driveat/workspaceor return a typed API error.sandbox.process.exec()should execute inside the sandbox onceSandboxInstance.createIfNotExistsreturns a sandbox./workspaceshould be visible through both:Actual Behavior
TypeError: fetch failedfor sandbox drive/process APIs.httpx.ConnectErrorfor sandbox drive/process/filesystem APIs.drives.listreports the drive mounted, but a file tree written into/workspaceis not visible via the drive listing API.Interpretation
2026-05-13T18:08Zsucceeded on immediate post-mount write/read/list operations for eight attempts, so the issue is not simply "read immediately after mount always fails."2026-05-13T18:10Zsaw first-call failures after sandbox standby/resume, followed by successful operations. That is the strongest current lead.GET /drives/mountPOST /drives/mountPOST /processGET /filesystem/{path}@blaxel/core.400, but that mapping does not explain the underlying SDK connection failures.Request for Blaxel
Please inspect logs/metrics for:
openclawguyopen-cowork-drive-incident-20260513open-cowork-drive-incident-20260513-py78e63172/augment-78e63172...open-cowork-agent-drive2026-05-13T18:00:00Zto2026-05-13T18:05:00ZQuestions:
/workspace?template-guardianmissing something required for sandbox API/drive support, or is this happening below the template layer?fetch failed/httpx.ConnectErrorinstead of a typed Blaxel API error?