Skip to content

Local server can crash with unhandled EPIPE during live heartbeat/dashboard activity #14

@aronprins

Description

@aronprins

Summary

The embedded/local Paperclip server can exit unexpectedly with an unhandled EPIPE error while the desktop app is actively viewing dashboard and heartbeat run data.

Observed user-facing message:

Exit code: 1, signal: null
...
node:events:496
throw er; // Unhandled 'error' event
^
Error: write EPIPE
...
code: 'EPIPE',
syscall: 'write'

Evidence

The captured server log showed normal request traffic immediately before the crash, including dashboard, agents, live-runs, heartbeat-runs, heartbeat run events, and workspace operations requests. The crash occurred after a burst of successful 200 responses and one 404 for a heartbeat run log endpoint.

The 404 does not currently look like the primary trigger. The stronger candidate is a raw socket/pipe write after the peer had already disconnected.

Initial hypotheses

1. Live events WebSocket upgrade path can write to a closed socket

The server's live events upgrade handler writes directly to the raw socket when rejecting an upgrade:

  • node_modules/@paperclipai/server/dist/realtime/live-events-ws.js
  • rejectUpgrade(socket, statusLine, message)
  • socket.write(...)

If the client disconnects during or just before that rejection path, this can produce an unhandled EPIPE on a Socket.

2. Electron/server stdio piping may also be vulnerable during teardown races

The Electron app spawns the server with piped stdio and forwards/logs its output:

  • src/main.ts
  • startServer() pipes child stdout/stderr
  • boot flow also writes server output to server.log

If the parent/consumer side closes unexpectedly, a later server-side write to stdout/stderr could also surface as EPIPE.

Why this looks like a product bug, not just a local one-off

  • Request traffic was otherwise healthy right before the crash.
  • The error is a transport/write failure, not a domain or database error.
  • The failure mode currently terminates the server process instead of degrading gracefully.

Suggested direction

  • Audit raw socket writes in the live events WebSocket upgrade/rejection path.
  • Add defensive error handling around upgrade rejection and other direct socket writes.
  • Review whether server stdout/stderr writes are safe during Electron app teardown.
  • Add a regression test or repro harness for client disconnects during WebSocket upgrade/reconnect races.

Source log excerpt

[09:58:24] INFO: GET /companies/.../dashboard 200
[09:58:24] WARN: GET /api/heartbeat-runs/558ebfbc-95cf-4f41-8958-b6b276e02add/log?offset=0&limitBytes=256000 404
[09:58:24] INFO: GET /heartbeat-runs/558ebfbc-95cf-4f41-8958-b6b276e02add/workspace-operations 200
[09:58:24] INFO: GET /heartbeat-runs/558ebfbc-95cf-4f41-8958-b6b276e02add/events?afterSeq=0&limit=200 200
...
Error: write EPIPE
...
code: 'EPIPE',
syscall: 'write'

Follow-up

I am doing a deeper investigation now and will comment back with the confirmed root cause, fix, and validation notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingupstream

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions