Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/ssh-dashboard-transport.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# SSH dashboard transport — design for full support (issue #667)

Status: **design / not yet implemented.** A reliable legacy-HTTP fallback shipped first (see
"Shipped now" below). This document captures the design for making the _dashboard_ chat transport
(profile switching, session history, slash commands, background prompts) work over an SSH tunnel.

## Background: why SSH dashboard chat is broken

The desktop's dashboard chat transport speaks WebSocket JSON-RPC at **`/api/ws`**
(`src/renderer/src/screens/Chat/dashboardGatewayClient.ts`, URL built in `src/main/dashboard.ts`).

In `hermes-agent`, `/api/ws` is served **only** by `hermes dashboard`
(`hermes_cli/web_server.py` → `start_server`, gated by `_DASHBOARD_EMBEDDED_CHAT_ENABLED = True`). It is
**never** served by `hermes gateway` (`gateway/...`, the api_server, which serves `/v1/chat/completions`,
`/health`, etc.).

But SSH mode today:

1. Starts `hermes gateway start` on the remote — `buildGatewayStartCommand` in `src/main/ssh-remote.ts`.
2. Tunnels the **gateway** port (`config.remotePort`, default 8642) — `ensureSshTunnel` in
`src/main/ssh-tunnel.ts`.
3. Connects `ws://127.0.0.1:{tunnelPort}/api/ws` — which 404s on the gateway.

A second, independent blocker: `/api/ws` authenticates with `HERMES_DASHBOARD_SESSION_TOKEN`
(`web_server.py` `_SESSION_TOKEN`; `?token=<…>` on loopback), but the SSH path passes the remote
`API_SERVER_KEY` (`sshReadRemoteApiKey` in `src/main/ssh-remote.ts`). Even a correctly-tunneled
dashboard would reject the WS upgrade.

## Shipped now: reliable legacy fallback

`src/renderer/src/screens/Chat/hooks/useDashboardChatTransport.ts` now latches a sticky
`dashboardUnavailableRef` on the first failed `ensureClient` for a remote/SSH connection, so subsequent
messages fall back to the working legacy HTTP transport (`/v1/chat/completions` through the tunnel)
_immediately_ instead of re-running the multi-second status+probe each time. It also fires
`onDashboardUnavailable` once, which `Chat.tsx` surfaces as a one-time toast. The flag resets on any
connection change. This makes SSH chat **work** (degraded: no profile switching / session history /
dashboard slash commands).

## Full design: run a remote `hermes dashboard` and tunnel it

Goal: restore the dashboard transport over SSH by talking to a real remote `hermes dashboard`.

### 1. Start the remote dashboard (not the gateway)

Add `sshStartDashboard(config, sessionToken, port)` in `src/main/ssh-remote.ts`, mirroring
`buildGatewayStartCommand`. It should run, detached:

```
HERMES_DASHBOARD_SESSION_TOKEN=<sessionToken> \
nohup hermes dashboard --no-open --host 127.0.0.1 --port <port> \
> $HOME/.hermes/dashboard.log 2>&1 &
```

This mirrors the **local** spawn in `src/main/hermes.ts:559` (`dashboard --no-open --host 127.0.0.1
--port <port>`, gated by `HERMES_DASHBOARD_SESSION_TOKEN`) — reuse that flag/arg shape. Add a matching
status/stop command pair (`buildDashboardStatusCommand` / `buildDashboardStopCommand`).

### 2. Tunnel the dashboard port (in addition to / instead of the gateway port)

The legacy fallback still needs the gateway tunnel for `/v1/chat/completions`, while the dashboard
transport needs the dashboard's `/api/ws` + `/api/sessions` + `/api/status`. Options:

- **Second forward**: generalize `ensureSshTunnel` / `getSshTunnelUrl` in `src/main/ssh-tunnel.ts` to
manage a named set of forwards (gateway + dashboard), each `localPort → 127.0.0.1:remotePort`.
- Remote dashboard port: pick a free remote port over SSH (run a tiny Python one-liner like the
existing helpers in `ssh-remote.ts`) or document a fixed default; surface it in the SSH config UI.

### 3. Authenticate `/api/ws` with the session token

The desktop generates a `sessionToken` (e.g. `randomUUID()`), exports it to the remote dashboard's env
(step 1), and builds the WS URL as `ws://127.0.0.1:{dashTunnelPort}/api/ws?token=<sessionToken>` —
replacing `API_SERVER_KEY`. Rewrite `sshDashboardConnectionFromConfig` in `src/main/dashboard.ts` to
this flow (it currently calls `sshStartGateway` + `sshReadRemoteApiKey`). Note `web_server.py`'s
`_ws_auth_ok` accepts the `?token=` query only on loopback / `--insecure`; over the SSH tunnel the
endpoint is loopback on the remote, so this should hold — **verify on a real host**.

### 4. Compatibility

`ensureSshDashboardCompatibility` (`src/main/hermes-agent-compat.ts`) already patches `web_server.py`'s
embedded-chat default and `/api/model/set`; keep it. Confirm the remote `hermes` build accepts
`dashboard --no-open --host --port` and the `HERMES_DASHBOARD_SESSION_TOKEN` env (v0.16.0 has
`_DASHBOARD_EMBEDDED_CHAT_ENABLED = True`, so embedded chat is available).

### 5. Lifecycle

- Stop the remote `hermes dashboard` on disconnect / app quit (best-effort, like `sshStopGateway`).
- Health-check the dashboard (`/api/status` through the tunnel) and restart on failure.
- Keep the gateway running too if other features depend on it.

## Testing

- **Unit**: command builders (`sshStartDashboard` / status / stop) and the multi-forward tunnel wiring,
following the existing `ssh-remote` / `ssh-tunnel` test patterns.
- **Manual E2E (required before merge)**: an SSH host running `hermes-agent` (see
`docs/SSH-TUNNEL-VPS.md`). Verify: tunnel up → dashboard starts remotely → `/api/ws` connects with the
session token → a sent message streams back → profile switching and session history work. This step
cannot be exercised without a real remote host.
13 changes: 13 additions & 0 deletions src/renderer/src/screens/Chat/Chat.tsx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { useCallback, useEffect, useRef, useState } from "react";
import toast from "react-hot-toast";
import { Zap } from "lucide-react";
import { ChatInput, type ChatInputHandle } from "./ChatInput";
import { ChatEmptyState } from "./ChatEmptyState";
Expand Down Expand Up @@ -392,6 +393,17 @@ function Chat({
addAgentMessage,
});

// Fired once per connection when the dashboard WebSocket transport can't
// connect (e.g. SSH tunnel → `hermes gateway`, which has no `/api/ws`, issue
// #667) and we fall back to legacy chat. A fixed toast id dedupes.
const handleDashboardUnavailable = useCallback(() => {
toast(t("chat.dashboardUnavailableFallback"), {
id: "dashboard-unavailable-fallback",
icon: "ℹ️",
duration: 8000,
});
}, [t]);

const dashboardTransport = useDashboardChatTransport({
activeTurnRef,
contextFolder,
Expand All @@ -409,6 +421,7 @@ function Chat({
setMessages,
setToolProgress,
setUsage,
onDashboardUnavailable: handleDashboardUnavailable,
});

// Defer a message onto the busy queue (used when a slash command resolves to
Expand Down
110 changes: 107 additions & 3 deletions src/renderer/src/screens/Chat/hooks/useDashboardChatTransport.test.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,17 @@ const activeRecoveryTurn: ActiveTurn = {
userId: "u-recovery",
};

function Harness({ api }: { api: HarnessApi }): null {
function Harness({
api,
fallbackOnUnavailable = false,
initialConnectionMode = "local",
onDashboardUnavailable,
}: {
api: HarnessApi;
fallbackOnUnavailable?: boolean;
initialConnectionMode?: "local" | "remote" | "ssh";
onDashboardUnavailable?: (reason: string) => void;
}): null {
const [messages, setMessages] = useState<ChatMessage[]>([
{
id: "u-bad",
Expand All @@ -76,14 +86,14 @@ function Harness({ api }: { api: HarnessApi }): null {
const [provider, setProvider] = useState("bad-provider");
const [connectionMode, setConnectionMode] = useState<
"local" | "remote" | "ssh"
>("local");
>(initialConnectionMode);
const activeTurnRef = useRef<ActiveTurn | null>({ ...activeBadTurn });
const transport = useDashboardChatTransport({
activeTurnRef,
contextFolder: null,
connectionMode,
enabled: true,
fallbackOnUnavailable: false,
fallbackOnUnavailable,
hermesSessionId: null,
messages,
model,
Expand All @@ -94,6 +104,7 @@ function Harness({ api }: { api: HarnessApi }): null {
setMessages,
setToolProgress: vi.fn(),
setUsage: vi.fn(),
onDashboardUnavailable,
});

useEffect(() => {
Expand Down Expand Up @@ -343,3 +354,96 @@ describe("useDashboardChatTransport recovery", () => {
);
});
});

describe("useDashboardChatTransport unavailable fallback (issue #667)", () => {
afterEach(() => {
vi.clearAllMocks();
});

function mockStartDashboard(): ReturnType<typeof vi.fn> {
const startDashboard = vi.fn(async () => ({
running: false,
error: "Hermes dashboard chat WebSocket is unavailable (404)",
}));
Object.defineProperty(window, "hermesAPI", {
configurable: true,
value: {
recordSessionContinuation: vi.fn(async () => true),
recordSessionLocalError: vi.fn(async () => true),
startDashboard,
},
});
return startDashboard;
}

it("latches unavailable on SSH and fails fast on later sends, notifying once", async () => {
const startDashboard = mockStartDashboard();
const onUnavailable = vi.fn();
const api: HarnessApi = {};
render(
<Harness
api={api}
initialConnectionMode="ssh"
fallbackOnUnavailable
onDashboardUnavailable={onUnavailable}
/>,
);

let first: boolean | undefined;
await act(async () => {
first = await api.send?.("hello");
});
// Dashboard unavailable → caller falls back to legacy (returns false).
expect(first).toBe(false);
expect(startDashboard).toHaveBeenCalledTimes(1);
expect(onUnavailable).toHaveBeenCalledTimes(1);

let second: boolean | undefined;
await act(async () => {
second = await api.send?.("again");
});
expect(second).toBe(false);
// Fast path: no second status/probe round-trip, no duplicate notice.
expect(startDashboard).toHaveBeenCalledTimes(1);
expect(onUnavailable).toHaveBeenCalledTimes(1);
});

it("re-probes after the connection changes", async () => {
const startDashboard = mockStartDashboard();
const api: HarnessApi = {};
render(
<Harness api={api} initialConnectionMode="ssh" fallbackOnUnavailable />,
);

await act(async () => {
await api.send?.("hello");
});
expect(startDashboard).toHaveBeenCalledTimes(1);

// Switching connection clears the sticky flag → the dashboard is retried.
await act(async () => {
api.setConnectionMode?.("remote");
});
await act(async () => {
await api.send?.("after change");
});
expect(startDashboard).toHaveBeenCalledTimes(2);
});

it("keeps retrying on local (does not latch)", async () => {
const startDashboard = mockStartDashboard();
const api: HarnessApi = {};
render(
<Harness api={api} initialConnectionMode="local" fallbackOnUnavailable />,
);

await act(async () => {
await api.send?.("hello");
});
await act(async () => {
await api.send?.("again");
});
// Local dashboard may still be spawning, so each send re-checks.
expect(startDashboard).toHaveBeenCalledTimes(2);
});
});
Loading
Loading