diff --git a/QUICKSTART.md b/QUICKSTART.md
index f2bc6d7..caa9680 100644
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -106,6 +106,54 @@ deploy the GTM specialist to production   ← Tier 3, asks for YES
 
 ---
 
+## Step 5 — Give your agent a persona
+
+By default Hermes runs as the Super Agent — a generalist that owns workflows and routes tasks. But you can switch it to a specialist persona at any time, right from Telegram.
+
+**See what personas are available:**
+```
+/identity
+```
+Hermes will reply with the current persona and a list of every option — like `coo`, `gtm`, `head_of_ops`.
+
+**Switch to a persona:**
+```
+/identity coo
+```
+From that point on, every message in that chat goes to Alex, your COO — with Alex's role, voice, and tool access baked in. The persona sticks until you change it or restart.
+
+**Create your own persona:**
+
+You don't need to touch any code. Just create a file called `<name>.yaml` in `src/agent_os/orchestrator/config/identities/`. For example, to create a video production agent:
+
+```yaml
+# src/agent_os/orchestrator/config/identities/video_agent.yaml
+name: Vex
+title: Video Production Agent
+system_prompt: |
+  You are Vex, the video production agent. You own the full video pipeline:
+  scripting, transcription, editing workflows, thumbnail generation, and
+  publishing to YouTube/social. You know ffmpeg and have shell access.
+  You remember every project we've worked on together.
+tools_allowed:
+  - hermes_self
+  - terminal
+  - exa
+default_tier_ceiling: 2
+```
+
+Commit that file, deploy, then send `/identity video_agent` in Telegram. That's it — Vex is live.
+
+**If you want a persona to be the default** (so it loads on startup without needing `/identity`), set this in your `.env` or Railway environment variables:
+
+```
+AGENT_IDENTITY=video_agent
+```
+
+Each machine or Railway service can have its own `AGENT_IDENTITY`. One instance is the Super Agent, another is the COO, a third is your video agent — all separate processes, all sharing the same memory vault so they have the same conversation history.
+
+---
+
 ## What to do if something breaks
 
 | Problem | Fix |
diff --git a/SETUP.md b/SETUP.md
index 2097302..6d5feeb 100644
--- a/SETUP.md
+++ b/SETUP.md
@@ -6,6 +6,71 @@ The default deploy is **Railway-managed** — every fabric service runs as its o
 
 ---
 
+## Agent personas — giving each agent its own identity
+
+Every agent in the fleet has a **persona**: a name, a role, a voice, and a list of tools it's allowed to use. Personas are defined in plain YAML files — no code changes needed to create or swap them.
+
+### How it works
+
+When an agent starts up, it reads its identity from an environment variable called `AGENT_IDENTITY`. That name maps to a file in `src/agent_os/orchestrator/config/identities/<name>.yaml`. The file contains a `system_prompt` that gets injected into every LLM call, so the agent always knows who it is, what it owns, and what tools it has access to.
+
+Four personas ship out of the box:
+
+| Name | Who they are |
+|---|---|
+| `supersan` | The Super Agent — the primary orchestrator. Owns everything, routes work to the right specialist. |
+| `coo` | Alex, the COO — sees the whole org, delegates aggressively, holds everyone accountable. |
+| `gtm` | Jordan, the GTM Agent — owns content, leads, and brand. Knows your CRM and email tools. |
+| `head_of_ops` | Morgan, the Head of Operations — runs the client pipeline, watches the funnel, catches broken jobs. |
+
+### Switching personas from Telegram
+
+You don't need to redeploy to switch personas. In any Telegram conversation with Hermes:
+
+- `/identity` — shows which persona is active and lists all available options
+- `/identity coo` — switches to Alex for the rest of that conversation
+- `/identity video_agent` — switches to any custom persona you've created
+
+The persona you set is remembered for that chat session. Every message after the switch goes through that agent's system prompt, memory, and tool rules.
+
+### Setting a default persona for a deployment
+
+If you want a service to always start as a specific persona, set this in its environment:
+
+```
+AGENT_IDENTITY=coo
+```
+
+On Railway, set it under the service's Variables tab. On a VPS, add it to the `.env` file. On Docker, pass it with `-e AGENT_IDENTITY=coo`.
+
+### Creating a new persona
+
+1. Create a YAML file at `src/agent_os/orchestrator/config/identities/<name>.yaml`
+2. Give it a `system_prompt` that describes who the agent is, what it owns, and how it should behave
+3. Optionally add `tools_allowed`, `tools_denied`, and `default_tier_ceiling`
+4. Commit and deploy — then send `/identity <name>` in Telegram to activate it
+
+Example — a video production agent:
+
+```yaml
+name: Vex
+title: Video Production Agent
+system_prompt: |
+  You are Vex, the video production agent. You own the full video pipeline:
+  scripting, transcription, editing workflows, thumbnail generation, and
+  publishing to YouTube and social. You have shell access and know ffmpeg.
+  You remember every project we've worked on together.
+tools_allowed:
+  - hermes_self
+  - terminal
+  - exa
+default_tier_ceiling: 2
+```
+
+The file name (without `.yaml`) is what you type after `/identity`. That's all there is to it.
+
+---
+
 ## Why both Railway and DigitalOcean?
 
 - **Railway** runs the **fixed always-on services** (NATS, Temporal, Coordinator, Archon wrapper, Admiral). One Dockerfile per service, auto-restart, public TLS URLs, env vars in a dashboard. You set it up once and forget.
diff --git a/pyproject.toml b/pyproject.toml
index 591324f..00b6adb 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,6 +49,7 @@ select = ["E", "F", "W", "I", "B", "UP"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 asyncio_mode = "auto"
+pythonpath = ["src"]
 
 [build-system]
 requires = ["hatchling"]
diff --git a/src/agent_os/channels/telegram/bot.py b/src/agent_os/channels/telegram/bot.py
index 6fba5f5..569491a 100644
--- a/src/agent_os/channels/telegram/bot.py
+++ b/src/agent_os/channels/telegram/bot.py
@@ -38,6 +38,10 @@
 _PENDING_APPROVALS: dict[int, dict[str, Any]] = {}
 _APPROVAL_TTL_SECONDS = 300
 
+# chat_id → active identity name (set via /identity <name>).
+# Falls back to AGENT_IDENTITY env var when not set.
+_ACTIVE_IDENTITY: dict[int, str] = {}
+
 # Telegram message hard cap is 4096 chars; leave headroom for our wrapper text.
 _MAX_BODY_CHARS = 3500
 
@@ -111,9 +115,9 @@ async def _handle_message(client: httpx.AsyncClient, msg: dict[str, Any]) -> Non
 
     # Lazy imports keep cold-start cheap and avoid hard deps when running
     # other parts of the system.
-    from agent_os.orchestrator import plan_card, tier_classifier, intent_classifier
-    from agent_os.orchestrator.adapters.job_router import Job
+    from agent_os.orchestrator import intent_classifier, plan_card
     from agent_os.orchestrator.adapters import plan_overrides
+    from agent_os.orchestrator.adapters.job_router import Job
     from agent_os.orchestrator.tool_planner import plan as plan_fn
 
     # Override commands (/cancel, /use, /why, /plan on|off, /tier N, YES) — these
@@ -168,6 +172,31 @@ async def _handle_message(client: httpx.AsyncClient, msg: dict[str, Any]) -> Non
                         f"Forced to tier {pending['plan'].tier}. Reply 'yes' to run.")
             return
 
+        if override.kind == "identity":
+            name = override.identity
+            if not name:
+                # List available identities
+                from pathlib import Path  # noqa: PLC0415
+                identity_dir = (
+                    Path(__file__).parents[3]
+                    / "orchestrator/config/identities"
+                )
+                available = sorted(
+                    p.stem for p in identity_dir.glob("*.yaml")
+                )
+                current = _ACTIVE_IDENTITY.get(chat_id) or os.getenv("AGENT_IDENTITY", "supersan")
+                await _send(
+                    client, chat_id,
+                    f"Current identity: {current}\n"
+                    f"Available: {', '.join(available)}\n\n"
+                    "Switch with: /identity <name>",
+                )
+                return
+            _ACTIVE_IDENTITY[chat_id] = name
+            await _send(client, chat_id,
+                        f"Identity set to '{name}'. All future messages will use this persona.")
+            return
+
         if override.kind == "confirm":
             if not pending:
                 await _send(client, chat_id, "No pending tier-3 plan to confirm.")
@@ -210,7 +239,10 @@ async def _handle_message(client: httpx.AsyncClient, msg: dict[str, Any]) -> Non
     # outbound intent it can prove from the wording. No fuzziness, no LLM
     # call, no auto-spawn from ambiguous prompts.
     intent = intent_classifier.classify(text)
-    job = Job(prompt=text, tags=set(intent.tags))
+    meta: dict[str, str] = {"user_id": str(chat_id)}
+    if chat_id in _ACTIVE_IDENTITY:
+        meta["identity"] = _ACTIVE_IDENTITY[chat_id]
+    job = Job(prompt=text, tags=set(intent.tags), metadata=meta)
 
     tool_plan = plan_fn(job, identity="primary_hermes")
     # tool_plan.tier already came from tier_classifier.classify with the
@@ -283,6 +315,8 @@ def _handle_command(text: str) -> str:
             "  /why        explain how this plan was picked\n"
             "  /tier <1|2|3>   force a tier override\n\n"
             "Other commands:\n"
+            "  /identity           show current persona + available options\n"
+            "  /identity <name>    switch persona (e.g. /identity coo)\n"
             "  /status — quick fleet status\n"
             "  /help — this message"
         )
@@ -329,7 +363,7 @@ async def run_bot() -> None:
                         # Handle each message in its own task so a slow LLM
                         # call doesn't block the poll loop.
                         asyncio.create_task(_handle_message(client, msg))
-            except (httpx.HTTPError, asyncio.TimeoutError) as exc:
+            except (TimeoutError, httpx.HTTPError) as exc:
                 logger.warning("Telegram poll error: %s — retrying in 5s", exc)
                 await asyncio.sleep(5)
             except Exception:
diff --git a/src/agent_os/orchestrator/adapters/plan_overrides.py b/src/agent_os/orchestrator/adapters/plan_overrides.py
index 6db4013..1608c0b 100644
--- a/src/agent_os/orchestrator/adapters/plan_overrides.py
+++ b/src/agent_os/orchestrator/adapters/plan_overrides.py
@@ -24,7 +24,7 @@
 
 OverrideKind = Literal[
     "cancel", "use", "why", "plan_on", "plan_off",
-    "tier", "confirm", "unknown",
+    "tier", "confirm", "identity", "unknown",
 ]
 
 
@@ -35,11 +35,13 @@ class Override:
     tool: str | None = None
     model: str | None = None
     tier: int | None = None
+    identity: str | None = None
     error: str | None = None
 
 
 _USE_RE = re.compile(r"^/use\s+([A-Za-z0-9_]+)(?:\s+([A-Za-z0-9_.-]+))?\s*$")
 _TIER_RE = re.compile(r"^/tier\s+([123])\s*$")
+_IDENTITY_RE = re.compile(r"^/identity(?:\s+([A-Za-z0-9_]+))?\s*$")
 
 
 def parse(text: str) -> Override | None:
@@ -91,6 +93,16 @@ def parse(text: str) -> Override | None:
             )
         return Override(kind="tier", raw=raw, tier=int(m.group(1)))
 
+    if lower.startswith("/identity"):
+        m = _IDENTITY_RE.match(lower)
+        if not m:
+            return Override(
+                kind="unknown",
+                raw=raw,
+                error="usage: /identity <name>  (e.g. /identity coo)",
+            )
+        return Override(kind="identity", raw=raw, identity=m.group(1))
+
     if raw.startswith("/"):
         return Override(
             kind="unknown",
diff --git a/src/agent_os/orchestrator/adapters/vault_memory.py b/src/agent_os/orchestrator/adapters/vault_memory.py
index 4c9713b..60be51a 100644
--- a/src/agent_os/orchestrator/adapters/vault_memory.py
+++ b/src/agent_os/orchestrator/adapters/vault_memory.py
@@ -27,3 +27,33 @@ def append_message(canonical_user_id: str, role: str, content: str) -> None:
 def load_history(canonical_user_id: str) -> str:
     p = conversation_path(canonical_user_id)
     return p.read_text() if p.exists() else ""
+
+
+def parse_history(canonical_user_id: str, limit: int = 10) -> list[dict]:
+    """Return recent conversation as OpenAI-style messages list.
+
+    Parses the markdown log written by append_message() back into
+    [{role, content}, ...] so runtimes can pass it directly to LLM APIs.
+    limit is the number of turns (each turn = one user + one assistant message).
+    """
+    raw = load_history(canonical_user_id)
+    if not raw:
+        return []
+    messages: list[dict] = []
+    current_role: str | None = None
+    current_lines: list[str] = []
+    for line in raw.split("\n"):
+        if line.startswith("## "):
+            if current_role and current_lines:
+                content = "\n".join(current_lines).strip()
+                if content:
+                    messages.append({"role": current_role, "content": content})
+            current_role = line[3:].strip()
+            current_lines = []
+        else:
+            current_lines.append(line)
+    if current_role and current_lines:
+        content = "\n".join(current_lines).strip()
+        if content:
+            messages.append({"role": current_role, "content": content})
+    return messages[-(limit * 2):]
diff --git a/src/agent_os/runtimes/hermes_self/invoke.py b/src/agent_os/runtimes/hermes_self/invoke.py
index cd05682..90e7290 100644
--- a/src/agent_os/runtimes/hermes_self/invoke.py
+++ b/src/agent_os/runtimes/hermes_self/invoke.py
@@ -19,11 +19,43 @@
 import logging
 import os
 import time
+from pathlib import Path
 
 from agent_os.runtimes._base import RuntimeResult, new_job_id, write_run_artifact
 
 logger = logging.getLogger(__name__)
 
+# ---------------------------------------------------------------------------
+# Identity — loaded once per identity name, keyed by AGENT_IDENTITY env var.
+#
+# Each deployed agent sets AGENT_IDENTITY=<name> (e.g. coo, gtm, head_of_ops,
+# supersan). The name maps directly to a YAML file in
+# orchestrator/config/identities/<name>.yaml.
+# Defaults to "supersan" if unset.
+# ---------------------------------------------------------------------------
+
+_IDENTITY_ROOT = Path(__file__).parents[3] / "orchestrator/config/identities"
+_PROMPT_CACHE: dict[str, str] = {}
+
+
+def _get_system_prompt(identity: str | None = None) -> str:
+    name = identity or os.getenv("AGENT_IDENTITY", "supersan")
+    if name in _PROMPT_CACHE:
+        return _PROMPT_CACHE[name]
+    try:
+        import yaml  # noqa: PLC0415
+        p = _IDENTITY_ROOT / f"{name}.yaml"
+        data = yaml.safe_load(p.read_text())
+        _PROMPT_CACHE[name] = (data.get("system_prompt") or "").strip()
+    except Exception as exc:
+        logger.warning("Could not load system prompt for identity %r: %s", name, exc)
+        _PROMPT_CACHE[name] = ""
+    return _PROMPT_CACHE[name]
+
+
+# ---------------------------------------------------------------------------
+# Model selection
+# ---------------------------------------------------------------------------
 
 def _default_model() -> str:
     """Final fallback — must match `default` task_class in config/models.yaml."""
@@ -34,6 +66,10 @@ def _default_model() -> str:
     )
 
 
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+
 def invoke(job) -> RuntimeResult:
     """Run a single LLM call for the job's prompt.
 
@@ -52,54 +88,95 @@ def invoke(job) -> RuntimeResult:
     if not isinstance(meta, dict):
         meta = {}
     model = meta.get("model") or meta.get("model_recommendation") or _default_model()
+    user_id = meta.get("user_id", "default")
+    identity = meta.get("identity")  # optional per-job override; falls back to AGENT_IDENTITY env
 
     if not prompt:
         return _result(job_id, "error", {"error": "empty prompt"}, t0)
 
+    # Load identity and conversation history before the LLM call
+    from agent_os.orchestrator.adapters import vault_memory as _vault  # noqa: PLC0415
+    system_prompt = _get_system_prompt(identity)
+    history = _vault.parse_history(user_id, limit=10)
+
     try:
-        text = _call_llm(model, prompt)
+        text = _call_llm(model, prompt, system_prompt=system_prompt, history=history)
     except Exception as exc:
         logger.warning("hermes_self LLM call failed: %s", exc)
         return _result(job_id, "error", {"error": str(exc), "model": model}, t0)
 
+    # Persist both turns so the next call can load them as context
+    _vault.append_message(user_id, "user", prompt)
+    _vault.append_message(user_id, "assistant", text)
+
     return _result(job_id, "completed", {"text": text, "model": model}, t0)
 
 
-def _call_llm(model: str, prompt: str) -> str:
+# ---------------------------------------------------------------------------
+# LLM dispatch
+# ---------------------------------------------------------------------------
+
+def _call_llm(
+    model: str,
+    prompt: str,
+    *,
+    system_prompt: str = "",
+    history: list | None = None,
+) -> str:
     """Single chat completion. Routes by model id prefix."""
     if model.startswith("claude-"):
-        return _call_anthropic(model, prompt)
-    return _call_openai_compat(model, prompt)
+        return _call_anthropic(model, prompt, system_prompt=system_prompt, history=history)
+    return _call_openai_compat(model, prompt, system_prompt=system_prompt, history=history)
 
 
-def _call_anthropic(model: str, prompt: str) -> str:
+def _call_anthropic(
+    model: str,
+    prompt: str,
+    *,
+    system_prompt: str = "",
+    history: list | None = None,
+) -> str:
     api_key = os.getenv("ANTHROPIC_API_KEY", "")
     if not api_key:
         raise RuntimeError("ANTHROPIC_API_KEY not set — cannot call claude")
 
     from anthropic import Anthropic  # imported lazily so missing dep doesn't block boot
     client = Anthropic(api_key=api_key)
-    msg = client.messages.create(
-        model=model,
-        max_tokens=2048,
-        messages=[{"role": "user", "content": prompt}],
-    )
+
+    messages = list(history or [])
+    messages.append({"role": "user", "content": prompt})
+
+    kwargs: dict = {"model": model, "max_tokens": 2048, "messages": messages}
+    if system_prompt:
+        kwargs["system"] = system_prompt
+
+    msg = client.messages.create(**kwargs)
     return "".join(b.text for b in msg.content if hasattr(b, "text"))
 
 
-def _call_openai_compat(model: str, prompt: str) -> str:
+def _call_openai_compat(
+    model: str,
+    prompt: str,
+    *,
+    system_prompt: str = "",
+    history: list | None = None,
+) -> str:
     base_url, api_key = _resolve_openai_compat(model)
     if not api_key:
         raise RuntimeError(f"No API key configured for model {model!r}")
 
     from openai import OpenAI
-    kwargs = {"api_key": api_key}
-    if base_url:
-        kwargs["base_url"] = base_url
-    client = OpenAI(**kwargs)
+    client = OpenAI(api_key=api_key, base_url=base_url) if base_url else OpenAI(api_key=api_key)
+
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": system_prompt})
+    messages.extend(history or [])
+    messages.append({"role": "user", "content": prompt})
+
     resp = client.chat.completions.create(
         model=model,
-        messages=[{"role": "user", "content": prompt}],
+        messages=messages,
         max_tokens=2048,
     )
     return resp.choices[0].message.content or ""
@@ -114,10 +191,17 @@ def _resolve_openai_compat(model: str) -> tuple[str | None, str]:
     if model.startswith(("kimi", "moonshot")):
         return "https://api.moonshot.ai/v1", os.getenv("MOONSHOT_API_KEY", "")
     if model.startswith(("gemini", "google/")):
-        return "https://generativelanguage.googleapis.com/v1beta/openai", os.getenv("GOOGLE_API_KEY", "")
+        return (
+            "https://generativelanguage.googleapis.com/v1beta/openai",
+            os.getenv("GOOGLE_API_KEY", ""),
+        )
     return "https://openrouter.ai/api/v1", os.getenv("OPENROUTER_API_KEY", "")
 
 
+# ---------------------------------------------------------------------------
+# Result helper
+# ---------------------------------------------------------------------------
+
 def _result(job_id: str, status: str, output: dict, t0: float) -> RuntimeResult:
     result = RuntimeResult(
         runtime="hermes_self",