-
-| [Deep OS Integration](https://microsoft.github.io/UFO) | Picture‑in‑Picture Desktop *(coming soon)* | [Hybrid GUI + API Actions](https://microsoft.github.io/UFO/automator/overview) |
-|---------------------|-------------------------------------------|---------------------------|
-| Combines Windows UIA, Win32 and WinCOM for first‑class control detection and native commands. | Automation runs in a sandboxed virtual desktop so you can keep using your main screen. | Chooses native APIs when available, falls back to clicks/keystrokes when not—fast *and* robust. |
-
-| [Speculative Multi‑Action](https://microsoft.github.io/UFO/advanced_usage/multi_action) | [Continuous Knowledge Substrate](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/overview/) | [UIA + Visual Control Detection](https://microsoft.github.io/UFO/advanced_usage/control_detection/hybrid_detection) |
-|--------------------------|--------------------------------|--------------------------------|
-| Bundles several predicted steps into one LLM call, validated live—up to **51 % fewer** queries. | Mixes docs, Bing search, user demos and execution traces via RAG for agents that learn over time. | Detects standard *and* custom controls with a hybrid UIA + vision pipeline. |
-
-
-
-*See the [documentation](https://microsoft.github.io/UFO/) for full details.*
-
----
-
-## 📢 News
-- 📅 2025-04-19: Version **v2.0.0** Released! We’re excited to announce the release the **UFO²**! UFO² is a major upgrade to the original UFO, featuring with enhanced capabilities. It introduces the **AgentOS** concept, enabling seamless integration of multiple agents for complex tasks. Please check our [new technical report](https://arxiv.org/pdf/2504.14603) for more details.
-- 📅 ...
-- 📅 2024-02-14: Our [technical report](https://arxiv.org/abs/2402.07939) for UFO is online!
-- 📅 2024-02-10: The first version of UFO is released on GitHub🎈. Happy Chinese New year🐉!
-
----
-
-## 🏗️ Architecture overview
-
-
-
-
-
-UFO² operates as a **Desktop AgentOS**, encompassing a multi-agent framework that includes:
-
-1. **HostAgent** – Parses the natural‑language goal, launches the necessary applications, spins up / coordinates AppAgents, and steers a global finite‑state machine (FSM).
-2. **AppAgents** – One per application; each runs a ReAct loop with multimodal perception, hybrid control detection, retrieval‑augmented knowledge, and the **Puppeteer** executor that chooses between GUI actions and native APIs.
-3. **Knowledge Substrate** – Blends offline documentation, online search, demonstrations, and execution traces into a vector store that is retrieved on‑the‑fly at inference.
-4. **Speculative Executor** – Slashes LLM latency by predicting batches of likely actions and validating them against live UIA state in a single shot.
-5. **Picture‑in‑Picture Desktop** *(coming soon)* – Runs the agent in an isolated virtual desktop so your main workspace and input devices remain untouched.
-
-For a deep dive see our [technical report](https://arxiv.org/pdf/2504.14603) or the [docs site](https://microsoft.github.io/UFO).
-
----
-
-## 🌐 Media Coverage
-
-UFO sightings have garnered attention from various media outlets, including:
-- [微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」](https://www.jiqizhixin.com/articles/2025-05-06-13)
-- [Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
-- [🚀 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo🌌](https://www.linkedin.com/posts/gutierrezfrancois_ai-ufo-microsoft-activity-7176819900399652865-pLoo?utm_source=share&utm_medium=member_desktop)
-- [The AI PC - The Future of Computers? - Microsoft UFO](https://www.youtube.com/watch?v=1k4LcffCq3E)
-- [下一代Windows系统曝光:基于GPT-4V,Agent跨应用调度,代号UFO](https://baijiahao.baidu.com/s?id=1790938358152188625&wfr=spider&for=pc)
-- [下一代智能版 Windows 要来了?微软推出首个 Windows Agent,命名为 UFO!](https://blog.csdn.net/csdnnews/article/details/136161570)
-- [Microsoft発のオープンソース版「UFO」登場! Windowsを自動操縦するAIエージェントを試す](https://internet.watch.impress.co.jp/docs/column/shimizu/1570581.html)
-- ...
-
-These sources provide insights into the evolving landscape of technology and the implications of UFO phenomena on various platforms.
-
----
-
-## 🚀 Three‑minute Quickstart
-
-
-### 🛠️ Step 1: Installation
-UFO requires **Python >= 3.10** running on **Windows OS >= 10**. It can be installed by running the following command:
-```powershell
-# [optional to create conda environment]
-# conda create -n ufo python=3.10
-# conda activate ufo
-
-# clone the repository
-git clone https://github.com/microsoft/UFO.git
-cd UFO
-# install the requirements
-pip install -r requirements.txt
-# If you want to use the Qwen as your LLMs, uncomment the related libs.
-```
-
-### ⚙️ Step 2: Configure the LLMs
-Before running UFO, you need to provide your LLM configurations **individually for HostAgent and AppAgent**. You can create your own config file `ufo/config/config.yaml`, by copying the `ufo/config/config.yaml.template` and editing config for **HOST_AGENT** and **APP_AGENT** as follows:
-
-```powershell
-copy ufo\config\config.yaml.template ufo\config\config.yaml
-notepad ufo\config\config.yaml # paste your key & endpoint
-```
-
-#### OpenAI
-```yaml
-VISUAL_MODE: True, # Whether to use the visual mode
-API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
-API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
-API_KEY: "sk-", # The OpenAI API key, begin with sk-
-API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
-API_MODEL: "gpt-4o", # The only OpenAI model
-```
-
-#### Azure OpenAI (AOAI)
-```yaml
-VISUAL_MODE: True, # Whether to use the visual mode
-API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
-API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
-API_KEY: "YOUR_KEY", # The aoai API key
-API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
-API_MODEL: "gpt-4o", # The only OpenAI model
-API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
-```
-
-> Need Qwen, Gemini, non‑visual GPT‑4, or even **OpenAI CUA Operator** as a AppAgent? See the [model guide](https://microsoft.github.io/UFO/supported_models/overview/).
-
-### 📔 Step 3: Additional Setting for RAG (optional).
-If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.
-
-We provide the following options for RAG to enhance UFO's capabilities:
-- [Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/) Enable UFO to retrieve information from offline help documents.
-- [Online Bing Search Engine](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_bing_search/): Enhance UFO's capabilities by utilizing the most up-to-date online search results.
-- [Self-Experience](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/experience_learning/): Save task completion trajectories into UFO's memory for future reference.
-- [User-Demonstration](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_demonstration/): Boost UFO's capabilities through user demonstration.
-
-Consult their respective documentation for more information on how to configure these settings.
-
-
-### 🎉 Step 4: Start UFO
-
-#### ⌨️ You can execute the following on your Windows command Line (CLI):
-
-```powershell
-# assume you are in the cloned UFO folder
-python -m ufo --task
-```
-
-This will start the UFO process and you can interact with it through the command line interface.
-If everything goes well, you will see the following message:
-
-```powershell
-Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
- _ _ _____ ___
-| | | || ___| / _ \
-| | | || |_ | | | |
-| |_| || _| | |_| |
- \___/ |_| \___/
-Please enter your request to be completed🛸:
-```
-
-Alternatively, you can also directly invoke UFO with a specific task and request by using the following command:
-
-```powershell
-python -m ufo --task -r ""
-```
-
-
-### Step 5 🎥: Execution Logs
-
-You can find the screenshots taken and request & response logs in the following folder:
-```
-./ufo/logs//
-```
-You may use them to debug, replay, or analyze the agent output.
-
-
-## ❓Get help
-* Please first check our our documentation [here](https://microsoft.github.io/UFO/).
-* ❔GitHub Issues (prefered)
-* For other communications, please contact [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com).
----
-
-
-## 📊 Evaluation
-
-UFO² is rigorously benchmarked on two publicly‑available live‑task suites:
-
-| Benchmark | Scope | Documents |
-|-----------|-------|-------|
-| [**Windows Agent Arena (WAA)**](https://github.com/nice-mee/WindowsAgentArena) | 154 real Windows tasks across 15 applications (Office, Edge, File Explorer, VS Code, …) | |
-| [**OSWorld (Windows)**](https://github.com/nice-mee/WindowsAgentArena/tree/2020-qqtcg/osworld) | 49 cross‑application tasks that mix Office 365, browser and system utilities | |
-
-The integration of these benchmarks into UFO² is in separate repositories. Please follow the above documents for more details.
-
----
-
-
-## 📚 Citation
-
-If you build on this work, please cite our the AgentOS framework:
-
-**UFO² – The Desktop AgentOS (2025)**
-
-```bibtex
-@article{zhang2025ufo2,
- title = {{UFO2: The Desktop AgentOS}},
- author = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
- journal = {arXiv preprint arXiv:2504.14603},
- year = {2025}
-}
-```
-
-**UFO – A UI‑Focused Agent for Windows OS Interaction (2024)**
-
-```bibtex
-@article{zhang2024ufo,
- title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
- author = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
- journal = {arXiv preprint arXiv:2402.07939},
- year = {2024}
-}
-```
-
-
-
----
-
-## 📝 Roadmap
-
-The UFO² team is actively working on the following features and improvements:
-
-- [ ] **Picture‑in‑Picture Mode** – Completed and will be available in the next release
-- [ ] **AgentOS‑as‑a‑Service** – Completed and will be available in the next release
-- [ ] **Auto‑Debugging Toolkit** – Completed and will be available in the next release
-- [ ] **Integration with MCP and Agent2Agent Communication** – Planned; under implementation
-
-
----
-
-## 🎨 Related Projects
-- **TaskWeaver** — a code‑first LLM agent for data analytics:
-- **LLM‑Brained GUI Agents: A Survey**: • [GitHub](https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey) • [Interactive site](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)
-
----
-
-
-## ⚠️ Disclaimer
-By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in [DISCLAIMER.md](./DISCLAIMER.md)
-
-
-## Trademarks
-
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
-trademarks or logos is subject to and must follow
-[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
-Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
-
-
----
-
-## ⚖️ License
-This repository is released under the [MIT License](LICENSE) (SPDX‑Identifier: MIT).
-See [DISCLAIMER.md](DISCLAIMER.md) for privacy & safety notices.
-
----
-
-
+
+
+
+---
+
+## 📢 Latest Updates
+
+### 2025-11 – UFO³ Galaxy Framework Released 🌌
+**Major Research Breakthrough:** Multi-Device Orchestration System
+
+- 🌟 **Declarative DAG Decomposition**: TaskConstellation structure for workflow logic and dependencies
+- 🔄 **Dynamic Graph Evolution**: Living constellation that adapts through controlled rewrites
+- 🎯 **Heterogeneous Orchestration**: Safe, asynchronous execution with capability-based device matching
+- 🔌 **Unified AIP Protocol**: WebSocket-based secure agent coordination with fault tolerance
+- 🛠️ **MCP-Empowered Agent Framework**: Template-driven toolkit for rapid device agent development
+- 📄 **Research Paper**: [UFO³: Weaving the Digital Agent Galaxy](https://arxiv.org/abs/2511.11332)
+
+**Key Features:**
+- First multi-device orchestration framework for GUI agents
+- Result-driven adaptive execution instead of rigid workflows
+- Model Context Protocol (MCP) integration for tool augmentation
+- Formally verified correctness and concurrency safety guarantees
+
+### 2025-04 – UFO² v2.0.0
+- 📅 UFO² Desktop AgentOS released
+- 🏗️ Enhanced architecture with AgentOS concept
+- 📄 [Technical Report](https://arxiv.org/pdf/2504.14603) published
+- ✅ Entered Long-Term Support (LTS) status
+
+### 2024-02 – Original UFO
+- 🎈 First UFO release - UI-Focused agent for Windows
+- 📄 [Original Paper](https://arxiv.org/abs/2402.07939)
+- 🌍 Wide media coverage and adoption
+
+---
+
+## 📚 Citation
+
+If you use UFO³ Galaxy or UFO² in your research, please cite the relevant papers:
+
+### UFO³ Galaxy Framework (2025)
+```bibtex
+@article{zhang2025ufo3,
+ title={UFO$^3$: Weaving the Digital Agent Galaxy},
+ author = {Zhang, Chaoyun and Li, Liqun and Huang, He and Ni, Chiming and Qiao, Bo and Qin, Si and Kang, Yu and Ma, Minghua and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2511.11332},
+ year = {2025},
+}
+```
+
+### UFO² Desktop AgentOS (2025)
+```bibtex
+@article{zhang2025ufo2,
+ title = {{UFO2: The Desktop AgentOS}},
+ author = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2504.14603},
+ year = {2025}
+}
+```
+
+### Original UFO (2024)
+```bibtex
+@article{zhang2024ufo,
+ title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
+ author = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
+ journal = {arXiv preprint arXiv:2402.07939},
+ year = {2024}
+}
+```
+
+---
+
+## 🌐 Media & Community
+
+**Media Coverage:**
+- [微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」](https://www.jiqizhixin.com/articles/2025-05-06-13)
+- [Microsoft's UFO: Smarter Windows Experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
+- [下一代Windows系统曝光](https://baijiahao.baidu.com/s?id=1790938358152188625)
+- **[More coverage →](./ufo/README.md#-tracing-the-stars)**
+
+**Community:**
+- 💬 [GitHub Discussions](https://github.com/microsoft/UFO/discussions)
+- 🐛 [Issue Tracker](https://github.com/microsoft/UFO/issues)
+- 📧 Email: [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+- 📺 [YouTube Channel](https://www.youtube.com/watch?v=QT_OhygMVXU)
+
+---
+
+## 🎨 Related Projects & Research
+
+**Microsoft Research:**
+- **[TaskWeaver](https://github.com/microsoft/TaskWeaver)** – Code-first LLM agent framework for data analytics and task automation
+
+**GUI Agent Research:**
+- **[LLM-Brained GUI Agents Survey](https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey)** – Comprehensive survey of GUI automation agents
+- **[Interactive Survey Site](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)** – Explore latest GUI agent research and developments
+
+**Multi-Agent Systems:**
+- **UFO³ Galaxy** represents a novel approach to multi-device orchestration, introducing the Constellation framework for coordinating heterogeneous agents across platforms
+- Builds on multi-agent coordination research while addressing unique challenges of cross-device GUI automation
+
+**Benchmarks:**
+- **[Windows Agent Arena (WAA)](https://github.com/nice-mee/WindowsAgentArena)** – Evaluation benchmark for Windows automation agents
+- **[OSWorld](https://github.com/nice-mee/WindowsAgentArena/tree/2020-qqtcg/osworld)** – Cross-application task evaluation suite
+
+---
+
+## 💡 FAQ
+
+
+🤔 Should I use Galaxy or UFO²?
+
+**Start with UFO²** if:
+- You only need Windows automation
+- You want quick setup and learning
+- Tasks are relatively simple
+
+**Choose Galaxy** if:
+- You need cross-device coordination
+- Tasks are complex and multi-step
+- You want advanced orchestration
+- You're comfortable with active development
+
+**Hybrid approach** if:
+- You want best of both worlds
+- Some tasks are simple (UFO²), some complex (Galaxy)
+- You're gradually migrating
+
+
+
+
+⚠️ Will UFO² be deprecated?
+
+**No!** UFO² has entered **Long-Term Support (LTS)** status:
+- ✅ Actively maintained
+- ✅ Bug fixes and security updates
+- ✅ Performance improvements
+- ✅ Full community support
+- ✅ No plans for deprecation
+
+UFO² is the stable, proven solution for Windows automation.
+
+
+
+
+🔄 How do I migrate from UFO² to Galaxy?
+
+Migration is **gradual and optional**:
+
+1. **Phase 1: Learn** – Understand Galaxy concepts
+2. **Phase 2: Experiment** – Try Galaxy with non-critical tasks
+3. **Phase 3: Hybrid** – Use both frameworks
+4. **Phase 4: Migrate** – Gradually move complex tasks to Galaxy
+
+**No forced migration!** Continue using UFO² as long as it meets your needs.
+
+See [Migration Guide](./documents/docs/getting_started/migration_ufo2_to_galaxy.md) for details.
+
+
+
+
+🎯 Can Galaxy do everything UFO² does?
+
+**Functionally: Yes.** Galaxy can use UFO² as a Windows device agent.
+
+**Practically: It depends.**
+- For **simple Windows tasks**: UFO² standalone is easier and more streamlined
+- For **complex workflows**: Galaxy orchestrates UFO² with other device agents
+
+**Recommendation:** Use the right tool for the job. UFO² can work standalone or as Galaxy's Windows device agent.
+
+
+
+
+📊 How mature is Galaxy?
+
+**Status: Active Development** 🚧
+
+**Stable:**
+- ✅ Core architecture
+- ✅ DAG orchestration
+- ✅ Basic multi-device support
+- ✅ Event system
+
+**In Development:**
+- 🔨 Advanced device types
+- 🔨 Enhanced monitoring
+- 🔨 Performance optimization
+- 🔨 Extended documentation
+
+**Recommendation:** Great for experimentation and non-critical workflows.
+
+
+
+
+🔧 Can I extend or customize?
+
+**Both frameworks are highly extensible:**
+
+**UFO²:**
+- Custom actions and automators
+- Custom knowledge sources (RAG)
+- Custom control detectors
+- Custom evaluation metrics
+
+**Galaxy:**
+- Custom agents
+- Custom device types
+- Custom orchestration strategies
+- Custom visualization components
+
+See respective documentation for extension guides.
+
+
+
+
+🤝 How can I contribute?
+
+We welcome contributions to both UFO² and Galaxy!
+
+**Ways to contribute:**
+- 🐛 Report bugs and issues
+- 💡 Suggest features and improvements
+- 📝 Improve documentation
+- 🧪 Add tests and examples
+- 🔧 Submit pull requests
+
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
+
+
+
+
+
+---
+
+## ⚠️ Disclaimer & License
+
+**Disclaimer:** By using this software, you acknowledge and agree to the terms in [DISCLAIMER.md](./DISCLAIMER.md).
+
+**License:** This project is licensed under the [MIT License](LICENSE).
+
+**Trademarks:** Use of Microsoft trademarks follows [Microsoft's Trademark Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
+
+---
+
+
diff --git a/aip/__init__.py b/aip/__init__.py
new file mode 100644
index 000000000..194039499
--- /dev/null
+++ b/aip/__init__.py
@@ -0,0 +1,116 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Agent Interaction Protocol (AIP)
+
+A lightweight, persistent, and extensible messaging layer for multi-agent orchestration.
+
+AIP provides:
+- Long-lived agent sessions spanning multiple task executions
+- Low-latency event propagation for dynamic scheduling
+- Standardized communication for registration, task dispatch, and result reporting
+- Resilient connection handling with automatic reconnection
+- Extensible protocol with middleware support
+
+Architecture:
+ Messages (aip.messages) - Strongly-typed message definitions
+ ↓
+ Protocol (aip.protocol) - Protocol logic (registration, task execution, heartbeat)
+ ↓
+ Transport (aip.transport) - Transport abstraction (WebSocket, future: HTTP/3, gRPC)
+ ↓
+ Endpoints (aip.endpoints) - Endpoint implementations (Device Server, Device Client, Constellation)
+ ↓
+ Resilience (aip.resilience) - Reconnection, heartbeat, timeout management
+
+Usage:
+ # Device Server
+ from aip.endpoints import DeviceServerEndpoint
+ endpoint = DeviceServerEndpoint(ws_manager, session_manager)
+ await endpoint.handle_websocket(websocket)
+
+ # Device Client
+ from aip.endpoints import DeviceClientEndpoint
+ endpoint = DeviceClientEndpoint(ws_url, ufo_client)
+ await endpoint.start()
+
+ # Constellation Client
+ from aip.endpoints import ConstellationEndpoint
+ endpoint = ConstellationEndpoint(task_name, message_processor)
+ await endpoint.connect_to_device(device_info, message_processor)
+"""
+
+from . import endpoints, extensions, messages, protocol, resilience, transport
+
+__version__ = "1.0.0"
+
+__all__ = [
+ "messages",
+ "transport",
+ "protocol",
+ "endpoints",
+ "resilience",
+ "extensions",
+]
+
+# Convenience exports
+from .endpoints import (
+ ConstellationEndpoint,
+ DeviceClientEndpoint,
+ DeviceServerEndpoint,
+)
+from .messages import (
+ ClientMessage,
+ ClientMessageType,
+ ClientType,
+ Command,
+ Result,
+ ResultStatus,
+ ServerMessage,
+ ServerMessageType,
+ TaskStatus,
+)
+from .protocol import (
+ AIPProtocol,
+ CommandProtocol,
+ DeviceInfoProtocol,
+ HeartbeatProtocol,
+ RegistrationProtocol,
+ TaskExecutionProtocol,
+)
+from .resilience import HeartbeatManager, ReconnectionStrategy, TimeoutManager
+from .transport import Transport, WebSocketTransport
+
+__all__.extend(
+ [
+ # Messages
+ "ClientMessage",
+ "ServerMessage",
+ "ClientMessageType",
+ "ServerMessageType",
+ "ClientType",
+ "TaskStatus",
+ "Command",
+ "Result",
+ "ResultStatus",
+ # Transport
+ "Transport",
+ "WebSocketTransport",
+ # Protocol
+ "AIPProtocol",
+ "RegistrationProtocol",
+ "TaskExecutionProtocol",
+ "HeartbeatProtocol",
+ "DeviceInfoProtocol",
+ "CommandProtocol",
+ # Endpoints
+ "DeviceServerEndpoint",
+ "DeviceClientEndpoint",
+ "ConstellationEndpoint",
+ # Resilience
+ "ReconnectionStrategy",
+ "HeartbeatManager",
+ "TimeoutManager",
+ ]
+)
diff --git a/aip/endpoints/__init__.py b/aip/endpoints/__init__.py
new file mode 100644
index 000000000..8fc74005e
--- /dev/null
+++ b/aip/endpoints/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Endpoints
+
+Provides endpoint implementations for Device Server, Device Client, and Constellation Client.
+"""
+
+from .base import AIPEndpoint
+from .client_endpoint import DeviceClientEndpoint
+from .constellation_endpoint import ConstellationEndpoint
+from .server_endpoint import DeviceServerEndpoint
+
+__all__ = [
+ "AIPEndpoint",
+ "DeviceServerEndpoint",
+ "DeviceClientEndpoint",
+ "ConstellationEndpoint",
+]
diff --git a/aip/endpoints/base.py b/aip/endpoints/base.py
new file mode 100644
index 000000000..4eb08c2a3
--- /dev/null
+++ b/aip/endpoints/base.py
@@ -0,0 +1,147 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Base AIP Endpoint
+
+Provides the foundation for all AIP endpoint implementations.
+"""
+
+import logging
+from abc import ABC, abstractmethod
+from typing import Any, Dict, Optional
+
+from aip.protocol import AIPProtocol
+from aip.resilience import ReconnectionStrategy, TimeoutManager
+
+
+class AIPEndpoint(ABC):
+ """
+ Abstract base class for AIP endpoints.
+
+ An endpoint combines:
+ - Protocol (message handling)
+ - Session management (state tracking)
+ - Resilience (reconnection, heartbeat, timeout)
+
+ Subclasses implement specific endpoint types:
+ - DeviceServerEndpoint: Server-side device connection management
+ - DeviceClientEndpoint: Client-side device operations
+ - ConstellationEndpoint: Constellation client operations
+ """
+
+ def __init__(
+ self,
+ protocol: AIPProtocol,
+ reconnection_strategy: Optional[ReconnectionStrategy] = None,
+ heartbeat_interval: float = 30.0,
+ default_timeout: float = 120.0,
+ ):
+ """
+ Initialize AIP endpoint.
+
+ :param protocol: AIP protocol instance
+ :param reconnection_strategy: Optional reconnection strategy
+ :param heartbeat_interval: Heartbeat interval (seconds)
+ :param default_timeout: Default timeout for operations (seconds)
+ """
+ self.protocol = protocol
+ self.logger = logging.getLogger(self.__class__.__name__)
+
+ # Resilience components
+ self.reconnection_strategy = reconnection_strategy or ReconnectionStrategy()
+ self.timeout_manager = TimeoutManager(default_timeout=default_timeout)
+
+ # Session tracking
+ self.session_handlers: Dict[str, Any] = {}
+
+ @abstractmethod
+ async def start(self) -> None:
+ """
+ Start the endpoint.
+
+ Should establish connections, register handlers, and begin listening for messages.
+ """
+ pass
+
+ @abstractmethod
+ async def stop(self) -> None:
+ """
+ Stop the endpoint.
+
+ Should gracefully close connections and cleanup resources.
+ """
+ pass
+
+ @abstractmethod
+ async def handle_message(self, msg: Any) -> None:
+ """
+ Handle an incoming message.
+
+ :param msg: Message to handle
+ """
+ pass
+
+ def is_connected(self) -> bool:
+ """
+ Check if endpoint is connected.
+
+ :return: True if connected, False otherwise
+ """
+ return self.protocol.is_connected()
+
+ async def send_with_timeout(
+ self, msg: Any, timeout: Optional[float] = None
+ ) -> None:
+ """
+ Send a message with timeout.
+
+ :param msg: Message to send
+ :param timeout: Optional timeout override
+ """
+ await self.timeout_manager.with_timeout(
+ self.protocol.send_message(msg), timeout, "send_message"
+ )
+
+ async def receive_with_timeout(
+ self, message_type: type, timeout: Optional[float] = None
+ ) -> Any:
+ """
+ Receive a message with timeout.
+
+ :param message_type: Expected message type
+ :param timeout: Optional timeout override
+ :return: Received message
+ """
+ return await self.timeout_manager.with_timeout(
+ self.protocol.receive_message(message_type), timeout, "receive_message"
+ )
+
+ @abstractmethod
+ async def reconnect_device(self, device_id: str) -> bool:
+ """
+ Attempt to reconnect to a device.
+
+ :param device_id: Device to reconnect to
+ :return: True if successful, False otherwise
+ """
+ pass
+
+ @abstractmethod
+ async def cancel_device_tasks(self, device_id: str, reason: str) -> None:
+ """
+ Cancel all tasks for a device.
+
+ :param device_id: Device ID
+ :param reason: Cancellation reason
+ """
+ pass
+
+ @abstractmethod
+ async def on_device_disconnected(self, device_id: str) -> None:
+ """
+ Handle device disconnection notification.
+
+ :param device_id: Disconnected device ID
+ """
+ pass
diff --git a/aip/endpoints/client_endpoint.py b/aip/endpoints/client_endpoint.py
new file mode 100644
index 000000000..6ea346eb3
--- /dev/null
+++ b/aip/endpoints/client_endpoint.py
@@ -0,0 +1,151 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Device Client Endpoint
+
+Wraps the existing UFO WebSocket client with AIP protocol abstractions.
+"""
+
+import logging
+from typing import Any
+
+from aip.endpoints.base import AIPEndpoint
+from aip.protocol import AIPProtocol, HeartbeatProtocol, RegistrationProtocol
+from aip.resilience import HeartbeatManager, ReconnectionStrategy
+from aip.transport.websocket import WebSocketTransport
+
+
+class DeviceClientEndpoint(AIPEndpoint):
+ """
+ Device Client endpoint for AIP.
+
+ Wraps the existing UFOWebSocketClient to provide AIP protocol support
+ while maintaining full backward compatibility.
+ """
+
+ def __init__(
+ self,
+ ws_url: str,
+ ufo_client: Any, # UFOClient
+ max_retries: int = 3,
+ timeout: float = 120.0,
+ ):
+ """
+ Initialize device client endpoint.
+
+ :param ws_url: WebSocket server URL
+ :param ufo_client: UFOClient instance
+ :param max_retries: Maximum reconnection retries
+ :param timeout: Connection timeout
+ """
+ # Import here to avoid circular dependency
+ from ufo.client.websocket import UFOWebSocketClient
+
+ # Create transport and protocol
+ transport = WebSocketTransport(
+ ping_interval=20, ping_timeout=180, max_size=100 * 1024 * 1024
+ )
+ protocol = AIPProtocol(transport)
+
+ # Create specialized protocols
+ registration_protocol = RegistrationProtocol(transport)
+ heartbeat_protocol = HeartbeatProtocol(transport)
+
+ # Create reconnection strategy
+ reconnection_strategy = ReconnectionStrategy(
+ max_retries=max_retries,
+ initial_backoff=2.0,
+ max_backoff=60.0,
+ )
+
+ super().__init__(protocol=protocol, reconnection_strategy=reconnection_strategy)
+
+ self.ws_url = ws_url
+ self.ufo_client = ufo_client
+ self.timeout = timeout
+
+ # Use existing client for compatibility
+ self.client = UFOWebSocketClient(ws_url, ufo_client, max_retries, timeout)
+
+ # AIP-specific components
+ self.registration_protocol = registration_protocol
+ self.heartbeat_protocol = heartbeat_protocol
+ self.heartbeat_manager = HeartbeatManager(heartbeat_protocol)
+
+ self.logger = logging.getLogger(f"{__name__}.DeviceClientEndpoint")
+
+ async def start(self) -> None:
+ """
+ Start the endpoint and connect to server.
+ """
+ self.logger.info(f"Starting device client endpoint: {self.ws_url}")
+
+ # Use existing client's connection logic
+ import asyncio
+
+ asyncio.create_task(self.client.connect_and_listen())
+
+ # Wait for connection
+ await self.client.connected_event.wait()
+
+ self.logger.info("Device client endpoint connected")
+
+ async def stop(self) -> None:
+ """Stop the endpoint."""
+ self.logger.info("Stopping device client endpoint")
+
+ # Stop heartbeat
+ await self.heartbeat_manager.stop_all()
+
+ # Close connection
+ if self.client._ws:
+ await self.client._ws.close()
+
+ await self.protocol.close()
+ self.logger.info("Device client endpoint stopped")
+
+ async def handle_message(self, msg: Any) -> None:
+ """
+ Handle an incoming message.
+
+ :param msg: Message to handle
+ """
+ # Messages are handled by the existing client
+ await self.client.handle_message(msg)
+
+ async def reconnect_device(self, device_id: str) -> bool:
+ """
+ Attempt to reconnect.
+
+ :param device_id: Device ID (unused for client)
+ :return: True if successful
+ """
+ try:
+ await self.start()
+ return True
+ except Exception as e:
+ self.logger.error(f"Reconnection failed: {e}")
+ return False
+
+ async def cancel_device_tasks(self, device_id: str, reason: str) -> None:
+ """
+ Cancel device tasks.
+
+ :param device_id: Device ID
+ :param reason: Cancellation reason
+ """
+ # Client-side task cancellation handled by UFOClient
+ self.logger.info(f"Cancelling tasks for {device_id}: {reason}")
+
+ async def on_device_disconnected(self, device_id: str) -> None:
+ """
+ Handle disconnection.
+
+ :param device_id: Device ID
+ """
+ self.logger.warning(f"Device disconnected: {device_id}")
+
+ def is_connected(self) -> bool:
+ """Check if client is connected."""
+ return self.client.is_connected()
diff --git a/aip/endpoints/constellation_endpoint.py b/aip/endpoints/constellation_endpoint.py
new file mode 100644
index 000000000..cb53fc571
--- /dev/null
+++ b/aip/endpoints/constellation_endpoint.py
@@ -0,0 +1,171 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Constellation Client Endpoint
+
+Wraps the existing Galaxy constellation client with AIP protocol abstractions.
+"""
+
+import logging
+from typing import Any, Dict, Optional
+
+from aip.endpoints.base import AIPEndpoint
+from aip.protocol import AIPProtocol, RegistrationProtocol
+from aip.resilience import ReconnectionStrategy
+from aip.transport.websocket import WebSocketTransport
+
+
+class ConstellationEndpoint(AIPEndpoint):
+ """
+ Constellation Client endpoint for AIP.
+
+ Wraps the existing WebSocketConnectionManager to provide AIP protocol support.
+ """
+
+ def __init__(
+ self,
+ task_name: str,
+ message_processor: Any = None, # MessageProcessor
+ ):
+ """
+ Initialize constellation endpoint.
+
+ :param task_name: Task name for this constellation
+ :param message_processor: Optional message processor
+ """
+ # Create transport and protocol
+ transport = WebSocketTransport(
+ ping_interval=30, ping_timeout=30, max_size=100 * 1024 * 1024
+ )
+ protocol = AIPProtocol(transport)
+
+ # Create registration protocol
+ registration_protocol = RegistrationProtocol(transport)
+
+ # Create reconnection strategy
+ reconnection_strategy = ReconnectionStrategy(
+ max_retries=5, initial_backoff=1.0, max_backoff=60.0
+ )
+
+ super().__init__(protocol=protocol, reconnection_strategy=reconnection_strategy)
+
+ self.task_name = task_name
+ self.message_processor = message_processor
+ self.registration_protocol = registration_protocol
+
+ # Import here to avoid circular dependency
+ from galaxy.client.components.connection_manager import (
+ WebSocketConnectionManager,
+ )
+
+ self.connection_manager = WebSocketConnectionManager(task_name)
+
+ self.logger = logging.getLogger(f"{__name__}.ConstellationEndpoint")
+
+ async def start(self) -> None:
+ """Start the endpoint."""
+ self.logger.info(f"Constellation endpoint started for {self.task_name}")
+
+ async def stop(self) -> None:
+ """Stop the endpoint and disconnect all devices."""
+ self.logger.info("Stopping constellation endpoint")
+ await self.connection_manager.disconnect_all()
+ await self.protocol.close()
+
+ async def connect_to_device(
+ self, device_info: Any, message_processor: Any = None
+ ) -> Any:
+ """
+ Connect to a device.
+
+ :param device_info: AgentProfile with device information
+ :param message_processor: Optional message processor
+ :return: WebSocket connection
+ """
+ processor = message_processor or self.message_processor
+ return await self.connection_manager.connect_to_device(device_info, processor)
+
+ async def send_task_to_device(self, device_id: str, task_request: Any) -> Any:
+ """
+ Send task to device.
+
+ :param device_id: Target device ID
+ :param task_request: Task request details
+ :return: Execution result
+ """
+ return await self.connection_manager.send_task_to_device(
+ device_id, task_request
+ )
+
+ async def request_device_info(self, device_id: str) -> Optional[Dict[str, Any]]:
+ """
+ Request device information.
+
+ :param device_id: Device ID
+ :return: Device info dictionary or None
+ """
+ return await self.connection_manager.request_device_info(device_id)
+
+ async def disconnect_device(self, device_id: str) -> None:
+ """
+ Disconnect from a device.
+
+ :param device_id: Device ID
+ """
+ await self.connection_manager.disconnect_device(device_id)
+
+ def is_device_connected(self, device_id: str) -> bool:
+ """
+ Check if device is connected.
+
+ :param device_id: Device ID
+ :return: True if connected
+ """
+ return self.connection_manager.is_connected(device_id)
+
+ async def handle_message(self, msg: Any) -> None:
+ """
+ Handle incoming message.
+
+ :param msg: Message to handle
+ """
+ # Messages handled by message processor
+ if self.message_processor:
+ await self.message_processor.process_message(msg)
+
+ async def reconnect_device(self, device_id: str) -> bool:
+ """
+ Attempt to reconnect to device.
+
+ :param device_id: Device ID
+ :return: True if successful
+ """
+ try:
+ # Get device info from somewhere
+ # This would need to be implemented based on available device registry
+ self.logger.warning(f"Reconnection for {device_id} not fully implemented")
+ return False
+ except Exception as e:
+ self.logger.error(f"Reconnection failed for {device_id}: {e}")
+ return False
+
+ async def cancel_device_tasks(self, device_id: str, reason: str) -> None:
+ """
+ Cancel tasks for device.
+
+ :param device_id: Device ID
+ :param reason: Cancellation reason
+ """
+ # Cancel pending tasks managed by connection manager
+ self.connection_manager._cancel_pending_tasks_for_device(device_id)
+ self.logger.info(f"Cancelled tasks for {device_id}: {reason}")
+
+ async def on_device_disconnected(self, device_id: str) -> None:
+ """
+ Handle device disconnection.
+
+ :param device_id: Device ID
+ """
+ self.logger.warning(f"Device {device_id} disconnected from constellation")
+ await self.cancel_device_tasks(device_id, "device_disconnected")
diff --git a/aip/endpoints/server_endpoint.py b/aip/endpoints/server_endpoint.py
new file mode 100644
index 000000000..7144f7bf8
--- /dev/null
+++ b/aip/endpoints/server_endpoint.py
@@ -0,0 +1,127 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Device Server Endpoint
+
+Wraps the existing UFO server WebSocket handler with AIP protocol abstractions.
+This maintains backward compatibility while providing the AIP interface.
+"""
+
+import logging
+from typing import Any, Optional
+
+from fastapi import WebSocket
+
+from aip.endpoints.base import AIPEndpoint
+from aip.protocol import AIPProtocol
+from aip.resilience import ReconnectionStrategy
+
+
+class DeviceServerEndpoint(AIPEndpoint):
+ """
+ Device Server endpoint for AIP.
+
+ Wraps the existing UFOWebSocketHandler to provide AIP protocol support
+ while maintaining full backward compatibility with existing implementations.
+ """
+
+ def __init__(
+ self,
+ ws_manager: Any, # WSManager
+ session_manager: Any, # SessionManager
+ local: bool = False,
+ protocol: Optional[AIPProtocol] = None,
+ reconnection_strategy: Optional[ReconnectionStrategy] = None,
+ ):
+ """
+ Initialize device server endpoint.
+
+ :param ws_manager: WebSocket manager instance
+ :param session_manager: Session manager instance
+ :param local: Whether running in local mode
+ :param protocol: Optional AIP protocol instance
+ :param reconnection_strategy: Optional reconnection strategy
+ """
+ # Import here to avoid circular dependency
+ from ufo.server.ws.handler import UFOWebSocketHandler
+
+ if protocol is None:
+ # Create a minimal protocol for compatibility
+ from aip.transport.websocket import WebSocketTransport
+
+ protocol = AIPProtocol(WebSocketTransport())
+
+ super().__init__(protocol=protocol, reconnection_strategy=reconnection_strategy)
+
+ self.ws_manager = ws_manager
+ self.session_manager = session_manager
+ self.local = local
+
+ # Use existing handler for actual implementation
+ self.handler = UFOWebSocketHandler(ws_manager, session_manager, local)
+
+ self.logger = logging.getLogger(f"{__name__}.DeviceServerEndpoint")
+
+ async def start(self) -> None:
+ """
+ Start the endpoint.
+
+ Note: For server endpoints, connections are handled per WebSocket.
+ """
+ self.logger.info("Device server endpoint ready")
+
+ async def stop(self) -> None:
+ """Stop the endpoint."""
+ self.logger.info("Device server endpoint stopped")
+
+ async def handle_websocket(self, websocket: WebSocket) -> None:
+ """
+ Handle a WebSocket connection.
+
+ This delegates to the existing UFOWebSocketHandler for full compatibility.
+
+ :param websocket: WebSocket connection
+ """
+ await self.handler.handler(websocket)
+
+ async def handle_message(self, msg: Any) -> None:
+ """
+ Handle an incoming message.
+
+ :param msg: Message to handle
+ """
+ # Messages are handled within the handler per connection
+ pass
+
+ async def reconnect_device(self, device_id: str) -> bool:
+ """
+ Server-side reconnection is handled by client reconnecting.
+
+ :param device_id: Device ID
+ :return: False (server waits for client)
+ """
+ self.logger.debug(f"Server endpoint does not actively reconnect to {device_id}")
+ return False
+
+ async def cancel_device_tasks(self, device_id: str, reason: str) -> None:
+ """
+ Cancel all tasks for a device.
+
+ :param device_id: Device ID
+ :param reason: Cancellation reason
+ """
+ session_ids = self.ws_manager.get_device_sessions(device_id)
+ for session_id in session_ids:
+ try:
+ await self.session_manager.cancel_task(session_id, reason=reason)
+ except Exception as e:
+ self.logger.error(f"Error cancelling session {session_id}: {e}")
+
+ async def on_device_disconnected(self, device_id: str) -> None:
+ """
+ Handle device disconnection notification.
+
+ :param device_id: Disconnected device ID
+ """
+ self.logger.info(f"Device {device_id} disconnected")
diff --git a/aip/extensions/__init__.py b/aip/extensions/__init__.py
new file mode 100644
index 000000000..170cf04bb
--- /dev/null
+++ b/aip/extensions/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Extension Support
+
+Provides extension points for customizing AIP behavior.
+"""
+
+from .base import AIPExtension
+from .middleware import LoggingExtension, MetricsExtension
+
+__all__ = ["AIPExtension", "LoggingExtension", "MetricsExtension"]
diff --git a/aip/extensions/base.py b/aip/extensions/base.py
new file mode 100644
index 000000000..d9e67ca44
--- /dev/null
+++ b/aip/extensions/base.py
@@ -0,0 +1,66 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Base Extension Interface
+
+Defines the interface for AIP extensions.
+"""
+
+from abc import ABC, abstractmethod
+from typing import Any
+
+
+class AIPExtension(ABC):
+ """
+ Abstract base class for AIP extensions.
+
+ Extensions can customize protocol behavior, add logging,
+ collect metrics, or implement custom business logic.
+ """
+
+ @abstractmethod
+ async def on_message_sent(self, msg: Any) -> None:
+ """
+ Called when a message is sent.
+
+ :param msg: Message that was sent
+ """
+ pass
+
+ @abstractmethod
+ async def on_message_received(self, msg: Any) -> None:
+ """
+ Called when a message is received.
+
+ :param msg: Message that was received
+ """
+ pass
+
+ @abstractmethod
+ async def on_connection_established(self, endpoint_id: str) -> None:
+ """
+ Called when a connection is established.
+
+ :param endpoint_id: Endpoint identifier
+ """
+ pass
+
+ @abstractmethod
+ async def on_connection_closed(self, endpoint_id: str) -> None:
+ """
+ Called when a connection is closed.
+
+ :param endpoint_id: Endpoint identifier
+ """
+ pass
+
+ @abstractmethod
+ async def on_error(self, error: Exception, context: str) -> None:
+ """
+ Called when an error occurs.
+
+ :param error: Exception that occurred
+ :param context: Context where error occurred
+ """
+ pass
diff --git a/aip/extensions/middleware.py b/aip/extensions/middleware.py
new file mode 100644
index 000000000..e42c0b918
--- /dev/null
+++ b/aip/extensions/middleware.py
@@ -0,0 +1,135 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Extension Middleware
+
+Provides ready-to-use extensions for common use cases.
+"""
+
+import logging
+import time
+from typing import Any, Dict
+
+from aip.extensions.base import AIPExtension
+
+
+class LoggingExtension(AIPExtension):
+ """
+ Extension that logs all protocol events.
+ """
+
+ def __init__(self, log_level: int = logging.INFO):
+ """
+ Initialize logging extension.
+
+ :param log_level: Log level for events
+ """
+ self.logger = logging.getLogger(f"{__name__}.LoggingExtension")
+ self.log_level = log_level
+
+ async def on_message_sent(self, msg: Any) -> None:
+ """Log sent message."""
+ msg_type = getattr(msg, "type", "unknown")
+ self.logger.log(self.log_level, f"[SENT] {msg_type}")
+
+ async def on_message_received(self, msg: Any) -> None:
+ """Log received message."""
+ msg_type = getattr(msg, "type", "unknown")
+ self.logger.log(self.log_level, f"[RECV] {msg_type}")
+
+ async def on_connection_established(self, endpoint_id: str) -> None:
+ """Log connection establishment."""
+ self.logger.log(self.log_level, f"[CONN] Connection established: {endpoint_id}")
+
+ async def on_connection_closed(self, endpoint_id: str) -> None:
+ """Log connection closure."""
+ self.logger.log(self.log_level, f"[DISC] Connection closed: {endpoint_id}")
+
+ async def on_error(self, error: Exception, context: str) -> None:
+ """Log error."""
+ self.logger.error(f"[ERROR] {context}: {error}", exc_info=True)
+
+
+class MetricsExtension(AIPExtension):
+ """
+ Extension that collects protocol metrics.
+ """
+
+ def __init__(self):
+ """Initialize metrics extension."""
+ self.logger = logging.getLogger(f"{__name__}.MetricsExtension")
+ self.metrics: Dict[str, Any] = {
+ "messages_sent": 0,
+ "messages_received": 0,
+ "connections_established": 0,
+ "connections_closed": 0,
+ "errors": 0,
+ "message_types": {},
+ "latencies": [],
+ }
+ self._message_timestamps: Dict[str, float] = {}
+
+ async def on_message_sent(self, msg: Any) -> None:
+ """Track sent message."""
+ self.metrics["messages_sent"] += 1
+ msg_type = str(getattr(msg, "type", "unknown"))
+ self.metrics["message_types"][msg_type] = (
+ self.metrics["message_types"].get(msg_type, 0) + 1
+ )
+
+ # Track timestamp for latency calculation
+ msg_id = getattr(msg, "request_id", None) or getattr(msg, "response_id", None)
+ if msg_id:
+ self._message_timestamps[msg_id] = time.time()
+
+ async def on_message_received(self, msg: Any) -> None:
+ """Track received message."""
+ self.metrics["messages_received"] += 1
+
+ # Calculate latency if we have a matching sent message
+ msg_id = getattr(msg, "request_id", None) or getattr(msg, "response_id", None)
+ if msg_id and msg_id in self._message_timestamps:
+ latency = time.time() - self._message_timestamps[msg_id]
+ self.metrics["latencies"].append(latency)
+ del self._message_timestamps[msg_id]
+
+ async def on_connection_established(self, endpoint_id: str) -> None:
+ """Track connection establishment."""
+ self.metrics["connections_established"] += 1
+
+ async def on_connection_closed(self, endpoint_id: str) -> None:
+ """Track connection closure."""
+ self.metrics["connections_closed"] += 1
+
+ async def on_error(self, error: Exception, context: str) -> None:
+ """Track error."""
+ self.metrics["errors"] += 1
+
+ def get_metrics(self) -> Dict[str, Any]:
+ """
+ Get collected metrics.
+
+ :return: Metrics dictionary
+ """
+ metrics = self.metrics.copy()
+ if metrics["latencies"]:
+ metrics["avg_latency"] = sum(metrics["latencies"]) / len(
+ metrics["latencies"]
+ )
+ metrics["max_latency"] = max(metrics["latencies"])
+ metrics["min_latency"] = min(metrics["latencies"])
+ return metrics
+
+ def reset_metrics(self) -> None:
+ """Reset all metrics."""
+ self.metrics = {
+ "messages_sent": 0,
+ "messages_received": 0,
+ "connections_established": 0,
+ "connections_closed": 0,
+ "errors": 0,
+ "message_types": {},
+ "latencies": [],
+ }
+ self._message_timestamps.clear()
diff --git a/aip/messages.py b/aip/messages.py
new file mode 100644
index 000000000..97a76f19a
--- /dev/null
+++ b/aip/messages.py
@@ -0,0 +1,555 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Agent Interaction Protocol (AIP) - Message Definitions
+
+This module defines the core message types and structures used in the Agent Interaction Protocol.
+Messages are strongly typed using Pydantic for validation and serialization.
+
+Message Flow:
+ Client → Server: ClientMessage (REGISTER, TASK, HEARTBEAT, COMMAND_RESULTS, etc.)
+ Server → Client: ServerMessage (TASK, COMMAND, TASK_END, HEARTBEAT, etc.)
+
+Key Concepts:
+ - ClientType: Distinguishes between device agents and constellation clients
+ - MessageType: Defines the purpose of each message
+ - TaskStatus: Tracks the state of task execution
+ - Result: Encapsulates command execution outcomes
+"""
+
+from enum import Enum
+from typing import Any, Dict, List, Literal, Optional
+
+from pydantic import BaseModel, ConfigDict, Field
+
+from ufo.client.mcp.mcp_server_manager import BaseMCPServer
+
+
+# ============================================================================
+# Core Data Structures
+# ============================================================================
+
+
+class Rect(BaseModel):
+ """
+ Rectangle coordinates for UI elements.
+ Represents a rectangle with x, y coordinates and width and height.
+ """
+
+ x: int
+ y: int
+ width: int
+ height: int
+
+
+class ControlInfo(BaseModel):
+ """
+ Information about a UI control.
+ """
+
+ annotation_id: Optional[str] = None
+ name: Optional[str] = None
+ title: Optional[str] = None
+ handle: Optional[int] = None
+ class_name: Optional[str] = None
+ rectangle: Optional[Rect] = None
+ control_type: Optional[str] = None
+ automation_id: Optional[str] = None
+ is_enabled: Optional[bool] = None
+ is_visible: Optional[bool] = None
+ source: Optional[str] = None
+ text_content: Optional[str] = None
+
+
+class WindowInfo(ControlInfo):
+ """
+ Information about a window in the UI.
+ """
+
+ process_id: Optional[int] = None
+ process_name: Optional[str] = None
+ is_visible: Optional[bool] = None
+ is_minimized: Optional[bool] = None
+ is_maximized: Optional[bool] = None
+ is_active: Optional[bool] = None
+
+
+class AppWindowControlInfo(BaseModel):
+ """
+ Information about a window and its controls.
+ """
+
+ window_info: WindowInfo
+ controls: Optional[List[ControlInfo]] = None
+
+
+# ============================================================================
+# Tool and Command Structures
+# ============================================================================
+
+
+class MCPToolInfo(BaseModel):
+ """
+ Information about a tool registered with the computer.
+ """
+
+ tool_key: str
+ tool_name: str
+ title: Optional[str] = None
+ namespace: str
+ tool_type: str
+ description: Optional[str] = None
+ input_schema: Optional[Dict[str, Any]] = None
+ output_schema: Optional[Dict[str, Any]] = None
+ meta: Optional[Dict[str, Any]] = None
+ annotations: Optional[Dict[str, Any]] = None
+
+
+class MCPToolCall(BaseModel):
+ """
+ Information about a tool registered with the computer and its associated MCP server.
+ """
+
+ tool_key: str # Unique key for the tool, e.g., "namespace.tool_name"
+ tool_name: str # Name of the tool
+ title: Optional[str] = None # Title of the tool, if any
+ namespace: str # Namespace of the tool, same as the MCP server namespace
+ tool_type: str # Type of the tool (e.g., "action", "data_collection")
+ description: str # Description of the tool
+ input_schema: Optional[Dict[str, Any]] = None # Input schema for the tool, if any
+ output_schema: Optional[Dict[str, Any]] = None # Output schema for the tool, if any
+ parameters: Optional[Dict[str, Any]] = None # Parameters for the tool, if any
+ mcp_server: BaseMCPServer # The BaseMCPServer instance where the tool is registered
+ meta: Optional[Dict[str, Any]] = None # Metadata about the tool, if any
+ annotations: Optional[Dict[str, Any]] = None # Annotations for the tool, if any
+
+ model_config = ConfigDict(arbitrary_types_allowed=True)
+
+ @property
+ def tool_info(self) -> MCPToolInfo:
+ """
+ Get a dictionary representation of the tool call.
+ :return: Dictionary with tool information.
+ """
+ return MCPToolInfo(
+ tool_key=self.tool_key,
+ tool_name=self.tool_name,
+ title=self.title,
+ namespace=self.namespace,
+ tool_type=self.tool_type,
+ description=self.description,
+ input_schema=self.input_schema,
+ output_schema=self.output_schema,
+ meta=self.meta,
+ annotations=self.annotations,
+ )
+
+
+class Command(BaseModel):
+ """
+ Represents a command to be executed by an agent.
+ Commands are atomic units of work dispatched by the orchestrator.
+ """
+
+ tool_name: str = Field(..., description="Name of the tool to execute")
+ parameters: Optional[Dict[str, Any]] = Field(
+ default=None, description="Parameters for the tool"
+ )
+ tool_type: Literal["data_collection", "action"] = Field(
+ ..., description="Type of tool: data_collection or action"
+ )
+ call_id: Optional[str] = Field(
+ default=None, description="Unique identifier for this command call"
+ )
+
+
+# ============================================================================
+# Result and Status Enums
+# ============================================================================
+
+
+class ResultStatus(str, Enum):
+ """
+ Represents the status of a command execution result.
+ """
+
+ SUCCESS = "success"
+ FAILURE = "failure"
+ SKIPPED = "skipped"
+ NONE = "none"
+
+
+class Result(BaseModel):
+ """
+ Represents the result of a command execution.
+ Contains status, error information, and the actual result payload.
+ """
+
+ status: ResultStatus = Field(..., description="Execution status")
+ error: Optional[str] = Field(default=None, description="Error message if failed")
+ result: Any = Field(default=None, description="Result payload")
+ namespace: Optional[str] = Field(
+ default=None, description="Namespace of the executed tool"
+ )
+ call_id: Optional[str] = Field(
+ default=None, description="ID matching the Command.call_id"
+ )
+
+
+class TaskStatus(str, Enum):
+ """
+ Represents the status of a task in the AIP protocol.
+
+ States:
+ CONTINUE: Task is ongoing, more steps needed
+ COMPLETED: Task finished successfully
+ FAILED: Task encountered an error
+ OK: Acknowledgment or health check passed
+ ERROR: Protocol-level error occurred
+ """
+
+ CONTINUE = "continue"
+ COMPLETED = "completed"
+ FAILED = "failed"
+ OK = "ok"
+ ERROR = "error"
+
+
+# ============================================================================
+# Message Type Enums
+# ============================================================================
+
+
+class ClientMessageType(str, Enum):
+ """
+ Message types sent from client to server.
+
+ Registration & Health:
+ REGISTER: Initial registration with server
+ HEARTBEAT: Periodic keepalive signal
+
+ Task Execution:
+ TASK: Request to execute a task
+ TASK_END: Notify task completion
+ COMMAND_RESULTS: Return results of executed commands
+
+ Device Info:
+ DEVICE_INFO_REQUEST: Request device information
+ DEVICE_INFO_RESPONSE: Response with device information
+
+ Error Handling:
+ ERROR: Report an error condition
+ """
+
+ TASK = "task"
+ HEARTBEAT = "heartbeat"
+ COMMAND_RESULTS = "command_results"
+ ERROR = "error"
+ REGISTER = "register"
+ TASK_END = "task_end"
+ DEVICE_INFO_REQUEST = "device_info_request"
+ DEVICE_INFO_RESPONSE = "device_info_response"
+
+
+class ServerMessageType(str, Enum):
+ """
+ Message types sent from server to client.
+
+ Task Execution:
+ TASK: Task assignment to device
+ COMMAND: Command(s) to execute
+ TASK_END: Task completion notification
+
+ Health & Info:
+ HEARTBEAT: Keepalive acknowledgment
+ DEVICE_INFO_REQUEST: Request for device information
+ DEVICE_INFO_RESPONSE: Device information response
+
+ Error Handling:
+ ERROR: Error notification
+ """
+
+ TASK = "task"
+ HEARTBEAT = "heartbeat"
+ TASK_END = "task_end"
+ COMMAND = "command"
+ ERROR = "error"
+ DEVICE_INFO_REQUEST = "device_info_request"
+ DEVICE_INFO_RESPONSE = "device_info_response"
+
+
+class ClientType(str, Enum):
+ """
+ Type of client in the AIP system.
+
+ DEVICE: A device agent that executes tasks
+ CONSTELLATION: An orchestrator that manages multiple devices
+ """
+
+ DEVICE = "device"
+ CONSTELLATION = "constellation"
+
+
+# ============================================================================
+# Core Message Classes
+# ============================================================================
+
+
+class ServerMessage(BaseModel):
+ """
+ Message sent from server to client.
+
+ Represents all server-to-client communications including task assignments,
+ command dispatches, heartbeats, and error notifications.
+
+ Fields:
+ type: Message type (TASK, COMMAND, HEARTBEAT, etc.)
+ status: Task status (CONTINUE, COMPLETED, FAILED, OK, ERROR)
+ user_request: Original user request text
+ agent_name: Name of the agent handling the task
+ process_name: Process name for execution context
+ root_name: Root application name
+ actions: List of commands to execute
+ messages: List of message strings (e.g., logs)
+ error: Error description if status is ERROR
+ session_id: Unique session identifier
+ task_name: Human-readable task name
+ timestamp: ISO 8601 timestamp
+ response_id: Unique response identifier for correlation
+ result: Result payload for TASK_END or DEVICE_INFO_RESPONSE
+ """
+
+ type: ServerMessageType = Field(..., description="Type of server message")
+ status: TaskStatus = Field(..., description="Current task status")
+ user_request: Optional[str] = Field(
+ default=None, description="Original user request"
+ )
+ agent_name: Optional[str] = Field(default=None, description="Agent name")
+ process_name: Optional[str] = Field(default=None, description="Process name")
+ root_name: Optional[str] = Field(default=None, description="Root application name")
+ actions: Optional[List[Command]] = Field(
+ default=None, description="Commands to execute"
+ )
+ messages: Optional[List[str]] = Field(default=None, description="Log messages")
+ error: Optional[str] = Field(default=None, description="Error message")
+ session_id: Optional[str] = Field(default=None, description="Session ID")
+ task_name: Optional[str] = Field(default=None, description="Task name")
+ timestamp: Optional[str] = Field(default=None, description="ISO 8601 timestamp")
+ response_id: Optional[str] = Field(default=None, description="Unique response ID")
+ result: Optional[Any] = Field(default=None, description="Result payload")
+
+
+class ClientMessage(BaseModel):
+ """
+ Message sent from client to server.
+
+ Represents all client-to-server communications including registration,
+ task requests, command results, heartbeats, and error reports.
+
+ Fields:
+ type: Message type (REGISTER, TASK, HEARTBEAT, etc.)
+ status: Task status
+ client_type: Type of client (DEVICE or CONSTELLATION)
+ session_id: Unique session identifier
+ task_name: Human-readable task name
+ client_id: Unique client identifier
+ target_id: Target device ID (for constellation clients)
+ request: Request text (for TASK messages)
+ action_results: Results of executed commands
+ timestamp: ISO 8601 timestamp
+ request_id: Unique request identifier
+ prev_response_id: Previous response ID for correlation
+ error: Error message
+ metadata: Additional metadata (e.g., system info, capabilities)
+ """
+
+ type: ClientMessageType = Field(..., description="Type of client message")
+ status: TaskStatus = Field(..., description="Current task status")
+ client_type: ClientType = Field(
+ default=ClientType.DEVICE, description="Type of client"
+ )
+ session_id: Optional[str] = Field(default=None, description="Session ID")
+ task_name: Optional[str] = Field(default=None, description="Task name")
+ client_id: Optional[str] = Field(default=None, description="Client ID")
+ target_id: Optional[str] = Field(
+ default=None, description="Target device ID (for constellation)"
+ )
+ request: Optional[str] = Field(default=None, description="Request text")
+ action_results: Optional[List[Result]] = Field(
+ default=None, description="Command execution results"
+ )
+ timestamp: Optional[str] = Field(default=None, description="ISO 8601 timestamp")
+ request_id: Optional[str] = Field(default=None, description="Unique request ID")
+ prev_response_id: Optional[str] = Field(
+ default=None, description="Previous response ID"
+ )
+ error: Optional[str] = Field(default=None, description="Error message")
+ metadata: Optional[Dict[str, Any]] = Field(
+ default=None, description="Additional metadata"
+ )
+
+
+# ============================================================================
+# Message Validation and Utilities
+# ============================================================================
+
+
+class MessageValidator:
+ """
+ Validates AIP messages for protocol compliance.
+
+ Provides static methods to validate message structures, required fields,
+ and protocol-level constraints.
+ """
+
+ @staticmethod
+ def validate_registration(msg: ClientMessage) -> bool:
+ """
+ Validate a registration message.
+
+ :param msg: Client message to validate
+ :return: True if valid, False otherwise
+ """
+ if msg.type != ClientMessageType.REGISTER:
+ return False
+ if not msg.client_id:
+ return False
+ if msg.client_type == ClientType.CONSTELLATION and not msg.target_id:
+ # Constellation clients should specify target device
+ pass # Optional, can be set later
+ return True
+
+ @staticmethod
+ def validate_task_request(msg: ClientMessage) -> bool:
+ """
+ Validate a task request message.
+
+ :param msg: Client message to validate
+ :return: True if valid, False otherwise
+ """
+ if msg.type != ClientMessageType.TASK:
+ return False
+ if not msg.request:
+ return False
+ if not msg.client_id:
+ return False
+ return True
+
+ @staticmethod
+ def validate_command_results(msg: ClientMessage) -> bool:
+ """
+ Validate a command results message.
+
+ :param msg: Client message to validate
+ :return: True if valid, False otherwise
+ """
+ if msg.type != ClientMessageType.COMMAND_RESULTS:
+ return False
+ if not msg.prev_response_id:
+ return False
+ if msg.action_results is None:
+ return False
+ return True
+
+ @staticmethod
+ def validate_server_message(msg: ServerMessage) -> bool:
+ """
+ Validate a server message.
+
+ :param msg: Server message to validate
+ :return: True if valid, False otherwise
+ """
+ # Basic validation
+ if not msg.type:
+ return False
+ if not msg.status:
+ return False
+
+ # Type-specific validation
+ if msg.type == ServerMessageType.COMMAND:
+ if not msg.actions:
+ return False
+ if not msg.response_id:
+ return False
+
+ return True
+
+
+# ============================================================================
+# Binary Transfer Message Types (New Feature)
+# ============================================================================
+
+
+class BinaryMetadata(BaseModel):
+ """
+ Metadata for binary data transfer.
+
+ This metadata is sent as a text frame before the actual binary data,
+ allowing receivers to prepare for and validate incoming binary transfers.
+ """
+
+ type: Literal["binary_data"] = "binary_data"
+ filename: Optional[str] = None
+ mime_type: Optional[str] = None
+ size: int = Field(..., description="Size of binary data in bytes")
+ checksum: Optional[str] = Field(
+ None, description="MD5 or SHA256 checksum for validation"
+ )
+ session_id: Optional[str] = None
+ description: Optional[str] = None
+ timestamp: Optional[str] = None
+ # Allow additional custom fields
+ model_config = ConfigDict(extra="allow")
+
+
+class FileTransferStart(BaseModel):
+ """
+ Message to initiate a chunked file transfer.
+
+ Sent before sending file chunks to inform the receiver about
+ the file details and transfer parameters.
+ """
+
+ type: Literal["file_transfer_start"] = "file_transfer_start"
+ filename: str = Field(..., description="Name of file being transferred")
+ size: int = Field(..., description="Total file size in bytes")
+ chunk_size: int = Field(..., description="Size of each chunk in bytes")
+ total_chunks: int = Field(..., description="Total number of chunks")
+ mime_type: Optional[str] = Field(None, description="MIME type of file")
+ session_id: Optional[str] = None
+ description: Optional[str] = None
+ # Allow additional custom fields
+ model_config = ConfigDict(extra="allow")
+
+
+class FileTransferComplete(BaseModel):
+ """
+ Message to signal completion of a chunked file transfer.
+
+ Sent after all file chunks have been transmitted, includes
+ checksum for validation.
+ """
+
+ type: Literal["file_transfer_complete"] = "file_transfer_complete"
+ filename: str = Field(..., description="Name of transferred file")
+ total_chunks: int = Field(..., description="Total chunks sent")
+ checksum: Optional[str] = Field(None, description="MD5 checksum of complete file")
+ session_id: Optional[str] = None
+ # Allow additional custom fields
+ model_config = ConfigDict(extra="allow")
+
+
+class ChunkMetadata(BaseModel):
+ """
+ Metadata for a single file chunk.
+
+ Sent with each chunk during chunked file transfer to track
+ chunk sequence and validate chunk integrity.
+ """
+
+ chunk_num: int = Field(..., description="Chunk sequence number (0-indexed)")
+ chunk_size: int = Field(..., description="Size of this chunk in bytes")
+ checksum: Optional[str] = Field(None, description="Checksum of this chunk")
+ # Allow additional custom fields
+ model_config = ConfigDict(extra="allow")
diff --git a/aip/protocol/__init__.py b/aip/protocol/__init__.py
new file mode 100644
index 000000000..ed8738e22
--- /dev/null
+++ b/aip/protocol/__init__.py
@@ -0,0 +1,26 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Protocol Layer
+
+Implements the core protocol logic for the Agent Interaction Protocol.
+"""
+
+from .base import AIPProtocol, MessageHandler, ProtocolHandler
+from .command import CommandProtocol
+from .device_info import DeviceInfoProtocol
+from .heartbeat import HeartbeatProtocol
+from .registration import RegistrationProtocol
+from .task_execution import TaskExecutionProtocol
+
+__all__ = [
+ "AIPProtocol",
+ "MessageHandler",
+ "ProtocolHandler",
+ "RegistrationProtocol",
+ "TaskExecutionProtocol",
+ "HeartbeatProtocol",
+ "DeviceInfoProtocol",
+ "CommandProtocol",
+]
diff --git a/aip/protocol/base.py b/aip/protocol/base.py
new file mode 100644
index 000000000..90dd97abf
--- /dev/null
+++ b/aip/protocol/base.py
@@ -0,0 +1,599 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Base Protocol Implementation
+
+Provides the core AIP protocol abstractions and message handling infrastructure.
+"""
+
+import logging
+from abc import ABC, abstractmethod
+from typing import Any, Awaitable, Callable, Dict, List, Optional
+
+from aip.messages import ServerMessage
+from aip.transport import Transport
+
+# Type aliases for clarity
+MessageHandler = Callable[[Any], Awaitable[None]]
+ProtocolHandler = Callable[[Any], Awaitable[Optional[Any]]]
+
+
+class AIPProtocol:
+ """
+ Core AIP protocol implementation.
+
+ This class provides the foundation for all AIP communication:
+ - Message serialization and deserialization
+ - Middleware pipeline for extensibility
+ - Message routing and handler registration
+ - Error handling and logging
+
+ The protocol is transport-agnostic and works with any Transport implementation.
+
+ Usage:
+ transport = WebSocketTransport()
+ protocol = AIPProtocol(transport)
+ await protocol.send_message(ClientMessage(...))
+ message = await protocol.receive_message()
+ """
+
+ def __init__(self, transport: Transport):
+ """
+ Initialize AIP protocol.
+
+ :param transport: Transport layer for sending/receiving messages
+ """
+ self.transport = transport
+ self.message_handlers: Dict[str, List[MessageHandler]] = {}
+ self.middleware_chain: List["ProtocolMiddleware"] = []
+ self.logger = logging.getLogger(f"{__name__}.AIPProtocol")
+
+ async def send_message(self, msg: Any) -> None:
+ """
+ Send a message through the protocol.
+
+ Applies outgoing middleware, serializes the message, and sends via transport.
+
+ :param msg: Message to send (ClientMessage or ServerMessage)
+ :raises: ConnectionError if transport not connected
+ :raises: IOError if send fails
+ """
+ try:
+ # Apply outgoing middleware
+ for middleware in self.middleware_chain:
+ msg = await middleware.process_outgoing(msg)
+
+ # Serialize message
+ if hasattr(msg, "model_dump_json"):
+ # Pydantic model
+ serialized = msg.model_dump_json().encode("utf-8")
+ elif isinstance(msg, str):
+ serialized = msg.encode("utf-8")
+ elif isinstance(msg, bytes):
+ serialized = msg
+ else:
+ raise ValueError(f"Unsupported message type: {type(msg)}")
+
+ # Send via transport
+ await self.transport.send(serialized)
+ self.logger.debug(f"Sent message: {msg.__class__.__name__}")
+
+ except (ConnectionError, IOError, OSError) as e:
+ # Connection closed or I/O error - this is common during disconnection
+ # Log at DEBUG level to avoid alarming ERROR logs during normal shutdown
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot send message (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error sending message: {e}")
+ raise
+ except Exception as e:
+ self.logger.error(f"Error sending message: {e}")
+ raise
+
+ async def receive_message(self, message_type: type = ServerMessage) -> Any:
+ """
+ Receive a message through the protocol.
+
+ Receives data from transport, deserializes, and applies incoming middleware.
+
+ :param message_type: Expected message type (ClientMessage or ServerMessage)
+ :return: Deserialized message
+ :raises: ConnectionError if transport not connected
+ :raises: IOError if receive fails
+ """
+ try:
+ # Receive via transport
+ data = await self.transport.receive()
+
+ # Deserialize message
+ if isinstance(data, bytes):
+ data = data.decode("utf-8")
+
+ if hasattr(message_type, "model_validate_json"):
+ # Pydantic model
+ msg = message_type.model_validate_json(data)
+ else:
+ raise ValueError(f"Unsupported message type: {message_type}")
+
+ # Apply incoming middleware
+ for middleware in reversed(self.middleware_chain):
+ msg = await middleware.process_incoming(msg)
+
+ self.logger.debug(f"Received message: {msg.__class__.__name__}")
+ return msg
+
+ except (ConnectionError, IOError, OSError) as e:
+ # Connection closed or I/O error - this is common during disconnection
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot receive message (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error receiving message: {e}")
+ raise
+ except Exception as e:
+ self.logger.error(f"Error receiving message: {e}")
+ raise
+
+ def add_middleware(self, middleware: "ProtocolMiddleware") -> None:
+ """
+ Add middleware to the protocol pipeline.
+
+ Middleware is applied in order for outgoing messages,
+ and in reverse order for incoming messages.
+
+ :param middleware: Middleware to add
+ """
+ self.middleware_chain.append(middleware)
+ self.logger.info(f"Added middleware: {middleware.__class__.__name__}")
+
+ def register_handler(self, message_type: str, handler: MessageHandler) -> None:
+ """
+ Register a handler for a specific message type.
+
+ :param message_type: Message type string (e.g., "task", "heartbeat")
+ :param handler: Async function to handle the message
+ """
+ if message_type not in self.message_handlers:
+ self.message_handlers[message_type] = []
+ self.message_handlers[message_type].append(handler)
+ self.logger.debug(f"Registered handler for: {message_type}")
+
+ async def dispatch_message(self, msg: Any) -> None:
+ """
+ Dispatch a message to registered handlers.
+
+ :param msg: Message to dispatch
+ """
+ msg_type = getattr(msg, "type", None)
+ if msg_type and msg_type in self.message_handlers:
+ for handler in self.message_handlers[msg_type]:
+ try:
+ await handler(msg)
+ except Exception as e:
+ self.logger.error(
+ f"Error in handler for {msg_type}: {e}", exc_info=True
+ )
+ else:
+ self.logger.warning(f"No handler for message type: {msg_type}")
+
+ def is_connected(self) -> bool:
+ """Check if protocol transport is connected."""
+ return self.transport.is_connected
+
+ async def send_error(
+ self, error_msg: str, response_id: Optional[str] = None
+ ) -> None:
+ """
+ Send a generic error message (server-side).
+
+ :param error_msg: Error message
+ :param response_id: Optional response ID for correlation
+ """
+ import datetime
+ import uuid
+
+ from aip.messages import ServerMessage, ServerMessageType, TaskStatus
+
+ error_message = ServerMessage(
+ type=ServerMessageType.ERROR,
+ status=TaskStatus.ERROR,
+ error=error_msg,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id or str(uuid.uuid4()),
+ )
+ await self.send_message(error_message)
+
+ async def send_ack(
+ self, session_id: Optional[str] = None, response_id: Optional[str] = None
+ ) -> None:
+ """
+ Send a generic acknowledgment message (server-side).
+
+ :param session_id: Optional session ID
+ :param response_id: Optional response ID for correlation
+ """
+ import datetime
+ import uuid
+
+ from aip.messages import ServerMessage, ServerMessageType, TaskStatus
+
+ ack_message = ServerMessage(
+ type=ServerMessageType.HEARTBEAT,
+ status=TaskStatus.OK,
+ session_id=session_id,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id or str(uuid.uuid4()),
+ )
+ await self.send_message(ack_message)
+
+ async def close(self) -> None:
+ """Close protocol and transport."""
+ await self.transport.close()
+
+ # ========================================================================
+ # Binary Message Handling (New Feature)
+ # ========================================================================
+
+ async def send_binary_message(
+ self, data: bytes, metadata: Optional[Dict[str, Any]] = None
+ ) -> None:
+ """
+ Send a binary message with optional metadata.
+
+ Uses a two-frame approach for structured binary transfers:
+ 1. Text frame with JSON metadata (filename, size, mime_type, checksum, etc.)
+ 2. Binary frame with actual file data
+
+ This approach allows receivers to prepare for incoming binary data
+ and validate it after reception.
+
+ :param data: Binary data to send (image, file, etc.)
+ :param metadata: Optional metadata dict with fields like:
+ - filename: str
+ - mime_type: str (e.g., "image/png", "application/pdf")
+ - size: int (will be auto-filled)
+ - checksum: str (optional, for validation)
+ - session_id: str (optional)
+ - custom fields as needed
+
+ :raises: ConnectionError if transport not connected
+ :raises: IOError if send fails
+
+ Example:
+ # Send an image with metadata
+ with open("screenshot.png", "rb") as f:
+ image_data = f.read()
+
+ await protocol.send_binary_message(
+ data=image_data,
+ metadata={
+ "filename": "screenshot.png",
+ "mime_type": "image/png",
+ "description": "Desktop screenshot"
+ }
+ )
+ """
+ import datetime
+ import json
+
+ try:
+ # 1. Prepare and send metadata as text frame
+ meta = metadata or {}
+ meta.update(
+ {
+ "type": "binary_data",
+ "size": len(data),
+ "timestamp": datetime.datetime.now(
+ datetime.timezone.utc
+ ).isoformat(),
+ }
+ )
+
+ meta_json = json.dumps(meta)
+ await self.transport.send(meta_json.encode("utf-8"))
+ self.logger.debug(f"Sent binary metadata: {meta}")
+
+ # 2. Send actual data as binary frame
+ await self.transport.send_binary(data)
+ self.logger.debug(f"Sent {len(data)} bytes of binary data")
+
+ except Exception as e:
+ self.logger.error(f"Error sending binary message: {e}")
+ raise
+
+ async def receive_binary_message(
+ self, validate_size: bool = True
+ ) -> tuple[bytes, Dict[str, Any]]:
+ """
+ Receive a binary message with metadata.
+
+ Expects a two-frame sequence:
+ 1. Text frame with JSON metadata
+ 2. Binary frame with actual data
+
+ :param validate_size: If True, validates received size matches metadata
+ :return: Tuple of (binary_data, metadata_dict)
+ :raises: ConnectionError if connection closed
+ :raises: IOError if receive fails
+ :raises: ValueError if size validation fails
+
+ Example:
+ # Receive a binary file
+ data, metadata = await protocol.receive_binary_message()
+
+ filename = metadata.get("filename", "received_file.bin")
+ with open(filename, "wb") as f:
+ f.write(data)
+
+ print(f"Received: {filename} ({len(data)} bytes)")
+ """
+ import json
+
+ try:
+ # 1. Receive metadata as text frame
+ meta_bytes = await self.transport.receive()
+ meta = json.loads(meta_bytes.decode("utf-8"))
+ self.logger.debug(f"Received binary metadata: {meta}")
+
+ # Validate metadata type
+ if meta.get("type") != "binary_data":
+ self.logger.warning(
+ f"Expected binary_data message, got: {meta.get('type')}"
+ )
+
+ # 2. Receive actual binary data
+ data = await self.transport.receive_binary()
+ self.logger.debug(f"Received {len(data)} bytes of binary data")
+
+ # 3. Validate size if requested
+ if validate_size and "size" in meta:
+ expected_size = meta["size"]
+ actual_size = len(data)
+ if actual_size != expected_size:
+ error_msg = (
+ f"Size mismatch: expected {expected_size} bytes, "
+ f"got {actual_size} bytes"
+ )
+ self.logger.error(error_msg)
+ raise ValueError(error_msg)
+
+ return data, meta
+
+ except Exception as e:
+ self.logger.error(f"Error receiving binary message: {e}")
+ raise
+
+ async def send_file(
+ self,
+ file_path: str,
+ chunk_size: int = 1024 * 1024, # 1MB chunks
+ compute_checksum: bool = True,
+ ) -> None:
+ """
+ Send a file in chunks (for large files).
+
+ Sends large files by splitting them into chunks and sending
+ a completion message with checksum for validation.
+
+ Protocol:
+ 1. Send file_transfer_start message (text frame)
+ 2. Send file chunks as binary messages
+ 3. Send file_transfer_complete message with checksum (text frame)
+
+ :param file_path: Path to file to send
+ :param chunk_size: Size of each chunk in bytes (default: 1MB)
+ :param compute_checksum: If True, computes and sends MD5 checksum
+ :raises: FileNotFoundError if file doesn't exist
+ :raises: IOError if send fails
+
+ Example:
+ # Send a large video file
+ await protocol.send_file(
+ "video.mp4",
+ chunk_size=2 * 1024 * 1024 # 2MB chunks
+ )
+ """
+ import hashlib
+ import os
+
+ if not os.path.exists(file_path):
+ raise FileNotFoundError(f"File not found: {file_path}")
+
+ file_size = os.path.getsize(file_path)
+ file_name = os.path.basename(file_path)
+ total_chunks = (file_size + chunk_size - 1) // chunk_size
+
+ # Detect MIME type
+ import mimetypes
+ import json
+
+ mime_type, _ = mimetypes.guess_type(file_path)
+
+ # Send file header (as JSON string)
+ header_msg = {
+ "type": "file_transfer_start",
+ "filename": file_name,
+ "size": file_size,
+ "chunk_size": chunk_size,
+ "total_chunks": total_chunks,
+ "mime_type": mime_type,
+ }
+ await self.transport.send(json.dumps(header_msg).encode("utf-8"))
+
+ # Send file in chunks
+ md5_hash = hashlib.md5() if compute_checksum else None
+
+ with open(file_path, "rb") as f:
+ chunk_num = 0
+
+ while True:
+ chunk = f.read(chunk_size)
+ if not chunk:
+ break
+
+ if md5_hash:
+ md5_hash.update(chunk)
+
+ await self.send_binary_message(
+ chunk, {"chunk_num": chunk_num, "chunk_size": len(chunk)}
+ )
+
+ chunk_num += 1
+ self.logger.info(f"Sent chunk {chunk_num}/{total_chunks}")
+
+ # Send completion with checksum (as JSON string)
+ completion_msg = {
+ "type": "file_transfer_complete",
+ "filename": file_name,
+ "total_chunks": chunk_num,
+ }
+
+ if md5_hash:
+ completion_msg["checksum"] = md5_hash.hexdigest()
+
+ await self.transport.send(json.dumps(completion_msg).encode("utf-8"))
+ self.logger.info(f"File transfer complete: {file_name}")
+
+ async def receive_file(
+ self, output_path: str, validate_checksum: bool = True
+ ) -> Dict[str, Any]:
+ """
+ Receive a file that was sent in chunks.
+
+ Receives a chunked file transfer and writes to the specified path.
+ Validates checksum if provided.
+
+ :param output_path: Path where received file should be saved
+ :param validate_checksum: If True, validates MD5 checksum
+ :return: Dictionary with transfer metadata (filename, size, checksum, etc.)
+ :raises: IOError if receive fails
+ :raises: ValueError if checksum validation fails
+
+ Example:
+ # Receive a file
+ metadata = await protocol.receive_file("downloads/received_video.mp4")
+ print(f"Received: {metadata['filename']} ({metadata['size']} bytes)")
+ """
+ import hashlib
+ import json
+ import os
+
+ # 1. Receive file header
+ header_bytes = await self.transport.receive()
+ header = json.loads(header_bytes.decode("utf-8"))
+
+ if header.get("type") != "file_transfer_start":
+ raise ValueError(f"Expected file_transfer_start, got: {header.get('type')}")
+
+ filename = header["filename"]
+ total_size = header["size"]
+ total_chunks = header["total_chunks"]
+
+ self.logger.info(
+ f"Receiving file: {filename} ({total_size} bytes, {total_chunks} chunks)"
+ )
+
+ # 2. Receive chunks and write to file
+ md5_hash = hashlib.md5() if validate_checksum else None
+ os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
+
+ with open(output_path, "wb") as f:
+ for chunk_num in range(total_chunks):
+ data, chunk_meta = await self.receive_binary_message()
+
+ if md5_hash:
+ md5_hash.update(data)
+
+ f.write(data)
+ self.logger.info(f"Received chunk {chunk_num + 1}/{total_chunks}")
+
+ # 3. Receive completion message
+ completion_bytes = await self.transport.receive()
+ completion = json.loads(completion_bytes.decode("utf-8"))
+
+ if completion.get("type") != "file_transfer_complete":
+ raise ValueError(
+ f"Expected file_transfer_complete, got: {completion.get('type')}"
+ )
+
+ # 4. Validate checksum
+ if validate_checksum and "checksum" in completion:
+ expected_checksum = completion["checksum"]
+ actual_checksum = md5_hash.hexdigest()
+
+ if actual_checksum != expected_checksum:
+ error_msg = (
+ f"Checksum mismatch: expected {expected_checksum}, "
+ f"got {actual_checksum}"
+ )
+ self.logger.error(error_msg)
+ raise ValueError(error_msg)
+
+ self.logger.info(f"Checksum validated: {actual_checksum}")
+
+ self.logger.info(f"File received successfully: {output_path}")
+
+ return {
+ "filename": filename,
+ "size": total_size,
+ "output_path": output_path,
+ "checksum": completion.get("checksum"),
+ }
+
+
+class ProtocolMiddleware(ABC):
+ """
+ Abstract base class for protocol middleware.
+
+ Middleware can intercept and modify messages in both directions,
+ enabling cross-cutting concerns like logging, metrics, and encryption.
+ """
+
+ @abstractmethod
+ async def process_outgoing(self, msg: Any) -> Any:
+ """
+ Process outgoing message.
+
+ :param msg: Outgoing message
+ :return: Modified message
+ """
+ pass
+
+ @abstractmethod
+ async def process_incoming(self, msg: Any) -> Any:
+ """
+ Process incoming message.
+
+ :param msg: Incoming message
+ :return: Modified message
+ """
+ pass
+
+
+class LoggingMiddleware(ProtocolMiddleware):
+ """
+ Middleware that logs all messages.
+
+ Useful for debugging and monitoring protocol communication.
+ """
+
+ def __init__(self, log_level: int = logging.DEBUG):
+ """
+ Initialize logging middleware.
+
+ :param log_level: Log level for messages
+ """
+ self.logger = logging.getLogger(f"{__name__}.LoggingMiddleware")
+ self.log_level = log_level
+
+ async def process_outgoing(self, msg: Any) -> Any:
+ """Log outgoing message."""
+ self.logger.log(self.log_level, f"[OUT] {msg}")
+ return msg
+
+ async def process_incoming(self, msg: Any) -> Any:
+ """Log incoming message."""
+ self.logger.log(self.log_level, f"[IN] {msg}")
+ return msg
diff --git a/aip/protocol/command.py b/aip/protocol/command.py
new file mode 100644
index 000000000..bc8409238
--- /dev/null
+++ b/aip/protocol/command.py
@@ -0,0 +1,76 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Command Protocol
+
+Handles command execution at a fine-grained level.
+"""
+
+import logging
+from typing import List
+
+from aip.messages import Command, Result
+from aip.protocol.base import AIPProtocol
+
+
+class CommandProtocol(AIPProtocol):
+ """
+ Command execution protocol for AIP.
+
+ Provides fine-grained command execution with:
+ - Typed arguments
+ - Result validation
+ - Error propagation
+ - Batch command support
+ """
+
+ def __init__(self, *args, **kwargs):
+ """Initialize command protocol."""
+ super().__init__(*args, **kwargs)
+ self.logger = logging.getLogger(f"{__name__}.CommandProtocol")
+
+ def validate_command(self, cmd: Command) -> bool:
+ """
+ Validate a command structure.
+
+ :param cmd: Command to validate
+ :return: True if valid, False otherwise
+ """
+ if not cmd.tool_name:
+ self.logger.error("Command missing tool_name")
+ return False
+ if not cmd.tool_type:
+ self.logger.error("Command missing tool_type")
+ return False
+ return True
+
+ def validate_commands(self, commands: List[Command]) -> bool:
+ """
+ Validate a batch of commands.
+
+ :param commands: Commands to validate
+ :return: True if all valid, False otherwise
+ """
+ return all(self.validate_command(cmd) for cmd in commands)
+
+ def validate_result(self, result: Result) -> bool:
+ """
+ Validate a command result.
+
+ :param result: Result to validate
+ :return: True if valid, False otherwise
+ """
+ if not result.status:
+ self.logger.error("Result missing status")
+ return False
+ return True
+
+ def validate_results(self, results: List[Result]) -> bool:
+ """
+ Validate a batch of results.
+
+ :param results: Results to validate
+ :return: True if all valid, False otherwise
+ """
+ return all(self.validate_result(res) for res in results)
diff --git a/aip/protocol/device_info.py b/aip/protocol/device_info.py
new file mode 100644
index 000000000..7d0653c99
--- /dev/null
+++ b/aip/protocol/device_info.py
@@ -0,0 +1,113 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Device Info Protocol
+
+Handles device information requests and responses.
+"""
+
+import datetime
+import logging
+from typing import Any, Dict, Optional
+from uuid import uuid4
+
+from aip.messages import (
+ ClientMessage,
+ ClientMessageType,
+ ClientType,
+ ServerMessage,
+ ServerMessageType,
+ TaskStatus,
+)
+from aip.protocol.base import AIPProtocol
+
+
+class DeviceInfoProtocol(AIPProtocol):
+ """
+ Device information protocol for AIP.
+
+ Handles:
+ - Device info requests from constellation
+ - Device info responses from device
+ - System information exchange
+ """
+
+ def __init__(self, *args, **kwargs):
+ """Initialize device info protocol."""
+ super().__init__(*args, **kwargs)
+ self.logger = logging.getLogger(f"{__name__}.DeviceInfoProtocol")
+
+ async def request_device_info(
+ self,
+ constellation_id: str,
+ target_device: str,
+ request_id: Optional[str] = None,
+ ) -> None:
+ """
+ Request device information (constellation-side).
+
+ :param constellation_id: Constellation client ID
+ :param target_device: Target device ID
+ :param request_id: Optional request ID for correlation
+ """
+ req_msg = ClientMessage(
+ type=ClientMessageType.DEVICE_INFO_REQUEST,
+ client_type=ClientType.CONSTELLATION,
+ client_id=constellation_id,
+ target_id=target_device,
+ request_id=request_id or str(uuid4()),
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ status=TaskStatus.OK,
+ )
+ await self.send_message(req_msg)
+ self.logger.info(
+ f"Sent device info request: {constellation_id} → {target_device}"
+ )
+
+ async def send_device_info_response(
+ self,
+ device_info: Optional[Dict[str, Any]],
+ request_id: str,
+ error: Optional[str] = None,
+ ) -> None:
+ """
+ Send device information response (server-side).
+
+ :param device_info: Device information dictionary
+ :param request_id: Request ID for correlation
+ :param error: Optional error message
+ """
+ status = TaskStatus.OK if error is None else TaskStatus.ERROR
+ resp_msg = ServerMessage(
+ type=ServerMessageType.DEVICE_INFO_RESPONSE,
+ status=status,
+ result=device_info,
+ error=error,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=request_id,
+ )
+ await self.send_message(resp_msg)
+ self.logger.info(f"Sent device info response (request_id: {request_id})")
+
+ async def send_device_info_push(
+ self,
+ device_id: str,
+ device_info: Dict[str, Any],
+ ) -> None:
+ """
+ Push device information proactively (device-side, future use).
+
+ :param device_id: Device ID
+ :param device_info: Device information dictionary
+ """
+ push_msg = ClientMessage(
+ type=ClientMessageType.DEVICE_INFO_RESPONSE,
+ client_id=device_id,
+ client_type=ClientType.DEVICE,
+ metadata=device_info,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ )
+ await self.send_message(push_msg)
+ self.logger.info(f"Pushed device info from {device_id}")
diff --git a/aip/protocol/heartbeat.py b/aip/protocol/heartbeat.py
new file mode 100644
index 000000000..a95a98e0b
--- /dev/null
+++ b/aip/protocol/heartbeat.py
@@ -0,0 +1,122 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Heartbeat Protocol
+
+Handles periodic keepalive messages to maintain connection health.
+"""
+
+import asyncio
+import datetime
+import logging
+from typing import Optional
+from uuid import uuid4
+
+from aip.messages import (
+ ClientMessage,
+ ClientMessageType,
+ ServerMessage,
+ ServerMessageType,
+ TaskStatus,
+)
+from aip.protocol.base import AIPProtocol
+
+
+class HeartbeatProtocol(AIPProtocol):
+ """
+ Heartbeat protocol for AIP.
+
+ Provides:
+ - Periodic heartbeat messages
+ - Connection health monitoring
+ - Automatic heartbeat management
+ """
+
+ def __init__(self, *args, **kwargs):
+ """Initialize heartbeat protocol."""
+ super().__init__(*args, **kwargs)
+ self.logger = logging.getLogger(f"{__name__}.HeartbeatProtocol")
+ self._heartbeat_task: Optional[asyncio.Task] = None
+ self._heartbeat_interval: float = 30.0 # Default: 30 seconds
+
+ async def send_heartbeat(
+ self, client_id: str, metadata: Optional[dict] = None
+ ) -> None:
+ """
+ Send a single heartbeat message (client-side).
+
+ :param client_id: Client ID
+ :param metadata: Optional metadata dictionary
+ """
+ heartbeat_msg = ClientMessage(
+ type=ClientMessageType.HEARTBEAT,
+ client_id=client_id,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ metadata=metadata,
+ )
+ await self.send_message(heartbeat_msg)
+ self.logger.debug(f"Sent heartbeat from {client_id}")
+
+ async def send_heartbeat_ack(self, response_id: Optional[str] = None) -> None:
+ """
+ Send heartbeat acknowledgment (server-side).
+
+ :param response_id: Optional response ID
+ """
+ ack_msg = ServerMessage(
+ type=ServerMessageType.HEARTBEAT,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id or str(uuid4()),
+ )
+ await self.send_message(ack_msg)
+ self.logger.debug("Sent heartbeat acknowledgment")
+
+ async def start_heartbeat(self, client_id: str, interval: float = 30.0) -> None:
+ """
+ Start automatic heartbeat sending.
+
+ :param client_id: Client ID
+ :param interval: Interval between heartbeats (seconds)
+ """
+ if self._heartbeat_task is not None:
+ self.logger.warning("Heartbeat already running, stopping existing task")
+ await self.stop_heartbeat()
+
+ self._heartbeat_interval = interval
+ self._heartbeat_task = asyncio.create_task(
+ self._heartbeat_loop(client_id, interval)
+ )
+ self.logger.info(f"Started heartbeat for {client_id} (interval: {interval}s)")
+
+ async def stop_heartbeat(self) -> None:
+ """Stop automatic heartbeat sending."""
+ if self._heartbeat_task is not None:
+ self._heartbeat_task.cancel()
+ try:
+ await self._heartbeat_task
+ except asyncio.CancelledError:
+ pass
+ self._heartbeat_task = None
+ self.logger.info("Stopped heartbeat")
+
+ async def _heartbeat_loop(self, client_id: str, interval: float) -> None:
+ """
+ Internal heartbeat loop.
+
+ :param client_id: Client ID
+ :param interval: Interval between heartbeats (seconds)
+ """
+ try:
+ while True:
+ await asyncio.sleep(interval)
+ if self.is_connected():
+ await self.send_heartbeat(client_id)
+ else:
+ self.logger.warning("Transport not connected, skipping heartbeat")
+ except asyncio.CancelledError:
+ self.logger.debug("Heartbeat loop cancelled")
+ except Exception as e:
+ self.logger.error(f"Error in heartbeat loop: {e}", exc_info=True)
diff --git a/aip/protocol/registration.py b/aip/protocol/registration.py
new file mode 100644
index 000000000..3e458363a
--- /dev/null
+++ b/aip/protocol/registration.py
@@ -0,0 +1,210 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Registration Protocol
+
+Handles agent registration and capability advertisement in the AIP system.
+"""
+
+import datetime
+import logging
+from typing import Any, Dict, Optional
+
+from aip.messages import (
+ ClientMessage,
+ ClientMessageType,
+ ClientType,
+ ServerMessage,
+ ServerMessageType,
+ TaskStatus,
+)
+from aip.protocol.base import AIPProtocol
+
+
+class RegistrationProtocol(AIPProtocol):
+ """
+ Registration protocol for AIP.
+
+ Handles:
+ - Device agent registration
+ - Constellation client registration
+ - Capability advertisement
+ - Metadata exchange
+ """
+
+ def __init__(self, *args, **kwargs):
+ """Initialize registration protocol."""
+ super().__init__(*args, **kwargs)
+ self.logger = logging.getLogger(f"{__name__}.RegistrationProtocol")
+
+ async def register_as_device(
+ self,
+ device_id: str,
+ metadata: Optional[Dict[str, Any]] = None,
+ platform: str = "windows",
+ ) -> bool:
+ """
+ Register as a device agent.
+
+ :param device_id: Unique device identifier
+ :param metadata: Optional device metadata (system info, capabilities, etc.)
+ :param platform: Platform type (windows, linux, etc.)
+ :return: True if registration successful, False otherwise
+ """
+ try:
+ # Prepare metadata
+ if metadata is None:
+ metadata = {}
+
+ # Add platform to metadata
+ if "platform" not in metadata:
+ metadata["platform"] = platform
+
+ # Add registration timestamp
+ metadata["registration_time"] = datetime.datetime.now(
+ datetime.timezone.utc
+ ).isoformat()
+
+ # Create registration message
+ reg_msg = ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_id=device_id,
+ client_type=ClientType.DEVICE,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ metadata=metadata,
+ )
+
+ # Send registration
+ await self.send_message(reg_msg)
+ self.logger.info(f"Sent device registration for {device_id}")
+
+ # Wait for server response
+ response = await self.receive_message(ServerMessage)
+
+ if response.status == TaskStatus.OK:
+ self.logger.info(f"Device {device_id} registered successfully")
+ return True
+ else:
+ self.logger.error(
+ f"Device registration failed: {response.error or 'Unknown error'}"
+ )
+ return False
+
+ except Exception as e:
+ self.logger.error(f"Error during device registration: {e}", exc_info=True)
+ return False
+
+ async def register_as_constellation(
+ self,
+ constellation_id: str,
+ target_device: str,
+ metadata: Optional[Dict[str, Any]] = None,
+ ) -> bool:
+ """
+ Register as a constellation client.
+
+ :param constellation_id: Unique constellation identifier
+ :param target_device: Target device ID for this constellation
+ :param metadata: Optional constellation metadata
+ :return: True if registration successful, False otherwise
+ """
+ try:
+ # Prepare metadata
+ if metadata is None:
+ metadata = {}
+
+ # Add constellation-specific metadata
+ metadata.update(
+ {
+ "type": "constellation_client",
+ "targeted_device_id": target_device,
+ "registration_time": datetime.datetime.now(
+ datetime.timezone.utc
+ ).isoformat(),
+ }
+ )
+
+ # Create registration message
+ reg_msg = ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_id=constellation_id,
+ client_type=ClientType.CONSTELLATION,
+ target_id=target_device,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ metadata=metadata,
+ )
+
+ # Send registration
+ await self.send_message(reg_msg)
+ self.logger.info(
+ f"Sent constellation registration for {constellation_id} → {target_device}"
+ )
+
+ # Wait for server response
+ response = await self.receive_message(ServerMessage)
+
+ if response.status == TaskStatus.OK:
+ self.logger.info(
+ f"Constellation {constellation_id} registered successfully"
+ )
+ return True
+ elif response.status == TaskStatus.ERROR:
+ self.logger.error(
+ f"Constellation registration failed: {response.error or 'Unknown error'}"
+ )
+ return False
+ else:
+ self.logger.warning(
+ f"Unexpected registration response: {response.status}"
+ )
+ return False
+
+ except Exception as e:
+ self.logger.error(
+ f"Error during constellation registration: {e}", exc_info=True
+ )
+ return False
+
+ async def send_registration_confirmation(
+ self, response_id: Optional[str] = None
+ ) -> None:
+ """
+ Send registration confirmation (server-side).
+
+ :param response_id: Optional response ID for correlation
+ """
+ confirmation = ServerMessage(
+ type=ServerMessageType.HEARTBEAT,
+ status=TaskStatus.OK,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id or self._generate_response_id(),
+ )
+ await self.send_message(confirmation)
+
+ async def send_registration_error(
+ self, error: str, response_id: Optional[str] = None
+ ) -> None:
+ """
+ Send registration error (server-side).
+
+ :param error: Error message
+ :param response_id: Optional response ID for correlation
+ """
+ error_msg = ServerMessage(
+ type=ServerMessageType.ERROR,
+ status=TaskStatus.ERROR,
+ error=error,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id or self._generate_response_id(),
+ )
+ await self.send_message(error_msg)
+
+ @staticmethod
+ def _generate_response_id() -> str:
+ """Generate a unique response ID."""
+ import uuid
+
+ return str(uuid.uuid4())
diff --git a/aip/protocol/task_execution.py b/aip/protocol/task_execution.py
new file mode 100644
index 000000000..280d88d75
--- /dev/null
+++ b/aip/protocol/task_execution.py
@@ -0,0 +1,281 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Task Execution Protocol
+
+Handles task assignment, execution coordination, and result reporting.
+"""
+
+import datetime
+import logging
+from typing import Any, List, Optional
+from uuid import uuid4
+
+from aip.messages import (
+ ClientMessage,
+ ClientMessageType,
+ ClientType,
+ Command,
+ Result,
+ ServerMessage,
+ ServerMessageType,
+ TaskStatus,
+)
+from aip.protocol.base import AIPProtocol
+
+
+class TaskExecutionProtocol(AIPProtocol):
+ """
+ Task execution protocol for AIP.
+
+ Handles:
+ - Task assignment from constellation to device
+ - Task status updates
+ - Command execution
+ - Result reporting
+ """
+
+ def __init__(self, *args, **kwargs):
+ """Initialize task execution protocol."""
+ super().__init__(*args, **kwargs)
+ self.logger = logging.getLogger(f"{__name__}.TaskExecutionProtocol")
+
+ async def send_task_request(
+ self,
+ request: str,
+ task_name: str,
+ session_id: str,
+ client_id: str,
+ target_id: Optional[str] = None,
+ client_type: ClientType = ClientType.DEVICE,
+ metadata: Optional[dict] = None,
+ ) -> None:
+ """
+ Send a task request.
+
+ :param request: Task request text
+ :param task_name: Task name
+ :param session_id: Session ID
+ :param client_id: Client ID
+ :param target_id: Target device ID (for constellation)
+ :param client_type: Type of client
+ :param metadata: Optional metadata
+ """
+ task_msg = ClientMessage(
+ type=ClientMessageType.TASK,
+ request=request,
+ task_name=task_name,
+ session_id=session_id,
+ client_id=client_id,
+ target_id=target_id,
+ client_type=client_type,
+ status=TaskStatus.CONTINUE,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ request_id=str(uuid4()),
+ metadata=metadata,
+ )
+ await self.send_message(task_msg)
+ self.logger.info(f"Sent task request: {task_name}")
+
+ async def send_task_assignment(
+ self,
+ user_request: str,
+ task_name: str,
+ session_id: str,
+ response_id: str,
+ agent_name: Optional[str] = None,
+ process_name: Optional[str] = None,
+ ) -> None:
+ """
+ Send task assignment to device (server-side).
+
+ :param user_request: User request text
+ :param task_name: Task name
+ :param session_id: Session ID
+ :param response_id: Response ID
+ :param agent_name: Agent name
+ :param process_name: Process name
+ """
+ task_msg = ServerMessage(
+ type=ServerMessageType.TASK,
+ status=TaskStatus.CONTINUE,
+ user_request=user_request,
+ task_name=task_name,
+ session_id=session_id,
+ response_id=response_id,
+ agent_name=agent_name,
+ process_name=process_name,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ )
+ await self.send_message(task_msg)
+ self.logger.info(f"Sent task assignment: {task_name}")
+
+ async def send_command(self, server_message: ServerMessage) -> None:
+ """
+ Send command(s) to execute (server-side).
+ Accepts a ServerMessage object directly for backward compatibility.
+
+ :param server_message: ServerMessage with commands to execute
+ """
+ await self.send_message(server_message)
+ actions_count = len(server_message.actions) if server_message.actions else 0
+ self.logger.info(
+ f"Sent {actions_count} command(s) for session {server_message.session_id}"
+ )
+
+ async def send_commands(
+ self,
+ actions: List[Command],
+ session_id: str,
+ response_id: str,
+ status: TaskStatus = TaskStatus.CONTINUE,
+ agent_name: Optional[str] = None,
+ process_name: Optional[str] = None,
+ root_name: Optional[str] = None,
+ task_name: Optional[str] = None,
+ ) -> None:
+ """
+ Send command(s) to execute (server-side).
+ Creates ServerMessage from parameters.
+
+ :param actions: List of commands to execute
+ :param session_id: Session ID
+ :param response_id: Response ID
+ :param status: Task status
+ :param agent_name: Agent name
+ :param process_name: Process name
+ :param root_name: Root name
+ :param task_name: Task name
+ """
+ cmd_msg = ServerMessage(
+ type=ServerMessageType.COMMAND,
+ status=status,
+ actions=actions,
+ session_id=session_id,
+ response_id=response_id,
+ agent_name=agent_name,
+ process_name=process_name,
+ root_name=root_name,
+ task_name=task_name,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ )
+ await self.send_message(cmd_msg)
+ self.logger.info(f"Sent {len(actions)} command(s) for session {session_id}")
+
+ async def send_command_results(
+ self,
+ action_results: List[Result],
+ session_id: str,
+ client_id: str,
+ prev_response_id: str,
+ status: TaskStatus = TaskStatus.CONTINUE,
+ ) -> None:
+ """
+ Send command execution results (client-side).
+
+ :param action_results: Results of executed commands
+ :param session_id: Session ID
+ :param client_id: Client ID
+ :param prev_response_id: Previous response ID
+ :param status: Task status
+ """
+ result_msg = ClientMessage(
+ type=ClientMessageType.COMMAND_RESULTS,
+ action_results=action_results,
+ session_id=session_id,
+ client_id=client_id,
+ prev_response_id=prev_response_id,
+ status=status,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ request_id=str(uuid4()),
+ )
+ await self.send_message(result_msg)
+ self.logger.info(
+ f"Sent {len(action_results)} result(s) for session {session_id}"
+ )
+
+ async def send_task_result(
+ self,
+ session_id: str,
+ prev_response_id: str,
+ action_results: List[Result],
+ status: TaskStatus = TaskStatus.CONTINUE,
+ client_id: Optional[str] = None,
+ ) -> None:
+ """
+ Convenience method to send task results (client-side).
+ Alias for send_command_results with automatic client_id handling.
+
+ :param session_id: Session ID
+ :param prev_response_id: Previous response ID
+ :param action_results: Results of executed commands
+ :param status: Task status
+ :param client_id: Client ID (optional, will be extracted from context if available)
+ """
+ # If client_id not provided, try to extract from transport or use a default
+ if not client_id:
+ client_id = "unknown_client" # Fallback
+
+ await self.send_command_results(
+ action_results=action_results,
+ session_id=session_id,
+ client_id=client_id,
+ prev_response_id=prev_response_id,
+ status=status,
+ )
+
+ async def send_task_end(
+ self,
+ session_id: str,
+ status: TaskStatus,
+ result: Optional[Any] = None,
+ error: Optional[str] = None,
+ response_id: Optional[str] = None,
+ ) -> None:
+ """
+ Send task completion notification (server-side).
+
+ :param session_id: Session ID
+ :param status: Final task status (COMPLETED or FAILED)
+ :param result: Task result if successful
+ :param error: Error message if failed
+ :param response_id: Response ID
+ """
+ task_end_msg = ServerMessage(
+ type=ServerMessageType.TASK_END,
+ status=status,
+ session_id=session_id,
+ result=result,
+ error=error,
+ response_id=response_id or str(uuid4()),
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ )
+ await self.send_message(task_end_msg)
+ self.logger.info(f"Sent task end for session {session_id}, status: {status}")
+
+ async def send_task_end_ack(
+ self,
+ session_id: str,
+ client_id: str,
+ status: TaskStatus,
+ error: Optional[str] = None,
+ ) -> None:
+ """
+ Send task end acknowledgment (client-side).
+
+ :param session_id: Session ID
+ :param client_id: Client ID
+ :param status: Task status
+ :param error: Error message if failed
+ """
+ task_end_msg = ClientMessage(
+ type=ClientMessageType.TASK_END,
+ session_id=session_id,
+ client_id=client_id,
+ status=status,
+ error=error,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ )
+ await self.send_message(task_end_msg)
+ self.logger.info(f"Sent task end ack for session {session_id}")
diff --git a/aip/resilience/__init__.py b/aip/resilience/__init__.py
new file mode 100644
index 000000000..9eb198298
--- /dev/null
+++ b/aip/resilience/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Resilience Mechanisms
+
+Provides connection resilience, reconnection strategies, heartbeat management,
+and timeout handling for reliable agent communication.
+"""
+
+from .heartbeat_manager import HeartbeatManager
+from .reconnection import ReconnectionPolicy, ReconnectionStrategy
+from .timeout import TimeoutManager
+
+__all__ = [
+ "ReconnectionStrategy",
+ "ReconnectionPolicy",
+ "HeartbeatManager",
+ "TimeoutManager",
+]
diff --git a/aip/resilience/heartbeat_manager.py b/aip/resilience/heartbeat_manager.py
new file mode 100644
index 000000000..d902bd14f
--- /dev/null
+++ b/aip/resilience/heartbeat_manager.py
@@ -0,0 +1,146 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Heartbeat Manager
+
+Manages periodic heartbeat messages to monitor connection health
+and detect disconnections early.
+"""
+
+import asyncio
+import logging
+from typing import Dict, Optional
+
+from aip.protocol.heartbeat import HeartbeatProtocol
+
+
+class HeartbeatManager:
+ """
+ Manages heartbeat for multiple clients/devices.
+
+ Features:
+ - Per-client heartbeat tracking
+ - Configurable intervals
+ - Automatic heartbeat sending
+ - Connection health monitoring
+ """
+
+ def __init__(
+ self,
+ protocol: HeartbeatProtocol,
+ default_interval: float = 30.0,
+ ):
+ """
+ Initialize heartbeat manager.
+
+ :param protocol: Heartbeat protocol instance
+ :param default_interval: Default interval between heartbeats (seconds)
+ """
+ self.protocol = protocol
+ self.default_interval = default_interval
+ self.logger = logging.getLogger(f"{__name__}.HeartbeatManager")
+
+ # Track heartbeat tasks per client
+ self._heartbeat_tasks: Dict[str, asyncio.Task] = {}
+ self._intervals: Dict[str, float] = {}
+
+ async def start_heartbeat(
+ self, client_id: str, interval: Optional[float] = None
+ ) -> None:
+ """
+ Start heartbeat for a client.
+
+ :param client_id: Client ID
+ :param interval: Heartbeat interval (default: use default_interval)
+ """
+ if client_id in self._heartbeat_tasks:
+ self.logger.warning(
+ f"Heartbeat already running for {client_id}, stopping existing"
+ )
+ await self.stop_heartbeat(client_id)
+
+ interval = interval or self.default_interval
+ self._intervals[client_id] = interval
+
+ # Create heartbeat task
+ task = asyncio.create_task(self._heartbeat_loop(client_id, interval))
+ self._heartbeat_tasks[client_id] = task
+
+ self.logger.info(f"Started heartbeat for {client_id} (interval: {interval}s)")
+
+ async def stop_heartbeat(self, client_id: str) -> None:
+ """
+ Stop heartbeat for a client.
+
+ :param client_id: Client ID
+ """
+ task = self._heartbeat_tasks.pop(client_id, None)
+ if task:
+ task.cancel()
+ try:
+ await task
+ except asyncio.CancelledError:
+ pass
+ self._intervals.pop(client_id, None)
+ self.logger.info(f"Stopped heartbeat for {client_id}")
+
+ async def stop_all(self) -> None:
+ """Stop all heartbeats."""
+ client_ids = list(self._heartbeat_tasks.keys())
+ for client_id in client_ids:
+ await self.stop_heartbeat(client_id)
+ self.logger.info("Stopped all heartbeats")
+
+ def is_running(self, client_id: str) -> bool:
+ """
+ Check if heartbeat is running for a client.
+
+ :param client_id: Client ID
+ :return: True if running, False otherwise
+ """
+ task = self._heartbeat_tasks.get(client_id)
+ return task is not None and not task.done()
+
+ def get_interval(self, client_id: str) -> Optional[float]:
+ """
+ Get heartbeat interval for a client.
+
+ :param client_id: Client ID
+ :return: Interval in seconds, or None if not running
+ """
+ return self._intervals.get(client_id)
+
+ async def _heartbeat_loop(self, client_id: str, interval: float) -> None:
+ """
+ Internal heartbeat loop for a client.
+
+ :param client_id: Client ID
+ :param interval: Heartbeat interval (seconds)
+ """
+ try:
+ while True:
+ await asyncio.sleep(interval)
+
+ # Check if protocol is still connected
+ if self.protocol.is_connected():
+ try:
+ await self.protocol.send_heartbeat(client_id)
+ self.logger.debug(f"Sent heartbeat for {client_id}")
+ except Exception as e:
+ self.logger.error(
+ f"Error sending heartbeat for {client_id}: {e}"
+ )
+ # Let the loop continue, connection manager will handle disconnection
+ else:
+ self.logger.warning(
+ f"Protocol not connected for {client_id}, skipping heartbeat"
+ )
+
+ except asyncio.CancelledError:
+ self.logger.debug(f"Heartbeat loop cancelled for {client_id}")
+ except Exception as e:
+ self.logger.error(
+ f"Unexpected error in heartbeat loop for {client_id}: {e}",
+ exc_info=True,
+ )
diff --git a/aip/resilience/reconnection.py b/aip/resilience/reconnection.py
new file mode 100644
index 000000000..8cd4e9fa1
--- /dev/null
+++ b/aip/resilience/reconnection.py
@@ -0,0 +1,218 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Reconnection Strategy
+
+Implements automatic reconnection with exponential backoff for handling
+transient network failures and connection interruptions.
+"""
+
+import asyncio
+import logging
+from enum import Enum
+from typing import TYPE_CHECKING, Awaitable, Callable, Optional
+
+if TYPE_CHECKING:
+ from aip.endpoints.base import AIPEndpoint
+
+
+class ReconnectionPolicy(str, Enum):
+ """Reconnection policies."""
+
+ EXPONENTIAL_BACKOFF = "exponential_backoff"
+ LINEAR_BACKOFF = "linear_backoff"
+ IMMEDIATE = "immediate"
+ NONE = "none"
+
+
+class ReconnectionStrategy:
+ """
+ Manages automatic reconnection for AIP endpoints.
+
+ Features:
+ - Exponential backoff
+ - Configurable retry limits
+ - Connection state callbacks
+ - Task cancellation on disconnect
+ """
+
+ def __init__(
+ self,
+ max_retries: int = 5,
+ initial_backoff: float = 1.0,
+ max_backoff: float = 60.0,
+ backoff_multiplier: float = 2.0,
+ policy: ReconnectionPolicy = ReconnectionPolicy.EXPONENTIAL_BACKOFF,
+ ):
+ """
+ Initialize reconnection strategy.
+
+ :param max_retries: Maximum number of reconnection attempts
+ :param initial_backoff: Initial backoff time (seconds)
+ :param max_backoff: Maximum backoff time (seconds)
+ :param backoff_multiplier: Multiplier for exponential backoff
+ :param policy: Reconnection policy
+ """
+ self.max_retries = max_retries
+ self.initial_backoff = initial_backoff
+ self.max_backoff = max_backoff
+ self.backoff_multiplier = backoff_multiplier
+ self.policy = policy
+ self.logger = logging.getLogger(f"{__name__}.ReconnectionStrategy")
+
+ self._retry_count = 0
+ self._reconnection_task: Optional[asyncio.Task] = None
+
+ async def handle_disconnection(
+ self,
+ endpoint: "AIPEndpoint",
+ device_id: str,
+ on_reconnect: Optional[Callable[[], Awaitable[None]]] = None,
+ ) -> None:
+ """
+ Handle device disconnection with automatic reconnection.
+
+ Workflow:
+ 1. Cancel all pending tasks for the device
+ 2. Notify upper layers of disconnection
+ 3. Attempt reconnection with backoff
+ 4. Call on_reconnect callback if successful
+
+ :param endpoint: AIP endpoint managing the connection
+ :param device_id: Device that disconnected
+ :param on_reconnect: Optional callback after successful reconnection
+ """
+ self.logger.warning(f"Device {device_id} disconnected, starting recovery")
+
+ # Step 1: Cancel pending tasks
+ await self._cancel_pending_tasks(endpoint, device_id)
+
+ # Step 2: Notify upper layers
+ await self._notify_disconnection(endpoint, device_id)
+
+ # Step 3: Attempt reconnection
+ if self.policy != ReconnectionPolicy.NONE:
+ reconnected = await self.attempt_reconnection(endpoint, device_id)
+
+ # Step 4: Call reconnection callback
+ if reconnected and on_reconnect:
+ try:
+ await on_reconnect()
+ self.logger.info(f"Reconnection callback executed for {device_id}")
+ except Exception as e:
+ self.logger.error(
+ f"Error in reconnection callback for {device_id}: {e}"
+ )
+
+ async def attempt_reconnection(
+ self, endpoint: "AIPEndpoint", device_id: str
+ ) -> bool:
+ """
+ Attempt to reconnect to a device.
+
+ :param endpoint: AIP endpoint managing the connection
+ :param device_id: Device to reconnect to
+ :return: True if reconnection successful, False otherwise
+ """
+ self._retry_count = 0
+
+ while self._retry_count < self.max_retries:
+ # Calculate backoff time
+ backoff_time = self._calculate_backoff()
+
+ self.logger.info(
+ f"Reconnection attempt {self._retry_count + 1}/{self.max_retries} "
+ f"for {device_id} in {backoff_time:.1f}s"
+ )
+
+ # Wait before attempting reconnection
+ await asyncio.sleep(backoff_time)
+
+ # Try to reconnect
+ try:
+ success = await endpoint.reconnect_device(device_id)
+ if success:
+ self.logger.info(
+ f"Successfully reconnected to {device_id} "
+ f"after {self._retry_count + 1} attempt(s)"
+ )
+ self._retry_count = 0
+ return True
+ else:
+ self.logger.warning(
+ f"Reconnection attempt {self._retry_count + 1} failed for {device_id}"
+ )
+ except Exception as e:
+ self.logger.error(
+ f"Error during reconnection attempt {self._retry_count + 1} for {device_id}: {e}"
+ )
+
+ self._retry_count += 1
+
+ self.logger.error(
+ f"Max reconnection attempts ({self.max_retries}) reached for {device_id}"
+ )
+ return False
+
+ async def _cancel_pending_tasks(
+ self, endpoint: "AIPEndpoint", device_id: str
+ ) -> None:
+ """
+ Cancel all pending tasks for a disconnected device.
+
+ :param endpoint: AIP endpoint
+ :param device_id: Disconnected device ID
+ """
+ try:
+ if hasattr(endpoint, "cancel_device_tasks"):
+ await endpoint.cancel_device_tasks(
+ device_id, reason="device_disconnected"
+ )
+ self.logger.info(f"Cancelled pending tasks for {device_id}")
+ except Exception as e:
+ self.logger.error(
+ f"Error cancelling tasks for {device_id}: {e}", exc_info=True
+ )
+
+ async def _notify_disconnection(
+ self, endpoint: "AIPEndpoint", device_id: str
+ ) -> None:
+ """
+ Notify upper layers of device disconnection.
+
+ :param endpoint: AIP endpoint
+ :param device_id: Disconnected device ID
+ """
+ try:
+ if hasattr(endpoint, "on_device_disconnected"):
+ await endpoint.on_device_disconnected(device_id)
+ self.logger.info(f"Notified disconnection of {device_id}")
+ except Exception as e:
+ self.logger.error(
+ f"Error notifying disconnection for {device_id}: {e}", exc_info=True
+ )
+
+ def _calculate_backoff(self) -> float:
+ """
+ Calculate backoff time based on policy.
+
+ :return: Backoff time in seconds
+ """
+ if self.policy == ReconnectionPolicy.IMMEDIATE:
+ return 0.0
+ elif self.policy == ReconnectionPolicy.LINEAR_BACKOFF:
+ backoff = self.initial_backoff * (self._retry_count + 1)
+ elif self.policy == ReconnectionPolicy.EXPONENTIAL_BACKOFF:
+ backoff = self.initial_backoff * (
+ self.backoff_multiplier**self._retry_count
+ )
+ else:
+ return 0.0
+
+ # Cap at max_backoff
+ return min(backoff, self.max_backoff)
+
+ def reset(self) -> None:
+ """Reset retry counter."""
+ self._retry_count = 0
diff --git a/aip/resilience/timeout.py b/aip/resilience/timeout.py
new file mode 100644
index 000000000..147eb6e2c
--- /dev/null
+++ b/aip/resilience/timeout.py
@@ -0,0 +1,85 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Timeout Manager
+
+Handles timeout enforcement for asynchronous operations in AIP.
+"""
+
+import asyncio
+import logging
+from typing import Any, Awaitable, Optional, TypeVar
+
+T = TypeVar("T")
+
+
+class TimeoutManager:
+ """
+ Manages timeouts for asynchronous operations.
+
+ Features:
+ - Configurable default timeout
+ - Per-operation timeout override
+ - Timeout exception wrapping
+ - Detailed logging
+ """
+
+ def __init__(self, default_timeout: float = 120.0):
+ """
+ Initialize timeout manager.
+
+ :param default_timeout: Default timeout for operations (seconds)
+ """
+ self.default_timeout = default_timeout
+ self.logger = logging.getLogger(f"{__name__}.TimeoutManager")
+
+ async def with_timeout(
+ self,
+ coro: Awaitable[T],
+ timeout: Optional[float] = None,
+ operation_name: str = "operation",
+ ) -> T:
+ """
+ Execute a coroutine with timeout.
+
+ :param coro: Coroutine to execute
+ :param timeout: Timeout in seconds (default: use default_timeout)
+ :param operation_name: Name of operation for logging
+ :return: Result of coroutine
+ :raises: asyncio.TimeoutError if operation times out
+ """
+ timeout = timeout or self.default_timeout
+
+ try:
+ self.logger.debug(f"Starting {operation_name} with timeout {timeout}s")
+ result = await asyncio.wait_for(coro, timeout=timeout)
+ self.logger.debug(f"Completed {operation_name}")
+ return result
+
+ except asyncio.TimeoutError:
+ self.logger.error(f"Timeout ({timeout}s) exceeded for {operation_name}")
+ raise asyncio.TimeoutError(f"{operation_name} timed out after {timeout}s")
+ except Exception as e:
+ self.logger.error(f"Error in {operation_name}: {e}", exc_info=True)
+ raise
+
+ async def with_timeout_or_none(
+ self,
+ coro: Awaitable[T],
+ timeout: Optional[float] = None,
+ operation_name: str = "operation",
+ ) -> Optional[T]:
+ """
+ Execute a coroutine with timeout, returning None on timeout.
+
+ :param coro: Coroutine to execute
+ :param timeout: Timeout in seconds (default: use default_timeout)
+ :param operation_name: Name of operation for logging
+ :return: Result of coroutine or None if timeout
+ """
+ try:
+ return await self.with_timeout(coro, timeout, operation_name)
+ except asyncio.TimeoutError:
+ self.logger.warning(f"{operation_name} timed out, returning None")
+ return None
diff --git a/aip/transport/__init__.py b/aip/transport/__init__.py
new file mode 100644
index 000000000..d72f2aa21
--- /dev/null
+++ b/aip/transport/__init__.py
@@ -0,0 +1,28 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+AIP Transport Layer
+
+Provides transport abstractions for the Agent Interaction Protocol.
+Supports WebSocket and is extensible to other transports (HTTP/3, gRPC, etc.).
+"""
+
+from .adapters import (
+ FastAPIWebSocketAdapter,
+ WebSocketAdapter,
+ WebSocketsLibAdapter,
+ create_adapter,
+)
+from .base import Transport, TransportState
+from .websocket import WebSocketTransport
+
+__all__ = [
+ "Transport",
+ "TransportState",
+ "WebSocketTransport",
+ "WebSocketAdapter",
+ "FastAPIWebSocketAdapter",
+ "WebSocketsLibAdapter",
+ "create_adapter",
+]
diff --git a/aip/transport/adapters.py b/aip/transport/adapters.py
new file mode 100644
index 000000000..350aada10
--- /dev/null
+++ b/aip/transport/adapters.py
@@ -0,0 +1,256 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+WebSocket Adapter Interface
+
+Provides a unified interface for different WebSocket implementations.
+Uses the Adapter pattern to abstract away differences between:
+- FastAPI WebSocket (server-side)
+- websockets library (client-side)
+
+Supports both text and binary frame transmission for efficient file transfer.
+"""
+
+from abc import ABC, abstractmethod
+from typing import Union
+
+from websockets import WebSocketClientProtocol
+
+
+class WebSocketAdapter(ABC):
+ """
+ Abstract adapter for WebSocket operations.
+
+ Provides a consistent interface regardless of the underlying WebSocket implementation.
+ Supports both text frames (for JSON messages) and binary frames (for file transfer).
+ """
+
+ @abstractmethod
+ async def send(self, data: str) -> None:
+ """
+ Send text data through WebSocket.
+
+ :param data: Text data to send
+ :raises: Exception if send fails
+ """
+ pass
+
+ @abstractmethod
+ async def receive(self) -> str:
+ """
+ Receive text data from WebSocket.
+
+ :return: Received text data
+ :raises: Exception if receive fails
+ """
+ pass
+
+ @abstractmethod
+ async def send_bytes(self, data: bytes) -> None:
+ """
+ Send binary data through WebSocket.
+
+ Sends data as a binary WebSocket frame for efficient transmission
+ of images, files, and other binary content.
+
+ :param data: Binary data to send
+ :raises: Exception if send fails
+ """
+ pass
+
+ @abstractmethod
+ async def receive_bytes(self) -> bytes:
+ """
+ Receive binary data from WebSocket.
+
+ Expects a binary WebSocket frame. Raises an error if a text frame is received.
+
+ :return: Received binary data
+ :raises: ValueError if a text frame is received instead of binary
+ :raises: Exception if receive fails
+ """
+ pass
+
+ @abstractmethod
+ async def receive_auto(self) -> Union[str, bytes]:
+ """
+ Receive data and auto-detect frame type (text or binary).
+
+ This method automatically detects whether the received WebSocket frame
+ is text or binary and returns the appropriate type.
+
+ :return: Received data (str for text frames, bytes for binary frames)
+ :raises: Exception if receive fails
+ """
+ pass
+
+ @abstractmethod
+ async def close(self) -> None:
+ """
+ Close the WebSocket connection.
+ """
+ pass
+
+ @abstractmethod
+ def is_open(self) -> bool:
+ """
+ Check if the WebSocket connection is open.
+
+ :return: True if connection is open, False otherwise
+ """
+ pass
+
+
+class FastAPIWebSocketAdapter(WebSocketAdapter):
+ """
+ Adapter for FastAPI/Starlette WebSocket (server-side).
+
+ Used when the server accepts WebSocket connections from clients.
+ Supports both text and binary frame transmission.
+ """
+
+ def __init__(self, websocket):
+ """
+ Initialize FastAPI WebSocket adapter.
+
+ :param websocket: FastAPI WebSocket instance
+ """
+ from fastapi import WebSocket
+
+ self._ws: WebSocket = websocket
+
+ async def send(self, data: str) -> None:
+ """Send text data via FastAPI WebSocket."""
+ await self._ws.send_text(data)
+
+ async def receive(self) -> str:
+ """Receive text data via FastAPI WebSocket."""
+ return await self._ws.receive_text()
+
+ async def send_bytes(self, data: bytes) -> None:
+ """
+ Send binary data via FastAPI WebSocket.
+
+ FastAPI provides native send_bytes() method for binary frames.
+ """
+ await self._ws.send_bytes(data)
+
+ async def receive_bytes(self) -> bytes:
+ """
+ Receive binary data via FastAPI WebSocket.
+
+ FastAPI provides native receive_bytes() method.
+ Raises an error if a text frame is received.
+ """
+ return await self._ws.receive_bytes()
+
+ async def receive_auto(self) -> Union[str, bytes]:
+ """
+ Auto-detect and receive text or binary data.
+
+ Uses FastAPI's receive() to get the raw message and extract
+ the appropriate data type.
+ """
+ message = await self._ws.receive()
+ if "text" in message:
+ return message["text"]
+ elif "bytes" in message:
+ return message["bytes"]
+ else:
+ raise ValueError(f"Unknown WebSocket message type: {message}")
+
+ async def close(self) -> None:
+ """Close FastAPI WebSocket connection."""
+ await self._ws.close()
+
+ def is_open(self) -> bool:
+ """Check if FastAPI WebSocket is still connected."""
+ from starlette.websockets import WebSocketState
+
+ return self._ws.client_state == WebSocketState.CONNECTED
+
+
+class WebSocketsLibAdapter(WebSocketAdapter):
+ """
+ Adapter for websockets library (client-side).
+
+ Used when the client connects to a WebSocket server.
+ Supports both text and binary frame transmission.
+ """
+
+ def __init__(self, websocket: WebSocketClientProtocol):
+ """
+ Initialize websockets library adapter.
+
+ :param websocket: websockets library WebSocket instance
+ """
+ self._ws: WebSocketClientProtocol = websocket
+
+ async def send(self, data: str) -> None:
+ """Send text data via websockets library."""
+ await self._ws.send(data)
+
+ async def receive(self) -> str:
+ """Receive data via websockets library (handles both text and bytes)."""
+ received = await self._ws.recv()
+ # websockets library can return either str or bytes
+ if isinstance(received, bytes):
+ return received.decode("utf-8")
+ return received
+
+ async def send_bytes(self, data: bytes) -> None:
+ """
+ Send binary data via websockets library.
+
+ The websockets library automatically detects bytes type and sends
+ as a binary WebSocket frame.
+ """
+ await self._ws.send(data)
+
+ async def receive_bytes(self) -> bytes:
+ """
+ Receive binary data via websockets library.
+
+ Raises ValueError if a text frame is received instead of binary.
+ """
+ received = await self._ws.recv()
+ if isinstance(received, str):
+ raise ValueError(
+ "Expected binary WebSocket frame, but received text frame. "
+ f"Received data: {received[:100]}..."
+ )
+ return received
+
+ async def receive_auto(self) -> Union[str, bytes]:
+ """
+ Auto-detect and receive text or binary data.
+
+ The websockets library's recv() automatically returns the correct type
+ (str for text frames, bytes for binary frames).
+ """
+ return await self._ws.recv()
+
+ async def close(self) -> None:
+ """Close websockets library connection."""
+ await self._ws.close()
+
+ def is_open(self) -> bool:
+ """Check if websockets library connection is still open."""
+ return not self._ws.closed
+
+
+def create_adapter(websocket) -> WebSocketAdapter:
+ """
+ Factory function to create the appropriate WebSocket adapter.
+
+ Auto-detects the WebSocket type and returns the correct adapter.
+
+ :param websocket: Either FastAPI WebSocket or websockets library WebSocket
+ :return: Appropriate adapter instance
+ """
+ # Check if it's a FastAPI WebSocket by looking for server-side attributes
+ if hasattr(websocket, "client_state") or hasattr(websocket, "application_state"):
+ return FastAPIWebSocketAdapter(websocket)
+ else:
+ return WebSocketsLibAdapter(websocket)
diff --git a/aip/transport/base.py b/aip/transport/base.py
new file mode 100644
index 000000000..72273611e
--- /dev/null
+++ b/aip/transport/base.py
@@ -0,0 +1,117 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Base Transport Interface
+
+Defines the abstract interface for all AIP transports.
+This allows AIP to work with different underlying communication mechanisms
+while maintaining a consistent protocol layer.
+"""
+
+from abc import ABC, abstractmethod
+from enum import Enum
+
+
+class TransportState(str, Enum):
+ """
+ State of a transport connection.
+
+ DISCONNECTED: Not connected
+ CONNECTING: Connection in progress
+ CONNECTED: Active connection
+ DISCONNECTING: Graceful shutdown in progress
+ ERROR: Transport error occurred
+ """
+
+ DISCONNECTED = "disconnected"
+ CONNECTING = "connecting"
+ CONNECTED = "connected"
+ DISCONNECTING = "disconnecting"
+ ERROR = "error"
+
+
+class Transport(ABC):
+ """
+ Abstract base class for AIP transports.
+
+ A transport handles the low-level sending and receiving of messages
+ between AIP endpoints. It abstracts away the specifics of the
+ underlying communication channel (WebSocket, HTTP, gRPC, etc.).
+
+ Implementations must be:
+ - Asynchronous (use async/await)
+ - Thread-safe for state queries
+ - Resilient to transient errors
+ """
+
+ def __init__(self):
+ """Initialize transport."""
+ self._state: TransportState = TransportState.DISCONNECTED
+
+ @property
+ def state(self) -> TransportState:
+ """Get current transport state."""
+ return self._state
+
+ @property
+ def is_connected(self) -> bool:
+ """Check if transport is connected."""
+ return self._state == TransportState.CONNECTED
+
+ @abstractmethod
+ async def connect(self, url: str, **kwargs) -> None:
+ """
+ Establish connection to the remote endpoint.
+
+ :param url: Target URL/address
+ :param kwargs: Transport-specific connection parameters
+ :raises: ConnectionError if connection fails
+ """
+ pass
+
+ @abstractmethod
+ async def send(self, data: bytes) -> None:
+ """
+ Send data through the transport.
+
+ :param data: Bytes to send
+ :raises: ConnectionError if not connected
+ :raises: IOError if send fails
+ """
+ pass
+
+ @abstractmethod
+ async def receive(self) -> bytes:
+ """
+ Receive data from the transport.
+
+ Blocks until data is available.
+
+ :return: Received bytes
+ :raises: ConnectionError if connection closed
+ :raises: IOError if receive fails
+ """
+ pass
+
+ @abstractmethod
+ async def close(self) -> None:
+ """
+ Close the transport connection.
+
+ Should be idempotent (safe to call multiple times).
+ """
+ pass
+
+ @abstractmethod
+ async def wait_closed(self) -> None:
+ """
+ Wait for transport to fully close.
+
+ Useful for graceful shutdown.
+ """
+ pass
+
+ def __repr__(self) -> str:
+ """String representation of transport."""
+ return f"{self.__class__.__name__}(state={self.state})"
diff --git a/aip/transport/websocket.py b/aip/transport/websocket.py
new file mode 100644
index 000000000..42f048667
--- /dev/null
+++ b/aip/transport/websocket.py
@@ -0,0 +1,420 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+WebSocket Transport Implementation
+
+Implements the Transport interface using WebSockets.
+Provides reliable, bidirectional, full-duplex communication over a single TCP connection.
+Supports both text frames (for JSON messages) and binary frames (for efficient file transfer).
+"""
+
+import asyncio
+import logging
+from typing import Optional, Union
+
+import websockets
+from websockets import WebSocketClientProtocol
+from websockets.exceptions import ConnectionClosed, WebSocketException
+
+from .adapters import WebSocketAdapter, create_adapter
+from .base import Transport, TransportState
+
+
+class WebSocketTransport(Transport):
+ """
+ WebSocket-based transport for AIP.
+
+ Features:
+ - Automatic ping/pong keepalive
+ - Configurable timeouts
+ - Large message support (up to 100MB by default)
+ - Graceful connection shutdown
+ - Text and binary frame support for efficient data transfer
+
+ Usage:
+ # Text messages (JSON)
+ transport = WebSocketTransport(ping_interval=30, ping_timeout=180)
+ await transport.connect("ws://localhost:8000/ws")
+ await transport.send(b"Hello")
+ data = await transport.receive()
+
+ # Binary data (files, images)
+ await transport.send_binary(image_bytes)
+ binary_data = await transport.receive_binary()
+
+ # Auto-detect frame type
+ data = await transport.receive_auto() # Returns str or bytes
+
+ await transport.close()
+ """
+
+ def __init__(
+ self,
+ websocket=None, # Accept existing WebSocket (FastAPI server-side)
+ ping_interval: float = 30.0,
+ ping_timeout: float = 180.0,
+ close_timeout: float = 10.0,
+ max_size: int = 100 * 1024 * 1024, # 100MB
+ ):
+ """
+ Initialize WebSocket transport.
+
+ :param websocket: Optional existing WebSocket connection (for server-side use)
+ :param ping_interval: Interval between ping messages (seconds)
+ :param ping_timeout: Timeout for ping response (seconds)
+ :param close_timeout: Timeout for graceful close (seconds)
+ :param max_size: Maximum message size in bytes
+ """
+ super().__init__()
+ self.ping_interval = ping_interval
+ self.ping_timeout = ping_timeout
+ self.close_timeout = close_timeout
+ self.max_size = max_size
+ self._ws: Optional[WebSocketClientProtocol] = None
+ self._adapter: Optional[WebSocketAdapter] = None
+ self.logger = logging.getLogger(f"{__name__}.WebSocketTransport")
+
+ # If websocket provided (server-side), create adapter and mark as connected
+ if websocket is not None:
+ self._ws = websocket
+ self._adapter = create_adapter(websocket)
+ self._state = TransportState.CONNECTED
+ adapter_type = type(self._adapter).__name__
+ self.logger.info(
+ f"WebSocket transport initialized with existing connection ({adapter_type})"
+ )
+
+ async def connect(self, url: str, **kwargs) -> None:
+ """
+ Connect to WebSocket server.
+
+ :param url: WebSocket URL (ws:// or wss://)
+ :param kwargs: Additional parameters passed to websockets.connect()
+ :raises: ConnectionError if connection fails
+ """
+ if self._state == TransportState.CONNECTED:
+ self.logger.warning("Already connected, disconnecting first")
+ await self.close()
+
+ try:
+ self._state = TransportState.CONNECTING
+ self.logger.info(f"Connecting to {url}")
+
+ # Merge user kwargs with defaults
+ connect_params = {
+ "ping_interval": self.ping_interval,
+ "ping_timeout": self.ping_timeout,
+ "close_timeout": self.close_timeout,
+ "max_size": self.max_size,
+ }
+ connect_params.update(kwargs)
+
+ self._ws = await websockets.connect(url, **connect_params)
+ self._adapter = create_adapter(self._ws)
+ self._state = TransportState.CONNECTED
+ self.logger.info(f"Connected to {url}")
+
+ except WebSocketException as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"WebSocket error during connection: {e}")
+ raise ConnectionError(f"Failed to connect to {url}: {e}") from e
+ except OSError as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Network error during connection: {e}")
+ raise ConnectionError(f"Network error connecting to {url}: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Unexpected error during connection: {e}")
+ raise ConnectionError(f"Unexpected error connecting to {url}: {e}") from e
+
+ async def send(self, data: bytes) -> None:
+ """
+ Send data through WebSocket.
+
+ :param data: Bytes to send
+ :raises: ConnectionError if not connected
+ :raises: IOError if send fails
+ """
+ if not self.is_connected or self._adapter is None:
+ raise ConnectionError("Transport not connected")
+
+ # Check if WebSocket is still open using adapter
+ if not self._adapter.is_open():
+ self._state = TransportState.DISCONNECTED
+ raise ConnectionError("WebSocket connection is closed")
+
+ try:
+ # Convert bytes to text for consistent transport (JSON messages are text-based)
+ text_data = data.decode("utf-8") if isinstance(data, bytes) else data
+
+ adapter_type = type(self._adapter).__name__
+ self.logger.debug(f"Sending {len(text_data)} chars via {adapter_type}")
+
+ # Use adapter to send (abstracts away FastAPI vs websockets library)
+ await self._adapter.send(text_data)
+
+ self.logger.debug(f"✅ Sent {len(text_data)} chars successfully")
+ except ConnectionClosed as e:
+ self._state = TransportState.DISCONNECTED
+ self.logger.debug(f"Connection closed during send: {e}")
+ raise ConnectionError(f"Connection closed: {e}") from e
+ except (ConnectionError, OSError) as e:
+ self._state = TransportState.ERROR
+ # Check if this is a normal disconnection scenario
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot send (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error sending data: {e}")
+ raise IOError(f"Failed to send data: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Error sending data: {e}")
+ raise IOError(f"Failed to send data: {e}") from e
+
+ async def receive(self) -> bytes:
+ """
+ Receive data from WebSocket.
+
+ Blocks until data is available.
+
+ :return: Received bytes
+ :raises: ConnectionError if connection closed
+ :raises: IOError if receive fails
+ """
+ if not self.is_connected or self._adapter is None:
+ raise ConnectionError("Transport not connected")
+
+ try:
+ adapter_type = type(self._adapter).__name__
+ self.logger.debug(f"🔍 Attempting to receive data via {adapter_type}...")
+
+ # Use adapter to receive (abstracts away FastAPI vs websockets library)
+ text_data = await self._adapter.receive()
+ data = text_data.encode("utf-8")
+
+ self.logger.debug(f"✅ Received {len(data)} bytes successfully")
+ return data
+ except ConnectionClosed as e:
+ self._state = TransportState.DISCONNECTED
+ self.logger.debug(f"Connection closed during receive: {e}")
+ raise ConnectionError(f"Connection closed: {e}") from e
+ except (ConnectionError, OSError) as e:
+ self._state = TransportState.ERROR
+ # Check if this is a normal disconnection scenario
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot receive (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error receiving data: {e}")
+ raise IOError(f"Failed to receive data: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Error receiving data: {e}")
+ raise IOError(f"Failed to receive data: {e}") from e
+
+ async def close(self) -> None:
+ """
+ Close WebSocket connection gracefully.
+
+ Idempotent - safe to call multiple times.
+ """
+ if self._state in (TransportState.DISCONNECTED, TransportState.DISCONNECTING):
+ return
+
+ try:
+ self._state = TransportState.DISCONNECTING
+ if self._adapter is not None:
+ await self._adapter.close()
+ self.logger.info("WebSocket closed")
+ except Exception as e:
+ self.logger.warning(f"Error during close: {e}")
+ finally:
+ self._state = TransportState.DISCONNECTED
+ self._ws = None
+ self._adapter = None
+
+ async def wait_closed(self) -> None:
+ """
+ Wait for WebSocket to fully close.
+
+ Useful for graceful shutdown.
+ """
+ if self._ws is not None:
+ await self._ws.wait_closed()
+ self._state = TransportState.DISCONNECTED
+
+ async def send_binary(self, data: bytes) -> None:
+ """
+ Send binary data through WebSocket as a binary frame.
+
+ This method sends raw binary data (images, files, etc.) without
+ text encoding overhead, providing maximum efficiency for binary transfers.
+
+ :param data: Binary bytes to send
+ :raises: ConnectionError if not connected
+ :raises: IOError if send fails
+
+ Example:
+ # Send an image file
+ with open("screenshot.png", "rb") as f:
+ image_data = f.read()
+ await transport.send_binary(image_data)
+ """
+ if not self.is_connected or self._adapter is None:
+ raise ConnectionError("Transport not connected")
+
+ if not self._adapter.is_open():
+ self._state = TransportState.DISCONNECTED
+ raise ConnectionError("WebSocket connection is closed")
+
+ try:
+ adapter_type = type(self._adapter).__name__
+ self.logger.debug(
+ f"Sending {len(data)} bytes (binary frame) via {adapter_type}"
+ )
+
+ await self._adapter.send_bytes(data)
+
+ self.logger.debug(f"✅ Sent {len(data)} bytes successfully")
+ except ConnectionClosed as e:
+ self._state = TransportState.DISCONNECTED
+ self.logger.debug(f"Connection closed during binary send: {e}")
+ raise ConnectionError(f"Connection closed: {e}") from e
+ except (ConnectionError, OSError) as e:
+ self._state = TransportState.ERROR
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot send binary (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error sending binary data: {e}")
+ raise IOError(f"Failed to send binary data: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Error sending binary data: {e}")
+ raise IOError(f"Failed to send binary data: {e}") from e
+
+ async def receive_binary(self) -> bytes:
+ """
+ Receive binary data from WebSocket as a binary frame.
+
+ This method expects a binary WebSocket frame and returns raw bytes.
+ Raises an error if a text frame is received.
+
+ :return: Received binary bytes
+ :raises: ConnectionError if connection closed
+ :raises: ValueError if a text frame is received instead of binary
+ :raises: IOError if receive fails
+
+ Example:
+ # Receive a binary file
+ file_data = await transport.receive_binary()
+ with open("received_file.bin", "wb") as f:
+ f.write(file_data)
+ """
+ if not self.is_connected or self._adapter is None:
+ raise ConnectionError("Transport not connected")
+
+ try:
+ adapter_type = type(self._adapter).__name__
+ self.logger.debug(
+ f"🔍 Attempting to receive binary data via {adapter_type}..."
+ )
+
+ data = await self._adapter.receive_bytes()
+
+ self.logger.debug(f"✅ Received {len(data)} bytes successfully")
+ return data
+ except ConnectionClosed as e:
+ self._state = TransportState.DISCONNECTED
+ self.logger.debug(f"Connection closed during binary receive: {e}")
+ raise ConnectionError(f"Connection closed: {e}") from e
+ except ValueError as e:
+ # Raised when expecting binary but got text frame
+ self.logger.error(f"Frame type mismatch: {e}")
+ raise
+ except (ConnectionError, OSError) as e:
+ self._state = TransportState.ERROR
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot receive binary (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error receiving binary data: {e}")
+ raise IOError(f"Failed to receive binary data: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Error receiving binary data: {e}")
+ raise IOError(f"Failed to receive binary data: {e}") from e
+
+ async def receive_auto(self) -> Union[bytes, str]:
+ """
+ Receive data and automatically detect frame type (text or binary).
+
+ This method receives a WebSocket frame and returns the appropriate type:
+ - str for text frames (JSON messages)
+ - bytes for binary frames (files, images)
+
+ :return: Received data (str for text frames, bytes for binary frames)
+ :raises: ConnectionError if connection closed
+ :raises: IOError if receive fails
+
+ Example:
+ data = await transport.receive_auto()
+ if isinstance(data, bytes):
+ # Handle binary data
+ print(f"Received {len(data)} bytes")
+ else:
+ # Handle text data
+ message = json.loads(data)
+ """
+ if not self.is_connected or self._adapter is None:
+ raise ConnectionError("Transport not connected")
+
+ try:
+ adapter_type = type(self._adapter).__name__
+ self.logger.debug(
+ f"🔍 Attempting to receive data (auto-detect) via {adapter_type}..."
+ )
+
+ data = await self._adapter.receive_auto()
+
+ if isinstance(data, bytes):
+ self.logger.debug(
+ f"✅ Received {len(data)} bytes (binary frame) successfully"
+ )
+ else:
+ self.logger.debug(
+ f"✅ Received {len(data)} chars (text frame) successfully"
+ )
+
+ return data
+ except ConnectionClosed as e:
+ self._state = TransportState.DISCONNECTED
+ self.logger.debug(f"Connection closed during receive: {e}")
+ raise ConnectionError(f"Connection closed: {e}") from e
+ except (ConnectionError, OSError) as e:
+ self._state = TransportState.ERROR
+ error_msg = str(e).lower()
+ if "closed" in error_msg or "not connected" in error_msg:
+ self.logger.debug(f"Cannot receive (connection closed): {e}")
+ else:
+ self.logger.warning(f"Connection error receiving data: {e}")
+ raise IOError(f"Failed to receive data: {e}") from e
+ except Exception as e:
+ self._state = TransportState.ERROR
+ self.logger.error(f"Error receiving data: {e}")
+ raise IOError(f"Failed to receive data: {e}") from e
+
+ @property
+ def websocket(self) -> Optional[WebSocketClientProtocol]:
+ """
+ Get the underlying WebSocket connection.
+
+ :return: WebSocket connection or None if not connected
+ """
+ return self._ws
+
+ def __repr__(self) -> str:
+ """String representation."""
+ return f"WebSocketTransport(state={self.state}, ping_interval={self.ping_interval})"
diff --git a/assets/UFO_paper.pdf b/assets/UFO_paper.pdf
deleted file mode 100644
index f2fa7a991..000000000
Binary files a/assets/UFO_paper.pdf and /dev/null differ
diff --git a/assets/demo_preview.png b/assets/demo_preview.png
new file mode 100644
index 000000000..82e49bb55
Binary files /dev/null and b/assets/demo_preview.png differ
diff --git a/assets/logo3.png b/assets/logo3.png
new file mode 100644
index 000000000..4d6e3a020
Binary files /dev/null and b/assets/logo3.png differ
diff --git a/assets/orchestrator.png b/assets/orchestrator.png
new file mode 100644
index 000000000..14a34bbed
Binary files /dev/null and b/assets/orchestrator.png differ
diff --git a/assets/poster.png b/assets/poster.png
new file mode 100644
index 000000000..92bb74b74
Binary files /dev/null and b/assets/poster.png differ
diff --git a/assets/poster_with_play.png b/assets/poster_with_play.png
new file mode 100644
index 000000000..775ec4057
Binary files /dev/null and b/assets/poster_with_play.png differ
diff --git a/assets/task_constellation.png b/assets/task_constellation.png
new file mode 100644
index 000000000..127d8ceda
Binary files /dev/null and b/assets/task_constellation.png differ
diff --git a/assets/webui.png b/assets/webui.png
new file mode 100644
index 000000000..d2a1d606f
Binary files /dev/null and b/assets/webui.png differ
diff --git a/config/__init__.py b/config/__init__.py
new file mode 100644
index 000000000..ead0c2f10
--- /dev/null
+++ b/config/__init__.py
@@ -0,0 +1,35 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+UFO² Configuration System
+
+Modern, modular configuration system with type safety and backward compatibility.
+"""
+
+from config.config_loader import (
+ ConfigLoader,
+ get_ufo_config,
+ get_galaxy_config,
+ clear_config_cache,
+)
+
+from config.config_schemas import (
+ UFOConfig,
+ GalaxyConfig,
+ AgentConfig,
+ SystemConfig,
+ RAGConfig,
+)
+
+__all__ = [
+ "ConfigLoader",
+ "get_ufo_config",
+ "get_galaxy_config",
+ "clear_config_cache",
+ "UFOConfig",
+ "GalaxyConfig",
+ "AgentConfig",
+ "SystemConfig",
+ "RAGConfig",
+]
diff --git a/config/config_loader.py b/config/config_loader.py
new file mode 100644
index 000000000..6feb04382
--- /dev/null
+++ b/config/config_loader.py
@@ -0,0 +1,634 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Modern Configuration Loader for UFO³ and Galaxy
+
+Professional Software Engineering Design:
+- ✅ Separation of Concerns: Modular YAML files for different config domains
+- ✅ Backward Compatibility: Automatic fallback to legacy paths (ufo/config/)
+- ✅ Migration Support: Built-in migration warnings and tools
+- ✅ Type Safety: Pydantic-style typed configs + dynamic YAML fields
+- ✅ Auto-Discovery: Loads all YAML files automatically
+- ✅ Environment Overrides: dev/test/prod environment support
+- ✅ Priority Chain: New path → Legacy path → Environment variables
+- ✅ Zero Breaking Changes: Existing code continues to work
+
+Configuration Structure:
+ New (Recommended):
+ config/ufo/ ← UFO² configurations
+ config/galaxy/ ← Galaxy configurations
+
+ Legacy (Auto-detected):
+ ufo/config/ ← Old UFO configs (still supported)
+
+Priority Rules:
+ 1. config/{module}/ ← Highest priority (new path)
+ 2. {module}/config/ ← Fallback (legacy path)
+ 3. Environment vars ← Override mechanism
+
+Usage Examples:
+ # Load config (automatic fallback to legacy)
+ config = get_ufo_config()
+
+ # Type-safe access (IDE autocomplete!)
+ max_step = config.system.max_step
+ api_model = config.app_agent.api_model
+
+ # Dynamic YAML fields (no code changes needed!)
+ new_field = config.NEW_FEATURE
+ setting = config["CUSTOM_SETTING"]
+
+ # Backward compatible
+ old_style = config["MAX_STEP"] # Still works!
+"""
+
+import logging
+import os
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import yaml
+
+from config.config_schemas import UFOConfig, GalaxyConfig
+
+logger = logging.getLogger(__name__)
+
+
+class DynamicConfig:
+ """
+ Dynamic configuration object that provides both dict-like and attribute access.
+
+ Usage:
+ config = DynamicConfig(data)
+
+ # Dict-style access (backward compatible)
+ value = config["MAX_STEP"]
+
+ # Attribute-style access (modern)
+ value = config.MAX_STEP
+
+ # Nested access
+ value = config.HOST_AGENT.API_MODEL
+ """
+
+ def __init__(self, data: Dict[str, Any], name: str = "config"):
+ """
+ Initialize DynamicConfig.
+
+ :param data: Configuration data dictionary
+ :param name: Name of this configuration (for debugging)
+ """
+ self._data = data
+ self._name = name
+ self._nested_configs = {}
+
+ # Pre-create nested configs for dict values
+ for key, value in data.items():
+ if isinstance(value, dict):
+ self._nested_configs[key] = DynamicConfig(value, name=key)
+
+ def __getattr__(self, name: str) -> Any:
+ """Attribute-style access: config.MAX_STEP"""
+ if name.startswith("_"):
+ return object.__getattribute__(self, name)
+
+ # Check if we have a pre-created nested config
+ if name in self._nested_configs:
+ return self._nested_configs[name]
+
+ # Return value from data
+ if name in self._data:
+ value = self._data[name]
+ if isinstance(value, dict):
+ # Create nested config on-the-fly
+ nested = DynamicConfig(value, name=name)
+ self._nested_configs[name] = nested
+ return nested
+ return value
+
+ raise AttributeError(f"'{self._name}' configuration has no attribute '{name}'")
+
+ def __getitem__(self, key: str) -> Any:
+ """Dict-style access: config["MAX_STEP"]"""
+ if key in self._nested_configs:
+ return self._nested_configs[key]
+ return self._data[key]
+
+ def __contains__(self, key: str) -> bool:
+ """Support 'in' operator"""
+ return key in self._data
+
+ def get(self, key: str, default: Any = None) -> Any:
+ """Dict-style get with default"""
+ if key in self._nested_configs:
+ return self._nested_configs[key]
+ return self._data.get(key, default)
+
+ def keys(self) -> List[str]:
+ """Get all keys"""
+ return self._data.keys()
+
+ def items(self):
+ """Get all items"""
+ return self._data.items()
+
+ def values(self):
+ """Get all values"""
+ return self._data.values()
+
+ def to_dict(self) -> Dict[str, Any]:
+ """Convert to plain dictionary"""
+ return self._data.copy()
+
+ def __repr__(self) -> str:
+ return f"DynamicConfig({self._name})"
+
+ def __str__(self) -> str:
+ return f"DynamicConfig({self._name}): {len(self._data)} keys"
+
+
+class ConfigLoader:
+ """
+ Modern configuration loader with backward compatibility.
+
+ Features:
+ - Automatic discovery of YAML files in config directories
+ - Fallback to legacy paths for backward compatibility
+ - Clear migration warnings to guide users
+ - Deep merging of multiple YAML files
+ - Environment-specific overrides (dev/test/prod)
+
+ Priority Chain (High → Low):
+ 1. config/{module}/*.yaml ← New path (highest priority)
+ 2. {module}/config/*.yaml ← Legacy path (fallback)
+ 3. Environment-specific overrides ← dev/test/prod variants
+
+ When both new and legacy paths exist:
+ - New path takes priority
+ - Legacy values fill in missing keys
+ - Clear warning shown to user
+ """
+
+ _instance: Optional["ConfigLoader"] = None
+
+ # Path mappings: new_path → legacy_path
+ LEGACY_PATH_MAP = {
+ "config/ufo": "ufo/config",
+ "config/galaxy": None, # Galaxy has no legacy path
+ }
+
+ def __init__(self, base_path: str = "config"):
+ """
+ Initialize ConfigLoader.
+
+ :param base_path: Base path to configuration directory (default: "config")
+ """
+ self.base_path = Path(base_path)
+ self._cache: Dict[str, Any] = {}
+ self._env = os.getenv("UFO_ENV", "production")
+ self._warnings_shown: set = set() # Track shown warnings
+
+ @classmethod
+ def get_instance(cls, base_path: str = "config") -> "ConfigLoader":
+ """
+ Get or create ConfigLoader singleton.
+
+ :param base_path: Base path to configuration directory
+ :return: ConfigLoader instance
+ """
+ if cls._instance is None:
+ cls._instance = ConfigLoader(base_path)
+ return cls._instance
+
+ @classmethod
+ def reset(cls) -> None:
+ """Reset singleton instance (useful for testing)"""
+ cls._instance = None
+
+ def _load_yaml(self, path: Path) -> Optional[Dict[str, Any]]:
+ """
+ Load YAML file safely with caching.
+
+ :param path: Path to YAML file
+ :return: Parsed YAML data or None if file doesn't exist
+ """
+ # Check cache first
+ cache_key = str(path)
+ if cache_key in self._cache:
+ return self._cache[cache_key]
+
+ # Load from file
+ if not path.exists():
+ return None
+
+ try:
+ with open(path, "r", encoding="utf-8") as f:
+ data = yaml.safe_load(f) or {}
+ self._cache[cache_key] = data
+ return data
+ except Exception as e:
+ logger.warning(f"Error loading {path}: {e}")
+ return None
+
+ def _deep_merge(self, target: Dict[str, Any], source: Dict[str, Any]) -> None:
+ """
+ Deep merge source dictionary into target dictionary.
+
+ Source values override target values.
+ Nested dictionaries are merged recursively.
+
+ :param target: Target dictionary to update
+ :param source: Source dictionary
+ """
+ for key, value in source.items():
+ if (
+ key in target
+ and isinstance(target[key], dict)
+ and isinstance(value, dict)
+ ):
+ self._deep_merge(target[key], value)
+ else:
+ target[key] = value
+
+ def _discover_yaml_files(self, directory: Path) -> List[Path]:
+ """
+ Discover all YAML files in a directory.
+
+ Excludes environment-specific files (*_dev.yaml, *_test.yaml, etc.)
+ which are loaded separately based on UFO_ENV.
+
+ :param directory: Directory to search
+ :return: List of YAML file paths (sorted for consistent loading)
+ """
+ if not directory.exists():
+ return []
+
+ yaml_files = []
+ for file in directory.glob("*.yaml"):
+ # Skip environment-specific files (loaded separately)
+ if not any(
+ file.stem.endswith(suffix) for suffix in ["_dev", "_test", "_prod"]
+ ):
+ yaml_files.append(file)
+
+ return sorted(yaml_files) # Consistent loading order
+
+ def _load_module_configs(
+ self, module_dir: Path, env: Optional[str] = None
+ ) -> Dict[str, Any]:
+ """
+ Load all configuration files from a module directory and merge them.
+
+ Loading order:
+ 1. Base YAML files (*.yaml)
+ 2. Environment-specific overrides (*_.yaml)
+
+ :param module_dir: Module directory (e.g., config/ufo or config/galaxy)
+ :param env: Environment name for overrides (dev/test/prod)
+ :return: Merged configuration dictionary
+ """
+ merged_config = {}
+
+ # Load all base YAML files
+ yaml_files = self._discover_yaml_files(module_dir)
+ for yaml_file in yaml_files:
+ config_data = self._load_yaml(yaml_file)
+ if config_data:
+ # Special handling for mcp.yaml and agent_mcp.yaml: nest under 'mcp' key
+ if yaml_file.stem in ["mcp", "agent_mcp"]:
+ config_data = {"mcp": config_data}
+ self._deep_merge(merged_config, config_data)
+
+ # Load environment-specific overrides
+ if env and env != "production":
+ for yaml_file in yaml_files:
+ # Look for _.yaml files
+ env_file = yaml_file.parent / f"{yaml_file.stem}_{env}.yaml"
+ env_data = self._load_yaml(env_file)
+ if env_data:
+ self._deep_merge(merged_config, env_data)
+
+ return merged_config
+
+ def _load_with_fallback(
+ self, module: str, env: Optional[str] = None
+ ) -> Dict[str, Any]:
+ """
+ Load configuration with automatic fallback to legacy paths.
+
+ Priority:
+ 1. config/{module}/ ← New path (priority)
+ 2. {module}/config/ ← Legacy path (fallback)
+
+ Behavior:
+ - If both exist: New overrides legacy, warning shown
+ - If only new: Use new path, no warning
+ - If only legacy: Use legacy, show migration warning
+ - If neither: Raise FileNotFoundError
+
+ :param module: Module name ("ufo" or "galaxy")
+ :param env: Environment override
+ :return: Merged configuration dictionary
+ """
+ new_path = self.base_path / module
+ legacy_path_str = self.LEGACY_PATH_MAP.get(f"config/{module}")
+ legacy_path = Path(legacy_path_str) if legacy_path_str else None
+
+ # Load new configuration
+ new_config = self._load_module_configs(new_path, env)
+ new_exists = bool(new_config)
+
+ # Load legacy configuration (if path exists)
+ legacy_config = {}
+ legacy_exists = False
+ if legacy_path and legacy_path.exists():
+ legacy_config = self._load_module_configs(legacy_path, env)
+ legacy_exists = bool(legacy_config)
+
+ # Determine which config to use and show appropriate warnings
+ if new_exists and legacy_exists:
+ # Both exist: Merge with new taking priority
+ self._warn_duplicate_configs(module, str(new_path), str(legacy_path))
+ merged = legacy_config.copy()
+ self._deep_merge(merged, new_config)
+ return merged
+
+ elif new_exists:
+ # Only new exists: Ideal case
+ return new_config
+
+ elif legacy_exists:
+ # Only legacy exists: Show migration warning
+ self._warn_legacy_config(module, str(legacy_path), str(new_path))
+ return legacy_config
+
+ else:
+ # Neither exists: Error
+ raise FileNotFoundError(
+ f"No configuration found for '{module}'.\n"
+ f"Expected at:\n"
+ f" - {new_path}/ (recommended)\n"
+ + (f" - {legacy_path}/ (legacy)\n" if legacy_path else "")
+ )
+
+ def _warn_duplicate_configs(
+ self, module: str, new_path: str, legacy_path: str
+ ) -> None:
+ """
+ Warn user when both new and legacy configs exist.
+
+ :param module: Module name
+ :param new_path: New configuration path
+ :param legacy_path: Legacy configuration path
+ """
+ warning_key = f"duplicate_{module}"
+ if warning_key in self._warnings_shown:
+ return
+
+ logger.warning(
+ f"\n{'=' * 70}\n"
+ f"⚠️ CONFIG CONFLICT DETECTED: {module.upper()}\n"
+ f"{'=' * 70}\n"
+ f"Found configurations in BOTH locations:\n"
+ f" 1. {new_path}/ ← ACTIVE (using this)\n"
+ f" 2. {legacy_path}/ ← IGNORED (legacy)\n\n"
+ f"Recommendation:\n"
+ f" Remove legacy config to avoid confusion:\n"
+ f" rm -rf {legacy_path}/*.yaml\n"
+ f"{'=' * 70}\n"
+ )
+ self._warnings_shown.add(warning_key)
+
+ def _warn_legacy_config(self, module: str, legacy_path: str, new_path: str) -> None:
+ """
+ Warn user when using legacy configuration path.
+
+ :param module: Module name
+ :param legacy_path: Legacy configuration path
+ :param new_path: New configuration path (recommended)
+ """
+ warning_key = f"legacy_{module}"
+ if warning_key in self._warnings_shown:
+ return
+
+ logger.warning(
+ f"\n{'=' * 70}\n"
+ f"⚠️ LEGACY CONFIG PATH DETECTED: {module.upper()}\n"
+ f"{'=' * 70}\n"
+ f"Using legacy config: {legacy_path}/\n"
+ f"Please migrate to: {new_path}/\n\n"
+ f"Quick migration:\n"
+ f" mkdir -p {new_path}\n"
+ f" cp {legacy_path}/*.yaml {new_path}/\n\n"
+ f"Or use migration tool:\n"
+ f" python -m ufo.tools.migrate_config\n"
+ f"{'=' * 70}\n"
+ )
+ self._warnings_shown.add(warning_key)
+
+ def load_ufo_config(self, env: Optional[str] = None) -> UFOConfig:
+ """
+ Load UFO configuration with automatic legacy fallback.
+
+ Automatically discovers and loads all YAML files:
+ - Priority 1: config/ufo/*.yaml (new structure)
+ - Priority 2: ufo/config/*.yaml (legacy fallback)
+
+ Returns UFOConfig with:
+ - Typed fields for common configs (config.system.max_step)
+ - Dynamic access for any YAML field (config.ANY_NEW_KEY)
+
+ :param env: Environment override (dev/test/prod)
+ :return: UFOConfig with typed + dynamic access
+ """
+ env = env or self._env
+
+ # Suppress TensorFlow warnings (from old Config) - BEFORE copying env vars
+ os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
+
+ # Start with environment variables (for backward compatibility with old Config)
+ config_data = dict(os.environ)
+
+ # Load YAML configs with automatic fallback and merge into env vars
+ yaml_config = self._load_with_fallback("ufo", env)
+ config_data.update(yaml_config)
+
+ # Apply legacy API base transformations
+ self._apply_legacy_transforms(config_data)
+
+ # Create typed config with dynamic fields
+ return UFOConfig.from_dict(config_data)
+
+ def load_galaxy_config(self, env: Optional[str] = None) -> GalaxyConfig:
+ """
+ Load Galaxy configuration with automatic legacy fallback.
+
+ Automatically discovers and loads all YAML files from config/galaxy/.
+ Returns GalaxyConfig with:
+ - Typed fields for agent config
+ - Dynamic access for any YAML field (config.client_001, etc.)
+
+ :param env: Environment override (dev/test/prod)
+ :return: GalaxyConfig with typed + dynamic access
+ """
+ env = env or self._env
+
+ # Load configuration (Galaxy has no legacy fallback)
+ config_data = self._load_with_fallback("galaxy", env)
+
+ # Apply legacy API base transformations
+ self._apply_legacy_transforms(config_data)
+
+ # Create typed config with dynamic fields
+ return GalaxyConfig.from_dict(config_data)
+
+ def _apply_legacy_transforms(self, config: Dict[str, Any]) -> None:
+ """
+ Apply legacy configuration transformations.
+
+ :param config: Configuration dictionary to transform
+ """
+ # Update API base for various agents
+ for agent_key in [
+ "HOST_AGENT",
+ "APP_AGENT",
+ "BACKUP_AGENT",
+ "EVALUATION_AGENT",
+ "CONSTELLATION_AGENT",
+ ]:
+ if agent_key in config:
+ self._update_api_base(config, agent_key)
+
+ # Ensure CONTROL_BACKEND is a list
+ if "CONTROL_BACKEND" in config and isinstance(config["CONTROL_BACKEND"], str):
+ config["CONTROL_BACKEND"] = [config["CONTROL_BACKEND"]]
+
+ @staticmethod
+ def _update_api_base(config: Dict[str, Any], agent_key: str) -> None:
+ """
+ Update API base URL based on API type (legacy behavior).
+
+ :param config: Configuration dictionary
+ :param agent_key: Agent configuration key
+ """
+ if agent_key not in config:
+ return
+
+ agent_config = config[agent_key]
+ if not isinstance(agent_config, dict):
+ return
+
+ api_type = agent_config.get("API_TYPE", "").lower()
+
+ if api_type == "aoai":
+ # Azure OpenAI - construct deployment URL
+ api_base = agent_config.get("API_BASE", "")
+ if api_base and "deployments" not in api_base:
+ deployment_id = agent_config.get("API_DEPLOYMENT_ID", "")
+ api_version = agent_config.get("API_VERSION", "")
+ if deployment_id:
+ agent_config["API_BASE"] = (
+ f"{api_base.rstrip('/')}/openai/deployments/"
+ f"{deployment_id}/chat/completions?api-version={api_version}"
+ )
+ agent_config["API_MODEL"] = deployment_id
+
+ elif api_type == "openai":
+ # OpenAI - standard API base
+ if not agent_config.get("API_BASE"):
+ agent_config["API_BASE"] = "https://api.openai.com/v1/chat/completions"
+
+
+# Global convenience functions with caching
+
+_global_ufo_config: Optional[UFOConfig] = None
+_global_galaxy_config: Optional[GalaxyConfig] = None
+
+
+def get_ufo_config(reload: bool = False) -> UFOConfig:
+ """
+ Get UFO configuration (cached).
+
+ Returns a hybrid config object with:
+ - Type-safe fixed fields: config.system.max_step, config.app_agent.api_model
+ - Dynamic YAML fields: config.ANY_NEW_KEY, config["NEW_SETTING"]
+ - Backward compatible: config["MAX_STEP"]
+
+ Usage Examples:
+ config = get_ufo_config()
+
+ # Modern typed access (IDE autocomplete!)
+ max_step = config.system.max_step
+ log_level = config.system.log_level
+ model = config.app_agent.api_model
+ rag_enabled = config.rag.experience
+
+ # Dynamic access (no code changes needed for new YAML keys!)
+ if hasattr(config, 'NEW_FEATURE_FLAG'):
+ enabled = config.NEW_FEATURE_FLAG
+
+ new_value = config.get("CUSTOM_SETTING", "default")
+
+ # Legacy dict access (still works)
+ max_step_old = config["MAX_STEP"]
+ agent_config = config["APP_AGENT"]
+
+ :param reload: Force reload configuration from files
+ :return: UFOConfig instance
+ """
+ global _global_ufo_config
+
+ if _global_ufo_config is None or reload:
+ loader = ConfigLoader.get_instance()
+ _global_ufo_config = loader.load_ufo_config()
+
+ return _global_ufo_config
+
+
+def get_galaxy_config(reload: bool = False) -> GalaxyConfig:
+ """
+ Get Galaxy configuration (cached).
+
+ Returns a hybrid config object with:
+ - Type-safe agent config: config.constellation_agent.api_model
+ - Dynamic YAML fields: config.client_001, config.constellation_id, etc.
+ - Backward compatible: config["CONSTELLATION_AGENT"]
+
+ Usage Examples:
+ config = get_galaxy_config()
+
+ # Modern typed access
+ agent_model = config.constellation_agent.api_model
+
+ # Dynamic access to constellation settings
+ constellation_id = config.constellation_id
+ heartbeat = config.heartbeat_interval
+
+ # Dynamic access to devices
+ device = config.client_001
+ server_url = device.server_url
+ capabilities = device.capabilities
+
+ # Legacy dict access
+ agent_old = config["CONSTELLATION_AGENT"]
+ device_old = config["client_001"]
+
+ :param reload: Force reload configuration from files
+ :return: GalaxyConfig instance
+ """
+ global _global_galaxy_config
+
+ if _global_galaxy_config is None or reload:
+ loader = ConfigLoader.get_instance()
+ _global_galaxy_config = loader.load_galaxy_config()
+
+ return _global_galaxy_config
+
+
+def clear_config_cache():
+ """Clear configuration cache. Useful for testing or hot-reloading."""
+ global _global_ufo_config, _global_galaxy_config
+ _global_ufo_config = None
+ _global_galaxy_config = None
+ ConfigLoader.reset()
diff --git a/config/config_schemas.py b/config/config_schemas.py
new file mode 100644
index 000000000..200473747
--- /dev/null
+++ b/config/config_schemas.py
@@ -0,0 +1,847 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Configuration Schema Definitions
+
+Hybrid design: Fixed typed fields + dynamic field support.
+"""
+
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+
+@dataclass
+class AgentConfig:
+ """
+ Agent configuration with common fields + dynamic extras.
+
+ Fixed fields provide IDE autocomplete and type safety.
+ Any additional fields from YAML are accessible via dict-style or attribute access.
+ """
+
+ # ========== Fixed Common Fields (Type-Safe) ==========
+ visual_mode: bool = True
+ reasoning_model: bool = False
+ api_type: str = "azure_ad"
+ api_base: str = ""
+ api_key: str = ""
+ api_version: str = "2025-02-01-preview"
+ api_model: str = "gpt-4.1-20250414"
+
+ # Azure AD fields
+ aad_tenant_id: Optional[str] = None
+ aad_api_scope: Optional[str] = None
+ aad_api_scope_base: Optional[str] = None
+ api_deployment_id: Optional[str] = None
+
+ # Prompt paths
+ prompt: Optional[str] = None
+ example_prompt: Optional[str] = None
+
+ # ========== Dynamic Fields (Auto-populated from YAML) ==========
+ _extras: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ """Support dynamic attribute access for extra fields"""
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ # Support uppercase access (API_MODEL, API_TYPE, etc.)
+ # Map to lowercase attribute if exists
+ lower_name = name.lower()
+ if hasattr(self.__class__, lower_name):
+ return getattr(self, lower_name)
+
+ # Check extras (try both exact name and uppercase version)
+ if name in self._extras:
+ return self._extras[name]
+
+ # If lowercase requested, try uppercase in extras
+ upper_name = name.upper()
+ if upper_name in self._extras:
+ return self._extras[upper_name]
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ """Support dict-style access"""
+ # Try fixed fields first
+ if hasattr(self, key) and not key.startswith("_"):
+ return getattr(self, key)
+ # Then try extras
+ if key in self._extras:
+ return self._extras[key]
+ raise KeyError(key)
+
+ def __contains__(self, key: str) -> bool:
+ """Support 'in' operator"""
+ return (hasattr(self, key) and not key.startswith("_")) or (key in self._extras)
+
+ def get(self, key: str, default: Any = None) -> Any:
+ """Dict-style get with default"""
+ try:
+ return self[key]
+ except KeyError:
+ return default
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "AgentConfig":
+ """
+ Create AgentConfig from dictionary.
+
+ Known fields are mapped to typed attributes.
+ Unknown fields are stored in _extras.
+ """
+ # Known field mappings
+ known_fields = {
+ "VISUAL_MODE": "visual_mode",
+ "REASONING_MODEL": "reasoning_model",
+ "API_TYPE": "api_type",
+ "API_BASE": "api_base",
+ "API_KEY": "api_key",
+ "API_VERSION": "api_version",
+ "API_MODEL": "api_model",
+ "AAD_TENANT_ID": "aad_tenant_id",
+ "AAD_API_SCOPE": "aad_api_scope",
+ "AAD_API_SCOPE_BASE": "aad_api_scope_base",
+ "API_DEPLOYMENT_ID": "api_deployment_id",
+ "PROMPT": "prompt",
+ "EXAMPLE_PROMPT": "example_prompt",
+ }
+
+ # Extract known fields
+ kwargs = {}
+ extras = {}
+
+ for key, value in data.items():
+ if key in known_fields:
+ kwargs[known_fields[key]] = value
+ else:
+ # Store unknown fields as extras
+ extras[key] = value
+
+ # Create instance
+ instance = cls(**kwargs)
+ instance._extras = extras
+
+ return instance
+
+
+@dataclass
+class RAGConfig:
+ """RAG configuration with fixed fields + dynamic extras"""
+
+ # ========== Fixed Fields ==========
+ offline_docs: bool = False
+ offline_docs_retrieved_topk: int = 1
+ online_search: bool = False
+ online_search_topk: int = 5
+ online_retrieved_topk: int = 5
+ experience: bool = False
+ experience_retrieved_topk: int = 5
+ demonstration: bool = False
+ demonstration_retrieved_topk: int = 5
+
+ # ========== Dynamic Fields ==========
+ _extras: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ # Support uppercase access with RAG_ prefix (RAG_OFFLINE_DOCS -> offline_docs)
+ if name.startswith("RAG_"):
+ # Remove RAG_ prefix and convert to lowercase
+ field_name = name[4:].lower() # RAG_OFFLINE_DOCS -> offline_docs
+ if hasattr(self.__class__, field_name):
+ return getattr(self, field_name)
+
+ # Support uppercase access without prefix (OFFLINE_DOCS -> offline_docs)
+ lower_name = name.lower()
+ if hasattr(self.__class__, lower_name):
+ return getattr(self, lower_name)
+
+ # Check extras (try both exact name and uppercase version)
+ if name in self._extras:
+ return self._extras[name]
+
+ # If lowercase requested, try uppercase in extras
+ upper_name = name.upper()
+ if upper_name in self._extras:
+ return self._extras[upper_name]
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ if hasattr(self, key) and not key.startswith("_"):
+ return getattr(self, key)
+ if key in self._extras:
+ return self._extras[key]
+ raise KeyError(key)
+
+ def get(self, key: str, default: Any = None) -> Any:
+ try:
+ return self[key]
+ except KeyError:
+ return default
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "RAGConfig":
+ """Create RAGConfig with known fields + extras"""
+ known_mappings = {
+ "RAG_OFFLINE_DOCS": "offline_docs",
+ "RAG_OFFLINE_DOCS_RETRIEVED_TOPK": "offline_docs_retrieved_topk",
+ "RAG_ONLINE_SEARCH": "online_search",
+ "RAG_ONLINE_SEARCH_TOPK": "online_search_topk",
+ "RAG_ONLINE_RETRIEVED_TOPK": "online_retrieved_topk",
+ "RAG_EXPERIENCE": "experience",
+ "RAG_EXPERIENCE_RETRIEVED_TOPK": "experience_retrieved_topk",
+ "RAG_DEMONSTRATION": "demonstration",
+ "RAG_DEMONSTRATION_RETRIEVED_TOPK": "demonstration_retrieved_topk",
+ }
+
+ kwargs = {}
+ extras = {}
+
+ for key, value in data.items():
+ if key in known_mappings:
+ kwargs[known_mappings[key]] = value
+ elif key.startswith("RAG_") or key in [
+ "BING_API_KEY",
+ "EXPERIENCE_SAVED_PATH",
+ "DEMONSTRATION_SAVED_PATH",
+ "EXPERIENCE_PROMPT",
+ "DEMONSTRATION_PROMPT",
+ ]:
+ extras[key] = value
+
+ instance = cls(**kwargs)
+ instance._extras = extras
+ return instance
+
+
+@dataclass
+class SystemConfig:
+ """System configuration with fixed fields + dynamic extras"""
+
+ # ========== LLM Parameters ==========
+ max_tokens: int = 2000
+ max_retry: int = 20
+ temperature: float = 0.0
+ top_p: float = 0.0
+ timeout: int = 60
+
+ # ========== Control Backend ==========
+ control_backend: List[str] = field(default_factory=lambda: ["uia"])
+ iou_threshold_for_merge: float = 0.1
+
+ # ========== Execution Limits ==========
+ max_step: int = 50
+ max_round: int = 1
+ sleep_time: int = 1
+ rectangle_time: int = 1
+
+ # ========== Action Configuration ==========
+ action_sequence: bool = False
+ show_visual_outline_on_screen: bool = False
+ maximize_window: bool = False
+ json_parsing_retry: int = 3
+
+ # ========== Safety ==========
+ safe_guard: bool = False
+ control_list: List[str] = field(
+ default_factory=lambda: [
+ "Button",
+ "Edit",
+ "TabItem",
+ "Document",
+ "ListItem",
+ "MenuItem",
+ "ScrollBar",
+ "TreeItem",
+ "Hyperlink",
+ "ComboBox",
+ "RadioButton",
+ "Spinner",
+ "CheckBox",
+ "Group",
+ "Text",
+ ]
+ )
+
+ # ========== History ==========
+ history_keys: List[str] = field(
+ default_factory=lambda: [
+ "step",
+ "subtask",
+ "action_representation",
+ "user_confirm",
+ ]
+ )
+
+ # ========== Annotation ==========
+ annotation_colors: Dict[str, str] = field(default_factory=dict)
+ highlight_bbox: bool = True
+ annotation_font_size: int = 22
+
+ # ========== Control Actions ==========
+ click_api: str = "click_input"
+ after_click_wait: int = 0
+ input_text_api: str = "type_keys"
+ input_text_enter: bool = False
+ input_text_inter_key_pause: float = 0.05
+
+ # ========== Logging ==========
+ print_log: bool = False
+ concat_screenshot: bool = False
+ log_level: str = "DEBUG"
+ include_last_screenshot: bool = True
+ request_timeout: int = 250
+ log_xml: bool = False
+ log_to_markdown: bool = True
+ screenshot_to_memory: bool = True
+
+ # ========== Image Performance ==========
+ default_png_compress_level: int = 1
+
+ # ========== Save Options ==========
+ save_ui_tree: bool = False
+ save_full_screen: bool = False
+
+ # ========== Task Management ==========
+ task_status: bool = True
+ task_status_file: Optional[str] = None
+ save_experience: str = "always_not"
+
+ # ========== Evaluation ==========
+ eva_session: bool = True
+ eva_round: bool = False
+ eva_all_screenshots: bool = True
+
+ # ========== Customization ==========
+ ask_question: bool = False
+ use_customization: bool = False
+ qa_pair_file: str = "customization/global_memory.jsonl"
+ qa_pair_num: int = 20
+
+ # ========== Omniparser ==========
+ omniparser: Dict[str, Any] = field(default_factory=dict)
+
+ # ========== Control Filtering ==========
+ control_filter_type: List[str] = field(default_factory=list)
+ control_filter_top_k_plan: int = 2
+ control_filter_top_k_semantic: int = 15
+ control_filter_top_k_icon: int = 15
+ control_filter_model_semantic_name: str = "all-MiniLM-L6-v2"
+ control_filter_model_icon_name: str = "clip-ViT-B-32"
+
+ # ========== API Usage ==========
+ use_apis: bool = True
+ api_prompt: str = "ufo/prompts/share/base/api.yaml"
+
+ # ========== MCP (Model Context Protocol) ==========
+ use_mcp: bool = True
+ mcp_servers_config: str = "config/ufo/mcp.yaml"
+ mcp_preferred_apps: List[str] = field(default_factory=list)
+ mcp_fallback_to_ui: bool = True
+ mcp_instructions_path: str = "ufo/config/mcp_instructions"
+ mcp_tool_timeout: int = 30
+ mcp_log_execution: bool = False
+
+ # ========== Device Configuration ==========
+ device_info: str = "config/device_config.yaml"
+
+ # ========== Prompt Paths ==========
+ hostagent_prompt: str = "ufo/prompts/share/base/host_agent.yaml"
+ appagent_prompt: str = "ufo/prompts/share/base/app_agent.yaml"
+ followeragent_prompt: str = "ufo/prompts/share/base/app_agent.yaml"
+ evaluation_prompt: str = "ufo/prompts/evaluation/evaluate.yaml"
+ hostagent_example_prompt: str = (
+ "ufo/prompts/examples/{mode}/host_agent_example.yaml"
+ )
+ appagent_example_prompt: str = "ufo/prompts/examples/{mode}/app_agent_example.yaml"
+ appagent_example_prompt_as: str = (
+ "ufo/prompts/examples/{mode}/app_agent_example_as.yaml"
+ )
+
+ # ========== API and App-specific Prompts ==========
+ app_api_prompt_address: Dict[str, str] = field(default_factory=dict)
+ word_api_prompt: str = "ufo/prompts/apps/word/api.yaml"
+ excel_api_prompt: str = "ufo/prompts/apps/excel/api.yaml"
+
+ # ========== Constellation Prompts ==========
+ constellation_creation_prompt: str = (
+ "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ )
+ constellation_editing_prompt: str = (
+ "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ )
+ constellation_creation_example_prompt: str = (
+ "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ )
+ constellation_editing_example_prompt: str = (
+ "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+ )
+
+ # ========== Third-Party Agents ==========
+ enabled_third_party_agents: List[str] = field(default_factory=list)
+ third_party_agent_config: Dict[str, Any] = field(default_factory=dict)
+
+ # ========== Output ==========
+ output_presenter: str = "rich"
+
+ # ========== Prices (from legacy config) ==========
+ prices: Dict[str, Any] = field(default_factory=dict)
+
+ # ========== Dynamic Fields ==========
+ _extras: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ # Support uppercase access (MAX_TOKENS, MAX_STEP, etc.)
+ # Map to lowercase attribute if exists
+ lower_name = name.lower()
+ if hasattr(self.__class__, lower_name):
+ return getattr(self, lower_name)
+
+ # Check extras (try both exact name and uppercase version)
+ if name in self._extras:
+ return self._extras[name]
+
+ # If lowercase requested, try uppercase in extras
+ upper_name = name.upper()
+ if upper_name in self._extras:
+ return self._extras[upper_name]
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ if hasattr(self, key) and not key.startswith("_"):
+ return getattr(self, key)
+ if key in self._extras:
+ return self._extras[key]
+ raise KeyError(key)
+
+ def get(self, key: str, default: Any = None) -> Any:
+ try:
+ return self[key]
+ except KeyError:
+ return default
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "SystemConfig":
+ """Create SystemConfig with known fields + extras"""
+ known_mappings = {
+ # LLM Parameters
+ "MAX_TOKENS": "max_tokens",
+ "MAX_RETRY": "max_retry",
+ "TEMPERATURE": "temperature",
+ "TOP_P": "top_p",
+ "TIMEOUT": "timeout",
+ # Control Backend
+ "CONTROL_BACKEND": "control_backend",
+ "IOU_THRESHOLD_FOR_MERGE": "iou_threshold_for_merge",
+ # Execution Limits
+ "MAX_STEP": "max_step",
+ "MAX_ROUND": "max_round",
+ "SLEEP_TIME": "sleep_time",
+ "RECTANGLE_TIME": "rectangle_time",
+ # Action Configuration
+ "ACTION_SEQUENCE": "action_sequence",
+ "SHOW_VISUAL_OUTLINE_ON_SCREEN": "show_visual_outline_on_screen",
+ "MAXIMIZE_WINDOW": "maximize_window",
+ "JSON_PARSING_RETRY": "json_parsing_retry",
+ # Safety
+ "SAFE_GUARD": "safe_guard",
+ "CONTROL_LIST": "control_list",
+ # History
+ "HISTORY_KEYS": "history_keys",
+ # Annotation
+ "ANNOTATION_COLORS": "annotation_colors",
+ "HIGHLIGHT_BBOX": "highlight_bbox",
+ "ANNOTATION_FONT_SIZE": "annotation_font_size",
+ # Control Actions
+ "CLICK_API": "click_api",
+ "AFTER_CLICK_WAIT": "after_click_wait",
+ "INPUT_TEXT_API": "input_text_api",
+ "INPUT_TEXT_ENTER": "input_text_enter",
+ "INPUT_TEXT_INTER_KEY_PAUSE": "input_text_inter_key_pause",
+ # Logging
+ "PRINT_LOG": "print_log",
+ "CONCAT_SCREENSHOT": "concat_screenshot",
+ "LOG_LEVEL": "log_level",
+ "INCLUDE_LAST_SCREENSHOT": "include_last_screenshot",
+ "REQUEST_TIMEOUT": "request_timeout",
+ "LOG_XML": "log_xml",
+ "LOG_TO_MARKDOWN": "log_to_markdown",
+ "SCREENSHOT_TO_MEMORY": "screenshot_to_memory",
+ # Image Performance
+ "DEFAULT_PNG_COMPRESS_LEVEL": "default_png_compress_level",
+ # Save Options
+ "SAVE_UI_TREE": "save_ui_tree",
+ "SAVE_FULL_SCREEN": "save_full_screen",
+ # Task Management
+ "TASK_STATUS": "task_status",
+ "TASK_STATUS_FILE": "task_status_file",
+ "SAVE_EXPERIENCE": "save_experience",
+ # Evaluation
+ "EVA_SESSION": "eva_session",
+ "EVA_ROUND": "eva_round",
+ "EVA_ALL_SCREENSHOTS": "eva_all_screenshots",
+ # Customization
+ "ASK_QUESTION": "ask_question",
+ "USE_CUSTOMIZATION": "use_customization",
+ "QA_PAIR_FILE": "qa_pair_file",
+ "QA_PAIR_NUM": "qa_pair_num",
+ # Omniparser
+ "OMNIPARSER": "omniparser",
+ # Control Filtering
+ "CONTROL_FILTER_TYPE": "control_filter_type",
+ "CONTROL_FILTER_TOP_K_PLAN": "control_filter_top_k_plan",
+ "CONTROL_FILTER_TOP_K_SEMANTIC": "control_filter_top_k_semantic",
+ "CONTROL_FILTER_TOP_K_ICON": "control_filter_top_k_icon",
+ "CONTROL_FILTER_MODEL_SEMANTIC_NAME": "control_filter_model_semantic_name",
+ "CONTROL_FILTER_MODEL_ICON_NAME": "control_filter_model_icon_name",
+ # API Usage
+ "USE_APIS": "use_apis",
+ "API_PROMPT": "api_prompt",
+ # MCP
+ "USE_MCP": "use_mcp",
+ "MCP_SERVERS_CONFIG": "mcp_servers_config",
+ "MCP_PREFERRED_APPS": "mcp_preferred_apps",
+ "MCP_FALLBACK_TO_UI": "mcp_fallback_to_ui",
+ "MCP_INSTRUCTIONS_PATH": "mcp_instructions_path",
+ "MCP_TOOL_TIMEOUT": "mcp_tool_timeout",
+ "MCP_LOG_EXECUTION": "mcp_log_execution",
+ # Device Configuration
+ "DEVICE_INFO": "device_info",
+ # Prompt Paths
+ "HOSTAGENT_PROMPT": "hostagent_prompt",
+ "APPAGENT_PROMPT": "appagent_prompt",
+ "FOLLOWERAGENT_PROMPT": "followeragent_prompt",
+ "EVALUATION_PROMPT": "evaluation_prompt",
+ "HOSTAGENT_EXAMPLE_PROMPT": "hostagent_example_prompt",
+ "APPAGENT_EXAMPLE_PROMPT": "appagent_example_prompt",
+ "APPAGENT_EXAMPLE_PROMPT_AS": "appagent_example_prompt_as",
+ # API and App-specific Prompts
+ "APP_API_PROMPT_ADDRESS": "app_api_prompt_address",
+ "WORD_API_PROMPT": "word_api_prompt",
+ "EXCEL_API_PROMPT": "excel_api_prompt",
+ # Constellation Prompts
+ "CONSTELLATION_CREATION_PROMPT": "constellation_creation_prompt",
+ "CONSTELLATION_EDITING_PROMPT": "constellation_editing_prompt",
+ "CONSTELLATION_CREATION_EXAMPLE_PROMPT": "constellation_creation_example_prompt",
+ "CONSTELLATION_EDITING_EXAMPLE_PROMPT": "constellation_editing_example_prompt",
+ # Third-Party Agents
+ "ENABLED_THIRD_PARTY_AGENTS": "enabled_third_party_agents",
+ "THIRD_PARTY_AGENT_CONFIG": "third_party_agent_config",
+ # Output
+ "OUTPUT_PRESENTER": "output_presenter",
+ # Prices
+ "PRICES": "prices",
+ }
+
+ kwargs = {}
+ extras = {}
+
+ for key, value in data.items():
+ if key in known_mappings:
+ kwargs[known_mappings[key]] = value
+ else:
+ # All other fields go to extras
+ extras[key] = value
+
+ instance = cls(**kwargs)
+ instance._extras = extras
+ return instance
+
+
+@dataclass
+class UFOConfig:
+ """
+ Complete UFO configuration with typed modules + dynamic raw access.
+
+ This hybrid approach provides:
+ 1. Typed access to common configurations: config.system.max_step
+ 2. Dynamic access to any YAML key: config["ANY_NEW_KEY"]
+ 3. Backward compatibility: config["OLD_KEY"] still works
+ """
+
+ # ========== Typed Module Configs (Recommended) ==========
+ host_agent: AgentConfig
+ app_agent: AgentConfig
+ backup_agent: AgentConfig
+ evaluation_agent: AgentConfig
+ operator: AgentConfig
+ rag: RAGConfig
+ system: SystemConfig
+
+ # ========== Raw Dictionary (Backward Compatible) ==========
+ _raw: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ """
+ Support dynamic attribute access for any config key.
+
+ Allows: config.ANY_NEW_YAML_KEY
+ """
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ # Check if it's in raw config
+ if name in self._raw:
+ value = self._raw[name]
+ # Wrap dict values in DynamicConfig for nested access
+ if isinstance(value, dict):
+ from config.config_loader import DynamicConfig
+
+ return DynamicConfig(value, name=name)
+ return value
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ """
+ Support dict-style access for backward compatibility.
+
+ Allows: config["ANY_KEY"]
+ """
+ return self._raw[key]
+
+ def __contains__(self, key: str) -> bool:
+ """Support 'in' operator"""
+ return key in self._raw
+
+ def get(self, key: str, default: Any = None) -> Any:
+ """Dict-style get with default"""
+ return self._raw.get(key, default)
+
+ def keys(self):
+ """Get all raw config keys"""
+ return self._raw.keys()
+
+ def items(self):
+ """Get all raw config items"""
+ return self._raw.items()
+
+ def values(self):
+ """Get all raw config values"""
+ return self._raw.values()
+
+ def to_dict(self) -> Dict[str, Any]:
+ """
+ Convert UFOConfig back to dictionary format.
+ Returns the raw config dictionary for backward compatibility.
+ """
+ return self._raw.copy()
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "UFOConfig":
+ """Create UFOConfig from merged configuration dictionary"""
+ return cls(
+ host_agent=AgentConfig.from_dict(data.get("HOST_AGENT", {})),
+ app_agent=AgentConfig.from_dict(data.get("APP_AGENT", {})),
+ backup_agent=AgentConfig.from_dict(data.get("BACKUP_AGENT", {})),
+ evaluation_agent=AgentConfig.from_dict(data.get("EVALUATION_AGENT", {})),
+ operator=AgentConfig.from_dict(data.get("OPERATOR", {})),
+ rag=RAGConfig.from_dict(data),
+ system=SystemConfig.from_dict(data),
+ _raw=data,
+ )
+
+
+@dataclass
+class ConstellationRuntimeConfig:
+ """
+ Constellation runtime configuration with fixed fields + dynamic extras.
+ """
+
+ # ========== Fixed Fields ==========
+ constellation_id: str = "test_constellation"
+ heartbeat_interval: float = 30.0
+ reconnect_delay: float = 5.0
+ max_concurrent_tasks: int = 6
+ max_step: int = 15
+ device_info: str = "config/galaxy/devices.yaml"
+ log_to_markdown: bool = True
+
+ # ========== Dynamic Fields ==========
+ _extras: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ # Support uppercase access (DEVICE_INFO, MAX_STEP, etc.)
+ # Map to lowercase attribute if exists
+ lower_name = name.lower()
+ if hasattr(self.__class__, lower_name):
+ return getattr(self, lower_name)
+
+ # Check extras (try both exact name and uppercase version)
+ if name in self._extras:
+ return self._extras[name]
+
+ # If lowercase requested, try uppercase in extras
+ upper_name = name.upper()
+ if upper_name in self._extras:
+ return self._extras[upper_name]
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ if hasattr(self, key) and not key.startswith("_"):
+ return getattr(self, key)
+ if key in self._extras:
+ return self._extras[key]
+ raise KeyError(key)
+
+ def get(self, key: str, default: Any = None) -> Any:
+ try:
+ return self[key]
+ except KeyError:
+ return default
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "ConstellationRuntimeConfig":
+ """Create ConstellationRuntimeConfig from dictionary"""
+ known_mappings = {
+ "CONSTELLATION_ID": "constellation_id",
+ "HEARTBEAT_INTERVAL": "heartbeat_interval",
+ "RECONNECT_DELAY": "reconnect_delay",
+ "MAX_CONCURRENT_TASKS": "max_concurrent_tasks",
+ "MAX_STEP": "max_step",
+ "DEVICE_INFO": "device_info",
+ }
+
+ kwargs = {}
+ extras = {}
+
+ for key, value in data.items():
+ if key in known_mappings:
+ kwargs[known_mappings[key]] = value
+ else:
+ extras[key] = value
+
+ instance = cls(**kwargs)
+ instance._extras = extras
+ return instance
+
+
+@dataclass
+class GalaxyAgentConfig:
+ """
+ Galaxy agent configuration wrapper providing typed access.
+ """
+
+ constellation_agent: AgentConfig
+
+ def __getattr__(self, name: str) -> Any:
+ # Provide direct access to CONSTELLATION_AGENT
+ if name.upper() == "CONSTELLATION_AGENT":
+ return self.constellation_agent
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ if key == "CONSTELLATION_AGENT":
+ return self.constellation_agent
+ raise KeyError(key)
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "GalaxyAgentConfig":
+ """Create GalaxyAgentConfig from dictionary"""
+ return cls(
+ constellation_agent=AgentConfig.from_dict(
+ data.get("CONSTELLATION_AGENT", {})
+ )
+ )
+
+
+@dataclass
+class GalaxyConfig:
+ """
+ Complete Galaxy configuration with typed modules + dynamic raw access.
+
+ Provides structured access:
+ - config.agent.CONSTELLATION_AGENT → typed agent config
+ - config.constellation.MAX_STEP → typed constellation config
+ - config["ANY_KEY"] → backward compatible dict access
+ """
+
+ # ========== Typed Module Configs ==========
+ agent: GalaxyAgentConfig
+ constellation: ConstellationRuntimeConfig
+
+ # ========== Raw Dictionary (Backward Compatible) ==========
+ _raw: Dict[str, Any] = field(default_factory=dict, repr=False)
+
+ def __getattr__(self, name: str) -> Any:
+ """Support dynamic attribute access"""
+ if name.startswith("_"):
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ if name in self._raw:
+ value = self._raw[name]
+ if isinstance(value, dict):
+ from config.config_loader import DynamicConfig
+
+ return DynamicConfig(value, name=name)
+ return value
+
+ raise AttributeError(
+ f"'{type(self).__name__}' object has no attribute '{name}'"
+ )
+
+ def __getitem__(self, key: str) -> Any:
+ """Support dict-style access"""
+ return self._raw[key]
+
+ def __contains__(self, key: str) -> bool:
+ """Support 'in' operator"""
+ return key in self._raw
+
+ def get(self, key: str, default: Any = None) -> Any:
+ """Dict-style get with default"""
+ return self._raw.get(key, default)
+
+ def keys(self):
+ return self._raw.keys()
+
+ def items(self):
+ return self._raw.items()
+
+ @classmethod
+ def from_dict(cls, data: Dict[str, Any]) -> "GalaxyConfig":
+ """Create GalaxyConfig from merged configuration dictionary"""
+ return cls(
+ agent=GalaxyAgentConfig.from_dict(data),
+ constellation=ConstellationRuntimeConfig.from_dict(data),
+ _raw=data,
+ )
diff --git a/config/galaxy/agent.yaml.template b/config/galaxy/agent.yaml.template
new file mode 100644
index 000000000..c32b1d84a
--- /dev/null
+++ b/config/galaxy/agent.yaml.template
@@ -0,0 +1,19 @@
+# Galaxy Constellation Agent Configuration
+
+CONSTELLATION_AGENT:
+ REASONING_MODEL: False
+ API_TYPE: "openai" # The API type: "openai" for OpenAI API, "aoai" for Azure OpenAI, "azure_ad" for Azure AD auth
+ API_BASE: "https://api.openai.com/v1/chat/completions" # The API endpoint
+ API_KEY: "YOUR_KEY"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-5-chat-20251003" # Updated from legacy config
+
+ AAD_TENANT_ID: "72f988bf-86f1-41af-91ab-2d7cd011db47"
+ AAD_API_SCOPE: "openai"
+ AAD_API_SCOPE_BASE: "feb7b661-cac7-44a8-8dc1-163b63c23df2"
+
+ # Prompt configurations for constellation agent
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
diff --git a/config/galaxy/constellation.yaml b/config/galaxy/constellation.yaml
new file mode 100644
index 000000000..a39554ad3
--- /dev/null
+++ b/config/galaxy/constellation.yaml
@@ -0,0 +1,15 @@
+# Galaxy Constellation Configuration
+# This configuration defines runtime settings for constellation system
+
+# Constellation Runtime Settings
+CONSTELLATION_ID: "test_constellation"
+HEARTBEAT_INTERVAL: 30.0 # Heartbeat interval in seconds
+RECONNECT_DELAY: 5.0 # Delay before reconnecting in seconds
+MAX_CONCURRENT_TASKS: 6 # Maximum concurrent tasks across the constellation
+MAX_STEP: 15 # Maximum steps per session
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices.yaml" # Path to device configuration file
+
+# Logging Configuration
+LOG_TO_MARKDOWN: true # Whether to save trajectory logs to markdown format
diff --git a/config/galaxy/devices.yaml b/config/galaxy/devices.yaml
new file mode 100644
index 000000000..a470a2954
--- /dev/null
+++ b/config/galaxy/devices.yaml
@@ -0,0 +1,71 @@
+# Device Configuration - YAML Format
+# This configuration defines devices for the constellation
+# Runtime settings (constellation_id, heartbeat_interval, etc.) are configured in constellation.yaml
+
+devices:
+ # - device_id: "windowsagent"
+ # server_url: "ws://localhost:5005/ws"
+ # os: "windows"
+ # capabilities:
+ # - "web_browsing"
+ # - "office_applications"
+ # - "file_management"
+ # - "send emails"
+ # - "any windows tasks"
+ # metadata:
+ # location: "home_office"
+ # os: "windows"
+ # performance: "medium"
+ # description: "Primary development laptop"
+ # operation_engineer_email: "hidan.zhang@gmail.com"
+ # app_log_file: "log_detailed.xlsx"
+ # sheet_name_for_writing_log_in_excel: "report"
+ # sender_name: "Zac"
+ # operation_engineer_name: "Hidan Zhang"
+ # tips: "If you want to use PowerShell, please launch a new PowerShell window to run the commands."
+ # max_retries: 5
+
+ - device_id: "linux_agent_1"
+ server_url: "ws://localhost:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/root/log/log1.txt"
+ dev_path: "/root/dev1/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+ auto_connect: true
+ max_retries: 5
+
+ - device_id: "linux_agent_2"
+ server_url: "ws://localhost:5002/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/root/log/log2.txt"
+ dev_path: "/root/dev2/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+ auto_connect: true
+ max_retries: 5
+
+ - device_id: "linux_agent_3"
+ server_url: "ws://localhost:5003/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/root/log/log3.txt"
+ dev_path: "/root/dev3/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+ auto_connect: true
+ max_retries: 5
diff --git a/config/ufo/agents.yaml.template b/config/ufo/agents.yaml.template
new file mode 100644
index 000000000..dc48414ad
--- /dev/null
+++ b/config/ufo/agents.yaml.template
@@ -0,0 +1,125 @@
+# UFO Agent Configurations
+# All agent configurations for HOST, APP, BACKUP, EVALUATION, and OPERATOR agents
+# Copy this file to agents.yaml and fill in your API credentials
+
+HOST_AGENT:
+ VISUAL_MODE: True # Whether to use the visual mode
+ REASONING_MODEL: False # Whether the model is reasoning model. For OpenAI o1, o3, o4-mini, this field must be set to True.
+ API_TYPE: "openai" # The API type: "openai" for OpenAI API, "aoai" for Azure OpenAI, "azure_ad" for Azure AD auth
+ API_BASE: "https://api.openai.com/v1/chat/completions" # The API endpoint
+ API_KEY: "sk-YOUR_KEY_HERE" # The OpenAI API key, begin with sk-
+ API_VERSION: "2025-02-01-preview" # API version
+ API_MODEL: "gpt-4o" # The model name
+
+ ### Comment above and uncomment these if using "aoai" (Azure OpenAI).
+ # API_TYPE: "aoai"
+ # API_BASE: "https://YOUR_RESOURCE.openai.azure.com" # Format: https://{your-resource-name}.openai.azure.com
+ # API_KEY: "YOUR_AOAI_KEY"
+ # API_VERSION: "2024-02-15-preview"
+ # API_MODEL: "gpt-4o"
+ # API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID" # The deployment id for the AOAI API
+
+ ### For Azure AD authentication (azure_ad)
+ # API_TYPE: "azure_ad"
+ # AAD_TENANT_ID: "YOUR_TENANT_ID" # Set the value to your tenant id for the llm model
+ # AAD_API_SCOPE: "YOUR_SCOPE" # Set the value to your scope for the llm model
+ # AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE
+
+ # Prompt configurations (usually don't need to change)
+ PROMPT: "ufo/prompts/share/base/host_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/host_agent_example.yaml"
+
+APP_AGENT:
+ VISUAL_MODE: True # Whether to use the visual mode
+ REASONING_MODEL: False # Whether the model is reasoning model. For OpenAI o1, o3, o4-mini, this field must be set to True.
+ API_TYPE: "openai" # The API type: "openai" for OpenAI API, "aoai" for Azure OpenAI, "azure_ad" for Azure AD auth
+ API_BASE: "https://api.openai.com/v1/chat/completions" # The API endpoint
+ API_KEY: "sk-YOUR_KEY_HERE" # The OpenAI API key, begin with sk-
+ API_VERSION: "2025-02-01-preview" # API version
+ API_MODEL: "gpt-4o" # The model name
+
+ ### Comment above and uncomment these if using "aoai" (Azure OpenAI).
+ # API_TYPE: "aoai"
+ # API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ # API_KEY: "YOUR_AOAI_KEY"
+ # API_VERSION: "2024-02-15-preview"
+ # API_MODEL: "gpt-4o"
+ # API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+
+ ### For Azure AD authentication (azure_ad)
+ # API_TYPE: "azure_ad"
+ # AAD_TENANT_ID: "YOUR_TENANT_ID"
+ # AAD_API_SCOPE: "YOUR_SCOPE"
+ # AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE"
+
+ # Prompt configurations (usually don't need to change)
+ PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/app_agent_example.yaml"
+ EXAMPLE_PROMPT_AS: "ufo/prompts/examples/{mode}/app_agent_example_as.yaml"
+
+BACKUP_AGENT:
+ VISUAL_MODE: True # Whether to use the visual mode
+ API_TYPE: "openai" # The API type: "openai" for OpenAI API, "aoai" for Azure OpenAI
+ API_BASE: "https://api.openai.com/v1/chat/completions" # The API endpoint
+ API_KEY: "sk-YOUR_KEY_HERE" # The OpenAI API key, begin with sk-
+ API_VERSION: "2024-02-15-preview" # API version
+ API_MODEL: "gpt-4-vision-preview" # The backup model name
+
+ ### Comment above and uncomment these if using "aoai" (Azure OpenAI).
+ # API_TYPE: "aoai"
+ # API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ # API_KEY: "YOUR_AOAI_KEY"
+ # API_VERSION: "2024-02-15-preview"
+ # API_MODEL: "gpt-4-vision-preview"
+ # API_DEPLOYMENT_ID: "gpt-4-visual-preview"
+
+ ### For Azure AD authentication (azure_ad)
+ # API_TYPE: "azure_ad"
+ # AAD_TENANT_ID: "YOUR_TENANT_ID"
+ # AAD_API_SCOPE: "YOUR_SCOPE"
+ # AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE"
+
+EVALUATION_AGENT:
+ VISUAL_MODE: True # Whether to use the visual mode
+ REASONING_MODEL: False # Whether the model is reasoning model. For OpenAI o1, o3, o4-mini, this field must be set to True.
+ API_TYPE: "openai" # The API type: "openai" for OpenAI API, "aoai" for Azure OpenAI
+ API_BASE: "https://api.openai.com/v1/chat/completions" # The API endpoint
+ API_KEY: "sk-YOUR_KEY_HERE" # The OpenAI API key, begin with sk-
+ API_VERSION: "2025-02-01-preview" # API version
+ API_MODEL: "gpt-4o" # The model name
+
+ ### Comment above and uncomment these if using "aoai" (Azure OpenAI).
+ # API_TYPE: "aoai"
+ # API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ # API_KEY: "YOUR_AOAI_KEY"
+ # API_VERSION: "2024-02-15-preview"
+ # API_MODEL: "gpt-4o"
+ # API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+
+ ### For Azure AD authentication (azure_ad)
+ # API_TYPE: "azure_ad"
+ # AAD_TENANT_ID: "YOUR_TENANT_ID"
+ # AAD_API_SCOPE: "YOUR_SCOPE"
+ # AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE"
+
+# Omniparser Configuration (for grounding model)
+OMNIPARSER:
+ ENDPOINT: "http://xxx.xxx.xxx.xxx:xxxx" # The omniparser endpoint, to be filled by the user
+ BOX_THRESHOLD: 0.05 # The box threshold for the omniparser
+ IOU_THRESHOLD: 0.1 # The iou threshold for the omniparser
+ USE_PADDLEOCR: True # Whether to use the paddleocr for the omniparser
+ IMGSZ: 640 # The image size for the omniparser
+
+# GPT Parameters
+MAX_TOKENS: 2000 # The max token limit for the response completion
+MAX_RETRY: 3 # The max retry limit for the response completion
+TEMPERATURE: 0.0 # The temperature of the model: the lower the value, the more consistent the output
+TOP_P: 0.0 # The top_p of the model: the lower the value, the more conservative the output
+TIMEOUT: 60 # The call timeout(s), default is 1 min
+
+# App API Prompt Configuration
+APP_API_PROMPT_ADDRESS:
+ "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
+ "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
+ "msedge.exe": "ufo/prompts/apps/web/api.yaml"
+ "chrome.exe": "ufo/prompts/apps/web/api.yaml"
diff --git a/config/ufo/mcp.yaml b/config/ufo/mcp.yaml
new file mode 100644
index 000000000..442e26142
--- /dev/null
+++ b/config/ufo/mcp.yaml
@@ -0,0 +1,164 @@
+# MCP (Model Context Protocol) Agent Configuration
+# This file defines the agents and their configurations for the MCP servers.
+# The key structure is:
+# AgentName: # The name of the agent, e.g., "AppAgent", "HostAgent", "HardwareAgent"
+# sub_type: # The sub type of the agent, can be "default" or the app root name
+# data_collection: # The data collection server list configuration for the agent
+# - namespace: # The namespace of the server
+# - type: # The type of the server, can be "stdio" or "http"
+# - start_args: # The start arguments for the server (only for stdio)
+# - host: # The host of the server (only for http)
+# - port: # The port of the server (only for http)
+# - path: # The path of the server (only for http)
+# action: # The action configuration server list for the agent
+# ... (same structure as data_collection)
+
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false # Whether to reset the MCP server when switching to a new computer
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+
+AppAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: WordCOMExecutor
+ type: local
+ start_args: []
+ reset: true
+
+ EXCEL.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: ExcelCOMExecutor
+ type: local
+ start_args: []
+ reset: true
+
+ POWERPNT.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: PowerPointCOMExecutor
+ type: local
+ start_args: []
+ reset: true
+
+ explorer.exe:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: PDFReaderExecutor
+ type: local
+ start_args: []
+ reset: true
+
+ConstellationAgent:
+ default:
+ action:
+ - namespace: ConstellationEditor
+ type: local
+ start_args: []
+ reset: false
+
+HardwareAgent:
+ default:
+ data_collection:
+ - namespace: HardwareCollector
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+ action:
+ - namespace: HardwareExecutor
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+
+LinuxAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "localhost"
+ port: 8010
+ path: "/mcp"
+ reset: false
+
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http
+ host: "localhost"
+ port: 8020
+ path: "/mcp"
+ reset: false
+ action:
+ - namespace: MobileActionExecutor
+ type: http
+ host: "localhost"
+ port: 8021
+ path: "/mcp"
+ reset: false
diff --git a/config/ufo/prices.yaml b/config/ufo/prices.yaml
new file mode 100644
index 000000000..caa356965
--- /dev/null
+++ b/config/ufo/prices.yaml
@@ -0,0 +1,76 @@
+# API Pricing Configuration
+# Source: https://openai.com/pricing
+# Prices in $ per 1000 tokens
+# Last updated: 2024-05-13
+
+PRICES:
+ # OpenAI Models
+ "openai/gpt-4-0613": {"input": 0.03, "output": 0.06}
+ "openai/gpt-3.5-turbo-0613": {"input": 0.0015, "output": 0.002}
+ "openai/gpt-4-0125-preview": {"input": 0.01, "output": 0.03}
+ "openai/gpt-4-1106-preview": {"input": 0.01, "output": 0.03}
+ "openai/gpt-4-1106-vision-preview": {"input": 0.01, "output": 0.03}
+ "openai/gpt-4": {"input": 0.03, "output": 0.06}
+ "openai/gpt-4-32k": {"input": 0.06, "output": 0.12}
+ "openai/gpt-4-turbo": {"input": 0.01, "output": 0.03}
+ "openai/gpt-4o": {"input": 0.005, "output": 0.015}
+ "openai/gpt-4o-2024-05-13": {"input": 0.005, "output": 0.015}
+ "openai/gpt-4o-20240513": {"input": 0.0025, "output": 0.01}
+ "openai/gpt-4o-20240806": {"input": 0.0025, "output": 0.01}
+ "openai/gpt-4o-20241120": {"input": 0.0025, "output": 0.01}
+ "openai/gpt-4o-mini-20240718": {"input": 0.00015, "output": 0.0006}
+ "openai/gpt-4.1-2025-04-14": {"input": 0.002, "output": 0.008}
+ "openai/gpt-3.5-turbo-0125": {"input": 0.0005, "output": 0.0015}
+ "openai/gpt-3.5-turbo-1106": {"input": 0.001, "output": 0.002}
+ "openai/gpt-3.5-turbo-instruct": {"input": 0.0015, "output": 0.002}
+ "openai/gpt-3.5-turbo-16k-0613": {"input": 0.003, "output": 0.004}
+ "openai/o1": {"input": 0.015, "output": 0.060}
+ "openai/o1-mini": {"input": 0.0011, "output": 0.0044}
+ "openai/o1-mini-2024-09-12": {"input": 0.0011, "output": 0.0044}
+ "openai/o1-pro": {"input": 0.150, "output": 0.600}
+ "openai/o1-pro-2025-03-19": {"input": 0.150, "output": 0.600}
+ "openai/o4-mini": {"input": 0.0011, "output": 0.0044}
+ "openai/o4-mini-2025-04-16": {"input": 0.0011, "output": 0.0044}
+ "openai/whisper-1": {"input": 0.006, "output": 0.006}
+ "openai/tts-1": {"input": 0.015, "output": 0.015}
+ "openai/tts-hd-1": {"input": 0.03, "output": 0.03}
+ "openai/text-embedding-ada-002-v2": {"input": 0.0001, "output": 0.0001}
+ "openai/text-davinci:003": {"input": 0.02, "output": 0.02}
+ "openai/text-ada-001": {"input": 0.0004, "output": 0.0004}
+
+ # Azure Models
+ "azure/gpt-35-turbo-20220309": {"input": 0.0015, "output": 0.002}
+ "azure/gpt-35-turbo-20230613": {"input": 0.0015, "output": 0.002}
+ "azure/gpt-35-turbo-16k-20230613": {"input": 0.003, "output": 0.004}
+ "azure/gpt-35-turbo-1106": {"input": 0.001, "output": 0.002}
+ "azure/gpt-4-20230321": {"input": 0.03, "output": 0.06}
+ "azure/gpt-4-32k-20230321": {"input": 0.06, "output": 0.12}
+ "azure/gpt-4-1106-preview": {"input": 0.01, "output": 0.03}
+ "azure/gpt-4-0125-preview": {"input": 0.01, "output": 0.03}
+ "azure/gpt-4-visual-preview": {"input": 0.01, "output": 0.03}
+ "azure/gpt-4-turbo-20240409": {"input": 0.01, "output": 0.03}
+ "azure/gpt-4o": {"input": 0.005, "output": 0.015}
+ "azure/gpt-4o-20240513": {"input": 0.0025, "output": 0.01}
+ "azure/gpt-4o-20240806": {"input": 0.0025, "output": 0.01}
+ "azure/gpt-4o-20241120": {"input": 0.0025, "output": 0.01}
+ "azure/gpt-4o-mini-20240718": {"input": 0.00015, "output": 0.0006}
+ "azure/gpt-4.1-20250414": {"input": 0.002, "output": 0.008}
+ "azure/o1-20241217": {"input": 0.015, "output": 0.060}
+ "azure/o1-mini-20240912": {"input": 0.0011, "output": 0.0044}
+ "azure/o3-20250416": {"input": 0.010, "output": 0.040}
+ "azure/o3-mini-20250416": {"input": 0.0011, "output": 0.0044}
+ "azure/o4-mini-20250416": {"input": 0.0011, "output": 0.0044}
+
+ # Other Providers
+ "qwen/qwen-vl-plus": {"input": 0.008, "output": 0.008}
+ "qwen/qwen-vl-max": {"input": 0.02, "output": 0.02}
+ "qwen/qwen-omni-turbo": {"input": 0.0002, "output": 0.0006}
+ "gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105}
+ "gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105}
+ "gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015}
+ "gemini/gemini-2.5-flash-preview-04-17": {"input": 0.00015, "output": 0.0035}
+ "gemini/gemini-2.5-pro-preview-03-25": {"input": 0.000125, "output": 0.01}
+ "gemini/gemini-2.5-pro-exp-03-25": {"input": 0.0, "output": 0.0}
+ "claude/claude-3-5-sonnet-20241022": {"input": 0.0003, "output": 0.0015}
+ "claude/claude-3-5-sonnet": {"input": 0.0003, "output": 0.0015}
+ "claude/claude-3-5-opus": {"input": 0.0015, "output": 0.0075}
diff --git a/config/ufo/rag.yaml b/config/ufo/rag.yaml
new file mode 100644
index 000000000..9dd040318
--- /dev/null
+++ b/config/ufo/rag.yaml
@@ -0,0 +1,26 @@
+# RAG (Retrieval Augmented Generation) Configuration
+
+# Offline Documentation RAG
+RAG_OFFLINE_DOCS: False # Whether to use the offline RAG
+RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1 # The topk for the offline retrieved documents
+
+# Online Search RAG
+BING_API_KEY: "a5f1dec156334648a2354fabb221ffff" # The Bing search API key
+RAG_ONLINE_SEARCH: False # Whether to use the online search for the RAG
+RAG_ONLINE_SEARCH_TOPK: 5 # The topk for the online search
+RAG_ONLINE_RETRIEVED_TOPK: 1 # The topk for the online retrieved documents
+
+# Experience RAG
+RAG_EXPERIENCE: False # Whether to use the experience RAG
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5 # The topk for the experience retrieved documents
+EXPERIENCE_SAVED_PATH: "vectordb/experience/" # The path to save experience
+
+# Demonstration RAG
+RAG_DEMONSTRATION: False # Whether to use the RAG from user demonstration
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5 # The topk for the demonstration retrieved documents
+RAG_DEMONSTRATION_COMPLETION_N: 3 # The number of completion choices for the demonstration result
+DEMONSTRATION_SAVED_PATH: "vectordb/demonstration/" # The path to save demonstration
+
+# Prompts for RAG
+EXPERIENCE_PROMPT: "ufo/prompts/experience/experience_summary.yaml"
+DEMONSTRATION_PROMPT: "ufo/prompts/demonstration/demonstration_summary.yaml"
diff --git a/config/ufo/system.yaml b/config/ufo/system.yaml
new file mode 100644
index 000000000..4715be136
--- /dev/null
+++ b/config/ufo/system.yaml
@@ -0,0 +1,118 @@
+# UFO System Configuration
+
+# LLM Parameters
+MAX_TOKENS: 2000 # The max token limit for the response completion
+MAX_RETRY: 20 # The max retry limit for the response completion
+TEMPERATURE: 0.0 # The temperature of the model: the lower the value, the more consistent
+TOP_P: 0.0 # The top_p of the model: the lower the value, the more conservative
+TIMEOUT: 60 # The call timeout(s), default is 1 mins
+
+# Control Backend
+CONTROL_BACKEND: ["uia"] # The backend for control action: uia, omniparser
+IOU_THRESHOLD_FOR_MERGE: 0.1 # The iou threshold for merging the boxes between controls
+
+# Execution Limits
+MAX_STEP: 50 # The max step limit for completing the user request
+MAX_ROUND: 1 # The max round limit for completing the user request
+SLEEP_TIME: 1 # The sleep time between each step to wait for the window to be ready
+RECTANGLE_TIME: 1
+
+# Action Configuration
+ACTION_SEQUENCE: False # Whether to output the action sequence (from legacy config)
+SHOW_VISUAL_OUTLINE_ON_SCREEN: False # Skip rendering visual outline on screen if not necessary
+MAXIMIZE_WINDOW: False # Whether to maximize the application window before the action
+JSON_PARSING_RETRY: 3 # The retry times for the json parsing
+
+# Safety
+SAFE_GUARD: False # Whether to use the safe guard to prevent sensitive operations (from legacy config)
+CONTROL_LIST: ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "Spinner", "CheckBox", "Group", "Text"]
+
+# History
+HISTORY_KEYS: ["step", "subtask", "action_representation", "user_confirm"]
+
+# Annotation
+ANNOTATION_COLORS:
+ "Button": "#FFF68F"
+ "Edit": "#A5F0B5"
+ "TabItem": "#A5E7F0"
+ "Document": "#FFD18A"
+ "ListItem": "#D9C3FE"
+ "MenuItem": "#E7FEC3"
+ "ScrollBar": "#FEC3F8"
+ "TreeItem": "#D6D6D6"
+ "Hyperlink": "#91FFEB"
+ "ComboBox": "#D8B6D4"
+
+HIGHLIGHT_BBOX: True
+ANNOTATION_FONT_SIZE: 22
+
+# Control Actions
+CLICK_API: "click_input" # The click API
+AFTER_CLICK_WAIT: 0 # The wait time after clicking in seconds
+INPUT_TEXT_API: "type_keys" # The input text API: type_keys or set_text
+INPUT_TEXT_ENTER: False # Whether to press enter after typing the text
+INPUT_TEXT_INTER_KEY_PAUSE: 0.05 # The pause time between each key press
+
+
+# Logging
+PRINT_LOG: False # Whether to print the log
+CONCAT_SCREENSHOT: False # Whether to concat the screenshot for the control item
+LOG_LEVEL: "DEBUG" # The log level
+INCLUDE_LAST_SCREENSHOT: True # Whether to include the last screenshot in the observation
+REQUEST_TIMEOUT: 250 # The call timeout for the GPT-V model
+LOG_XML: False # Whether to log the xml file at every step
+LOG_TO_MARKDOWN: True # Whether to save the log to markdown file
+SCREENSHOT_TO_MEMORY: True # Whether to allow the screenshot to memory
+
+# Image Performance
+DEFAULT_PNG_COMPRESS_LEVEL: 1 # The compress level for PNG image, 0-9
+
+# Save Options
+SAVE_UI_TREE: False # Whether to save the UI tree at each step
+SAVE_FULL_SCREEN: False # Whether to save the full screen at each step
+
+# Task Management
+TASK_STATUS: True # Whether to record the status of the tasks in batch execution mode
+SAVE_EXPERIENCE: "always_not" # always, always_not, ask, auto
+
+# Evaluation
+EVA_SESSION: True # Whether to include the session in the evaluation
+EVA_ROUND: False
+EVA_ALL_SCREENSHOTS: True # Whether to include all the screenshots in the evaluation
+
+# Customization
+ASK_QUESTION: False # Whether to allow the agent to ask questions
+USE_CUSTOMIZATION: False # Whether to use the customization
+QA_PAIR_FILE: "customization/global_memory.jsonl"
+QA_PAIR_NUM: 20 # The number of QA pairs for the customization
+
+# Omniparser
+OMNIPARSER:
+ ENDPOINT: "https://aeb8ef731536d2d6c2.gradio.live"
+ BOX_THRESHOLD: 0.05
+ IOU_THRESHOLD: 0.1
+ USE_PADDLEOCR: True
+ IMGSZ: 640
+
+# Control Filtering Configuration
+CONTROL_FILTER_TYPE: [] # List of control filter types: 'TEXT', 'SEMANTIC', 'ICON'
+CONTROL_FILTER_TOP_K_PLAN: 2 # Control filter effect on top k plans from UFO
+CONTROL_FILTER_TOP_K_SEMANTIC: 15 # Control filter top k for semantic similarity
+CONTROL_FILTER_TOP_K_ICON: 15 # Control filter top k for icon similarity
+CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2" # Semantic similarity model
+CONTROL_FILTER_MODEL_ICON_NAME: "clip-ViT-B-32" # Icon similarity model
+
+
+# MCP (Model Context Protocol) Integration
+USE_MCP: True # Whether to enable MCP integration for tool execution
+MCP_SERVERS_CONFIG: "config/ufo/mcp.yaml" # Path to MCP servers configuration (updated to new path)
+MCP_PREFERRED_APPS: ["POWERPNT.EXE", "WINWORD.EXE", "EXCEL.EXE", "powerpoint", "word", "excel", "web", "shell", "hardware", "hardwareagent"]
+MCP_FALLBACK_TO_UI: True # Whether to fallback to UI automation if MCP execution fails
+MCP_INSTRUCTIONS_PATH: "ufo/config/mcp_instructions" # Path to MCP instructions files
+MCP_TOOL_TIMEOUT: 30 # Timeout in seconds for MCP tool execution
+MCP_LOG_EXECUTION: False # Whether to log MCP tool execution details
+
+
+
+# Enabled Third-Party Agents
+ENABLED_THIRD_PARTY_AGENTS: ["HardwareAgent", "LinuxAgent"]
diff --git a/config/ufo/third_party.yaml b/config/ufo/third_party.yaml
new file mode 100644
index 000000000..dd39c6874
--- /dev/null
+++ b/config/ufo/third_party.yaml
@@ -0,0 +1,27 @@
+# Third-Party Agent Integration Configuration
+# This file configures external/third-party agents that extend UFO's capabilities
+# beyond the core Windows GUI automation
+
+# Enabled Third-Party Agents
+ENABLED_THIRD_PARTY_AGENTS: ["HardwareAgent", "LinuxAgent", "MobileAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ HardwareAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "HardwareAgent"
+ APPAGENT_PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/examples/visual/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/third_party/hardware_agent_api.yaml"
+ INTRODUCTION: "The HardwareAgent is used to manipulate hardware components of the computer without using GUI, such as robotic arms for keyboard input and mouse control, plug and unplug devices such as USB drives, and other hardware-related tasks."
+
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+
+ MobileAgent:
+ AGENT_NAME: "MobileAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/mobile_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/mobile_agent_example.yaml"
+ INTRODUCTION: "For Android Mobile Device Control. Enables remote control and automation of Android devices via ADB and UI interactions."
diff --git a/dataflow/config/config.py b/dataflow/config/config.py
index 4760ad0c2..d7fd20ae9 100644
--- a/dataflow/config/config.py
+++ b/dataflow/config/config.py
@@ -1,7 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
-from ufo.config.config import Config
+from ufo.config import Config
class Config(Config):
diff --git a/dataflow/env/env_manager.py b/dataflow/env/env_manager.py
index a74ebef18..18ecdb711 100644
--- a/dataflow/env/env_manager.py
+++ b/dataflow/env/env_manager.py
@@ -1,15 +1,23 @@
import logging
+import platform
import re
from time import sleep
-from typing import Optional, Tuple, Dict
+from typing import Optional, Tuple, Dict, TYPE_CHECKING, Any
import psutil
from fuzzywuzzy import fuzz
-from pywinauto import Desktop
-from pywinauto.controls.uiawrapper import UIAWrapper
+
+# Conditional imports for Windows-specific packages
+if TYPE_CHECKING or platform.system() == "Windows":
+ from pywinauto import Desktop
+ from pywinauto.controls.uiawrapper import UIAWrapper
+else:
+ Desktop = None
+ UIAWrapper = Any
from dataflow.config.config import Config
-from ufo.config.config import Config as UFOConfig
+from ufo.config import Config as UFOConfig
+from aip.messages import ControlInfo
# Load configuration settings
_configs = Config.get_instance().config_data
@@ -134,14 +142,14 @@ def _match_window_name(self, window_title: str, doc_name: str) -> bool:
logging.exception(f"Unknown match strategy: {_MATCH_STRATEGY}")
raise ValueError(f"Unknown match strategy: {_MATCH_STRATEGY}")
- def _calculate_match_score(self, control, control_text) -> int:
+ def _calculate_match_score(self, control: ControlInfo, control_text: str) -> int:
"""
Calculate the match score between a control and the given text.
:param control: The control object to evaluate.
:param control_text: The target text to match.
:return: An integer score representing the match quality (higher is better).
"""
- control_content = control.window_text() or ""
+ control_content = control.text_content or ""
# Matching strategies
if _MATCH_STRATEGY == "contains":
@@ -155,8 +163,8 @@ def _calculate_match_score(self, control, control_text) -> int:
raise ValueError(f"Unknown match strategy: {_MATCH_STRATEGY}")
def find_matching_controller(
- self, filtered_annotation_dict: Dict[int, UIAWrapper], control_text: str
- ) -> Tuple[str, UIAWrapper]:
+ self, filtered_annotation_dict: Dict[int, ControlInfo], control_text: str
+ ) -> Tuple[str, ControlInfo]:
""" "
Select the best matched controller.
:param filtered_annotation_dict: The filtered annotation dictionary.
diff --git a/dataflow/execution/workflow/execute_flow.py b/dataflow/execution/workflow/execute_flow.py
index 7d0d69624..936a6c986 100644
--- a/dataflow/execution/workflow/execute_flow.py
+++ b/dataflow/execution/workflow/execute_flow.py
@@ -9,9 +9,8 @@
from ufo import utils
from ufo.agents.processors.app_agent_processor import AppAgentProcessor
from ufo.automator.app_apis.basic import WinCOMReceiverBasic
-from ufo.config.config import Config as UFOConfig
+from ufo.config import Config as UFOConfig
from ufo.module.basic import BaseSession, Context, ContextNames
-from ufo.automator.ui_control.screenshot import PhotographerDecorator
_configs = InstantiationConfig.get_instance().config_data
_ufo_configs = UFOConfig.get_instance().config_data
@@ -335,7 +334,7 @@ def execute_action(self) -> None:
control_selected.draw_outline(colour="red", thickness=3)
time.sleep(_ufo_configs.get("RECTANGLE_TIME", 0))
- control_coordinates = PhotographerDecorator.coordinate_adjusted(
+ control_coordinates = utils.coordinate_adjusted(
self.application_window.rectangle(), control_selected.rectangle()
)
diff --git a/dataflow/instantiation/agent/filter_agent.py b/dataflow/instantiation/agent/filter_agent.py
index 239fad5ad..256b12db6 100644
--- a/dataflow/instantiation/agent/filter_agent.py
+++ b/dataflow/instantiation/agent/filter_agent.py
@@ -74,7 +74,7 @@ def message_constructor(self, request: str, app: str) -> List[str]:
return filter_agent_prompt_message
- def process_comfirmation(self) -> None:
+ def process_confirmation(self) -> None:
"""
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
diff --git a/dataflow/instantiation/agent/prefill_agent.py b/dataflow/instantiation/agent/prefill_agent.py
index 33e3a3f5b..406f9b652 100644
--- a/dataflow/instantiation/agent/prefill_agent.py
+++ b/dataflow/instantiation/agent/prefill_agent.py
@@ -41,7 +41,9 @@ def __init__(
)
self._process_name = process_name
- def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str) -> str:
+ def get_prompter(
+ self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str
+ ) -> str:
"""
Get the prompt for the agent.
This is the abstract method from BasicAgent that needs to be implemented.
@@ -83,10 +85,10 @@ def message_constructor(
return appagent_prompt_message
- def process_comfirmation(self) -> None:
+ def process_confirmation(self) -> None:
"""
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
"""
- pass
\ No newline at end of file
+ pass
diff --git a/dataflow/instantiation/agent/template_agent.py b/dataflow/instantiation/agent/template_agent.py
index be9bf1946..cdb4aa632 100644
--- a/dataflow/instantiation/agent/template_agent.py
+++ b/dataflow/instantiation/agent/template_agent.py
@@ -78,7 +78,7 @@ def message_constructor(
return appagent_prompt_message
- def process_comfirmation(self) -> None:
+ def process_confirmation(self) -> None:
"""
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
diff --git a/dataflow/instantiation/workflow/prefill_flow.py b/dataflow/instantiation/workflow/prefill_flow.py
index 4ee4686ed..c85b213a3 100644
--- a/dataflow/instantiation/workflow/prefill_flow.py
+++ b/dataflow/instantiation/workflow/prefill_flow.py
@@ -11,7 +11,7 @@
from ufo.automator.ui_control.inspector import ControlInspectorFacade
from ufo.automator.ui_control.screenshot import PhotographerFacade
from ufo.module.basic import BaseSession
-from ufo.config.config import Config as UFOConfig
+from ufo.config import Config as UFOConfig
_configs = Config.get_instance().config_data
_ufo_configs = UFOConfig.get_instance().config_data
diff --git a/dataflow/prompter/instantiation/filter_prompter.py b/dataflow/prompter/instantiation/filter_prompter.py
index fd1d4b783..86ce59b47 100644
--- a/dataflow/prompter/instantiation/filter_prompter.py
+++ b/dataflow/prompter/instantiation/filter_prompter.py
@@ -60,7 +60,7 @@ def api_prompt_helper(self, apis: Dict = {}, verbose: int = 1) -> str:
api_list.append(api_text)
- api_prompt = self.retrived_documents_prompt_helper("", "", api_list)
+ api_prompt = self.retrieved_documents_prompt_helper("", "", api_list)
else:
api_list = [
"- The action type are limited to {actions}.".format(
@@ -76,7 +76,7 @@ def api_prompt_helper(self, apis: Dict = {}, verbose: int = 1) -> str:
)
api_list.append(api_text)
- api_prompt = self.retrived_documents_prompt_helper("", "", api_list)
+ api_prompt = self.retrieved_documents_prompt_helper("", "", api_list)
return api_prompt
@@ -157,4 +157,4 @@ def examples_prompt_helper(
example_list += [json.dumps(example) for example in additional_examples]
- return self.retrived_documents_prompt_helper(header, separator, example_list)
+ return self.retrieved_documents_prompt_helper(header, separator, example_list)
diff --git a/dataflow/prompter/instantiation/prefill_prompter.py b/dataflow/prompter/instantiation/prefill_prompter.py
index d98e409c8..32bdfe0af 100644
--- a/dataflow/prompter/instantiation/prefill_prompter.py
+++ b/dataflow/prompter/instantiation/prefill_prompter.py
@@ -59,7 +59,7 @@ def api_prompt_helper(self, verbose: int = 1) -> str:
api_list.append(api_text)
- api_prompt = self.retrived_documents_prompt_helper("", "", api_list)
+ api_prompt = self.retrieved_documents_prompt_helper("", "", api_list)
return api_prompt
@@ -176,4 +176,4 @@ def examples_prompt_helper(
example_list += [json.dumps(example) for example in additional_examples]
- return self.retrived_documents_prompt_helper(header, separator, example_list)
+ return self.retrieved_documents_prompt_helper(header, separator, example_list)
diff --git a/dataflow/tasks/prefill/bulleted.json b/dataflow/tasks/prefill/bulleted.json
deleted file mode 100644
index 237b68eb1..000000000
--- a/dataflow/tasks/prefill/bulleted.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
- "app": "word",
- "unique_id": "5",
- "task": "Turning lines of text into a bulleted list in Word",
- "refined_steps": [
- "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
- "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
- ]
-}
\ No newline at end of file
diff --git a/dataflow/tasks/prefill/watermark.json b/dataflow/tasks/prefill/watermark.json
deleted file mode 100644
index ef14611f2..000000000
--- a/dataflow/tasks/prefill/watermark.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "app": "word",
- "unique_id": "108",
- "task": "Put a watermark on all pages in Word for Office 365",
- "refined_steps": [
- "1.On the **Design** tab, select **Watermark**.",
- "2.Select **Custom Watermark**.",
- "3.Choose **Text watermark** and type your text in the **Text** box.",
- "4.Select **OK**."
- ]
-}
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/batch_mode.md b/documents/docs/advanced_usage/batch_mode.md
deleted file mode 100644
index 054d2b74c..000000000
--- a/documents/docs/advanced_usage/batch_mode.md
+++ /dev/null
@@ -1,67 +0,0 @@
-# Batch Mode
-
-Batch mode is a feature of UFO, the agent allows batch automation of tasks.
-
-## Quick Start
-
-### Step 1: Create a Plan file
-
-Before starting the Batch mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
-
-| Field | Description | Type |
-| ------ | -------------------------------------------------------------------------------------------- | ------- |
-| task | The task description. | String |
-| object | The application or file to interact with. | String |
-| close | Determines whether to close the corresponding application or file after completing the task. | Boolean |
-
-Below is an example of a plan file:
-
-```json
-{
- "task": "Type in a text of 'Test For Fun' with heading 1 level",
- "object": "draft.docx",
- "close": False
-}
-```
-
-!!! note
- The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Batch mode.
- The structure of your files should be as follows, where `tasks` is the directory for your tasks and `files` is where your object files are stored:
-
- - Parent
- - tasks
- - files
-
-
-### Step 2: Start the Batch Mode
-To start the Batch mode, run the following command:
-
-```bash
-# assume you are in the cloned UFO folder
-python ufo.py --task_name {task_name} --mode batch_normal --plan {plan_file}
-```
-
-!!! tip
- Replace `{task_name}` with the name of the task and `{plan_file}` with the `Path_to_Parent/Plan_file`.
-
-
-
-## Evaluation
-You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.
-
-You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.
-
-# References
-The batch mode employs a `PlanReader` to parse the plan file and create a `FromFileSession` to follow the plan.
-
-## PlanReader
-The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.
-
-:::module.sessions.plan_reader.PlanReader
-
-
-## FollowerSession
-
-The `FromFileSession` is also located in the `ufo/module/sessions/session.py` file.
-
-:::module.sessions.session.FromFileSession
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_detection/hybrid_detection.md b/documents/docs/advanced_usage/control_detection/hybrid_detection.md
deleted file mode 100644
index f205e0f75..000000000
--- a/documents/docs/advanced_usage/control_detection/hybrid_detection.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# Hybrid Detection
-
-We also support hybrid control detection using both UIA and OmniParser-v2. This method is useful for detecting standard controls in the application using the UI Automation (UIA) framework, and for detecting custom controls in the application that may not be recognized by standard UIA methods. The visually detected controls are merged with the UIA controls by removing the duplicate controls based on IOU. We illustrate the hybrid control detection in the figure below:
-
-
-
-
-
-
-## Configuration
-
-
-Before using the hybrid control detection, you need to deploy and configure the OmniParser model. You can refer to the [OmniParser deployment](./visual_detection.md) for more details.
-
-To activate the icon control filtering, you need to set `CONTROL_BACKEND` to `["uia", "omniparser"]` in the `config_dev.yaml` file.
-
-```yaml
-CONTROL_BACKEND: ["uia", "omniparser"]
-```
-
-
-# Reference
-The following classes are used for visual control detection in OmniParser:
-
-
-:::automator.ui_control.grounding.omniparser.OmniparserGrounding
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_detection/overview.md b/documents/docs/advanced_usage/control_detection/overview.md
deleted file mode 100644
index 63cdd3fd6..000000000
--- a/documents/docs/advanced_usage/control_detection/overview.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Control Detection
-
-
-We support different control detection methods to detect the controls in the application to accommodate both standard (UIA) and custom controls (Visual). The control detection methods include:
-
-| Mechanism | Description |
-|-----------|-------------|
-| [**UIA**](./uia_detection.md) | The UI Automation (UIA) framework is used to detect standard controls in the application. It provides a set of APIs to access and manipulate the UI elements in Windows applications. |
-| [**Visual**](./visual_detection.md) | The visual control detection method uses OmniParser visual detection to detect custom controls in the application. It uses computer vision techniques to identify and interact with the UI elements based on their visual appearance. |
-| [**Hybrid**](./hybrid_detection.md) | The hybrid control detection method combines both UIA and visual detection methods to detect the controls in the application. It first tries to use the UIA method, and if it fails, it falls back to the visual method. |
-
-
-
-## Configuration
-To configure the control detection method, you can set the `CONTROL_BACKEND` parameter in the `config_dev.yaml` file. The available options are `uia`, and `onmiparser`. If you want to use the hybrid method, you can set it to `["uia", "onmiparser"]`.
-
-```yaml
-CONTROL_BACKEND: ["uia"]
-```
-
diff --git a/documents/docs/advanced_usage/control_detection/uia_detection.md b/documents/docs/advanced_usage/control_detection/uia_detection.md
deleted file mode 100644
index 980473aa3..000000000
--- a/documents/docs/advanced_usage/control_detection/uia_detection.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# UIA Control Detection
-
-UIA control detection is a method to detect standard controls in the application using the UI Automation (UIA) framework. It provides a set of APIs to access and manipulate the UI elements in Windows applications.
-
-!!! note
- The UIA control detection may fail to detect non-standard controls or custom controls in the application.
-
-## Configuration
-
-To activate the icon control filtering, you need to set `CONTROL_BACKEND` to `["uia"]` in the `config_dev.yaml` file.
-
-```yaml
-CONTROL_BACKEND: ["uia"]
-```
-
-
-# Reference
-
-:::automator.ui_control.inspector.ControlInspectorFacade
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_detection/visual_detection.md b/documents/docs/advanced_usage/control_detection/visual_detection.md
deleted file mode 100644
index 3d355b6d4..000000000
--- a/documents/docs/advanced_usage/control_detection/visual_detection.md
+++ /dev/null
@@ -1,52 +0,0 @@
-# Visual Control Detection (OmniParser)
-
-We also support visual control detection using [OmniParser-v2](https://github.com/microsoft/OmniParser). This method is useful for detecting custom controls in the application that may not be recognized by standard UIA methods. The visual control detection uses computer vision techniques to identify and interact with the UI elements based on their visual appearance.
-
-
-## Deployment
-
-On your remote GPU server, clone the OmniParser repository
-```bash
-git clone https://github.com/microsoft/OmniParser.git
-```
-
-Start `omniparserserver` service
-```bash
-cd OmniParser/omnitool/omniparserserver
-python gradio_demo.py
-```
-
-This will give you a short URL
-```
-* Running on local URL: http://0.0.0.0:7861
-* Running on public URL: https://xxxxxxxxxxxxxxxxxx.gradio.live
-```
-
-> Note: If you have any questions regarding the deployment of OmniParser, please take a look at the [README](https://github.com/microsoft/OmniParser/tree/master/omnitool) from OmniParser repo.
-
-## Configuration
-
-After deploying the OmniParser model, you need to configure the OmniParser settings in the `config.yaml` file:
-
-```yaml
-OMNIPARSER: {
- ENDPOINT: "", # The endpoint for the omniparser deployment
- BOX_THRESHOLD: 0.05, # The box confidence threshold for the omniparser, default is 0.05
- IOU_THRESHOLD: 0.1, # The iou threshold for the omniparser, default is 0.1
- USE_PADDLEOCR: True, # Whether to use the paddleocr for the omniparser
- IMGSZ: 640 # The image size for the omniparser
-}
-```
-
-To activate the icon control filtering, you need to set `CONTROL_BACKEND` to `["omniparser"]` in the `config_dev.yaml` file.
-
-```yaml
-CONTROL_BACKEND: ["omniparser"]
-```
-
-
-# Reference
-The following classes are used for visual control detection in OmniParser:
-
-
-:::automator.ui_control.grounding.omniparser.OmniparserGrounding
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_filtering/icon_filtering.md b/documents/docs/advanced_usage/control_filtering/icon_filtering.md
deleted file mode 100644
index 422fecbf6..000000000
--- a/documents/docs/advanced_usage/control_filtering/icon_filtering.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Icon Filter
-
-The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.
-
-## Configuration
-
-To activate the icon control filtering, you need to add `ICON` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed icon control filter configuration in the `config_dev.yaml` file:
-
-- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add `ICON` to the list.
-- `CONTROL_FILTER_TOP_K_ICON`: The number of controls to keep after filtering.
-- `CONTROL_FILTER_MODEL_ICON_NAME`: The control filter model name for icon similarity. By default, it is set to "clip-ViT-B-32".
-
-
-# Reference
-
-:::automator.ui_control.control_filter.IconControlFilter
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_filtering/overview.md b/documents/docs/advanced_usage/control_filtering/overview.md
deleted file mode 100644
index 98daa4ada..000000000
--- a/documents/docs/advanced_usage/control_filtering/overview.md
+++ /dev/null
@@ -1,22 +0,0 @@
-# Control Filtering
-
-There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task.
-
-Execept for configuring the control types for selection on `CONTROL_LIST` in `config_dev.yaml`, UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods:
-
-| Filtering Method | Description |
-|------------------|-------------|
-| [`Text`](./text_filtering.md) | Filter the controls based on the control text. |
-| [`Semantic`](./semantic_filtering.md) | Filter the controls based on the semantic similarity. |
-| [`Icon`](./icon_filtering.md) | Filter the controls based on the control icon image. |
-
-
-## Configuration
-You can activate the control filtering by setting the `CONTROL_FILTER` in the `config_dev.yaml` file. The `CONTROL_FILTER` is a list of filtering methods that you want to apply to the controls, which can be `TEXT`, `SEMANTIC`, or `ICON`.
-
-You can configure multiple filtering methods in the `CONTROL_FILTER` list.
-
-# Reference
-The implementation of the control filtering is base on the `BasicControlFilter` class located in the `ufo/automator/ui_control/control_filter.py` file. Concrete filtering class inherit from the `BasicControlFilter` class and implement the `control_filter` method to filter the controls based on the specific filtering method.
-
-:::automator.ui_control.control_filter.BasicControlFilter
diff --git a/documents/docs/advanced_usage/control_filtering/semantic_filtering.md b/documents/docs/advanced_usage/control_filtering/semantic_filtering.md
deleted file mode 100644
index 078bfa036..000000000
--- a/documents/docs/advanced_usage/control_filtering/semantic_filtering.md
+++ /dev/null
@@ -1,15 +0,0 @@
-# Sematic Control Filter
-
-The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings.
-
-## Configuration
-
-To activate the semantic control filtering, you need to add `SEMANTIC` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed sematic control filter configuration in the `config_dev.yaml` file:
-
-- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add `SEMANTIC` to the list.
-- `CONTROL_FILTER_TOP_K_SEMANTIC`: The number of controls to keep after filtering.
-- `CONTROL_FILTER_MODEL_SEMANTIC_NAME`: The control filter model name for semantic similarity. By default, it is set to "all-MiniLM-L6-v2".
-
-# Reference
-
-:::automator.ui_control.control_filter.SemanticControlFilter
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/control_filtering/text_filtering.md b/documents/docs/advanced_usage/control_filtering/text_filtering.md
deleted file mode 100644
index 53f845004..000000000
--- a/documents/docs/advanced_usage/control_filtering/text_filtering.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Text Control Filter
-
-The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.
-
-## Configuration
-
-To activate the text control filtering, you need to add `TEXT` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed text control filter configuration in the `config_dev.yaml` file:
-
-- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add `TEXT` to the list.
-- `CONTROL_FILTER_TOP_K_PLAN`: The number of agent's plan keywords or phrases to use for filtering the controls.
-
-
-
-# Reference
-
-:::automator.ui_control.control_filter.TextControlFilter
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/customization.md b/documents/docs/advanced_usage/customization.md
deleted file mode 100644
index 8c38259f1..000000000
--- a/documents/docs/advanced_usage/customization.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Customization
-
-Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.
-
-## Scenario
-
-Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.
-
-
-## Implementation
-We currently implement the customization feature in the `HostAgent` class. When the `HostAgent` needs additional information, it will transit to the `PENDING` state and ask the user for the information. The user will provide the information, and the `HostAgent` will save it in the local memory base for future reference. The saved information is stored in the `blackboard` and can be accessed by all agents in the session.
-
-!!! note
- The customization memory base is only saved in a **local file**. These information will **not** upload to the cloud or any other storage to protect the user's privacy.
-
-## Configuration
-
-You can configure the customization feature by setting the following field in the `config_dev.yaml` file.
-
-| Configuration Option | Description | Type | Default Value |
-|------------------------|----------------------------------------------|---------|---------------------------------------|
-| `USE_CUSTOMIZATION` | Whether to enable the customization. | Boolean | True |
-| `QA_PAIR_FILE` | The path for the historical QA pairs. | String | "customization/historical_qa.txt" |
-| `QA_PAIR_NUM` | The number of QA pairs for the customization.| Integer | 20 |
diff --git a/documents/docs/advanced_usage/follower_mode.md b/documents/docs/advanced_usage/follower_mode.md
deleted file mode 100644
index 448995723..000000000
--- a/documents/docs/advanced_usage/follower_mode.md
+++ /dev/null
@@ -1,83 +0,0 @@
-# Follower Mode
-
-The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an `FollowerAgent` that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.
-
-## Quick Start
-
-### Step 1: Create a Plan file
-
-Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
-
-| Field | Description | Type |
-| --- | --- | --- |
-| task | The task description. | String |
-| steps | The list of steps for the agent to follow. | List of Strings |
-| object | The application or file to interact with. | String |
-
-Below is an example of a plan file:
-
-```json
-{
- "task": "Type in a text of 'Test For Fun' with heading 1 level",
- "steps":
- [
- "1.type in 'Test For Fun'",
- "2.Select the 'Test For Fun' text",
- "3.Click 'Home' tab to show the 'Styles' ribbon tab",
- "4.Click 'Styles' ribbon tab to show the style 'Heading 1'",
- "5.Click 'Heading 1' style to apply the style to the selected text"
- ],
- "object": "draft.docx"
-}
-```
-
-!!! note
- The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Follower mode.
-
-
-### Step 2: Start the Follower Mode
-To start the Follower mode, run the following command:
-
-```bash
-# assume you are in the cloned UFO folder
-python ufo.py --task_name {task_name} --mode follower --plan {plan_file}
-```
-
-!!! tip
- Replace `{task_name}` with the name of the task and `{plan_file}` with the path to the plan file.
-
-
-### Step 3: Run in Batch (Optional)
-
-You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command:
-
-```bash
-# assume you are in the cloned UFO folder
-python ufo.py --task_name {task_name} --mode follower --plan {plan_folder}
-```
-
-UFO will automatically detect the plan files in the folder and run them one by one.
-
-!!! tip
- Replace `{task_name}` with the name of the task and `{plan_folder}` with the path to the folder containing plan files.
-
-
-## Evaluation
-You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.
-
-You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.
-
-# References
-The follower mode employs a `PlanReader` to parse the plan file and create a `FollowerSession` to follow the plan.
-
-## PlanReader
-The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.
-
-:::module.sessions.plan_reader.PlanReader
-
-
-## FollowerSession
-
-The `FollowerSession` is also located in the `ufo/module/sessions/session.py` file.
-
-:::module.sessions.session.FollowerSession
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/multi_action.md b/documents/docs/advanced_usage/multi_action.md
deleted file mode 100644
index 7ec6595d0..000000000
--- a/documents/docs/advanced_usage/multi_action.md
+++ /dev/null
@@ -1,22 +0,0 @@
-# Speculative Multi-Action Execution
-
-UFO² introduces a new feature called **Speculative Multi-Action Execution**. This feature allows the agent to bundle several predicted steps into one LLM call, which are then validated live. This approach can lead to up to **51% fewer queries** compared to inferring each step separately. The agent will first predict a batch of likely actions and then validate them against the live UIA state in a single shot. We illustrate the speculative multi-action execution in the figure below:
-
-
-
-
-
-
-
-## Configuration
-To activate the speculative multi-action execution, you need to set `ACTION_SEQUENCE` to `True` in the `config_dev.yaml` file.
-
-```yaml
-ACTION_SEQUENCE: True
-```
-
-
-# References
-The implementation of the speculative multi-action execution is located in the `ufo/agents/processors/actions.py` file. The following classes are used for the speculative multi-action execution:
-
-:::agents.processors.actions.OneStepAction
diff --git a/documents/docs/advanced_usage/operator_as_app_agent.md b/documents/docs/advanced_usage/operator_as_app_agent.md
deleted file mode 100644
index edc86dc73..000000000
--- a/documents/docs/advanced_usage/operator_as_app_agent.md
+++ /dev/null
@@ -1,46 +0,0 @@
-# Operator as an AppAgent
-
-UFO² supports **wrapping any third-party agent as an AppAgent**, allowing it to be invoked by the HostAgent within a multi-agent workflow. This section demonstrates how to run **Operator**, an OpenAI-based Conversational UI Agent (CUA), as an AppAgent inside the UFO² ecosystem.
-
-
-
-
-
-
-
-## 📦 Prerequisites
-
-Before proceeding, please ensure that the Operator has been properly configured. You can follow the setup instructions in the [OpenAI CUA (Operator) guide](../supported_models/operator.md).
-
-## 🚀 Running the Operator
-
-UFO² provides two modes for running the Operator:
-
-1. **Single Agent Mode** — Use UFO² as the launcher to run Operator in standalone mode.
-2. **AppAgent Mode** — Run Operator as an `AppAgent`, enabling it to be orchestrated by the `HostAgent` as part of a broader task decomposition.
-
-### 🔹 Single Agent Mode
-
-In this mode, the Operator functions independently but is launched through UFO². This is useful for debugging or quick prototyping.
-
-```powershell
-python -m ufo -m operator -t -r
-```
-
-### 🔸 AppAgent Mode
-
-This mode wraps Operator as an AppAgent (`normal_operator`) so that it can be triggered as a sub-agent within a full HostAgent workflow.
-
-```powershell
-python -m ufo -m normal_operator -t -r
-```
-
-## 📝 Logs
-
-In both modes, execution logs will be saved in the following directory:
-
-```
-logs//
-```
-
-These logs follow the same structure and conventions as previous UFO² sessions.
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/reinforce_appagent/experience_learning.md b/documents/docs/advanced_usage/reinforce_appagent/experience_learning.md
deleted file mode 100644
index 7c5cc018f..000000000
--- a/documents/docs/advanced_usage/reinforce_appagent/experience_learning.md
+++ /dev/null
@@ -1,65 +0,0 @@
-# Learning from Self-Experience
-
-When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.
-
-## Mechanism
-
-### Step 1: Complete a Session
-- **Event**: UFO completes a session
-
-### Step 2: Ask User to Save Experience
-- **Action**: The agent prompts the user with a choice to save the successful experience
-
-
-
-
-
-### Step 3: User Chooses to Save
-- **Action**: If the user chooses to save the experience
-
-### Step 4: Summarize and Save the Experience
-- **Tool**: `ExperienceSummarizer`
-- **Process**:
- 1. Summarize the experience into a demonstration example
- 2. Save the demonstration example in the `EXPERIENCE_SAVED_PATH` as specified in the `config_dev.yaml` file
- 3. The demonstration example includes similar [fields](../../prompts/examples_prompts.md) as those used in the AppAgent's prompt
-
-### Step 5: Retrieve and Utilize Saved Experience
-- **When**: The AppAgent encounters a similar task in the future
-- **Action**: Retrieve the saved experience from the experience database
-- **Outcome**: Use the retrieved experience to generate a plan
-
-### Workflow Diagram
-```mermaid
-graph TD;
- A[Complete Session] --> B[Ask User to Save Experience]
- B --> C[User Chooses to Save]
- C --> D[Summarize with ExperienceSummarizer]
- D --> E[Save in EXPERIENCE_SAVED_PATH]
- F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience]
- G --> H[Generate Plan]
-```
-
-## Activate the Learning from Self-Experience
-
-### Step 1: Configure the AppAgent
-Configure the following parameters to allow UFO to use the RAG from its self-experience:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_EXPERIENCE` | Whether to use the RAG from its self-experience | Boolean | False |
-| `RAG_EXPERIENCE_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 5 |
-
-# Reference
-
-## Experience Summarizer
-The `ExperienceSummarizer` class is located in the `ufo/experience/experience_summarizer.py` file. The `ExperienceSummarizer` class provides the following methods to summarize the experience:
-
-:::experience.summarizer.ExperienceSummarizer
-
-
-
-## Experience Retriever
-The `ExperienceRetriever` class is located in the `ufo/rag/retriever.py` file. The `ExperienceRetriever` class provides the following methods to retrieve the experience:
-
-:::rag.retriever.ExperienceRetriever
diff --git a/documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md b/documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md
deleted file mode 100644
index 6ee9a5bb9..000000000
--- a/documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Learning from Bing Search
-
-UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the `AppAgent`'s knowledge.
-
-## Mechanism
-Upon receiving a request, the `AppAgent` constructs a Bing search query based on the request and retrieves the search results from Bing. The `AppAgent` then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.
-
-
-## Activate the Learning from Bing Search
-
-
-### Step 1: Obtain Bing API Key
-To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the [Microsoft Azure Bing Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) to get the API key.
-
-
-### Step 2: Configure the AppAgent
-
-Configure the following parameters to allow UFO to use online Bing search for the decision-making process:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_ONLINE_SEARCH` | Whether to use the Bing search | Boolean | False |
-| `BING_API_KEY` | The Bing search API key | String | "" |
-| `RAG_ONLINE_SEARCH_TOPK` | The topk for the online search | Integer | 5 |
-| `RAG_ONLINE_RETRIEVED_TOPK` | The topk for the online retrieved searched results | Integer | 1 |
-
-# Reference
-
-:::rag.retriever.OnlineDocRetriever
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/reinforce_appagent/learning_from_demonstration.md b/documents/docs/advanced_usage/reinforce_appagent/learning_from_demonstration.md
deleted file mode 100644
index c396dd09f..000000000
--- a/documents/docs/advanced_usage/reinforce_appagent/learning_from_demonstration.md
+++ /dev/null
@@ -1,50 +0,0 @@
-Here is the polished document for your Python code project:
-
-# Learning from User Demonstration
-
-For complex tasks, users can demonstrate the task using [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance.
-
-
-
-## Mechanism
-
-UFO use the [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The `DemonstrationSummarizer` class extracts and summarizes the demonstration. The summarized demonstration is saved in the `DEMONSTRATION_SAVED_PATH` as specified in the `config_dev.yaml` file. When the AppAgent encounters a similar task, the `DemonstrationRetriever` class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration.
-
-!!! info
- You can find how to record the task and action trajectories using the Step Recorder tool in the [User Demonstration Provision](../../creating_app_agent/demonstration_provision.md) document.
-
-
-You can find a demo video of learning from user demonstrations:
-
-
-
-
-
-
-## Activating Learning from User Demonstrations
-
-### Step 1: User Demonstration
-Please follow the steps in the [User Demonstration Provision](../../creating_app_agent/demonstration_provision.md) document to provide user demonstrations.
-
-### Step 2: Configure the AppAgent
-Configure the following parameters to allow UFO to use RAG from user demonstrations:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_DEMONSTRATION` | Whether to use RAG from user demonstrations | Boolean | False |
-| `RAG_DEMONSTRATION_RETRIEVED_TOPK` | The top K documents to retrieve offline | Integer | 5 |
-| `RAG_DEMONSTRATION_COMPLETION_N` | The number of completion choices for the demonstration result | Integer | 3 |
-
-## Reference
-
-### Demonstration Summarizer
-The `DemonstrationSummarizer` class is located in the `record_processor/summarizer/summarizer.py` file. The `DemonstrationSummarizer` class provides methods to summarize the demonstration:
-
-:::summarizer.summarizer.DemonstrationSummarizer
-
-
-
-### Demonstration Retriever
-The `DemonstrationRetriever` class is located in the `rag/retriever.py` file. The `DemonstrationRetriever` class provides methods to retrieve the demonstration:
-
-:::rag.retriever.DemonstrationRetriever
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/reinforce_appagent/learning_from_help_document.md b/documents/docs/advanced_usage/reinforce_appagent/learning_from_help_document.md
deleted file mode 100644
index aca82a561..000000000
--- a/documents/docs/advanced_usage/reinforce_appagent/learning_from_help_document.md
+++ /dev/null
@@ -1,31 +0,0 @@
-# Learning from Help Documents
-
-User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the [Help Document Provision](../../creating_app_agent/help_document_provision.md) section.
-
-
-## Mechanism
-The help documents are provided in a format of **task-solution pairs**. Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions.
-
-!!! note
- Since the retrieved help documents may not be relevant to the request, the `AppAgent` will only take them as references to generate the plan.
-
-## Activate the Learning from Help Documents
-
-Follow the steps below to activate the learning from help documents:
-
-### Step 1: Provide Help Documents
-Please follow the steps in the [Help Document Provision](../../creating_app_agent/help_document_provision.md) document to provide help documents to the AppAgent.
-
-### Step 2: Configure the AppAgent
-
-Configure the following parameters in the `config.yaml` file to activate the learning from help documents:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_OFFLINE_DOCS` | Whether to use the offline RAG | Boolean | False |
-| `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 1 |
-
-
-# Reference
-
-:::rag.retriever.OfflineDocRetriever
\ No newline at end of file
diff --git a/documents/docs/advanced_usage/reinforce_appagent/overview.md b/documents/docs/advanced_usage/reinforce_appagent/overview.md
deleted file mode 100644
index 6ebf072f8..000000000
--- a/documents/docs/advanced_usage/reinforce_appagent/overview.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Reinforcing AppAgent
-
-UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application.
-
-We currently support the following reinforcement methods:
-
-| Reinforcement Method | Description |
-|----------------------|-------------|
-| [Learning from Help Documents](./learning_from_help_document.md) | Reinforce the AppAgent by retrieving knowledge from help documents. |
-| [Learning from Bing Search](./learning_from_bing_search.md) | Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. |
-| [Learning from Self-Experience](./experience_learning.md) | Reinforce the AppAgent by learning from its own successful experiences. |
-| [Learning from User Demonstrations](./learning_from_demonstration.md) | Reinforce the AppAgent by learning from action trajectories demonstrated by users. |
-
-## Knowledge Provision
-
-UFO provides the knowledge to the AppAgent through a `context_provision` method defined in the `AppAgent` class:
-
-```python
-def context_provision(self, request: str = "") -> None:
- """
- Provision the context for the app agent.
- :param request: The Bing search query.
- """
-
- # Load the offline document indexer for the app agent if available.
- if configs["RAG_OFFLINE_DOCS"]:
- utils.print_with_color(
- "Loading offline help document indexer for {app}...".format(
- app=self._process_name
- ),
- "magenta",
- )
- self.build_offline_docs_retriever()
-
- # Load the online search indexer for the app agent if available.
-
- if configs["RAG_ONLINE_SEARCH"] and request:
- utils.print_with_color("Creating a Bing search indexer...", "magenta")
- self.build_online_search_retriever(
- request, configs["RAG_ONLINE_SEARCH_TOPK"]
- )
-
- # Load the experience indexer for the app agent if available.
- if configs["RAG_EXPERIENCE"]:
- utils.print_with_color("Creating an experience indexer...", "magenta")
- experience_path = configs["EXPERIENCE_SAVED_PATH"]
- db_path = os.path.join(experience_path, "experience_db")
- self.build_experience_retriever(db_path)
-
- # Load the demonstration indexer for the app agent if available.
- if configs["RAG_DEMONSTRATION"]:
- utils.print_with_color("Creating an demonstration indexer...", "magenta")
- demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
- db_path = os.path.join(demonstration_path, "demonstration_db")
- self.build_human_demonstration_retriever(db_path)
-```
-
-The `context_provision` method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the `config_dev.yaml` file.
-
-# Reference
-UFO employs the `Retriever` class located in the `ufo/rag/retriever.py` file to retrieve knowledge from various sources. The `Retriever` class provides the following methods to retrieve knowledge:
-
-:::rag.retriever.Retriever
diff --git a/documents/docs/agents/app_agent.md b/documents/docs/agents/app_agent.md
deleted file mode 100644
index a504ce5e0..000000000
--- a/documents/docs/agents/app_agent.md
+++ /dev/null
@@ -1,173 +0,0 @@
-# AppAgent 👾
-
-An `AppAgent` is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The `AppAgent` is created by the `HostAgent` to fulfill a sub-task within a `Round`. The `AppAgent` is responsible for executing the necessary actions within the application to fulfill the user's request. The `AppAgent` has the following features:
-
-1. **[ReAct](https://arxiv.org/abs/2210.03629) with the Application** - The `AppAgent` recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request.
-2. **Comprehension Enhancement** - The `AppAgent` is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application "expert".
-3. **Versatile Skill Set** - The `AppAgent` is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and "Copilot".
-
-!!! tip
- You can find the how to enhance the `AppAgent` with external knowledge bases and demonstration libraries in the [Reinforcing AppAgent](../advanced_usage/reinforce_appagent/overview.md) documentation.
-
-
-We show the framework of the `AppAgent` in the following diagram:
-
-
-
-
-
-## AppAgent Input
-
-To interact with the application, the `AppAgent` receives the following inputs:
-
-| Input | Description | Type |
-| --- | --- | --- |
-| User Request | The user's request in natural language. | String |
-| Sub-Task | The sub-task description to be executed by the `AppAgent`, assigned by the `HostAgent`. | String |
-| Current Application | The name of the application to be interacted with. | String |
-| Control Information | Index, name and control type of available controls in the application. | List of Dictionaries |
-| Application Screenshots | Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). | List of Strings |
-| Previous Sub-Tasks | The previous sub-tasks and their completion status. | List of Strings |
-| Previous Plan | The previous plan for the following steps. | List of Strings |
-| HostAgent Message | The message from the `HostAgent` for the completion of the sub-task. | String |
-| Retrived Information | The retrieved information from external knowledge bases or demonstration libraries. | String |
-| Blackboard | The shared memory space for storing and sharing information among the agents. | Dictionary |
-
-
-Below is an example of the annotated application screenshot with labeled controls. This follow the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm.
-
-
-
-
-
-By processing these inputs, the `AppAgent` determines the necessary actions to fulfill the user's request within the application.
-
-!!! tip
- Whether to concatenate the clean screenshot and annotated screenshot can be configured in the `CONCAT_SCREENSHOT` field in the `config_dev.yaml` file.
-
-!!! tip
- Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the `INCLUDE_LAST_SCREENSHOT` field in the `config_dev.yaml` file.
-
-
-## AppAgent Output
-
-With the inputs provided, the `AppAgent` generates the following outputs:
-
-| Output | Description | Type |
-| --- | --- | --- |
-| Observation | The observation of the current application screenshots. | String |
-| Thought | The logical reasoning process of the `AppAgent`. | String |
-| ControlLabel | The index of the selected control to interact with. | String |
-| ControlText | The name of the selected control to interact with. | String |
-| Function | The function to be executed on the selected control. | String |
-| Args | The arguments required for the function execution. | List of Strings |
-| Status | The status of the agent, mapped to the `AgentState`. | String |
-| Plan | The plan for the following steps after the current action. | List of Strings |
-| Comment | Additional comments or information provided to the user. | String |
-| SaveScreenshot | The flag to save the screenshot of the application to the `blackboard` for future reference. | Boolean |
-
-Below is an example of the `AppAgent` output:
-
-```json
-{
- "Observation": "Application screenshot",
- "Thought": "Logical reasoning process",
- "ControlLabel": "Control index",
- "ControlText": "Control name",
- "Function": "Function name",
- "Args": ["arg1", "arg2"],
- "Status": "AgentState",
- "Plan": ["Step 1", "Step 2"],
- "Comment": "Additional comments",
- "SaveScreenshot": true
-}
-```
-
-!!! info
- The `AppAgent` output is formatted as a JSON object by LLMs and can be parsed by the `json.loads` method in Python.
-
-
-## AppAgent State
-The `AppAgent` state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the `ufo/agents/states/app_agent_states.py` module. The states include:
-
-| State | Description |
-|-------------|-----------------------------------------------------------------------------|
-| `CONTINUE` | Main execution loop; evaluates which subtasks are ready to launch or resume. |
-| `ASSIGN` | Selects an available application process and spawns the corresponding `AppAgent`. |
-| `PENDING` | Waits for user input to resolve ambiguity or gather additional task parameters. |
-| `FINISH` | All subtasks complete; cleans up agent instances and finalizes session state. |
-| `FAIL` | Enters recovery or abort mode upon irrecoverable failure. |
-
-
-The state machine diagram for the `AppAgent` is shown below:
-
-
-
-
-The `AppAgent` progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the `HostAgent`.
-
-
-## Knowledge Enhancement
-The `AppAgent` is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The `AppAgent` leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.
-
-### Learning from Help Documents
-User can provide help documents to the `AppAgent` to enhance its comprehension of the application and improve its performance in the `config.yaml` file.
-
-!!! tip
- Please find details configuration in the [documentation](../configurations/user_configuration.md).
-!!! tip
- You may also refer to the [here]() for how to provide help documents to the `AppAgent`.
-
-
-In the `AppAgent`, it calls the `build_offline_docs_retriever` to build a help document retriever, and uses the `retrived_documents_prompt_helper` to contruct the prompt for the `AppAgent`.
-
-
-
-### Learning from Bing Search
-Since help documents may not cover all the information or the information may be outdated, the `AppAgent` can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the `config.yaml` file.
-
-!!! tip
- Please find details configuration in the [documentation](../configurations/user_configuration.md).
-!!! tip
- You may also refer to the [here]() for the implementation of Bing search in the `AppAgent`.
-
-In the `AppAgent`, it calls the `build_online_search_retriever` to build a Bing search retriever, and uses the `retrived_documents_prompt_helper` to contruct the prompt for the `AppAgent`.
-
-
-### Learning from Self-Demonstrations
-You may save successful action trajectories in the `AppAgent` to learn from self-demonstrations and improve its performance. After the completion of a `session`, the `AppAgent` will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the `config.yaml` file.
-
-!!! tip
- You can find details of the configuration in the [documentation](../configurations/user_configuration.md).
-
-!!! tip
- You may also refer to the [here]() for the implementation of self-demonstrations in the `AppAgent`.
-
-In the `AppAgent`, it calls the `build_experience_retriever` to build a self-demonstration retriever, and uses the `rag_experience_retrieve` to retrieve the demonstration for the `AppAgent`.
-
-### Learning from Human Demonstrations
-In addition to self-demonstrations, you can also provide human demonstrations to the `AppAgent` to enhance its performance by using the [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) tool built in the Windows OS. The `AppAgent` will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the `config.yaml` file.
-
-!!! tip
- You can find details of the configuration in the [documentation](../configurations/user_configuration.md).
-!!! tip
- You may also refer to the [here]() for the implementation of human demonstrations in the `AppAgent`.
-
-In the `AppAgent`, it calls the `build_human_demonstration_retriever` to build a human demonstration retriever, and uses the `rag_experience_retrieve` to retrieve the demonstration for the `AppAgent`.
-
-
-## Skill Set for Automation
-The `AppAgent` is equipped with a versatile skill set to support comprehensive automation within the application by calling the `create_puppeteer_interface` method. The skills include:
-
-| Skill | Description |
-| --- | --- |
-| UI Automation | Mimicking user interactions with the application UI controls using the `UI Automation` and `Win32` API. |
-| Native API | Accessing the application's native API to execute specific functions and actions. |
-| In-App Agent | Leveraging the in-app agent to interact with the application's internal functions and features. |
-
-By utilizing these skills, the `AppAgent` can efficiently interact with the application and fulfill the user's request. You can find more details in the [Automator](../automator/overview.md) documentation and the code in the `ufo/automator` module.
-
-
-# Reference
-
-:::agents.agent.app_agent.AppAgent
\ No newline at end of file
diff --git a/documents/docs/agents/design/memory.md b/documents/docs/agents/design/memory.md
deleted file mode 100644
index 484412aa4..000000000
--- a/documents/docs/agents/design/memory.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Agent Memory
-
-The `Memory` manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the `Memory` will be visible to the agent for decision-making.
-
-
-## MemoryItem
-A `MemoryItem` is a `dataclass` that represents a single step in the agent's memory. The fields of a `MemoryItem` is flexible and can be customized based on the requirements of the agent. The `MemoryItem` class is defined as follows:
-
-::: agents.memory.memory.MemoryItem
-
-!!!info
- At each step, an instance of `MemoryItem` is created and stored in the `Memory` to record the information of the agent's interaction with the user and applications.
-
-
-## Memory
-The `Memory` class is responsible for managing the memory of the agent. It stores a list of `MemoryItem` instances that represent the agent's memory at each step. The `Memory` class is defined as follows:
-
-::: agents.memory.memory.Memory
-
-!!!info
- Each agent has its own `Memory` instance to store their information.
-
-!!!info
- Not all information in the `Memory` are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.
\ No newline at end of file
diff --git a/documents/docs/agents/design/processor.md b/documents/docs/agents/design/processor.md
deleted file mode 100644
index 7b0ab1c68..000000000
--- a/documents/docs/agents/design/processor.md
+++ /dev/null
@@ -1,28 +0,0 @@
-# Agents Processor
-
-The `Processor` is a key component of the agent to process the core logic of the agent to process the user's request. The `Processor` is implemented as a class in the `ufo/agents/processors` folder. Each agent has its own `Processor` class withing the folder.
-
-## Core Process
-Once called, an agent follows a series of steps to process the user's request defined in the `Processor` class by calling the `process` method. The workflow of the `process` is as follows:
-
-| Step | Description | Function |
-| --- | --- | --- |
-| 1 | Print the step information. | `print_step_info` |
-| 2 | Capture the screenshot of the application. | `capture_screenshot` |
-| 3 | Get the control information of the application. | `get_control_info` |
-| 4 | Get the prompt message for the LLM. | `get_prompt_message` |
-| 5 | Generate the response from the LLM. | `get_response` |
-| 6 | Update the cost of the step. | `update_cost` |
-| 7 | Parse the response from the LLM. | `parse_response` |
-| 8 | Execute the action based on the response. | `execute_action` |
-| 9 | Update the memory and blackboard. | `update_memory` |
-| 10 | Update the status of the agent. | `update_status` |
-
-At each step, the `Processor` processes the user's request by invoking the corresponding method sequentially to execute the necessary actions.
-
-
-The process may be paused. It can be resumed, based on the agent's logic and the user's request using the `resume` method.
-
-## Reference
-Below is the basic structure of the `Processor` class:
-:::agents.processors.basic.BaseProcessor
\ No newline at end of file
diff --git a/documents/docs/agents/design/prompter.md b/documents/docs/agents/design/prompter.md
deleted file mode 100644
index aefd4a7ba..000000000
--- a/documents/docs/agents/design/prompter.md
+++ /dev/null
@@ -1,47 +0,0 @@
-# Agent Prompter
-
-The `Prompter` is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The `Prompter` is implemented in the `ufo/prompts` folder. Each agent has its own `Prompter` class that defines the structure of the prompt and the information to be fed to the LLM.
-
-## Components
-
-A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys:
-
-| Key | Description |
-| --- | --- |
-| `role` | The role of the text in the prompt, can be `system`, `user`, or `assistant`. |
-| `content` | The content of the text for the specific role. |
-
-!!!tip
- You may find the [official documentation](https://help.openai.com/en/articles/7042661-moving-from-completions-to-chat-completions-in-the-openai-api) helpful for constructing the prompt.
-
-In the `__init__` method of the `Prompter` class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the `prompt_construction` method.
-
-### System Prompt
-The system prompt use the template configured in the `config_dev.yaml` file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc.
-You need use the `system_prompt_construction` method to construct the system prompt.
-
-Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the `api_prompt_helper` and `examples_prompt_helper` methods respectively. Below is the sub-components of the system prompt:
-
-| Component | Description | Method |
-| --- | --- | --- |
-| `apis` | The API instructions for the agent. | `api_prompt_helper` |
-| `examples` | The demonstration examples for the agent. | `examples_prompt_helper` |
-
-### User Prompt
-The user prompt is constructed based on the information from the agent's observation, external knowledge, and `Blackboard`. You can use the `user_prompt_construction` method to construct the user prompt. Below is the sub-components of the user prompt:
-
-| Component | Description | Method |
-| --- | --- | --- |
-| `observation` | The observation of the agent. | `user_content_construction` |
-| `retrieved_docs` | The knowledge retrieved from the external knowledge base. | `retrived_documents_prompt_helper` |
-| `blackboard` | The information stored in the `Blackboard`. | `blackboard_to_prompt` |
-
-
-# Reference
-You can find the implementation of the `Prompter` in the `ufo/prompts` folder. Below is the basic structure of the `Prompter` class:
-
-:::prompter.basic.BasicPrompter
-
-
-!!!tip
- You can customize the `Prompter` class to tailor the prompt to your requirements.
\ No newline at end of file
diff --git a/documents/docs/agents/design/state.md b/documents/docs/agents/design/state.md
deleted file mode 100644
index a1495711c..000000000
--- a/documents/docs/agents/design/state.md
+++ /dev/null
@@ -1,122 +0,0 @@
-# Agent State
-
-The `State` class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow.
-
-
-
-## AgentStatus
-The set of states for an agent is defined in the `AgentStatus` class:
-
-```python
-class AgentStatus(Enum):
- """
- The status class for the agent.
- """
-
- ERROR = "ERROR"
- FINISH = "FINISH"
- CONTINUE = "CONTINUE"
- FAIL = "FAIL"
- PENDING = "PENDING"
- CONFIRM = "CONFIRM"
- SCREENSHOT = "SCREENSHOT"
-```
-
-Each agent implements its own set of `AgentStatus` to define the states of the agent.
-
-
-## AgentStateManager
-
-The class `AgentStateManager` manages the state mapping from a string to the corresponding state class. Each state class is registered with the `AgentStateManager` using the `register` decorator to associate the state class with a specific agent, e.g.,
-
-```python
-@AgentStateManager.register
-class SomeAgentState(AgentState):
- """
- The state class for the some agent.
- """
-```
-
-!!! tip
- You can find examples on how to register the state class for the `AppAgent` in the `ufo/agents/states/app_agent_state.py` file.
-
-Below is the basic structure of the `AgentStateManager` class:
-```python
-class AgentStateManager(ABC, metaclass=SingletonABCMeta):
- """
- A abstract class to manage the states of the agent.
- """
-
- _state_mapping: Dict[str, Type[AgentState]] = {}
-
- def __init__(self):
- """
- Initialize the state manager.
- """
-
- self._state_instance_mapping: Dict[str, AgentState] = {}
-
- def get_state(self, status: str) -> AgentState:
- """
- Get the state for the status.
- :param status: The status string.
- :return: The state object.
- """
-
- # Lazy load the state class
- if status not in self._state_instance_mapping:
- state_class = self._state_mapping.get(status)
- if state_class:
- self._state_instance_mapping[status] = state_class()
- else:
- self._state_instance_mapping[status] = self.none_state
-
- state = self._state_instance_mapping.get(status, self.none_state)
-
- return state
-
- def add_state(self, status: str, state: AgentState) -> None:
- """
- Add a new state to the state mapping.
- :param status: The status string.
- :param state: The state object.
- """
- self.state_map[status] = state
-
- @property
- def state_map(self) -> Dict[str, AgentState]:
- """
- The state mapping of status to state.
- :return: The state mapping.
- """
- return self._state_instance_mapping
-
- @classmethod
- def register(cls, state_class: Type[AgentState]) -> Type[AgentState]:
- """
- Decorator to register the state class to the state manager.
- :param state_class: The state class to be registered.
- :return: The state class.
- """
- cls._state_mapping[state_class.name()] = state_class
- return state_class
-
- @property
- @abstractmethod
- def none_state(self) -> AgentState:
- """
- The none state of the state manager.
- """
- pass
-```
-
-## AgentState
-Each state class inherits from the `AgentState` class and must implement the method of `handle` to process the action in the state. In addition, the `next_state` and `next_agent` methods are used to determine the next state and agent to handle the transition. Please find below the reference for the `State` class in UFO.
-
-::: agents.states.basic.AgentState
-
-!!!tip
- The state machine diagrams for the `HostAgent` and `AppAgent` are shown in their respective documents.
-
-!!!tip
- A `Round` calls the `handle`, `next_state`, and `next_agent` methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.
diff --git a/documents/docs/agents/evaluation_agent.md b/documents/docs/agents/evaluation_agent.md
deleted file mode 100644
index 6dbfbadf0..000000000
--- a/documents/docs/agents/evaluation_agent.md
+++ /dev/null
@@ -1,72 +0,0 @@
-# EvaluationAgent 🧐
-
-The objective of the `EvaluationAgent` is to evaluate whether a `Session` or `Round` has been successfully completed. The `EvaluationAgent` assesses the performance of the `HostAgent` and `AppAgent` in fulfilling the request. You can configure whether to enable the `EvaluationAgent` in the `config_dev.yaml` file and the detailed documentation can be found [here](../configurations/developer_configuration.md).
-!!! note
- The `EvaluationAgent` is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes.
-
-
-We illustrate the evaluation process in the following figure:
-
-
-
-
-## Configuration
-To enable the `EvaluationAgent`, you can configure the following parameters in the `config_dev.yaml` file to evaluate the task completion status at different levels:
-
-| Configuration Option | Description | Type | Default Value |
-|---------------------------|-----------------------------------------------|---------|---------------|
-| `EVA_SESSION` | Whether to include the session in the evaluation. | Boolean | True |
-| `EVA_ROUND` | Whether to include the round in the evaluation. | Boolean | False |
-| `EVA_ALL_SCREENSHOTS` | Whether to include all the screenshots in the evaluation. | Boolean | True |
-
-
-## Evaluation Inputs
-The `EvaluationAgent` takes the following inputs for evaluation:
-
-| Input | Description | Type |
-| --- | --- | --- |
-| User Request | The user's request to be evaluated. | String |
-| APIs Description | The description of the APIs used in the execution. | List of Strings |
-| Action Trajectories | The action trajectories executed by the `HostAgent` and `AppAgent`. | List of Strings |
-| Screenshots | The screenshots captured during the execution. | List of Images |
-
-For more details on how to construct the inputs, please refer to the `EvaluationAgentPrompter` class in `ufo/prompter/eva_prompter.py`.
-
-!!! tip
- You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the `EVA_ALL_SCREENSHOTS` of the `config_dev.yaml` file.
-
-
-## Evaluation Outputs
-The `EvaluationAgent` generates the following outputs after evaluation:
-
-| Output | Description | Type |
-| --- | --- | --- |
-| reason | The detailed reason for your judgment, by observing the screenshot differences and the . | String |
-| sub_scores | The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | List of Dictionaries |
-| complete | The completion status of the evaluation, can be `yes`, `no`, or `unsure`. | String |
-
-Below is an example of the evaluation output:
-
-```json
-{
- "reason": "The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams.
- The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open.
- The agent then focused on the chat window, input the message 'hello', and clicked the Send button.
- The final screenshot confirms that the message 'hello' was sent to Zac.",
- "sub_scores": {
- "correct application focus": "yes",
- "correct message input": "yes",
- "message sent successfully": "yes"
- },
- "complete": "yes"}
-```
-
-!!!info
- The log of the evaluation results will be saved in the `logs/{task_name}/evaluation.log` file.
-
-The `EvaluationAgent` employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation.
-
-# Reference
-
-:::agents.agent.evaluation_agent.EvaluationAgent
-
diff --git a/documents/docs/agents/follower_agent.md b/documents/docs/agents/follower_agent.md
deleted file mode 100644
index 4855366fb..000000000
--- a/documents/docs/agents/follower_agent.md
+++ /dev/null
@@ -1,28 +0,0 @@
-# Follower Agent 🚶🏽♂️
-
-The `FollowerAgent` is inherited from the `AppAgent` and is responsible for following the user's instructions to perform specific tasks within the application. The `FollowerAgent` is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior.
-
-
-## Different from the AppAgent
-The `FollowerAgent` shares most of the functionalities with the `AppAgent`, but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action.
-
-
-## Usage
-The `FollowerAgent` is available in `follower` mode. You can find more details in the [documentation](). It also uses differnt `Session` and `Processor` to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the `FollowerAgent` to execute the actions. An example of the json file is shown below:
-
-```json
-{
- "task": "Type in a bold text of 'Test For Fun'",
- "steps":
- [
- "1.type in 'Test For Fun'",
- "2.select the text of 'Test For Fun'",
- "3.click on the bold"
- ],
- "object": "draft.docx"
-}
-```
-
-# Reference
-
-:::agents.agent.follower_agent.FollowerAgent
\ No newline at end of file
diff --git a/documents/docs/agents/host_agent.md b/documents/docs/agents/host_agent.md
deleted file mode 100644
index 394613658..000000000
--- a/documents/docs/agents/host_agent.md
+++ /dev/null
@@ -1,160 +0,0 @@
-# HostAgent 🤖
-
-The `HostAgent` assumes three primary responsibilities:
-
-- **Task Decomposition.** Given a user's natural language input, `HostAgent` identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.
-
-- **Application Lifecycle Management.** For each subtask, `HostAgent` inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.
-
-- **`AppAgent` Instantiation.** `HostAgent` spawns the corresponding `AppAgent` for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).
-
-- **Task Scheduling and Control.** The global execution plan is serialized into a finite state machine (FSM), allowing `HostAgent` to enforce execution order, detect failures, and resolve dependencies across agents.
-
-- **Shared State Communication.** `HostAgent` reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.
-
-Below is a diagram illustrating the `HostAgent` architecture and its interactions with other components:
-
-
-
-
-
-
-The `HostAgent` activates its `Processor` to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an `AppAgent` for execution. The `HostAgent` monitors the progress of the `AppAgents` and ensures the successful completion of the user's request.
-
-## HostAgent Input
-
-The `HostAgent` receives the following inputs:
-
-| Input | Description | Type |
-| --- | --- | --- |
-| User Request | The user's request in natural language. | String |
-| Application Information | Information about the existing active applications. | List of Strings |
-| Desktop Screenshots | Screenshots of the desktop to provide context to the `HostAgent`. | Image |
-| Previous Sub-Tasks | The previous sub-tasks and their completion status. | List of Strings |
-| Previous Plan | The previous plan for the following sub-tasks. | List of Strings |
-| Blackboard | The shared memory space for storing and sharing information among the agents. | Dictionary |
-
-By processing these inputs, the `HostAgent` determines the appropriate application to fulfill the user's request and orchestrates the `AppAgents` to execute the necessary actions.
-
-## HostAgent Output
-
-With the inputs provided, the `HostAgent` generates the following outputs:
-
-| Output | Description | Type |
-| --- | --- | --- |
-| Observation | The observation of current desktop screenshots. | String |
-| Thought | The logical reasoning process of the `HostAgent`. | String |
-| Current Sub-Task | The current sub-task to be executed by the `AppAgent`. | String |
-| Message | The message to be sent to the `AppAgent` for the completion of the sub-task. | String |
-| ControlLabel | The index of the selected application to execute the sub-task. | String |
-| ControlText | The name of the selected application to execute the sub-task. | String |
-| Plan | The plan for the following sub-tasks after the current sub-task. | List of Strings |
-| Status | The status of the agent, mapped to the `AgentState`. | String |
-| Comment | Additional comments or information provided to the user. | String |
-| Questions | The questions to be asked to the user for additional information. | List of Strings |
-| Bash | The bash command to be executed by the `HostAgent`. It can be used to open applications or execute system commands. | String |
-
-
-Below is an example of the `HostAgent` output:
-
-```json
-{
- "Observation": "Desktop screenshot",
- "Thought": "Logical reasoning process",
- "Current Sub-Task": "Sub-task description",
- "Message": "Message to AppAgent",
- "ControlLabel": "Application index",
- "ControlText": "Application name",
- "Plan": ["Sub-task 1", "Sub-task 2"],
- "Status": "AgentState",
- "Comment": "Additional comments",
- "Questions": ["Question 1", "Question 2"],
- "Bash": "Bash command"
-}
-```
-
-!!! info
- The `HostAgent` output is formatted as a JSON object by LLMs and can be parsed by the `json.loads` method in Python.
-
-
-## HostAgent State
-
-The `HostAgent` progresses through different states, as defined in the `ufo/agents/states/host_agent_states.py` module. The states include:
-
-| State | Description |
-|-------------|-----------------------------------------------------------------------------|
-| `CONTINUE` | Default state for action planning and execution. |
-| `PENDING` | Invoked for safety-critical actions (e.g., destructive operations); requires user confirmation. |
-| `FINISH` | Task completed; execution ends. |
-| `FAIL` | Irrecoverable failure detected (e.g., application crash, permission error). |
-
-
-The state machine diagram for the `HostAgent` is shown below:
-
-
-
-
-
-The `HostAgent` transitions between these states based on the user's request, the application information, and the progress of the `AppAgents` in executing the sub-tasks.
-
-
-## Task Decomposition
-Upon receiving the user's request, the `HostAgent` decomposes it into sub-tasks and assigns each sub-task to an `AppAgent` for execution. The `HostAgent` determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the `AppAgents` to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:
-
-
-
-
-
-## Creating and Registering AppAgents
-When the `HostAgent` determines the need for a new `AppAgent` to fulfill a sub-task, it creates an instance of the `AppAgent` and registers it with the `HostAgent`, by calling the `create_subagent` method:
-
-```python
-def create_subagent(
- self,
- agent_type: str,
- agent_name: str,
- process_name: str,
- app_root_name: str,
- is_visual: bool,
- main_prompt: str,
- example_prompt: str,
- api_prompt: str,
- *args,
- **kwargs,
- ) -> BasicAgent:
- """
- Create an SubAgent hosted by the HostAgent.
- :param agent_type: The type of the agent to create.
- :param agent_name: The name of the SubAgent.
- :param process_name: The process name of the app.
- :param app_root_name: The root name of the app.
- :param is_visual: The flag indicating whether the agent is visual or not.
- :param main_prompt: The main prompt file path.
- :param example_prompt: The example prompt file path.
- :param api_prompt: The API prompt file path.
- :return: The created SubAgent.
- """
- app_agent = self.agent_factory.create_agent(
- agent_type,
- agent_name,
- process_name,
- app_root_name,
- is_visual,
- main_prompt,
- example_prompt,
- api_prompt,
- *args,
- **kwargs,
- )
- self.appagent_dict[agent_name] = app_agent
- app_agent.host = self
- self._active_appagent = app_agent
-
- return app_agent
-```
-
-The `HostAgent` then assigns the sub-task to the `AppAgent` for execution and monitors its progress.
-
-# Reference
-
-:::agents.agent.host_agent.HostAgent
diff --git a/documents/docs/agents/overview.md b/documents/docs/agents/overview.md
deleted file mode 100644
index c1992608c..000000000
--- a/documents/docs/agents/overview.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# Agents
-
-In UFO, there are four types of agents: `HostAgent`, `AppAgent`, `FollowerAgent`, and `EvaluationAgent`. Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process:
-
-| Agent | Description |
-| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
-| [`HostAgent`](../agents/host_agent.md) | Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. |
-| [`AppAgent`](../agents/app_agent.md) | Executes actions on the selected application. |
-| [`FollowerAgent`](../agents/follower_agent.md) | Follows the user's instructions to complete the task. |
-| [`EvaluationAgent`](../agents/evaluation_agent.md) | Evaluates the completeness of a session or a round. |
-
-In the normal workflow, only the `HostAgent` and `AppAgent` are involved in the user interaction process. The `FollowerAgent` and `EvaluationAgent` are used for specific tasks.
-
-Please see below the orchestration of the agents in UFO:
-
-
-
-
-
-## Main Components
-
-An agent in UFO is composed of the following main components to fulfill its role in the UFO system:
-
-| Component | Description |
-| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
-| [`State`](../agents/design/state.md) | Represents the current state of the agent and determines the next action and agent to handle the request. |
-| [`Memory`](../agents/design/memory.md) | Stores information about the user request, application state, and other relevant data. |
-| [`Blackboard`](../agents/design/blackboard.md) | Stores information shared between agents. |
-| [`Prompter`](../agents/design/prompter.md) | Generates prompts for the language model based on the user request and application state. |
-| [`Processor`](../agents/design/processor.md) | Processes the workflow of the agent, including handling user requests, executing actions, and memory management. |
-
-## Reference
-
-Below is the reference for the `Agent` class in UFO. All agents in UFO inherit from the `Agent` class and implement necessary methods to fulfill their roles in the UFO system.
-
-::: agents.agent.basic.BasicAgent
-
diff --git a/documents/docs/aip/endpoints.md b/documents/docs/aip/endpoints.md
new file mode 100644
index 000000000..7e9dfcd8e
--- /dev/null
+++ b/documents/docs/aip/endpoints.md
@@ -0,0 +1,544 @@
+# AIP Endpoints
+
+Endpoints combine protocol, transport, and resilience components to provide production-ready AIP communication for servers, clients, and orchestrators.
+
+## Endpoint Types at a Glance
+
+| Endpoint Type | Role | Used By | Key Features |
+|---------------|------|---------|--------------|
+| **DeviceServerEndpoint** | Server | Device Agent Service | ✅ Multiplexed connections ✅ Session management ✅ Task dispatching ✅ Result aggregation |
+| **DeviceClientEndpoint** | Client | Device Agent Client | ✅ Auto-reconnection ✅ Heartbeat management ✅ Command execution ✅ Telemetry reporting |
+| **ConstellationEndpoint** | Orchestrator | ConstellationClient | ✅ Multi-device coordination ✅ Task distribution ✅ Device info querying ✅ Connection pooling |
+
+---
+
+## Endpoint Architecture
+
+**Endpoint Inheritance Hierarchy:**
+
+AIP provides three specialized endpoint implementations that all inherit common functionality from a shared base class:
+
+```mermaid
+graph TB
+ Base[AIPEndpoint Base]
+ Base --> Server[DeviceServerEndpoint Server-Side]
+ Base --> Client[DeviceClientEndpoint Client-Side]
+ Base --> Constellation[ConstellationEndpoint Orchestrator]
+
+ Base -.->|Protocol| P[Message Handling]
+ Base -.->|Resilience| R[Reconnection + Heartbeat]
+ Base -.->|Sessions| S[State Tracking]
+
+ style Base fill:#e1f5ff
+ style Server fill:#fff4e1
+ style Client fill:#f0ffe1
+ style Constellation fill:#ffe1f5
+```
+
+The dashed arrows indicate capabilities that the base class provides to all subclasses. This inheritance design ensures consistent behavior across all endpoint types while allowing specialization for server, client, and orchestrator roles.
+
+**Base Endpoint Components:**
+
+All endpoints inherit from `AIPEndpoint`, which provides:
+
+- **Protocol**: Message serialization and handling
+- **Reconnection Strategy**: Automatic reconnection with backoff
+- **Timeout Manager**: Operation timeout management
+- **Session Handlers**: Per-session state tracking
+
+## Base Endpoint: AIPEndpoint
+
+### Common Methods
+
+| Method | Purpose | Example Usage |
+|--------|---------|---------------|
+| `start()` | Start endpoint | `await endpoint.start()` |
+| `stop()` | Stop endpoint | `await endpoint.stop()` |
+| `is_connected()` | Check connection | `if endpoint.is_connected(): ...` |
+| `send_with_timeout()` | Send with timeout | `await endpoint.send_with_timeout(msg, 30.0)` |
+| `receive_with_timeout()` | Receive with timeout | `msg = await endpoint.receive_with_timeout(ServerMessage, 60.0)` |
+
+**Basic Usage Pattern:**
+
+```python
+from aip.endpoints.base import AIPEndpoint
+
+# Start endpoint
+await endpoint.start()
+
+# Check connection
+if endpoint.is_connected():
+ await endpoint.handle_message(msg)
+
+# Send with timeout
+await endpoint.send_with_timeout(msg, timeout=30.0)
+
+# Clean shutdown
+await endpoint.stop()
+```
+
+---
+
+## DeviceServerEndpoint
+
+Wraps UFO's server-side WebSocket handler with AIP protocol support for managing multiple device connections simultaneously.
+
+### Configuration
+
+```python
+from aip.endpoints import DeviceServerEndpoint
+
+endpoint = DeviceServerEndpoint(
+ ws_manager=ws_manager, # WebSocket connection manager
+ session_manager=session_manager, # Session state manager
+ local=False # Local vs remote deployment
+)
+```
+
+### Integration with FastAPI
+
+```python
+from fastapi import FastAPI, WebSocket
+from aip.endpoints import DeviceServerEndpoint
+
+app = FastAPI()
+endpoint = DeviceServerEndpoint(ws_manager, session_manager)
+
+@app.websocket("/ws")
+async def websocket_route(websocket: WebSocket):
+ await endpoint.handle_websocket(websocket)
+```
+
+### Key Features
+
+| Feature | Description | Benefit |
+|---------|-------------|---------|
+| **Multiplexed Connections** | Handle multiple clients simultaneously | Scale to many devices |
+| **Session Management** | Track active sessions per device | Maintain conversation context |
+| **Task Dispatching** | Route tasks to appropriate clients | Targeted execution |
+| **Result Aggregation** | Collect and format execution results | Unified response handling |
+| **Auto Task Cancellation** | Cancel tasks on disconnect | Prevent orphaned tasks |
+
+**Backward Compatibility:**
+
+The Device Server Endpoint maintains full compatibility with UFO's existing WebSocket handler.
+
+### Task Cancellation on Disconnection
+
+```python
+# Automatically called when device disconnects
+await endpoint.cancel_device_tasks(
+ device_id="device_001",
+ reason="device_disconnected"
+)
+```
+
+---
+
+## DeviceClientEndpoint
+
+Wraps UFO's client-side WebSocket client with AIP protocol support, automatic reconnection, and heartbeat management.
+
+### Configuration
+
+```python
+from aip.endpoints import DeviceClientEndpoint
+
+endpoint = DeviceClientEndpoint(
+ ws_url="ws://localhost:8000/ws",
+ ufo_client=ufo_client,
+ max_retries=3,
+ timeout=120.0
+)
+```
+
+### Automatic Features
+
+| Feature | Default Behavior | Configuration |
+|---------|------------------|---------------|
+| **Heartbeat** | Starts on connection | 20s interval (fixed) |
+| **Reconnection** | Exponential backoff | `max_retries=3`, `initial_backoff=2.0` |
+| **Message Routing** | Auto-routes to UFO client | Handled internally |
+| **Connection Management** | Auto-connect on start | Transparent to user |
+
+**Lifecycle Management Example:**
+
+```python
+# Start and connect
+await endpoint.start()
+
+# Handle messages automatically
+# (routed to underlying UFO client)
+
+# Stop heartbeat and close
+await endpoint.stop()
+```
+
+### Reconnection Strategy
+
+```python
+from aip.resilience import ReconnectionStrategy
+
+reconnection_strategy = ReconnectionStrategy(
+ max_retries=3,
+ initial_backoff=2.0,
+ max_backoff=60.0
+)
+
+endpoint = DeviceClientEndpoint(
+ ws_url=url,
+ ufo_client=client,
+ reconnection_strategy=reconnection_strategy
+)
+```
+
+---
+
+## ConstellationEndpoint
+
+Enables the ConstellationClient to communicate with multiple devices simultaneously, managing connections, tasks, and queries.
+
+### Configuration
+
+```python
+from aip.endpoints import ConstellationEndpoint
+
+endpoint = ConstellationEndpoint(
+ task_name="multi_device_task",
+ message_processor=processor # Optional custom processor
+)
+```
+
+### Multi-Device Operations
+
+| Operation | Method | Description |
+|-----------|--------|-------------|
+| **Connect** | `connect_to_device()` | Establish connection to device |
+| **Send Task** | `send_task_to_device()` | Dispatch task to specific device |
+| **Query Info** | `request_device_info()` | Get device telemetry |
+| **Check Status** | `is_device_connected()` | Verify connection health |
+| **Disconnect** | `disconnect_device()` | Close device connection |
+| **Disconnect All** | `stop()` | Shutdown all connections |
+
+### Connecting to Devices
+
+```python
+# Connect using AgentProfile
+connection = await endpoint.connect_to_device(
+ device_info=agent_profile, # AgentProfile object
+ message_processor=processor
+)
+```
+
+Learn more about [AgentProfile configuration](../galaxy/client/device_manager.md) in the Galaxy documentation.
+
+### Sending Tasks
+
+```python
+# Dispatch task to specific device
+result = await endpoint.send_task_to_device(
+ device_id="device_001",
+ task_request={
+ "request": "Open Notepad",
+ "task_name": "open_notepad",
+ "session_id": "session_123"
+ }
+)
+```
+
+### Querying Device Info
+
+```python
+# Request telemetry update
+device_info = await endpoint.request_device_info("device_001")
+
+if device_info:
+ print(f"OS: {device_info['os']}")
+ print(f"CPU: {device_info['cpu']}")
+ print(f"GPU: {device_info.get('gpu', 'N/A')}")
+```
+
+### Connection Management
+
+**Managing Multiple Devices:**
+
+```python
+# Check connection before sending
+if endpoint.is_device_connected("device_001"):
+ await endpoint.send_task_to_device(...)
+
+# Disconnect specific device
+await endpoint.disconnect_device("device_001")
+
+# Disconnect all devices
+await endpoint.stop()
+```
+
+### Disconnection Handling
+
+```python
+# Automatically triggered on device disconnect
+await endpoint.on_device_disconnected("device_001")
+
+# Cancels pending tasks
+await endpoint.cancel_device_tasks(
+ device_id="device_001",
+ reason="device_disconnected"
+)
+
+# Attempts reconnection (if enabled)
+success = await endpoint.reconnect_device("device_001")
+```
+
+---
+
+## Endpoint Lifecycle Patterns
+
+### Server Lifecycle
+
+**Server Endpoint State Transitions:**
+
+This state diagram shows the lifecycle of a server endpoint from initialization through connection handling to shutdown:
+
+```mermaid
+stateDiagram-v2
+ [*] --> Initialize: Create endpoint
+ Initialize --> Started: start()
+ Started --> Listening: Accept connections
+ Listening --> Handling: Handle WebSocket
+ Handling --> Listening: Connection closed
+ Listening --> Stopped: stop()
+ Stopped --> [*]
+```
+
+The `Listening → Handling` loop represents the server accepting multiple client connections. Each connection is handled independently while the server remains in the listening state.
+
+**Server Lifecycle Code:**
+
+```python
+# 1. Initialize
+endpoint = DeviceServerEndpoint(client_manager, session_manager)
+
+# 2. Start
+await endpoint.start()
+
+# 3. Handle connections
+@app.websocket("/ws")
+async def handle_ws(websocket: WebSocket):
+ await endpoint.handle_websocket(websocket)
+
+# 4. Stop (on shutdown)
+await endpoint.stop()
+```
+
+### Client Lifecycle
+
+**Client Endpoint State Transitions with Auto-Reconnection:**
+
+This diagram shows the client lifecycle including automatic reconnection attempts when the connection is lost:
+
+```mermaid
+stateDiagram-v2
+ [*] --> Initialize: Create endpoint
+ Initialize --> Connecting: start()
+ Connecting --> Connected: Connection established
+ Connected --> Heartbeat: Auto-start heartbeat
+ Heartbeat --> Handling: Handle messages
+ Handling --> Heartbeat: Continue
+ Heartbeat --> Reconnecting: Connection lost
+ Reconnecting --> Connected: Reconnect successful
+ Reconnecting --> Stopped: Max retries
+ Connected --> Stopped: stop()
+ Stopped --> [*]
+```
+
+The `Heartbeat → Handling` loop represents normal operation with periodic heartbeats. The `Reconnecting → Connected` transition shows automatic recovery from network failures.
+
+**Client Lifecycle Code:**
+
+```python
+# 1. Initialize
+endpoint = DeviceClientEndpoint(ws_url, ufo_client)
+
+# 2. Connect
+await endpoint.start()
+
+# 3. Handle messages (automatic)
+# UFO client receives and processes messages
+
+# 4. Disconnect
+await endpoint.stop()
+```
+
+### Constellation Lifecycle
+
+**Constellation Lifecycle Code:**
+
+```python
+# 1. Initialize
+endpoint = ConstellationEndpoint(task_name)
+
+# 2. Start
+await endpoint.start()
+
+# 3. Connect to devices
+await endpoint.connect_to_device(device_info1)
+await endpoint.connect_to_device(device_info2)
+
+# 4. Send tasks
+await endpoint.send_task_to_device(device_id, task_request)
+
+# 5. Cleanup (disconnects all devices)
+await endpoint.stop()
+```
+
+---
+
+## Resilience Features
+
+!!!warning "Built-In Resilience"
+ All endpoints include automatic reconnection, timeout management, and heartbeat monitoring for production reliability.
+
+### Resilience Configuration
+
+| Component | Configuration | Purpose |
+|-----------|---------------|---------|
+| **Reconnection** | `ReconnectionStrategy` | Auto-reconnect with backoff |
+| **Timeout** | `TimeoutManager` | Enforce operation timeouts |
+| **Heartbeat** | `HeartbeatManager` | Monitor connection health |
+
+**Configuring Resilience:**
+
+```python
+from aip.resilience import ReconnectionStrategy, ReconnectionPolicy
+
+strategy = ReconnectionStrategy(
+ max_retries=5,
+ initial_backoff=1.0,
+ max_backoff=60.0,
+ backoff_multiplier=2.0,
+ policy=ReconnectionPolicy.EXPONENTIAL_BACKOFF
+)
+
+endpoint = DeviceClientEndpoint(
+ ws_url=url,
+ ufo_client=client,
+ reconnection_strategy=strategy
+)
+```
+
+### Timeout Operations
+
+```python
+# Send with custom timeout
+await endpoint.send_with_timeout(msg, timeout=30.0)
+
+# Receive with custom timeout
+msg = await endpoint.receive_with_timeout(ServerMessage, timeout=60.0)
+```
+
+[→ See detailed resilience documentation](./resilience.md)
+
+---
+
+## Error Handling Patterns
+
+### Connection Errors
+
+```python
+try:
+ await endpoint.start()
+except ConnectionError as e:
+ logger.error(f"Failed to connect: {e}")
+ # Reconnection handled automatically if enabled
+```
+
+### Task Execution Errors
+
+```python
+try:
+ result = await endpoint.send_task_to_device(device_id, task)
+except TimeoutError:
+ logger.error("Task execution timeout")
+except Exception as e:
+ logger.error(f"Task failed: {e}")
+```
+
+### Custom Disconnection Handling
+
+```python
+class CustomEndpoint(DeviceClientEndpoint):
+ async def on_device_disconnected(self, device_id: str) -> None:
+ logger.warning(f"Device {device_id} disconnected")
+
+ # Custom cleanup logic
+ await self.custom_cleanup(device_id)
+
+ # Call parent implementation
+ await super().on_device_disconnected(device_id)
+```
+
+---
+
+## Best Practices
+
+**Endpoint Selection:**
+
+| Use Case | Endpoint Type |
+|----------|---------------|
+| Device agent server | `DeviceServerEndpoint` |
+| Device agent client | `DeviceClientEndpoint` |
+| Multi-device orchestrator | `ConstellationEndpoint` |
+
+!!!warning "Configuration Guidelines"
+ - **Set appropriate timeouts** based on deployment environment
+ - **Configure reconnection** based on network reliability
+ - **Monitor connection health** with `is_connected()` checks
+ - **Implement custom handlers** for application-specific cleanup
+
+!!!success "Resource Management"
+ - **Always call `stop()`** during shutdown to prevent leaks
+ - **Use message processors** for custom message handling
+ - **Handle disconnections** with `on_device_disconnected` overrides
+
+**Custom Message Processor:**
+
+```python
+class MyProcessor:
+ async def process_message(self, msg):
+ # Custom processing
+ logger.info(f"Processing: {msg.type}")
+ # ...
+
+endpoint = ConstellationEndpoint(
+ task_name="task",
+ message_processor=MyProcessor()
+)
+```
+
+---
+
+## Quick Reference
+
+### Import Endpoints
+
+```python
+from aip.endpoints import (
+ AIPEndpoint, # Base class
+ DeviceServerEndpoint, # Server-side
+ DeviceClientEndpoint, # Client-side
+ ConstellationEndpoint, # Orchestrator-side
+)
+```
+
+### Related Documentation
+
+- [Protocol Reference](./protocols.md) - Protocol implementations used by endpoints
+- [Transport Layer](./transport.md) - Transport configuration and options
+- [Resilience](./resilience.md) - Reconnection and heartbeat management
+- [Messages](./messages.md) - Message types and validation
+- [Overview](./overview.md) - System architecture and design
+- [Galaxy Client](../galaxy/client/overview.md) - Multi-device orchestration with ConstellationClient
+- [UFO Server](../server/websocket_handler.md) - WebSocket server implementation
+- [UFO Client](../client/websocket_client.md) - WebSocket client implementation
+
diff --git a/documents/docs/aip/messages.md b/documents/docs/aip/messages.md
new file mode 100644
index 000000000..5c5067615
--- /dev/null
+++ b/documents/docs/aip/messages.md
@@ -0,0 +1,628 @@
+# AIP Message Reference
+
+AIP uses **Pydantic-based messages** for automatic validation, serialization, and type safety. All messages transmit as JSON over WebSocket.
+
+## Message Overview
+
+### Bidirectional Communication
+
+**Message Flow Overview:**
+
+This diagram illustrates all message types and their directions in the AIP protocol, showing how clients and servers communicate bidirectionally:
+
+```mermaid
+graph LR
+ Client[Device Client]
+ Server[Device Service]
+
+ Client -->|REGISTER| Server
+ Client -->|COMMAND_RESULTS| Server
+ Client -->|TASK_END| Server
+ Client -->|HEARTBEAT| Server
+
+ Server -->|TASK| Client
+ Server -->|COMMAND| Client
+ Server -->|HEARTBEAT| Client
+ Server -->|TASK_END| Client
+
+ Client <-->|DEVICE_INFO| Server
+ Client <-->|ERROR| Server
+
+ style Client fill:#f0ffe1
+ style Server fill:#fff4e1
+```
+
+Unidirectional arrows indicate request-response patterns, while bidirectional arrows (`<-->`) indicate messages that can be initiated by either party. Note that both `HEARTBEAT` and `TASK_END` can flow in both directions depending on the scenario.
+
+### Message Types Quick Reference
+
+| Direction | Message Type | Purpose | Key Fields |
+|-----------|--------------|---------|------------|
+| **Client → Server** | | | |
+| | `REGISTER` | Initial capability advertisement | `client_id`, `metadata` |
+| | `COMMAND_RESULTS` | Return command execution results | `action_results`, `prev_response_id` |
+| | `TASK_END` | Notify task completion | `status`, `session_id` |
+| | `HEARTBEAT` | Keepalive signal | `client_id` |
+| **Server → Client** | | | |
+| | `TASK` | Task assignment | `user_request`, `task_name` |
+| | `COMMAND` | Command execution request | `actions`, `response_id` |
+| | `HEARTBEAT` | Keepalive acknowledgment | `response_id` |
+| | `TASK_END` | Task completion notification | `status`, `result` |
+| **Bidirectional** | | | |
+| | `DEVICE_INFO_REQUEST` | Request device telemetry | `request_id` |
+| | `DEVICE_INFO_RESPONSE` | Device information | Device specs |
+| | `ERROR` | Error condition | `error` message |
+
+---
+---
+
+## Core Data Structures
+
+These Pydantic models form the building blocks for all AIP messages.
+
+### Essential Types Summary
+
+| Type | Purpose | Key Fields | Usage |
+|------|---------|------------|-------|
+| **Rect** | UI element coordinates | `x`, `y`, `width`, `height` | UI automation |
+| **ControlInfo** | UI control metadata | `annotation_id`, `name`, `rectangle` | Control discovery |
+| **WindowInfo** | Window metadata | `process_id`, `is_active` (extends ControlInfo) | Window management |
+| **MCPToolInfo** | Tool definition | `tool_key`, `namespace`, `input_schema` | Capability advertisement |
+| **Command** | Execution request | `tool_name`, `parameters`, `call_id` | Action dispatch |
+| **Result** | Execution outcome | `status`, `result`, `error` | Result reporting |
+
+### Rect (Rectangle)
+
+Represents UI element bounding box.
+
+```python
+rect = Rect(x=100, y=200, width=300, height=150)
+```
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `x` | int | X-coordinate (top-left) |
+| `y` | int | Y-coordinate (top-left) |
+| `width` | int | Width in pixels |
+| `height` | int | Height in pixels |
+
+### ControlInfo
+
+UI control element metadata.
+
+**ControlInfo Example:**
+
+```python
+control = ControlInfo(
+ annotation_id="ctrl_001",
+ name="Submit Button",
+ class_name="Button",
+ rectangle=Rect(x=100, y=200, width=80, height=30),
+ is_enabled=True,
+ is_visible=True
+)
+```
+
+**Complete Field List:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `annotation_id` | str? | Unique annotation identifier |
+| `name` | str? | Control name |
+| `title` | str? | Control title |
+| `handle` | int? | Windows handle (HWND) |
+| `class_name` | str? | UI class name |
+| `rectangle` | Rect? | Bounding rectangle |
+| `control_type` | str? | Type (Button, TextBox, etc.) |
+| `automation_id` | str? | UI Automation ID |
+| `is_enabled` | bool? | Enabled state |
+| `is_visible` | bool? | Visibility state |
+| `source` | str? | Data source identifier |
+| `text_content` | str? | Text content |
+
+### WindowInfo
+
+Window metadata (extends ControlInfo).
+
+**Additional Fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `process_id` | int? | Process ID (PID) |
+| `process_name` | str? | Process name (e.g., "notepad.exe") |
+| `is_minimized` | bool? | Minimized state |
+| `is_maximized` | bool? | Maximized state |
+| `is_active` | bool? | Has focus |
+
+### MCPToolInfo
+
+MCP tool capability definition.
+
+**Tool Advertisement:**
+
+Device agents use `MCPToolInfo` to advertise their capabilities during registration.
+
+```python
+tool_info = MCPToolInfo(
+ tool_key="ui_automation.click_button",
+ tool_name="click_button",
+ namespace="ui_automation",
+ tool_type="action",
+ description="Click a button by its ID",
+ input_schema={
+ "type": "object",
+ "properties": {
+ "button_id": {"type": "string"}
+ }
+ }
+)
+```
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `tool_key` | str | Unique key (`namespace.tool_name`) |
+| `tool_name` | str | Tool name |
+| `namespace` | str | MCP namespace |
+| `tool_type` | str | `"action"` or `"data_collection"` |
+| `description` | str? | Tool description |
+| `input_schema` | dict? | JSON schema for inputs |
+| `output_schema` | dict? | JSON schema for outputs |
+| `meta` | dict? | Metadata |
+| `annotations` | dict? | Additional annotations |
+
+Learn more about [MCP tools and capabilities](../mcp/overview.md).
+
+---
+
+## Command and Result Structures
+
+### Command
+
+Execution request sent to device agents.
+
+**Command Structure:**
+
+```python
+cmd = Command(
+ tool_name="click_element",
+ parameters={"control_id": "btn_submit"},
+ tool_type="action",
+ call_id="cmd_12345"
+)
+```
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `tool_name` | str | ✅ | Name of tool to execute |
+| `parameters` | dict | | Tool parameters |
+| `tool_type` | str | ✅ | `"data_collection"` or `"action"` |
+| `call_id` | str | | Unique identifier for correlation |
+
+**Call ID Correlation:**
+
+Use `call_id` to match commands with their results in the `Result` object.
+
+### ResultStatus
+
+Execution outcome enumeration.
+
+| Status | Meaning | When to Use |
+|--------|---------|-------------|
+| `SUCCESS` | ✅ Completed successfully | Command executed without errors |
+| `FAILURE` | ❌ Failed with error | Execution encountered an error |
+| `SKIPPED` | ⏭️ Skipped execution | Conditional execution, not run |
+| `NONE` | ⚪ No status | Initial/unknown state |
+
+### Result
+
+Command execution outcome.
+
+!!!warning "Always Check Status"
+ Check `status` before accessing `result`. If `FAILURE`, use `error` field for diagnostics.
+
+```python
+# Success result
+result = Result(
+ status=ResultStatus.SUCCESS,
+ result={"element_found": True, "clicked": True},
+ namespace="ui_automation",
+ call_id="cmd_12345"
+)
+
+# Failure result
+result = Result(
+ status=ResultStatus.FAILURE,
+ error="Element not found: btn_submit",
+ namespace="ui_automation",
+ call_id="cmd_12345"
+)
+```
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | ResultStatus | Execution status |
+| `error` | str? | Error message (if FAILURE) |
+| `result` | Any | Result payload (type varies by tool) |
+| `namespace` | str? | Namespace of executed tool |
+| `call_id` | str? | Matches Command.call_id |
+
+---
+
+## Status Enumerations
+
+### TaskStatus
+
+Task lifecycle states.
+
+**State Transitions:**
+
+**Task Lifecycle State Machine:**
+
+This diagram shows the possible state transitions during task execution, including the multi-turn loop and terminal states:
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: Task starts
+ CONTINUE --> CONTINUE: Multi-step
+ CONTINUE --> COMPLETED: Success
+ CONTINUE --> FAILED: Error
+ OK --> OK: Heartbeat
+ ERROR --> [*]: Terminal
+```
+
+The `CONTINUE → CONTINUE` self-loop represents multi-turn execution where tasks request additional commands before completion. `COMPLETED` and `FAILED` are terminal success/failure states.
+
+| Status | Meaning | Usage |
+|--------|---------|-------|
+| `CONTINUE` | 🔄 Task ongoing | Multi-turn execution, more steps needed |
+| `COMPLETED` | ✅ Task done | Successful completion |
+| `FAILED` | ❌ Task failed | Error encountered |
+| `OK` | ✓ Acknowledgment | Heartbeat, health check passed |
+| `ERROR` | ⚠️ Protocol error | Protocol-level error |
+
+**Multi-Turn Execution:**
+
+`CONTINUE` enables agents to request additional commands before marking a task as complete, supporting complex multi-step workflows.
+
+---
+
+## Client Types
+
+### ClientType
+
+Identifies the type of client connecting to the server.
+
+| Type | Role | Characteristics |
+|------|------|----------------|
+| `DEVICE` | Device agent executor | • Executes tasks locally • Reports telemetry • Single-device focus |
+| `CONSTELLATION` | Multi-device orchestrator | • Manages multiple devices • Coordinates tasks • Requires `target_id` |
+
+**Registration by Type:**
+
+```python
+# Device client
+device_msg = ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_type=ClientType.DEVICE,
+ client_id="device_001"
+)
+
+# Constellation client
+constellation_msg = ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_type=ClientType.CONSTELLATION,
+ client_id="orchestrator_001",
+ target_id="device_001" # Target device
+)
+```
+
+---
+
+## ClientMessage (Client → Server)
+
+Devices and constellation clients use `ClientMessage` to communicate with the server.
+
+### Message Types
+
+| Type | Purpose | Required Fields |
+|------|---------|----------------|
+| **REGISTER** | Initial registration | `client_id`, `client_type` |
+| **HEARTBEAT** | Keepalive | `client_id`, `status=OK` |
+| **TASK** | Request task execution | `request`, `client_id` |
+| **TASK_END** | Notify completion | `session_id`, `status` |
+| **COMMAND_RESULTS** | Return results | `action_results`, `prev_response_id` |
+| **DEVICE_INFO_REQUEST** | Request telemetry | `request_id` |
+| **DEVICE_INFO_RESPONSE** | Provide telemetry | Device data |
+| **ERROR** | Report error | `error` |
+
+### Common Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `type` | ClientMessageType | Message type |
+| `status` | TaskStatus | Current task status |
+| `client_type` | ClientType | DEVICE or CONSTELLATION |
+| `session_id` | str? | Session identifier |
+| `task_name` | str? | Human-readable task name |
+| `client_id` | str? | Unique client identifier |
+| `target_id` | str? | Target device (for constellation) |
+| `request` | str? | Request text (for TASK) |
+| `action_results` | List[Result]? | Command results |
+| `timestamp` | str? | ISO 8601 timestamp |
+| `request_id` | str? | Unique request identifier |
+| `prev_response_id` | str? | Previous response ID |
+| `error` | str? | Error message |
+| `metadata` | dict? | Additional metadata |
+
+### Example: REGISTER
+
+```python
+register_msg = ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_type=ClientType.DEVICE,
+ client_id="windows_agent_001",
+ status=TaskStatus.OK,
+ timestamp="2024-11-04T10:30:00Z",
+ metadata={
+ "platform": "windows",
+ "os_version": "Windows 11",
+ "capabilities": ["ui_automation", "file_operations"]
+ }
+)
+```
+
+### Example: COMMAND_RESULTS
+
+```python
+results_msg = ClientMessage(
+ type=ClientMessageType.COMMAND_RESULTS,
+ client_id="windows_agent_001",
+ session_id="session_123",
+ prev_response_id="resp_456", # Links to server's COMMAND message
+ status=TaskStatus.CONTINUE,
+ action_results=[
+ Result(status=ResultStatus.SUCCESS, result={"clicked": True}),
+ Result(status=ResultStatus.SUCCESS, result={"text_entered": True})
+ ],
+ timestamp="2024-11-04T10:31:00Z",
+ request_id="req_789"
+)
+```
+
+---
+
+## ServerMessage (Server → Client)
+
+Device services use `ServerMessage` to assign tasks and send commands to clients.
+
+### Message Types
+
+| Type | Purpose | Required Fields |
+|------|---------|----------------|
+| **TASK** | Assign task | `user_request`, `task_name`, `session_id` |
+| **COMMAND** | Execute commands | `actions`, `response_id`, `session_id` |
+| **TASK_END** | Notify completion | `status`, `session_id` |
+| **HEARTBEAT** | Keepalive ack | `response_id` |
+| **DEVICE_INFO_REQUEST** | Request telemetry | `request_id` |
+| **DEVICE_INFO_RESPONSE** | Telemetry data | Device info |
+| **ERROR** | Error notification | `error` |
+
+### Common Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `type` | ServerMessageType | Message type |
+| `status` | TaskStatus | Current task status |
+| `user_request` | str? | Original user request |
+| `agent_name` | str? | Agent handling task |
+| `process_name` | str? | Process for execution context |
+| `root_name` | str? | Root application name |
+| `actions` | List[Command]? | Commands to execute |
+| `messages` | List[str]? | Log messages |
+| `error` | str? | Error description |
+| `session_id` | str? | Session identifier |
+| `task_name` | str? | Task name |
+| `timestamp` | str? | ISO 8601 timestamp |
+| `response_id` | str? | Response identifier |
+| `result` | Any? | Result payload |
+
+### Example: TASK Assignment
+
+```python
+task_msg = ServerMessage(
+ type=ServerMessageType.TASK,
+ status=TaskStatus.CONTINUE,
+ user_request="Open Notepad and create a new file",
+ task_name="create_notepad_file",
+ session_id="session_123",
+ response_id="resp_001",
+ agent_name="AppAgent",
+ process_name="notepad.exe",
+ timestamp="2024-11-04T10:30:00Z"
+)
+```
+
+### Example: COMMAND Execution
+
+```python
+command_msg = ServerMessage(
+ type=ServerMessageType.COMMAND,
+ status=TaskStatus.CONTINUE,
+ session_id="session_123",
+ response_id="resp_456",
+ actions=[
+ Command(
+ tool_name="launch_application",
+ parameters={"app_name": "notepad"},
+ tool_type="action",
+ call_id="cmd_001"
+ ),
+ Command(
+ tool_name="type_text",
+ parameters={"text": "Hello World"},
+ tool_type="action",
+ call_id="cmd_002"
+ )
+ ],
+ timestamp="2024-11-04T10:30:30Z"
+)
+```
+
+### Example: TASK_END
+
+```python
+task_end_msg = ServerMessage(
+ type=ServerMessageType.TASK_END,
+ status=TaskStatus.COMPLETED,
+ session_id="session_123",
+ response_id="resp_999",
+ result={
+ "file_created": True,
+ "path": "C:\\Users\\user\\document.txt"
+ },
+ timestamp="2024-11-04T10:35:00Z"
+)
+```
+
+---
+
+## Message Validation
+
+!!!warning "Built-In Validation"
+ AIP provides `MessageValidator` class for ensuring message integrity. Always validate messages before processing to prevent protocol errors.
+
+### Validation Methods
+
+| Method | Purpose | Requirements |
+|--------|---------|-------------|
+| `validate_registration()` | Check registration | `type=REGISTER`, `client_id` present |
+| `validate_task_request()` | Check task request | `type=TASK`, `request` and `client_id` present |
+| `validate_command_results()` | Check results | `type=COMMAND_RESULTS`, `prev_response_id` present |
+| `validate_server_message()` | Check server msg | `type` and `status` present |
+
+**Validation Usage:**
+
+```python
+from aip.messages import MessageValidator
+
+# Validate registration
+if MessageValidator.validate_registration(client_message):
+ await process_registration(client_message)
+
+# Validate task request
+if MessageValidator.validate_task_request(client_message):
+ await dispatch_task(client_message)
+
+# Validate command results
+if MessageValidator.validate_command_results(client_message):
+ await process_results(client_message)
+```
+
+---
+
+## Message Correlation
+
+AIP uses identifier chains to maintain conversation context across multiple message exchanges.
+
+### Correlation Pattern
+
+**Message Identifier Chaining:**
+
+This sequence diagram demonstrates how messages are linked together using correlation IDs to maintain conversation context:
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant S as Server
+
+ C->>S: request_id: "req_001"
+ S->>C: response_id: "resp_001"
+ C->>S: request_id: "req_002" prev_response_id: "resp_001"
+ S->>C: response_id: "resp_002"
+```
+
+Each new request includes `prev_response_id` pointing to the previous server response, forming a traceable conversation chain. This pattern enables audit trails, debugging, and request-response correlation in multi-turn conversations.
+
+### Correlation Fields
+
+| Field | Purpose | Example |
+|-------|---------|---------|
+| `request_id` | Unique request identifier | `"req_abc123"` |
+| `response_id` | Unique response identifier | `"resp_def456"` |
+| `prev_response_id` | Links to previous response | `"resp_def456"` |
+| `session_id` | Groups related messages | `"session_xyz"` |
+| `call_id` | Correlates commands/results | `"cmd_001"` |
+
+### Session Tracking
+
+**Session-Based Grouping:**
+
+All messages within a task execution share the same `session_id` for traceability.
+
+```python
+# All messages use same session_id
+SESSION_ID = "session_abc123"
+
+task_msg.session_id = SESSION_ID
+command_msg.session_id = SESSION_ID
+results_msg.session_id = SESSION_ID
+task_end_msg.session_id = SESSION_ID
+```
+
+---
+
+## Best Practices
+
+!!!success "Message Construction"
+ **Timestamps**: Always use ISO 8601 format
+ ```python
+ from datetime import datetime, timezone
+ timestamp = datetime.now(timezone.utc).isoformat()
+ ```
+
+ **Unique IDs**: Generate UUIDs for correlation
+ ```python
+ import uuid
+ request_id = str(uuid.uuid4())
+ ```
+
+!!!warning "Error Handling"
+ - Check `Result.status` before accessing result data
+ - Always provide meaningful error messages
+ - Use `ResultStatus.FAILURE` with descriptive `error` field
+
+**Extensibility:**
+
+- Use `metadata` field for custom data without breaking protocol
+- Leverage Pydantic's validation for type safety
+- Always correlate messages with `prev_response_id`
+
+---
+
+## Quick Reference
+
+### Import Messages
+
+```python
+from aip.messages import (
+ ClientMessage,
+ ServerMessage,
+ ClientMessageType,
+ ServerMessageType,
+ ClientType,
+ TaskStatus,
+ Command,
+ Result,
+ ResultStatus,
+ MessageValidator,
+)
+```
+
+### Related Documentation
+
+- [Protocol Guide](./protocols.md) - How protocols construct and use messages
+- [Endpoints](./endpoints.md) - How endpoints handle messages
+- [Overview](./overview.md) - High-level message flow in system architecture
+- [Transport Layer](./transport.md) - WebSocket transport for message delivery
+- [Resilience](./resilience.md) - Message retry and timeout handling
+- [MCP Integration](../mcp/overview.md) - How MCP tools integrate with AIP messages
diff --git a/documents/docs/aip/overview.md b/documents/docs/aip/overview.md
new file mode 100644
index 000000000..1c08611ab
--- /dev/null
+++ b/documents/docs/aip/overview.md
@@ -0,0 +1,384 @@
+# Agent Interaction Protocol (AIP)
+
+The orchestration model requires a communication substrate that remains **correct under continuous DAG evolution**, **dynamic agent participation**, and **fine-grained event propagation**. Legacy HTTP-based coordination approaches (e.g., A2A, ACP) assume short-lived, stateless interactions, incurring handshake overhead, stale capability views, and fragile recovery when partial failures occur mid-task. These assumptions make them unsuitable for the continuously evolving workflows and long-running reasoning loops characteristic of UFO².
+
+## Design Overview
+
+AIP serves as the **nervous system** of UFO², connecting the ConstellationClient, device agent services, and device clients under a unified, event-driven control plane. It is designed as a lightweight yet evolution-tolerant protocol to satisfy six goals:
+
+**Design Goals:**
+
+- **(G1)** Maintain persistent bidirectional sessions to eliminate per-request overhead
+- **(G2)** Unify heterogeneous capability discovery via multi-source profiling
+- **(G3)** Ensure fine-grained reliability through heartbeats and timeout managers for disconnection and failure detection
+- **(G4)** Preserve deterministic command ordering within sessions
+- **(G5)** Support composable extensibility for new message types and resilience strategies
+- **(G6)** Provide transparent reconnection and task continuity under transient failures
+
+| Legacy HTTP Coordination | AIP WebSocket-Based Design |
+|--------------------------|----------------------------|
+| ❌ Short-lived requests | ✅ Persistent sessions (G1) |
+| ❌ Stateless interactions | ✅ Session-aware task management |
+| ❌ High latency overhead | ✅ Low-latency event streaming |
+| ❌ Poor reconnection support | ✅ Seamless recovery from disconnections (G6) |
+| ❌ Manual state synchronization | ✅ Automatic DAG state propagation |
+| ❌ Fragile partial failures | ✅ Fine-grained reliability (G3) |
+
+## Five-Layer Architecture
+
+To meet these requirements, AIP adopts a persistent, bidirectional WebSocket transport and decomposes the orchestration substrate into **five** logical strata, each responsible for a distinct aspect of reliability and adaptability. The architecture establishes a complete substrate where **L1** defines semantic contracts, **L2** provides transport flexibility, **L3** implements protocol logic, **L4** ensures operational resilience, and **L5** delivers deployment-ready orchestration primitives.
+
+**Architecture Diagram:**
+
+The following diagram illustrates the five-layer architecture and the roles of each component:
+
+
+
+### Layer 1: Message Schema Layer
+
+Defines strongly-typed, Pydantic-validated contracts (`ClientMessage`, `ServerMessage`) for message direction, purpose, and task transitions. All messages are validated at schema level, preventing malformed messages from entering the protocol pipeline, enabling early error detection and simplifying debugging.
+
+| Responsibility | Implementation | Supports |
+|----------------|----------------|----------|
+| Message contracts | Pydantic models with validation | Human-readable + machine-verifiable |
+| Structured metadata | System info, capabilities | Unified capability discovery (G2) |
+| ID correlation | Explicit request/response linking | Deterministic ordering (G4) |
+
+### Layer 2: Transport Abstraction Layer
+
+Provides protocol-agnostic `Transport` interface with production-grade WebSocket implementation. The abstraction layer allows swapping transports without changing protocol logic, supporting future protocol evolution.
+
+| Feature | Benefit | Goals |
+|---------|---------|-------|
+| Configurable pings/timeouts | Connection health monitoring | G3 |
+| Large payload support | Handles complex task definitions | G1 |
+| Decoupled transport logic | Future extensibility (HTTP/3, gRPC) | G5 |
+| Low-latency persistent sessions | Eliminates per-request overhead | G1 |
+
+### Layer 3: Protocol Orchestration Layer
+
+Implements modular handlers for registration, task execution, heartbeat, and command dispatch. Each handler is independently testable and replaceable, supporting composable extensibility (G5) while maintaining ordered state transitions (G4).
+
+| Component | Purpose | Design |
+|-----------|---------|--------|
+| `AIPProtocol` base | Common handler infrastructure | Extensible base class |
+| Handler modules | Registration, tasks, heartbeat, commands | Pluggable handlers |
+| Middleware hooks | Logging, metrics, authentication | Composable extensions (G5) |
+| State transitions | Ordered message processing | Deterministic ordering (G4) |
+
+**Related Documentation:**
+- [Complete message reference](./messages.md)
+- [Protocol implementation details](./protocols.md)
+
+### Layer 4: Resilience and Health Management Layer
+
+!!!warning "Fault Tolerance"
+ This layer guarantees fine-grained reliability (G3) and seamless task continuity under transient disconnections (G6), preventing cascade failures.
+
+Encapsulates reliability mechanisms ensuring operational continuity under failures:
+
+| Component | Mechanism | Goals |
+|-----------|-----------|-------|
+| `HeartbeatManager` | Periodic keepalive signals | G3 |
+| `TimeoutManager` | Configurable timeout policies | G3 |
+| `ReconnectionStrategy` | Exponential backoff with jitter | G6 |
+| Session recovery | Automatic state restoration | G6 |
+
+[→ Resilience implementation details](./resilience.md)
+
+### Layer 5: Endpoint Orchestration Layer
+
+Provides role-specific facades integrating lower layers into deployable components. These endpoints unify connection lifecycle, task routing, and health monitoring across roles, reinforcing G1–G6 through consistent implementation of lower-layer capabilities.
+
+| Endpoint | Role | Responsibilities |
+|----------|------|------------------|
+| `ConstellationEndpoint` | Orchestrator | Global agent registry, task assignment, DAG coordination |
+| `DeviceServerEndpoint` | Server | WebSocket connection management, task dispatch, result aggregation |
+| `DeviceClientEndpoint` | Executor | Local task execution, MCP tool invocation, telemetry reporting |
+
+**Endpoint Integration Benefits:**
+
+- ✅ Connection lifecycle management (G1, G6)
+- ✅ Role-specific protocol variants (G5)
+- ✅ Health monitoring integration (G3)
+- ✅ Task routing and session management (G4)
+
+[→ Endpoint setup guide](./endpoints.md)
+
+## Architecture Benefits
+
+Together, these layers form a vertically integrated stack that enables UFO² to maintain **correctness and availability** under challenging conditions:
+
+| Challenge | How AIP Addresses It | Layers Involved |
+|-----------|----------------------|-----------------|
+| **DAG Evolution** | Deterministic ordering, extensible message types | L1, L3, L4, L5 (G4, G5) |
+| **Agent Churn** | Heartbeats, reconnection, session recovery | L4, L5 (G3, G6) |
+| **Heterogeneous Environments** | Persistent sessions, multi-source profiling | L1, L2, L5 (G1, G2) |
+| **Transient Failures** | Timeout management, automatic recovery | L4 (G3, G6) |
+| **Protocol Evolution** | Transport abstraction, middleware hooks | L2, L3 (G5) |
+
+AIP transforms distributed workflow execution into a **coherent, safe, and adaptive system** where reasoning and execution converge seamlessly across diverse agents and environments.
+
+## Core Capabilities
+
+### Agent Registration & Profiling
+
+Each agent is represented by an **AgentProfile** combining data from three sources for comprehensive capability discovery, supporting heterogeneous capability unification (G2):
+
+| Source | Provider | Information |
+|--------|----------|-------------|
+| **User Config** | ConstellationClient | Endpoint URLs, user preferences, device identity |
+| **Service Manifest** | Device Agent Service | Supported tools, capabilities, operational metadata |
+| **Client Telemetry** | Device Agent Client | OS, hardware specs, GPU status, runtime metrics |
+
+**Benefits of Multi-Level Profiling:**
+
+- ✅ Accurate task allocation based on real-time capabilities (G2)
+- ✅ Transparent adaptation to environmental changes (e.g., GPU availability)
+- ✅ No manual updates needed when device state changes
+- ✅ Informed scheduling decisions at scale
+
+!!!tip "Dynamic Profile Updates"
+ Client telemetry continuously refreshes, so the orchestrator always sees current device state—critical for GPU-aware scheduling or cross-device load balancing (G2).
+
+[→ See detailed registration flow](./protocols.md)
+
+### Task Dispatch & Result Delivery
+
+AIP uses **long-lived WebSocket sessions** that span multiple task executions, eliminating per-request connection overhead and preserving context (G1).
+
+**Task Execution Sequence:**
+
+The following sequence diagram shows the complete lifecycle of a task from assignment to completion, including intermediate execution steps and state updates:
+
+```mermaid
+sequenceDiagram
+ participant CC as ConstellationClient
+ participant DAS as Device Service
+ participant DAC as Device Client
+
+ CC->>DAS: TASK message (TaskStar)
+ DAS->>DAC: Stream task payload
+ DAC->>DAC: Execute using MCP tools
+ DAC->>DAS: Stream execution logs
+ DAS->>CC: TASK_END (status, logs, results)
+ CC->>CC: Update TaskConstellation
+ CC->>CC: Notify ConstellationAgent
+```
+
+Each arrow represents a message exchange, with vertical lifelines showing the temporal ordering of events. Note how logs stream back during execution, enabling real-time monitoring.
+
+| Stage | Message Type | Content |
+|-------|-------------|---------|
+| Assignment | `TASK` | TaskStar definition, target device, commands |
+| Execution | (internal) | MCP tool invocations, local computation |
+| Reporting | `TASK_END` | Status, logs, evaluator outputs, results |
+
+!!!warning "Asynchronous Execution"
+ Tasks execute asynchronously. The orchestrator may assign multiple tasks to different devices simultaneously, with results arriving in non-deterministic order.
+
+**Related Documentation:**
+- [Message format details](./messages.md)
+- [TaskConstellation documentation](../galaxy/constellation/task_constellation.md)
+- [TaskStar (task nodes) documentation](../galaxy/constellation/task_star.md)
+
+### Command Execution
+
+Within each task, AIP executes **individual commands** deterministically with preserved ordering, enabling precise control and error handling (G4).
+
+**Command Structure:**
+
+| Field | Purpose | Example |
+|-------|---------|---------|
+| `tool_name` | Tool/action name | `"click_input"` |
+| `parameters` | Typed arguments | `{"target": "Save Button", "button": "left"}` |
+| `tool_type` | Category | `"action"` or `"data_collection"` |
+| `call_id` | Unique identifier | `"cmd_001"` |
+
+**Execution Guarantees:**
+
+- ✅ **Sequential execution** within a session (deterministic order) (G4)
+- ✅ **Command batching** supported (reduces network overhead)
+- ✅ **Structured results** with status codes and error details
+- ✅ **Timeout propagation** for precise recovery strategies (G3)
+
+**Command Batching Example:**
+
+```json
+{
+ "actions": [
+ {"tool_name": "click", "parameters": {"target": "File"}, "call_id": "1"},
+ {"tool_name": "click", "parameters": {"target": "Save As"}, "call_id": "2"},
+ {"tool_name": "type", "parameters": {"text": "document.pdf"}, "call_id": "3"}
+ ]
+}
+```
+
+All three commands sent in one message, executed sequentially.
+
+[→ See command execution protocol](./protocols.md)
+
+## Message Protocol Overview
+
+All AIP messages use **Pydantic models** for automatic validation, serialization, and type safety.
+
+### Bidirectional Message Types
+
+| Direction | Message Type | Purpose |
+|-----------|--------------|---------|
+| **Client → Server** | `REGISTER` | Initial capability advertisement |
+| | `COMMAND_RESULTS` | Return command execution results |
+| | `TASK_END` | Notify task completion |
+| | `HEARTBEAT` | Keepalive signal |
+| | `DEVICE_INFO_RESPONSE` | Device telemetry update |
+| **Server → Client** | `TASK` | Task assignment |
+| | `COMMAND` | Command execution request |
+| | `DEVICE_INFO_REQUEST` | Request telemetry refresh |
+| | `HEARTBEAT` | Keepalive acknowledgment |
+| **Bidirectional** | `ERROR` | Error condition reporting |
+
+**Message Correlation:**
+
+Every message includes:
+
+- `timestamp`: ISO 8601 formatted
+- `request_id` / `response_id`: Unique identifier
+- `prev_response_id`: Links responses to requests
+- `session_id`: Session context
+
+[→ Complete message reference](./messages.md)
+
+## Resilient Connection Protocol
+
+!!!warning "Network Instability Handling (G3, G6)"
+ AIP ensures **continuous orchestration** even under transient network failures or device disconnections through fine-grained reliability mechanisms and transparent reconnection.
+
+### Device Disconnection Flow
+
+**Connection State Transitions:**
+
+This state diagram illustrates how devices transition between connection states and the actions triggered at each transition:
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONNECTED
+ CONNECTED --> DISCONNECTED: Connection lost
+ DISCONNECTED --> CONNECTED: Reconnection succeeds
+ DISCONNECTED --> [*]: Timeout / Manual removal
+
+ note right of DISCONNECTED
+ • Excluded from scheduling
+ • Tasks marked FAILED
+ • Auto-reconnect triggered
+ end note
+```
+
+The `DISCONNECTED` state acts as a quarantine zone where the device is temporarily removed from the scheduling pool while auto-reconnection attempts are made. If reconnection fails after timeout, the device is permanently removed.
+
+| Event | Orchestrator Action | Device Action |
+|-------|---------------------|---------------|
+| **Device disconnects** | Mark as `DISCONNECTED` Exclude from scheduling Trigger auto-reconnect (G6) | N/A |
+| **Reconnection succeeds** | Mark as `CONNECTED` Resume scheduling | Session restored (G6) |
+| **Disconnect during task** | Mark tasks as `FAILED` Propagate to ConstellationAgent Trigger DAG edit | N/A |
+
+### ConstellationClient Disconnection
+
+!!!danger "Bidirectional Fault Handling"
+ When the **ConstellationClient** disconnects, all Device Agent Services:
+
+ 1. Receive termination signal
+ 2. **Abort all ongoing tasks** tied to that client
+ 3. Prevent resource leakage and zombie processes
+ 4. Maintain end-to-end consistency
+
+**Guarantees:**
+
+- ✅ No orphaned tasks
+- ✅ Synchronized state across client-server boundary
+- ✅ Rapid recovery when connection restored (G6)
+- ✅ Consistent TaskConstellation state (G4)
+
+[→ See resilience implementation](./resilience.md)
+
+## Extensibility Mechanisms
+
+AIP provides multiple extension points for domain-specific needs without modifying the core protocol, supporting composable extensibility (G5).
+
+### 1. Protocol Middleware
+
+Add custom processing to message pipeline:
+
+```python
+from aip.protocol.base import ProtocolMiddleware
+
+class AuditMiddleware(ProtocolMiddleware):
+ async def process_outgoing(self, msg):
+ log_to_audit_trail(msg)
+ return msg
+
+ async def process_incoming(self, msg):
+ log_to_audit_trail(msg)
+ return msg
+```
+
+### 2. Custom Message Handlers
+
+Register handlers for new message types:
+
+```python
+protocol.register_handler("custom_type", handle_custom_message)
+```
+
+### 3. Transport Layer
+
+Pluggable transport (default: WebSocket) (G5):
+
+```python
+from aip.transport import CustomTransport
+protocol.transport = CustomTransport(config)
+```
+
+[→ See extensibility guide](./protocols.md)
+
+## Integration with UFO² Ecosystem
+
+| Component | Integration Point | Benefit |
+|-----------|-------------------|---------|
+| **MCP Servers** | Command execution model aligns with MCP message formats | Unified interface for system actions and LLM tool calls |
+| **TaskConstellation** | Real-time state synchronization via AIP messages | Planning DAG always reflects distributed execution state |
+| **Configuration System** | Agent endpoints, capabilities managed via UFO² config | Centralized management, type-safe validation |
+| **Logging & Monitoring** | Comprehensive logging at all protocol layers | Debugging, performance monitoring, audit trails |
+
+AIP abstracts network/device heterogeneity, allowing the orchestrator to treat all agents as **first-class citizens** in a single event-driven control plane.
+
+**Related Documentation:**
+
+- [TaskConstellation (DAG orchestrator)](../galaxy/constellation/task_constellation.md)
+- [ConstellationAgent (orchestration agent)](../galaxy/constellation_agent/overview.md)
+- [MCP Integration Guide](../mcp/overview.md)
+- [Configuration System](../configuration/system/system_config.md)
+**Next Steps:**
+
+- 📖 [Message Reference](./messages.md) - Complete message type documentation
+- 🔧 [Protocol Guide](./protocols.md) - Implementation details and best practices
+- 🌐 [Transport Layer](./transport.md) - WebSocket configuration and optimization
+- 🔌 [Endpoints](./endpoints.md) - Endpoint setup and usage patterns
+- 🛡️ [Resilience](./resilience.md) - Connection management and fault tolerance
+
+## Summary
+
+AIP transforms distributed workflow execution into a **coherent, safe, and adaptive system** where reasoning and execution converge seamlessly across diverse agents and environments.
+
+**Key Takeaways:**
+
+| Aspect | Impact | Goals |
+|--------|--------|-------|
+| **Persistence** | Long-lived connections reduce overhead, maintain context | G1 |
+| **Low Latency** | WebSocket enables real-time event propagation | G1 |
+| **Capability Discovery** | Multi-source profiling unifies heterogeneous agents | G2 |
+| **Reliability** | Heartbeats, timeouts, auto-reconnection ensure graceful degradation | G3, G6 |
+| **Determinism** | Sequential command execution, explicit ID correlation | G4 |
+| **Extensibility** | Middleware hooks, pluggable transports, custom handlers | G5 |
+| **Developer UX** | Strongly-typed messages, clear errors reduce integration effort | G5 |
+
+By decomposing orchestration into five logical layers—each addressing specific reliability and adaptability concerns—AIP enables UFO² to maintain **correctness and availability** under DAG evolution (G4, G5), agent churn (G3, G6), and heterogeneous execution environments (G1, G2).
diff --git a/documents/docs/aip/protocols.md b/documents/docs/aip/protocols.md
new file mode 100644
index 000000000..9ac3d4b77
--- /dev/null
+++ b/documents/docs/aip/protocols.md
@@ -0,0 +1,667 @@
+# AIP Protocol Reference
+
+## Protocol Stack Overview
+
+AIP uses a three-layer architecture where specialized protocols handle domain-specific concerns, the core protocol manages message processing, and the transport layer provides network communication.
+
+```mermaid
+graph TB
+ subgraph "Specialized Protocols"
+ RP[RegistrationProtocol]
+ TEP[TaskExecutionProtocol]
+ CP[CommandProtocol]
+ HP[HeartbeatProtocol]
+ DIP[DeviceInfoProtocol]
+ end
+
+ subgraph "Core Protocol"
+ AIP["AIPProtocol Message serialization Middleware pipeline Message routing"]
+ end
+
+ subgraph "Transport Layer"
+ WS[WebSocket]
+ HTTP3[HTTP/3 Future]
+ GRPC[gRPC Future]
+ end
+
+ RP --> AIP
+ TEP --> AIP
+ CP --> AIP
+ HP --> AIP
+ DIP --> AIP
+
+ AIP --> WS
+ AIP -.-> HTTP3
+ AIP -.-> GRPC
+
+ style AIP fill:#e1f5ff
+ style WS fill:#f0ffe1
+```
+
+This layered design enables clean separation of concerns: specialized protocols implement domain logic, the core protocol handles serialization and routing, and the transport layer abstracts network details. Dashed arrows indicate future transport options.
+
+### Protocol Comparison
+
+| Protocol | Purpose | Key Messages | Use When |
+|----------|---------|--------------|----------|
+| **RegistrationProtocol** | Agent capability advertisement | `REGISTER`, `HEARTBEAT(OK)` | Device joins constellation |
+| **TaskExecutionProtocol** | Task lifecycle management | `TASK`, `COMMAND`, `TASK_END` | Executing multi-step tasks |
+| **CommandProtocol** | Command validation | Validation utilities | Before sending/receiving commands |
+| **HeartbeatProtocol** | Connection health monitoring | `HEARTBEAT` | Periodic keepalive |
+| **DeviceInfoProtocol** | Telemetry exchange | `DEVICE_INFO_REQUEST/RESPONSE` | Querying device state |
+
+---
+
+## Core Protocol: AIPProtocol
+
+`AIPProtocol` provides transport-agnostic message handling with middleware support and automatic serialization.
+
+### Quick Start
+
+```python
+from aip.protocol import AIPProtocol
+from aip.transport import WebSocketTransport
+
+transport = WebSocketTransport()
+protocol = AIPProtocol(transport)
+```
+
+### Core Operations
+
+| Operation | Method | Description |
+|-----------|--------|-------------|
+| **Send** | `send_message(msg)` | Serialize and send Pydantic message |
+| **Receive** | `receive_message(MsgType)` | Receive and deserialize to type |
+| **Dispatch** | `dispatch_message(msg)` | Route to registered handler |
+| **Error** | `send_error(error, id)` | Send error notification |
+| **Status** | `is_connected()` | Check connection state |
+
+### Middleware Pipeline
+
+Add middleware for logging, authentication, metrics, or custom transformations.
+
+```python
+from aip.protocol.base import ProtocolMiddleware
+
+class LoggingMiddleware(ProtocolMiddleware):
+ async def process_outgoing(self, msg):
+ logger.info(f"→ {msg.type}")
+ return msg
+
+ async def process_incoming(self, msg):
+ logger.info(f"← {msg.type}")
+ return msg
+
+protocol.add_middleware(LoggingMiddleware())
+```
+
+**Execution Order:**
+
+- **Outgoing**: First added → First executed
+- **Incoming**: Last added → First executed (reverse)
+
+### Message Handler Registration
+
+```python
+async def handle_task(msg):
+ logger.info(f"Handling task: {msg.task_name}")
+ # Process task...
+
+protocol.register_handler("task", handle_task)
+
+# Auto-dispatch to handler
+await protocol.dispatch_message(server_msg)
+```
+
+[→ See transport configuration](./transport.md)
+
+---
+
+## RegistrationProtocol {#registration-protocol}
+
+Handles initial registration and capability advertisement when agents join the constellation.
+
+### Registration Flow
+
+The following diagram shows the two-way handshake for device registration, including validation and acknowledgment:
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant S as Server
+
+ C->>S: REGISTER (device_id, metadata, capabilities)
+ S->>S: Validate registration and Store AgentProfile
+ alt Success
+ S->>C: HEARTBEAT (OK)
+ else Failure
+ S->>C: ERROR (reason)
+ end
+```
+
+Upon successful registration, the server stores the `AgentProfile` and responds with a `HEARTBEAT` acknowledgment. Failed registrations (e.g., duplicate device_id) return an `ERROR` message with diagnostic details.
+
+### Device Registration
+
+**Client-Side Registration:**
+
+```python
+from aip.protocol import RegistrationProtocol
+
+reg_protocol = RegistrationProtocol(transport)
+
+success = await reg_protocol.register_as_device(
+ device_id="windows_agent_001",
+ metadata={
+ "platform": "windows",
+ "os_version": "Windows 11",
+ "cpu": "Intel i7",
+ "ram_gb": 16,
+ "capabilities": ["ui_automation", "file_operations"]
+ },
+ platform="windows"
+)
+```
+
+**Auto-Added Fields:**
+
+- `timestamp`: Registration time (ISO 8601)
+- `client_type`: Set to `ClientType.DEVICE`
+
+[→ See ClientType and ClientMessage in Message Reference](./messages.md)
+
+### Constellation Registration
+
+**Orchestrator Registration:**
+
+```python
+success = await reg_protocol.register_as_constellation(
+ constellation_id="orchestrator_001",
+ target_device="windows_agent_001", # Required
+ metadata={
+ "orchestrator_version": "2.0.0",
+ "max_concurrent_tasks": 10
+ }
+)
+```
+
+!!!warning "Target Device Required"
+ Constellation clients **must** specify `target_device` to indicate which device they coordinate.
+
+### Server-Side Handlers
+
+| Method | Purpose | When to Use |
+|--------|---------|-------------|
+| `send_registration_confirmation()` | Acknowledge successful registration | After validating and storing profile |
+| `send_registration_error()` | Report registration failure | Invalid ID, duplicate, or validation error |
+
+---
+
+## TaskExecutionProtocol {#task-execution-protocol}
+
+Manages the complete task lifecycle: assignment → command execution → result reporting → completion.
+
+### Task Lifecycle
+
+This state diagram shows the complete task execution lifecycle, including the multi-turn command loop where agents can request additional commands before completion:
+
+```mermaid
+stateDiagram-v2
+ [*] --> TaskAssigned: TASK
+ TaskAssigned --> CommandSent: COMMAND
+ CommandSent --> ResultsReceived: COMMAND_RESULTS
+ ResultsReceived --> CommandSent: CONTINUE
+ ResultsReceived --> TaskCompleted: COMPLETED/FAILED
+ TaskCompleted --> [*]: TASK_END
+
+ note right of ResultsReceived
+ Multi-turn: Agent can request
+ more commands before completion
+ end note
+```
+
+The `CONTINUE` loop (ResultsReceived → CommandSent) enables iterative task refinement where the agent can execute commands, evaluate results, and request follow-up commands before declaring completion.
+
+### Client → Server: Task Request
+
+```python
+from aip.protocol import TaskExecutionProtocol
+
+task_protocol = TaskExecutionProtocol(transport)
+
+await task_protocol.send_task_request(
+ request="Open Notepad and create test.txt",
+ task_name="create_notepad_file",
+ session_id="session_123",
+ client_id="windows_agent_001",
+ client_type=ClientType.DEVICE,
+ metadata={"priority": "high"}
+)
+```
+
+### Server → Client: Task Assignment
+
+```python
+await task_protocol.send_task_assignment(
+ user_request="Open Notepad and create a file",
+ task_name="create_notepad_file",
+ session_id="session_123",
+ response_id="resp_001",
+ agent_name="AppAgent",
+ process_name="notepad.exe"
+)
+```
+
+### Server → Client: Command Dispatch
+
+Send multiple commands in one message to reduce network overhead.
+
+**Method 1: Using ServerMessage**
+
+```python
+from aip.messages import ServerMessage, Command, TaskStatus
+
+server_msg = ServerMessage(
+ type=ServerMessageType.COMMAND,
+ status=TaskStatus.CONTINUE,
+ session_id="session_123",
+ response_id="resp_002",
+ actions=[
+ Command(tool_name="launch_application",
+ parameters={"app_name": "notepad"},
+ tool_type="action", call_id="cmd_001"),
+ Command(tool_name="type_text",
+ parameters={"text": "Hello"},
+ tool_type="action", call_id="cmd_002")
+ ]
+)
+
+await task_protocol.send_command(server_msg)
+```
+
+**Method 2: Using send_commands**
+
+```python
+await task_protocol.send_commands(
+ actions=[Command(...)],
+ session_id="session_123",
+ response_id="resp_003",
+ status=TaskStatus.CONTINUE,
+ agent_name="AppAgent"
+)
+```
+
+### Client → Server: Command Results
+
+```python
+from aip.messages import Result, ResultStatus
+
+await task_protocol.send_command_results(
+ action_results=[
+ Result(status=ResultStatus.SUCCESS,
+ result={"app_launched": True},
+ call_id="cmd_001"),
+ Result(status=ResultStatus.SUCCESS,
+ result={"text_entered": True},
+ call_id="cmd_002")
+ ],
+ session_id="session_123",
+ client_id="windows_agent_001",
+ prev_response_id="resp_002", # Links to COMMAND message
+ status=TaskStatus.CONTINUE
+)
+```
+
+[→ See Result and ResultStatus definitions in Message Reference](./messages.md)
+
+### Task Completion
+
+**Server → Client: Success**
+
+```python
+await task_protocol.send_task_end(
+ session_id="session_123",
+ status=TaskStatus.COMPLETED,
+ result={
+ "file_created": True,
+ "path": "C:\\Users\\user\\test.txt"
+ },
+ response_id="resp_999"
+)
+```
+
+**Server → Client: Failure**
+
+```python
+await task_protocol.send_task_end(
+ session_id="session_123",
+ status=TaskStatus.FAILED,
+ error="Notepad failed to launch: Access denied",
+ response_id="resp_999"
+)
+```
+
+### Complete Task Flow
+
+This comprehensive sequence diagram shows the complete flow from task request to completion, including the multi-turn command loop where the agent iteratively executes commands and requests follow-up actions:
+
+```mermaid
+sequenceDiagram
+ participant CC as ConstellationClient
+ participant CA as ConstellationAgent
+ participant DS as DeviceService
+ participant DC as DeviceClient
+
+ CC->>CA: TASK request
+ CA->>DS: TASK assignment
+ DS->>DC: TASK (forward)
+
+ loop Multi-turn execution
+ DC->>DS: Request COMMAND
+ DS->>CA: Forward request
+ CA->>CA: Plan next action
+ CA->>DS: COMMAND
+ DS->>DC: COMMAND (forward)
+ DC->>DC: Execute
+ DC->>DS: COMMAND_RESULTS
+ DS->>CA: COMMAND_RESULTS
+ end
+
+ CA->>DS: TASK_END
+ DS->>DC: TASK_END (forward)
+ CC->>CC: Update TaskConstellation
+```
+
+The loop in the middle represents iterative task execution where the agent can perform multiple command cycles before determining the task is complete. Each cycle involves planning, execution, and result evaluation.
+
+---
+
+## CommandProtocol
+
+Provides validation utilities for commands and results before transmission.
+
+### Validation Methods
+
+| Method | Validates | Returns |
+|--------|-----------|---------|
+| `validate_command(cmd)` | Single command structure | `bool` |
+| `validate_commands(cmds)` | List of commands | `bool` |
+| `validate_result(result)` | Single result structure | `bool` |
+| `validate_results(results)` | List of results | `bool` |
+
+### Usage Pattern
+
+```python
+from aip.protocol import CommandProtocol
+
+cmd_protocol = CommandProtocol(transport)
+
+# Validate before sending
+cmd = Command(tool_name="click", parameters={"id": "btn"}, tool_type="action")
+
+if cmd_protocol.validate_command(cmd):
+ await task_protocol.send_commands([cmd], ...)
+else:
+ logger.error("Invalid command structure")
+
+# Validate results before transmission
+results = [Result(...), Result(...)]
+
+if cmd_protocol.validate_results(results):
+ await task_protocol.send_command_results(results, ...)
+```
+
+!!!warning "Validation Best Practice"
+ Always validate commands and results before transmission to catch protocol errors early and prevent runtime failures.
+
+---
+
+## HeartbeatProtocol {#heartbeat-protocol}
+
+Periodic keepalive messages detect broken connections and network issues.
+
+### Heartbeat Flow
+
+The heartbeat protocol uses a simple ping-pong pattern to verify connection health at regular intervals:
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant S as Server
+
+ loop Every 20-30s
+ C->>S: HEARTBEAT (client_id)
+ S->>S: Update last_seen timestamp
+ S->>C: HEARTBEAT (OK)
+ end
+
+ Note over C,S: If no response → Connection dead
+```
+
+If the server fails to receive a heartbeat within the timeout window, it marks the connection as dead and triggers disconnection handling. This prevents silent connection failures from going undetected.
+
+### Client-Side Heartbeat
+
+```python
+from aip.protocol import HeartbeatProtocol
+
+heartbeat_protocol = HeartbeatProtocol(transport)
+
+await heartbeat_protocol.send_heartbeat(
+ client_id="windows_agent_001",
+ metadata={"custom_info": "value"} # Optional
+)
+```
+
+### Server-Side Response
+
+```python
+await heartbeat_protocol.send_heartbeat_ack(
+ response_id="resp_hb_001"
+)
+```
+
+!!!tip "Automatic Management"
+ The `HeartbeatManager` automates heartbeat sending—you rarely need to call these methods directly.
+
+[→ See HeartbeatManager](./resilience.md#heartbeat-manager)
+
+---
+
+## DeviceInfoProtocol
+
+Request and report device hardware/software information for informed scheduling.
+
+### Info Request Flow
+
+The server can request fresh device information at any time to make informed scheduling decisions:
+
+```mermaid
+sequenceDiagram
+ participant S as Server
+ participant C as Client
+
+ S->>C: DEVICE_INFO_REQUEST
+ C->>C: Collect telemetry (OS, CPU, GPU, RAM, etc.)
+ C->>S: DEVICE_INFO_RESPONSE (device specs)
+```
+
+This pull-based telemetry model allows the orchestrator to query device capabilities on-demand (e.g., before assigning a GPU-intensive task) rather than relying on stale registration data.
+
+### Constellation → Server: Request Info
+
+```python
+from aip.protocol import DeviceInfoProtocol
+
+info_protocol = DeviceInfoProtocol(transport)
+
+await info_protocol.request_device_info(
+ constellation_id="orchestrator_001",
+ target_device="windows_agent_001",
+ request_id="req_info_001"
+)
+```
+
+### Server → Client: Provide Info
+
+The server responds with device information (or an error if collection failed):
+
+```python
+device_info = {
+ "os": "Windows 11",
+ "cpu": "Intel i7-12700K",
+ "ram_gb": 32,
+ "gpu": "NVIDIA RTX 3080",
+ "disk_free_gb": 500,
+ "active_processes": 145,
+ "network_status": "connected"
+}
+
+await info_protocol.send_device_info_response(
+ device_info=device_info,
+ request_id="req_info_001",
+ error=None # Set to error message string if info collection failed
+)
+```
+
+### Use Cases
+
+!!!success "Device-Aware Task Scheduling"
+ - **GPU-aware scheduling**: Check GPU availability before assigning vision tasks
+ - **Load balancing**: Distribute tasks based on CPU/RAM usage
+ - **Health monitoring**: Track device status over time
+
+---
+
+## Protocol Patterns
+
+### Multi-Turn Conversations
+
+Use `prev_response_id` to maintain conversation context across multiple exchanges.
+
+This diagram shows how messages are chained together using `prev_response_id` to maintain conversation context:
+
+```mermaid
+graph LR
+ A["Server: COMMAND response_id=001"] --> B["Client: RESULTS prev_response_id=001 request_id=002"]
+ B --> C["Server: COMMAND response_id=003"]
+ C --> D["Client: RESULTS prev_response_id=003 request_id=004"]
+```
+
+Each response references the previous message's `response_id` in its `prev_response_id` field, forming a traceable conversation chain. This enables debugging, audit trails, and request-response correlation.
+
+```python
+# Turn 1: Server sends command
+await protocol.send_message(ServerMessage(
+ type=ServerMessageType.COMMAND,
+ response_id="resp_001",
+ ...
+))
+
+# Turn 2: Client sends results
+await protocol.send_message(ClientMessage(
+ type=ClientMessageType.COMMAND_RESULTS,
+ request_id="req_001",
+ prev_response_id="resp_001", # Links to previous
+ ...
+))
+```
+
+### Session-Based Communication
+
+All messages in a task share the same `session_id` for traceability.
+
+```python
+SESSION_ID = "session_abc123"
+
+# All use same session_id
+task_msg.session_id = SESSION_ID
+command_msg.session_id = SESSION_ID
+results_msg.session_id = SESSION_ID
+task_end_msg.session_id = SESSION_ID
+```
+
+### Error Recovery
+
+**Protocol-Level Errors (Connection Issues):**
+
+```python
+try:
+ await protocol.send_message(msg)
+except ConnectionError:
+ await reconnect()
+except IOError as e:
+ logger.error(f"I/O error: {e}")
+```
+
+**Application-Level Errors (Task Failures):**
+
+```python
+# Send error through protocol
+await protocol.send_error(
+ error_msg="Invalid command: tool_name missing",
+ response_id=msg.response_id
+)
+```
+
+---
+
+## Best Practices
+
+### Protocol Selection
+
+Use specialized protocols instead of manually constructing messages with `AIPProtocol`.
+
+| Task | Protocol |
+|------|----------|
+| Agent registration | `RegistrationProtocol` |
+| Task execution | `TaskExecutionProtocol` |
+| Command validation | `CommandProtocol` |
+| Keepalive | `HeartbeatProtocol` |
+| Device telemetry | `DeviceInfoProtocol` |
+
+### Validation
+
+- Always validate commands/results before transmission
+- Use `MessageValidator` for message integrity checks
+- Catch validation errors early
+
+### Session Management
+
+- **Always set `session_id`** for task-related messages
+- Use **correlation IDs** (`prev_response_id`) for multi-turn conversations
+- **Generate unique IDs** with `uuid.uuid4()`
+
+### Error Handling
+
+- **Distinguish** protocol errors (connection) from application errors (task failure)
+- **Propagate errors** explicitly through error messages
+- **Leverage middleware** for cross-cutting concerns (logging, metrics, auth)
+
+!!!danger "Resource Cleanup"
+ Always close protocols when done to release transport resources.
+
+---
+
+## Quick Reference
+
+### Import Protocols
+
+```python
+from aip.protocol import (
+ AIPProtocol,
+ RegistrationProtocol,
+ TaskExecutionProtocol,
+ CommandProtocol,
+ HeartbeatProtocol,
+ DeviceInfoProtocol,
+)
+```
+
+### Related Documentation
+
+- [Message Reference](./messages.md) - Message types and structures
+- [Transport Layer](./transport.md) - WebSocket implementation
+- [Endpoints](./endpoints.md) - Protocol usage in endpoints
+- [Resilience](./resilience.md) - Connection management and recovery
+- [Overview](./overview.md) - System architecture
diff --git a/documents/docs/aip/resilience.md b/documents/docs/aip/resilience.md
new file mode 100644
index 000000000..f38136cd1
--- /dev/null
+++ b/documents/docs/aip/resilience.md
@@ -0,0 +1,592 @@
+# AIP Resilience
+
+AIP's resilience layer ensures stable communication and consistent orchestration across distributed agent constellations through automatic reconnection, heartbeat monitoring, and timeout management.
+
+## Resilience Components
+
+| Component | Purpose | Key Features |
+|-----------|---------|--------------|
+| **ReconnectionStrategy** | Auto-reconnect on disconnect | Exponential backoff, max retries, policies |
+| **HeartbeatManager** | Connection health monitoring | Periodic keepalive, failure detection |
+| **TimeoutManager** | Operation timeout enforcement | Configurable timeouts, async cancellation |
+| **ConnectionProtocol** | State management | Bidirectional fault handling, task cleanup |
+
+---
+
+## Resilient Connection Protocol
+
+The Resilient Connection Protocol governs how connection disruptions are detected, handled, and recovered between ConstellationClient and Device Agents.
+
+### Connection State Diagram
+
+This state diagram shows how devices transition between connection states and the internal sub-states during disconnection recovery:
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONNECTED: Initial connection
+ CONNECTED --> DISCONNECTED: Connection lost
+ DISCONNECTED --> CONNECTED: Reconnect succeeds
+ DISCONNECTED --> [*]: Max retries / Manual removal
+
+ state DISCONNECTED {
+ [*] --> DetectFailure
+ DetectFailure --> CancelTasks
+ CancelTasks --> NotifyOrchestrator
+ NotifyOrchestrator --> AttemptReconnect
+ AttemptReconnect --> [*]: Success
+ }
+
+ note right of DISCONNECTED
+ • Invisible to scheduler
+ • Tasks marked FAILED
+ • Auto-reconnect triggered
+ end note
+```
+
+The nested states within `DISCONNECTED` show the cleanup and recovery sequence: detect the failure, cancel running tasks, notify the orchestrator, then attempt reconnection with exponential backoff.
+
+### Device Disconnection Workflow
+
+!!!danger "Impact on Running Tasks"
+ All tasks running on a disconnected device are **immediately marked as FAILED** to maintain TaskConstellation consistency.
+
+| Phase | Action | Trigger |
+|-------|--------|---------|
+| **1. Detection** | Connection failure detected | WebSocket close, heartbeat timeout, network error |
+| **2. State Transition** | `CONNECTED` → `DISCONNECTED` | Agent excluded from scheduler |
+| **3. Task Failure** | Mark tasks as `TASK_FAILED` | Propagate to ConstellationAgent |
+| **4. Auto-Reconnect** | Background routine triggered | Exponential backoff |
+| **5. Recovery** | `DISCONNECTED` → `CONNECTED` | Resume scheduling |
+
+**Task Cancellation:**
+
+```python
+# Automatically called on disconnection
+await device_server.cancel_device_tasks(client_id, reason="device_disconnected")
+```
+
+### ConstellationClient Disconnection
+
+When ConstellationClient disconnects, Device Agent Servers proactively clean up to prevent orphaned tasks.
+
+This sequence diagram shows the proactive cleanup sequence when the orchestrator disconnects, ensuring all running tasks are properly aborted:
+
+```mermaid
+sequenceDiagram
+ participant CC as ConstellationClient
+ participant DAS as Device Agent Server
+ participant Tasks as Running Tasks
+
+ CC-xDAS: Connection lost
+ DAS->>DAS: Detect termination signal
+ DAS->>Tasks: Abort all tasks for client
+ Tasks->>Tasks: Cleanup resources
+ DAS->>DAS: Maintain consistency
+
+ Note over DAS: Prevents: • Resource leaks • Orphaned tasks • Inconsistent states
+```
+
+The `x` marker on the connection arrow indicates an abnormal termination. The server immediately detects this and cascades the cleanup signal to all associated tasks, preventing resource leaks.
+
+**Guarantees:**
+
+- ✅ No orphaned tasks or zombie processes
+- ✅ End-to-end consistency across client-server boundary
+- ✅ Automatic resource cleanup
+- ✅ Synchronized task state reflection
+
+---
+
+## ReconnectionStrategy
+
+Manages reconnection attempts with configurable backoff policies to handle transient network failures.
+
+### Configuration
+
+```python
+from aip.resilience import ReconnectionStrategy, ReconnectionPolicy
+
+strategy = ReconnectionStrategy(
+ max_retries=5, # Maximum attempts
+ initial_backoff=1.0, # Initial delay (seconds)
+ max_backoff=60.0, # Maximum delay (seconds)
+ backoff_multiplier=2.0, # Exponential multiplier
+ policy=ReconnectionPolicy.EXPONENTIAL_BACKOFF
+)
+```
+
+[→ See how ReconnectionStrategy is used in endpoints](./endpoints.md)
+
+### Backoff Policies
+
+Select the policy that matches your deployment environment's network characteristics.
+
+| Policy | Backoff Pattern | Best For | Example Sequence |
+|--------|----------------|----------|------------------|
+| **EXPONENTIAL_BACKOFF** | Doubles each attempt | Internet, unreliable networks | 1s → 2s → 4s → 8s → 16s |
+| **LINEAR_BACKOFF** | Linear increase | Local networks, testing | 1s → 2s → 3s → 4s → 5s |
+| **IMMEDIATE** | No delay | ⚠️ Testing only | 0s → 0s → 0s → 0s → 0s |
+| **NONE** | No reconnection | Manual control | Disabled |
+
+!!!danger "IMMEDIATE Policy Warning"
+ `IMMEDIATE` policy can overwhelm servers with rapid retry attempts. **Use only for testing.**
+
+### Reconnection Workflow
+
+This flowchart shows the complete reconnection logic from failure detection through recovery or permanent failure:
+
+```mermaid
+graph TD
+ A[Connection Lost] --> B[Cancel Pending Tasks]
+ B --> C[Notify Upper Layers]
+ C --> D{Retry Count < Max?}
+ D -->|Yes| E[Calculate Backoff]
+ E --> F[Wait Backoff Duration]
+ F --> G[Attempt Reconnect]
+ G --> H{Success?}
+ H -->|Yes| I[Restore Session]
+ H -->|No| J[Increment Retry Count]
+ J --> D
+ D -->|No| K[Max Retries Reached]
+ K --> L[Permanent Failure]
+ I --> M[Resume Operations]
+
+ style I fill:#d4edda
+ style L fill:#f8d7da
+```
+
+The loop between "Attempt Reconnect" and "Increment Retry Count" continues until either reconnection succeeds (green path) or max retries are exhausted (red path). Backoff duration increases with each failed attempt.
+
+### Reconnection Example
+
+```python
+async def handle_disconnection(
+ endpoint: AIPEndpoint,
+ device_id: str,
+ on_reconnect: Optional[Callable] = None
+):
+ # Step 1: Cancel pending tasks
+ await strategy._cancel_pending_tasks(endpoint, device_id)
+
+ # Step 2: Notify upper layers
+ await strategy._notify_disconnection(endpoint, device_id)
+
+ # Step 3: Attempt reconnection
+ reconnected = await strategy.attempt_reconnection(endpoint, device_id)
+
+ # Step 4: Call reconnection callback
+ if reconnected and on_reconnect:
+ await on_reconnect()
+```
+
+### Custom Reconnection Callback
+
+```python
+async def on_reconnected():
+ logger.info("Device reconnected, resuming tasks")
+ await restore_task_queue()
+ await sync_device_state()
+
+await strategy.handle_disconnection(
+ endpoint=endpoint,
+ device_id="device_001",
+ on_reconnect=on_reconnected
+)
+```
+
+---
+
+## HeartbeatManager {#heartbeat-manager}
+
+Sends periodic keepalive messages to detect broken connections before they cause failures.
+
+### Configuration
+
+```python
+from aip.resilience import HeartbeatManager
+from aip.protocol import HeartbeatProtocol
+
+heartbeat_protocol = HeartbeatProtocol(transport)
+heartbeat_manager = HeartbeatManager(
+ protocol=heartbeat_protocol,
+ default_interval=30.0 # 30 seconds
+)
+```
+
+[→ See HeartbeatProtocol reference](./protocols.md#heartbeat-protocol)
+
+### Lifecycle Management
+
+| Operation | Method | Description |
+|-----------|--------|-------------|
+| **Start** | `start_heartbeat(client_id, interval)` | Begin periodic heartbeat for client |
+| **Stop** | `stop_heartbeat(client_id)` | Stop heartbeat for specific client |
+| **Stop All** | `stop_all()` | Stop all active heartbeats |
+| **Check Status** | `is_running(client_id)` | Verify if heartbeat is active |
+| **Get Interval** | `get_interval(client_id)` | Retrieve current interval |
+
+### Usage Example
+
+```python
+# Start heartbeat for a client
+await heartbeat_manager.start_heartbeat(
+ client_id="device_001",
+ interval=20.0 # Override default
+)
+
+# Check if running
+if heartbeat_manager.is_running("device_001"):
+ logger.info("Heartbeat active")
+
+# Stop for specific client
+await heartbeat_manager.stop_heartbeat("device_001")
+
+# Stop all heartbeats (cleanup)
+await heartbeat_manager.stop_all()
+```
+
+### Heartbeat Loop Internals
+
+The heartbeat manager automatically sends periodic heartbeats. If the protocol is not connected, it logs a warning and continues the loop:
+
+```python
+async def _heartbeat_loop(client_id: str, interval: float):
+ """Internal heartbeat loop (automatic)"""
+ try:
+ while True:
+ await asyncio.sleep(interval)
+
+ if protocol.is_connected():
+ try:
+ await protocol.send_heartbeat(client_id)
+ except Exception as e:
+ logger.error(f"Error sending heartbeat: {e}")
+ # Continue loop, connection manager handles disconnection
+ else:
+ logger.warning("Protocol not connected, skipping heartbeat")
+
+ except asyncio.CancelledError:
+ logger.debug("Heartbeat loop cancelled")
+```
+
+### Failure Detection
+
+When the transport layer fails to send a heartbeat (connection closed), errors are logged but the loop continues running. The connection manager is responsible for detecting the disconnection through transport-level errors and triggering the reconnection strategy.
+
+This sequence diagram shows how heartbeat errors are handled:
+
+```mermaid
+sequenceDiagram
+ participant HM as HeartbeatManager
+ participant P as Protocol
+ participant T as Transport
+
+ loop Every interval
+ HM->>P: send_heartbeat()
+ P->>T: Send via WebSocket
+ alt Connection alive
+ T-->>P: Success
+ P-->>HM: Continue
+ else Connection dead
+ T-xP: ConnectionError
+ P-xHM: Error (caught)
+ HM->>HM: Log error, continue loop
+ Note over HM: Connection manager handles disconnection at transport level
+ end
+ end
+```
+
+The `x` markers indicate error paths. When the transport layer fails to send a heartbeat, the error is caught and logged. The heartbeat loop continues, while the connection manager detects the disconnection at the transport level and initiates recovery.
+
+### Interval Guidelines
+
+| Environment | Recommended Interval | Rationale |
+|-------------|---------------------|-----------|
+| **Local network** | 10-20s | Quick failure detection, low latency |
+| **Internet** | 30-60s | Balance overhead vs detection speed |
+| **Mobile/Unreliable** | 60-120s | Reduce battery/bandwidth usage |
+| **Critical systems** | 5-10s | Fastest failure detection |
+
+---
+
+## TimeoutManager
+
+Prevents operations from hanging indefinitely by enforcing configurable timeouts with automatic cancellation.
+
+### Configuration
+
+```python
+from aip.resilience import TimeoutManager
+
+timeout_manager = TimeoutManager(
+ default_timeout=120.0 # 120 seconds
+)
+```
+
+[→ See how timeouts are used in protocol operations](./protocols.md)
+
+### Usage Patterns
+
+**Default Timeout:**
+
+```python
+result = await timeout_manager.with_timeout(
+ protocol.send_message(msg),
+ operation_name="send_message"
+)
+```
+
+**Custom Timeout:**
+
+```python
+result = await timeout_manager.with_timeout(
+ protocol.receive_message(ServerMessage),
+ timeout=60.0,
+ operation_name="receive_message"
+)
+```
+
+### Error Handling
+
+```python
+from asyncio import TimeoutError
+
+try:
+ result = await timeout_manager.with_timeout(
+ long_running_operation(),
+ timeout=30.0
+ )
+except TimeoutError:
+ logger.error("Operation timed out after 30 seconds")
+ # Handle timeout: retry, fail task, notify user
+```
+
+### Recommended Timeouts
+
+| Operation | Timeout | Rationale |
+|-----------|---------|-----------|
+| **Registration** | 10-30s | Simple message exchange |
+| **Task Dispatch** | 30-60s | May involve scheduling logic |
+| **Command Execution** | 60-300s | Depends on command complexity |
+| **Heartbeat** | 5-10s | Fast failure detection needed |
+| **Disconnection** | 5-15s | Clean shutdown |
+| **Device Info Query** | 15-30s | Telemetry collection |
+
+---
+
+## Integration with Endpoints
+
+Endpoints automatically integrate all resilience components—no manual wiring needed.
+
+### Example: DeviceClientEndpoint
+
+```python
+from aip.endpoints import DeviceClientEndpoint
+
+endpoint = DeviceClientEndpoint(
+ ws_url="ws://localhost:8000/ws",
+ ufo_client=client,
+ max_retries=3, # Reconnection retries
+ timeout=120.0 # Connection timeout
+)
+
+# Resilience handled automatically on start
+await endpoint.start()
+```
+
+**Note**: The endpoint creates its own `ReconnectionStrategy` internally with the specified `max_retries`.
+
+### Built-In Features
+
+| Feature | Behavior | Configuration |
+|---------|----------|---------------|
+| **Auto-Reconnection** | Triggered on disconnect | Via `ReconnectionStrategy` |
+| **Heartbeat** | Starts on connection | Managed by `HeartbeatManager` |
+| **Timeout Enforcement** | Applied to all operations | Via `TimeoutManager` |
+| **Task Cancellation** | Auto-cancel on disconnect | Built-in to endpoint |
+
+[→ See endpoint documentation](./endpoints.md)
+[→ See WebSocket transport details](./transport.md)
+
+---
+
+## Best Practices by Environment
+
+### Local Network (Low Latency, High Reliability)
+
+```python
+strategy = ReconnectionStrategy(
+ max_retries=3,
+ initial_backoff=1.0,
+ max_backoff=10.0,
+ policy=ReconnectionPolicy.LINEAR_BACKOFF
+)
+heartbeat_interval = 20.0 # Quick detection
+timeout_default = 60.0
+```
+
+### Internet (Variable Latency, Moderate Reliability)
+
+```python
+strategy = ReconnectionStrategy(
+ max_retries=5,
+ initial_backoff=2.0,
+ max_backoff=60.0,
+ policy=ReconnectionPolicy.EXPONENTIAL_BACKOFF
+)
+heartbeat_interval = 30.0 # Balance overhead and detection
+timeout_default = 120.0
+```
+
+### Unreliable Network (High Latency, Low Reliability)
+
+```python
+strategy = ReconnectionStrategy(
+ max_retries=10,
+ initial_backoff=5.0,
+ max_backoff=300.0, # Up to 5 minutes
+ policy=ReconnectionPolicy.EXPONENTIAL_BACKOFF
+)
+heartbeat_interval = 60.0 # Reduce overhead
+timeout_default = 180.0
+```
+
+---
+
+## Error Scenarios
+
+### Scenario 1: Transient Network Failure
+
+**Problem**: Network glitch disconnects client for 3 seconds.
+
+**Resolution**:
+1. ✅ Disconnection detected via heartbeat timeout
+2. ✅ Automatic reconnection triggered (1st attempt after 2s)
+3. ✅ Connection restored successfully
+4. ✅ Heartbeat resumes
+5. ✅ Tasks continue
+
+### Scenario 2: Prolonged Outage
+
+**Problem**: Device offline for 10 minutes.
+
+**Resolution**:
+1. ❌ Initial disconnection detected
+2. ⏳ Multiple reconnection attempts (exponential backoff: 2s, 4s, 8s, 16s, 32s)
+3. ❌ All attempts fail (max retries reached)
+4. ⚠️ Tasks marked as FAILED
+5. 📢 ConstellationAgent notified
+6. ♻️ Tasks reassigned to other devices
+
+### Scenario 3: Server Restart
+
+**Problem**: Server restarts, causing all clients to disconnect at once.
+
+**Resolution**:
+1. ⚠️ All clients detect disconnection
+2. ⏳ Each client begins reconnection (with jitter to avoid thundering herd)
+3. ✅ Server restarts and accepts connections
+4. ✅ Clients reconnect and re-register
+5. ✅ Task execution resumes
+
+### Scenario 4: Heartbeat Timeout
+
+**Problem**: Heartbeat not received within timeout period.
+
+**Resolution**:
+ 1. ⏰ HeartbeatManager detects missing pong
+ 2. ⚠️ Connection marked as potentially dead
+ 3. 🔄 Disconnection handling triggered
+ 4. ⏳ Reconnection attempted
+ 5. ✅ If successful, heartbeat resumes
+
+---
+
+## Monitoring and Observability
+
+### Enable Resilience Logging
+
+```python
+import logging
+
+# Enable detailed resilience logs
+logging.getLogger("aip.resilience").setLevel(logging.INFO)
+```
+
+### Custom Event Handlers
+
+```python
+class CustomEndpoint(DeviceClientEndpoint):
+ async def on_device_disconnected(self, device_id: str) -> None:
+ # Custom cleanup
+ await self.cleanup_resources(device_id)
+ logger.warning(f"Device {device_id} disconnected")
+
+ # Call parent implementation
+ await super().on_device_disconnected(device_id)
+
+ async def reconnect_device(self, device_id: str) -> bool:
+ # Custom reconnection logic
+ success = await self.custom_reconnect(device_id)
+
+ if success:
+ await self.restore_state(device_id)
+ logger.info(f"Device {device_id} reconnected")
+
+ return success
+```
+
+### Graceful Degradation
+
+```python
+if not await strategy.attempt_reconnection(endpoint, device_id):
+ logger.error(f"Failed to reconnect {device_id} after max retries")
+
+ # Graceful degradation
+ await notify_operator(f"Device {device_id} offline")
+ await reassign_tasks_to_other_devices(device_id)
+ await update_monitoring_dashboard(device_id, "offline")
+```
+
+---
+
+## Testing Resilience
+
+Test resilience by simulating network failures and verifying recovery.
+
+```python
+# Simulate disconnection
+await transport.close()
+
+# Verify reconnection
+assert await endpoint.reconnect_device(device_id)
+
+# Verify heartbeat resumes
+await asyncio.sleep(1)
+assert heartbeat_manager.is_running(device_id)
+
+# Verify task state
+assert all(task.status == TaskStatus.FAILED for task in orphaned_tasks)
+```
+
+---
+
+## Quick Reference
+
+### Import Resilience Components
+
+```python
+from aip.resilience import (
+ ReconnectionStrategy,
+ ReconnectionPolicy,
+ HeartbeatManager,
+ TimeoutManager,
+)
+```
+
+### Related Documentation
+
+- [Endpoints](./endpoints.md) - How endpoints use resilience
+- [Transport Layer](./transport.md) - Transport-level connection management
+- [Protocol Reference](./protocols.md) - Protocol-level error handling
+- [Overview](./overview.md) - System architecture and design
diff --git a/documents/docs/aip/transport.md b/documents/docs/aip/transport.md
new file mode 100644
index 000000000..fc8715c48
--- /dev/null
+++ b/documents/docs/aip/transport.md
@@ -0,0 +1,693 @@
+# AIP Transport Layer
+
+The transport layer provides a pluggable abstraction for AIP's network communication, decoupling protocol logic from underlying network implementations through a unified Transport interface.
+
+## Transport Architecture
+
+AIP uses a transport abstraction pattern that allows different network protocols to be swapped without changing higher-level protocol logic. The current implementation focuses on WebSocket, with future support planned for HTTP/3 and gRPC:
+
+```mermaid
+graph TD
+ subgraph "Transport Abstraction"
+ TI[Transport Interface]
+ TI --> |implements| WST[WebSocketTransport]
+ TI --> |future| H3T[HTTP/3 Transport]
+ TI --> |future| GRPC[gRPC Transport]
+ end
+
+ subgraph "WebSocket Transport"
+ WST --> |client-side| WSC[websockets library]
+ WST --> |server-side| FAPI[FastAPI WebSocket]
+ WST --> |adapter| ADP[Unified Adapter]
+ end
+
+ subgraph "Protocol Layer"
+ PROTO[AIP Protocols]
+ PROTO --> |uses| TI
+ end
+
+ style WST fill:#d4edda
+ style TI fill:#d1ecf1
+```
+
+The unified adapter bridges client and server WebSocket libraries, providing a consistent interface regardless of which side of the connection you're on. This design pattern enables protocol code to be transport-agnostic.
+
+---
+
+## Transport Interface
+
+All transport implementations must implement the `Transport` interface for interoperability.
+
+### Core Operations
+
+| Method | Purpose | Return Type |
+|--------|---------|-------------|
+| `connect(url, **kwargs)` | Establish connection to remote endpoint | `None` |
+| `send(data)` | Send raw bytes | `None` |
+| `receive()` | Receive raw bytes | `bytes` |
+| `close()` | Close connection gracefully | `None` |
+| `wait_closed()` | Wait for connection to fully close | `None` |
+| `is_connected` (property) | Check connection status | `bool` |
+
+### Interface Definition
+
+```python
+from aip.transport import Transport
+
+class Transport(ABC):
+ @abstractmethod
+ async def connect(self, url: str, **kwargs) -> None:
+ """Connect to remote endpoint"""
+
+ @abstractmethod
+ async def send(self, data: bytes) -> None:
+ """Send data"""
+
+ @abstractmethod
+ async def receive(self) -> bytes:
+ """Receive data"""
+
+ @abstractmethod
+ async def close(self) -> None:
+ """Close connection"""
+
+ @abstractmethod
+ async def wait_closed(self) -> None:
+ """Wait for connection to fully close"""
+
+ @property
+ @abstractmethod
+ def is_connected(self) -> bool:
+ """Check connection status"""
+```
+
+---
+
+## WebSocket Transport
+
+`WebSocketTransport` provides persistent, full-duplex, bidirectional communication over WebSocket protocol (RFC 6455).
+
+### Quick Start
+
+**Client-Side:**
+
+```python
+from aip.transport import WebSocketTransport
+
+# Create and configure
+transport = WebSocketTransport(
+ ping_interval=30.0,
+ ping_timeout=180.0,
+ close_timeout=10.0,
+ max_size=100 * 1024 * 1024 # 100MB
+)
+
+# Connect
+await transport.connect("ws://localhost:8000/ws")
+
+# Communicate
+await transport.send(b"Hello Server")
+data = await transport.receive()
+
+# Cleanup
+await transport.close()
+```
+
+**Server-Side (FastAPI):**
+
+```python
+from fastapi import WebSocket
+from aip.transport import WebSocketTransport
+
+async def websocket_endpoint(websocket: WebSocket):
+ await websocket.accept()
+
+ # Wrap existing WebSocket
+ transport = WebSocketTransport(websocket=websocket)
+
+ # Use unified interface
+ data = await transport.receive()
+ await transport.send(b"Response")
+```
+
+**Note**: WebSocketTransport automatically detects whether it's wrapping a FastAPI WebSocket or a client connection and selects the appropriate adapter.
+
+[→ See how endpoints use WebSocketTransport](./endpoints.md)
+
+### Configuration Parameters
+
+
+🔧 Configuration Options (Click to expand)
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| **ping_interval** | `float` | `30.0` | Time between ping messages (seconds). Keepalive mechanism. |
+| **ping_timeout** | `float` | `180.0` | Max wait for pong response (seconds). Connection marked dead if exceeded. |
+| **close_timeout** | `float` | `10.0` | Timeout for graceful close handshake (seconds). |
+| **max_size** | `int` | `104857600` | Max message size in bytes (100MB). Messages exceeding this are rejected. |
+
+
+
+**Usage Guidelines:**
+
+!!!warning "max_size for Large Payloads"
+ Set `max_size` based on application needs. Large screenshots, models, or binary data may require higher limits. Consider compression for payloads approaching this limit.
+
+### Connection States
+
+WebSocket connections transition through multiple states during their lifecycle. This diagram shows all possible states and transitions:
+
+```mermaid
+stateDiagram-v2
+ [*] --> DISCONNECTED
+ DISCONNECTED --> CONNECTING: connect()
+ CONNECTING --> CONNECTED: Success
+ CONNECTING --> ERROR: Failure
+ CONNECTED --> DISCONNECTING: close()
+ DISCONNECTING --> DISCONNECTED: Complete
+ CONNECTED --> ERROR: Network failure
+ ERROR --> DISCONNECTED: Reset
+
+ note right of CONNECTED
+ • is_connected = True
+ • send/receive active
+ • Ping/pong running
+ end note
+```
+
+Only the `CONNECTED` state allows data transmission. The `ERROR` state is a terminal state that requires reset before attempting reconnection.
+
+**State Definitions:**
+
+| State | Meaning | Actions Allowed |
+|-------|---------|-----------------|
+| `DISCONNECTED` | No active connection | `connect()` |
+| `CONNECTING` | Connection in progress | Wait for result |
+| `CONNECTED` | Active connection | `send()`, `receive()`, `close()` |
+| `DISCONNECTING` | Closing in progress | Wait for completion |
+| `ERROR` | Error occurred | Investigate, reset |
+
+**Check State:**
+
+```python
+from aip.transport import TransportState
+
+if transport.state == TransportState.CONNECTED:
+ await transport.send(data)
+else:
+ logger.warning("Transport not connected")
+```
+
+### Ping/Pong Keepalive
+
+WebSocket automatically sends ping frames at `ping_interval` to detect broken connections.
+
+This sequence diagram shows the automatic ping/pong mechanism for detecting broken connections:
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant S as Server
+
+ loop Every ping_interval
+ C->>S: ping frame
+ S->>C: pong frame
+ Note over C: Connection healthy
+ end
+
+ C->>S: ping frame
+ S-xC: No response
+ Note over C: Timeout after ping_timeout
+ C->>C: Mark connection dead
+ C->>C: Close connection
+```
+
+The `x` marker indicates a failed pong response. After `ping_timeout` expires without receiving a pong, the connection is automatically marked dead and closed, triggering reconnection logic.
+
+**Timeout Behavior:**
+
+- ✅ **Pong received within `ping_timeout`**: Connection healthy, continue
+- ❌ **No pong within `ping_timeout`**: Connection marked dead, automatic close triggered
+
+### Error Handling
+
+!!!danger "Always Handle ConnectionError"
+ Connection failures can occur at any time due to network issues. Wrap send/receive in try-except blocks.
+
+**Connection Errors:**
+
+```python
+try:
+ await transport.connect("ws://localhost:8000/ws")
+except ConnectionError as e:
+ logger.error(f"Failed to connect: {e}")
+ await handle_connection_failure()
+```
+
+**Send/Receive Errors:**
+
+```python
+try:
+ await transport.send(data)
+ response = await transport.receive()
+except ConnectionError:
+ logger.warning("Connection closed during operation")
+ await reconnect()
+except IOError as e:
+ logger.error(f"I/O error: {e}")
+ await handle_io_error(e)
+```
+
+**Graceful Shutdown:**
+
+```python
+try:
+ # Close with timeout
+ await transport.close()
+
+ # Wait for complete shutdown
+ await transport.wait_closed()
+except Exception as e:
+ logger.error(f"Error during shutdown: {e}")
+```
+
+**Note**: The transport sends a WebSocket close frame and waits for the peer's close frame within `close_timeout` before terminating the connection.
+
+### Adapter Pattern
+
+AIP uses adapters to provide a unified interface across different WebSocket libraries without exposing implementation details.
+
+**Supported WebSocket Implementations:**
+
+| Implementation | Use Case | Adapter |
+|----------------|----------|---------|
+| **websockets library** | Client-side connections | `WebSocketsLibAdapter` |
+| **FastAPI WebSocket** | Server-side endpoints | `FastAPIWebSocketAdapter` |
+
+**Automatic Detection:**
+
+```python
+# Server-side: Automatically uses FastAPIWebSocketAdapter
+transport = WebSocketTransport(websocket=fastapi_websocket)
+
+# Client-side: Automatically uses WebSocketsLibAdapter
+transport = WebSocketTransport()
+await transport.connect("ws://server:8000/ws")
+```
+
+**Benefits:**
+
+- ✅ Protocol-level code remains unchanged across client/server
+- ✅ API differences abstracted by adapters
+- ✅ Easy to add new WebSocket implementations
+- ✅ Testability through adapter mocking
+
+---
+
+## Message Encoding
+
+AIP uses UTF-8 encoded JSON for all messages, leveraging Pydantic for serialization/deserialization.
+
+### Encoding Flow
+
+This diagram shows the transformation steps from Pydantic model to network bytes:
+
+```mermaid
+graph LR
+ A[Pydantic Model] -->|model_dump_json| B[JSON String]
+ B -->|encode utf-8| C[bytes]
+ C -->|transport.send| D[Network]
+
+ style A fill:#d4edda
+ style D fill:#d1ecf1
+```
+
+Pydantic handles type validation and JSON serialization, UTF-8 encoding converts to bytes, then the transport layer sends over the network. Decoding follows the reverse path.
+
+**Send Example:**
+
+```python
+from aip.messages import ClientMessage
+
+# 1. Create Pydantic model
+msg = ClientMessage(
+ message_type="TASK_RESULT",
+ task_id="task_123",
+ result={"status": "success"}
+)
+
+# 2. Serialize to JSON string
+json_str = msg.model_dump_json()
+
+# 3. Encode to bytes
+bytes_data = json_str.encode('utf-8')
+
+# 4. Send via transport
+await transport.send(bytes_data)
+```
+
+### Decoding Flow
+
+```mermaid
+graph LR
+ A[Network] -->|transport.receive| B[bytes]
+ B -->|decode utf-8| C[JSON String]
+ C -->|model_validate_json| D[Pydantic Model]
+
+ style A fill:#d1ecf1
+ style D fill:#d4edda
+```
+
+**Receive Example:**
+
+```python
+from aip.messages import ServerMessage
+
+# 1. Receive bytes
+bytes_data = await transport.receive()
+
+# 2. Decode to JSON string
+json_str = bytes_data.decode('utf-8')
+
+# 3. Deserialize to Pydantic model
+msg = ServerMessage.model_validate_json(json_str)
+
+# 4. Use typed data
+print(f"Task ID: {msg.task_id}")
+```
+
+---
+
+## Performance Optimization
+
+### Performance Comparison
+
+| Scenario | Recommended Configuration | Rationale |
+|----------|---------------------------|-----------|
+| **Large Messages** | `max_size=500MB`, compression | Screenshots, binary data |
+| **High Throughput** | Batch messages, `ping_interval=60s` | Reduce overhead per message |
+| **Low Latency** | Dedicated connections, `ping_interval=10s` | Fast failure detection |
+| **Mobile Networks** | `ping_interval=60s`, compression | Reduce battery/bandwidth usage |
+
+### Optimization Strategies
+
+**Large Messages Strategy:**
+
+For messages approaching `max_size`:
+
+**Option 1: Compression**
+```python
+ import gzip
+
+ compressed = gzip.compress(large_data)
+ await transport.send(compressed)
+ ```
+
+**Option 2: Chunking**
+```python
+ chunk_size = 1024 * 1024 # 1MB chunks
+ for i in range(0, len(large_data), chunk_size):
+ chunk = large_data[i:i+chunk_size]
+ await transport.send(chunk)
+ ```
+
+**Option 3: Streaming Protocol**
+
+Consider implementing a custom streaming protocol for very large payloads.
+
+[→ See message encoding details in Protocol Reference](./protocols.md)
+
+**High Throughput Strategy:**
+
+For high message rates:
+
+**Batch Messages:**
+```python
+ batch = [msg1, msg2, msg3, msg4]
+ batch_json = json.dumps([msg.model_dump() for msg in batch])
+ await transport.send(batch_json.encode('utf-8'))
+ ```
+
+**Reduce Ping Frequency:**
+```python
+transport = WebSocketTransport(
+ ping_interval=60.0 # Less overhead
+)
+```
+
+**Low Latency Strategy:**
+
+For real-time applications:
+
+**Fast Failure Detection:**
+```python
+ transport = WebSocketTransport(
+ ping_interval=10.0, # Quick detection
+ ping_timeout=30.0
+ )
+ ```
+
+**Dedicated Connections:**
+```python
+# One transport per device (no sharing)
+device_transports = {
+ device_id: WebSocketTransport()
+ for device_id in devices
+}
+```
+
+---
+
+## Transport Extensions
+
+!!!warning "Future Implementations"
+ AIP's architecture supports multiple transport implementations. The following are planned but not yet implemented.
+
+### HTTP/3 Transport (Planned)
+
+**Benefits:**
+
+- ✅ Multiplexing without head-of-line blocking (QUIC protocol)
+- ✅ 0-RTT connection resumption (faster reconnection)
+- ✅ Better mobile network performance (connection migration)
+- ✅ Built-in encryption (TLS 1.3)
+
+**Use Cases:**
+
+- High-latency networks (satellite, mobile)
+- Frequent reconnections (mobile roaming)
+- Multiple concurrent streams per connection
+
+### gRPC Transport (Planned)
+
+**Benefits:**
+
+- ✅ Strong typing with Protocol Buffers
+- ✅ Built-in load balancing
+- ✅ Bidirectional streaming RPCs
+- ✅ Code generation for multiple languages
+
+**Use Cases:**
+
+- Cross-language interoperability
+- Microservices communication
+- Performance-critical paths
+
+### Custom Transport Implementation
+
+Implement custom transports for specialized protocols:
+
+```python
+from aip.transport.base import Transport
+
+class CustomTransport(Transport):
+ async def connect(self, url: str, **kwargs) -> None:
+ # Custom connection logic
+ self._connection = await custom_protocol.connect(url)
+
+ async def send(self, data: bytes) -> None:
+ await self._connection.write(data)
+
+ async def receive(self) -> bytes:
+ return await self._connection.read()
+
+ async def close(self) -> None:
+ await self._connection.shutdown()
+
+ @property
+ def is_connected(self) -> bool:
+ return self._connection is not None and self._connection.is_open
+```
+
+**Integration:**
+
+Custom transports can be used directly with protocols:
+
+```python
+from aip.protocol import AIPProtocol
+
+# Use custom transport with protocol
+transport = CustomTransport()
+await transport.connect("custom://server:port")
+
+protocol = AIPProtocol(transport)
+await protocol.send_message(message)
+```
+
+[→ See Transport interface specification above](#transport-interface)
+[→ See Protocol usage examples](./protocols.md)
+
+---
+
+## Best Practices
+
+### Environment-Specific Configuration
+
+Adapt transport settings to your deployment environment's characteristics.
+
+| Environment | ping_interval | ping_timeout | max_size | close_timeout |
+|-------------|--------------|--------------|----------|---------------|
+| **Local Network** | 10-20s | 30-60s | 100MB | 5s |
+| **Internet** | 30-60s | 120-180s | 100MB | 10s |
+| **Unreliable Network** | 60-120s | 180-300s | 50MB | 15s |
+| **Mobile** | 60s | 180s | 10MB | 10s |
+
+**Local Network Example:**
+
+```python
+transport = WebSocketTransport(
+ ping_interval=15.0, # Quick failure detection
+ ping_timeout=45.0,
+ close_timeout=5.0
+)
+```
+
+**Internet Example:**
+
+```python
+transport = WebSocketTransport(
+ ping_interval=30.0, # Balance overhead and detection
+ ping_timeout=180.0,
+ close_timeout=10.0
+)
+```
+
+**Mobile Network Example:**
+
+```python
+transport = WebSocketTransport(
+ ping_interval=60.0, # Reduce battery usage
+ ping_timeout=180.0,
+ max_size=10 * 1024 * 1024 # 10MB for mobile
+)
+```
+
+### Connection Health Monitoring
+
+Always verify connection status before critical operations:
+
+```python
+# Check before sending
+if not transport.is_connected:
+ logger.warning("Transport not connected, attempting reconnection")
+ await reconnect_transport()
+
+# Proceed with send
+await transport.send(data)
+```
+
+### Resilience Integration
+
+Transport alone provides low-level communication. Combine with resilience components for production readiness:
+
+```python
+from aip.resilience import ReconnectionStrategy
+
+strategy = ReconnectionStrategy(max_retries=5)
+
+try:
+ await transport.send(data)
+except ConnectionError:
+ # Trigger reconnection
+ await strategy.handle_disconnection(endpoint, device_id)
+```
+
+[→ See Resilience documentation](./resilience.md)
+[→ See HeartbeatManager for connection health monitoring](./resilience.md#heartbeat-manager)
+
+### Logging and Observability
+
+```python
+import logging
+
+# Enable transport debug logs
+logging.getLogger("aip.transport").setLevel(logging.DEBUG)
+
+# Custom transport event logging
+class LoggedTransport(WebSocketTransport):
+ async def send(self, data: bytes) -> None:
+ logger.debug(f"Sending {len(data)} bytes")
+ await super().send(data)
+
+ async def receive(self) -> bytes:
+ data = await super().receive()
+ logger.debug(f"Received {len(data)} bytes")
+ return data
+```
+
+### Resource Cleanup
+
+!!!danger "Prevent Resource Leaks"
+ Always close transports to prevent socket/memory leaks:
+
+**Context Manager Pattern (Recommended):**
+
+```python
+async with WebSocketTransport() as transport:
+ await transport.connect("ws://localhost:8000/ws")
+ await transport.send(data)
+ # Automatic cleanup on exit
+```
+
+**Try-Finally Pattern:**
+
+```python
+transport = WebSocketTransport()
+try:
+ await transport.connect("ws://localhost:8000/ws")
+ await transport.send(data)
+finally:
+ await transport.close()
+```
+
+---
+
+## Quick Reference
+
+### Import Transport Components
+
+```python
+from aip.transport import (
+ Transport, # Abstract base class
+ WebSocketTransport, # WebSocket implementation
+ TransportState, # Connection states enum
+)
+```
+
+### Common Patterns
+
+| Pattern | Code |
+|---------|------|
+| **Create transport** | `transport = WebSocketTransport()` |
+| **Connect** | `await transport.connect("ws://host:port/path")` |
+| **Send** | `await transport.send(data.encode('utf-8'))` |
+| **Receive** | `data = await transport.receive()` |
+| **Check status** | `if transport.is_connected: ...` |
+| **Close** | `await transport.close()` |
+
+### Related Documentation
+
+- [Protocol Reference](./protocols.md) - How protocols use transports
+- [Resilience](./resilience.md) - Connection management and reconnection
+- [Endpoints](./endpoints.md) - Transport usage in endpoints
+- [Messages](./messages.md) - Message encoding/decoding
diff --git a/documents/docs/automator/ai_tool_automator.md b/documents/docs/automator/ai_tool_automator.md
deleted file mode 100644
index f29ae42ad..000000000
--- a/documents/docs/automator/ai_tool_automator.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# AI Tool Automator
-
-The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks.
-
-!!! note
- UFO can also call in-app AI tools, such as `Copilot`, to assist with the automation process. This is achieved by using either `UI Automation` or `API` to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application.
-
-## Configuration
-The AI Tool Automator shares the same prompt configuration options as the UI Automator:
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
-| `API_PROMPT` | The prompt for the UI automation API. | String | "ufo/prompts/share/base/api.yaml" |
-
-
-## Receiver
-The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the [UI Automator Receiver](./ui_automator.md#receiver) section for more details.
-
-## Command
-The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the [UI Automator Command](./ui_automator.md#command) section for more details. The list of available commands in the AI Tool Automator is shown below:
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `AnnotationCommand` | `annotation` | Annotate the control items on the screenshot. |
-| `SummaryCommand` | `summary` | Summarize the observation of the current application window. |
-
diff --git a/documents/docs/automator/bash_automator.md b/documents/docs/automator/bash_automator.md
deleted file mode 100644
index edf8e35f9..000000000
--- a/documents/docs/automator/bash_automator.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Bash Automator
-
-UFO allows the `HostAgent` to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The `Bash Automator` is implemented in the `ufo/automator/app_apis/shell` module.
-
-
-!!!note
- Only `HostAgent` is currently supported by the Bash Automator.
-
-## Receiver
-The Web Automator receiver is the `ShellReceiver` class defined in the `ufo/automator/app_apis/shell/shell_client.py` file.
-
-::: automator.app_apis.shell.shell_client.ShellReceiver
-
-
-
-
-## Command
-
-We now only support one command in the Bash Automator to execute a bash command on the host machine.
-
-```python
-@ShellReceiver.register
-class RunShellCommand(ShellCommand):
- """
- The command to run the crawler with various options.
- """
-
- def execute(self):
- """
- Execute the command to run the crawler.
- :return: The result content.
- """
- return self.receiver.run_shell(params=self.params)
-
- @classmethod
- def name(cls) -> str:
- """
- The name of the command.
- """
- return "run_shell"
-```
-
-
-Below is the list of available commands in the Web Automator that are currently supported by UFO:
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `RunShellCommand` | `run_shell` | Get the content of a web page into a markdown format. |
diff --git a/documents/docs/automator/overview.md b/documents/docs/automator/overview.md
deleted file mode 100644
index 5ac1d3870..000000000
--- a/documents/docs/automator/overview.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# Application Puppeteer
-
-The `Puppeteer` is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: `GUI` and `API`. Each application has a shared GUI action interface to operate with mouse and keyboard events, and a private API action interface to operate with the application's native API. We illustrate the `Puppeteer` architecture in the figure below:
-
-The state machine diagram for the `HostAgent` is shown below:
-
-
-
-
-
-
-!!! note
- UFO can also call in-app AI tools, such as `Copilot`, to assist with the automation process. This is achieved by using either `GUI` or `API` to interact with the in-app AI tool.
-
-- [UI Automator](./ui_automator.md) - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the **UIA** or **Win32** APIs to interact with the application's UI controls.
-- [API](./wincom_automator.md) - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications.
-- [Web](./web_automator.md) - This action type is used to interact with web applications. UFO uses the [**crawl4ai**](https://github.com/unclecode/crawl4ai) library to extract information from web pages.
-- [Bash](./bash_automator.md) - This action type is used to interact with the command line interface (CLI) of an application.
-- [AI Tool](./ai_tool_automator.md) - This action type is used to interact with the LLM-based AI tools.
-
-## Action Design Patterns
-
-Actions in UFO are implemented using the [command](https://refactoring.guru/design-patterns/command) design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action.
-
-The basic classes for implementing actions in UFO are as follows:
-
-| Role | Class | Description |
-| --- | --- | --- |
-| Receiver | `ufo.automator.basic.ReceiverBasic` | The base class for all receivers in UFO. Receivers are objects that perform actions on applications. |
-| Command | `ufo.automator.basic.CommandBasic` | The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. |
-| Invoker | `ufo.automator.puppeteer.AppPuppeteer` | The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. |
-
-The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions.
-
-## Receiver
-
-The `Receiver` is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the `ReceiverManager` class.
-
-You can find the reference for a basic `Receiver` class below:
-
-::: automator.basic.ReceiverBasic
-
-
-
-## Command
-
-The `Command` is a specific action that the `Receiver` can perform on the application. It encapsulates the function and parameters required to execute the action. The `Command` class is a base class for all commands in the Automator application.
-
-You can find the reference for a basic `Command` class below:
-
-::: automator.basic.CommandBasic
-
-
-
-!!! note
- Each command must register with a specific `Receiver` to be executed using the `register_command` decorator. For example:
- @ReceiverExample.register
- class CommandExample(CommandBasic):
- ...
-
-
-## Invoker (AppPuppeteer)
-
-The `AppPuppeteer` plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The `AppPuppeteer` equips the `AppAgent` with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the `Puppeteer` with the `ReceiverManager` class.
-
-You can find the implementation of the `AppPuppeteer` class in the `ufo/automator/puppeteer.py` file, and its reference is shown below.
-
-::: automator.puppeteer.AppPuppeteer
-
-
-
-
-## Receiver Manager
-The `ReceiverManager` manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the `AppPuppeteer`.
-
-::: automator.puppeteer.ReceiverManager
-
-
-
-For further details, refer to the specific documentation for each component and class in the Automator module.
\ No newline at end of file
diff --git a/documents/docs/automator/ui_automator.md b/documents/docs/automator/ui_automator.md
deleted file mode 100644
index 656647521..000000000
--- a/documents/docs/automator/ui_automator.md
+++ /dev/null
@@ -1,79 +0,0 @@
-# GUI Automator
-
-The GUI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the **UIA** or **Win32** APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.
-
-
-## Configuration
-
-There are several configurations that need to be set up before using the UI Automator in the `config_dev.yaml` file. Below is the list of configurations related to the UI Automator:
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
-| `CONTROL_BACKEND` | The list of backend for control action, currently supporting `uia` and `win32` and `onmiparser` | List | ["uia"] |
-| `CONTROL_LIST` | The list of widgets allowed to be selected. | List | ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"] |
-| `ANNOTATION_COLORS` | The colors assigned to different control types for annotation. | Dictionary | {"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"} |
-| `API_PROMPT` | The prompt for the UI automation API. | String | "ufo/prompts/share/base/api.yaml" |
-| `CLICK_API` | The API used for click action, can be `click_input` or `click`. | String | "click_input" |
-| `INPUT_TEXT_API` | The API used for input text action, can be `type_keys` or `set_text`. | String | "type_keys" |
-| `INPUT_TEXT_ENTER` | Whether to press enter after typing the text. | Boolean | False |
-
-
-
-## Receiver
-
-The receiver of the UI Automator is the `ControlReceiver` class defined in the `ufo/automator/ui_control/controller/control_receiver` module. It is initialized with the application's window handle and control wrapper that executes the actions. The `ControlReceiver` provides functionalities to interact with the application's UI controls. Below is the reference for the `ControlReceiver` class:
-
-::: automator.ui_control.controller.ControlReceiver
-
-
-
-## Command
-
-The command of the UI Automator is the `ControlCommand` class defined in the `ufo/automator/ui_control/controller/ControlCommand` module. It encapsulates the function and parameters required to execute the action. The `ControlCommand` class is a base class for all commands in the UI Automator application. Below is an example of a `ClickInputCommand` class that inherits from the `ControlCommand` class:
-
-```python
-@ControlReceiver.register
-class ClickInputCommand(ControlCommand):
- """
- The click input command class.
- """
-
- def execute(self) -> str:
- """
- Execute the click input command.
- :return: The result of the click input command.
- """
- return self.receiver.click_input(self.params)
-
- @classmethod
- def name(cls) -> str:
- """
- Get the name of the atomic command.
- :return: The name of the atomic command.
- """
- return "click_input"
-```
-
-!!! note
- The concrete command classes must implement the `execute` method to execute the action and the `name` method to return the name of the atomic command.
-
-!!! note
- Each command must register with a specific `ControlReceiver` to be executed using the `@ControlReceiver.register` decorator.
-
-Below is the list of available commands in the UI Automator that are currently supported by UFO:
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `ClickInputCommand` | `click_input` | Click the control item with the mouse. |
-| `ClickOnCoordinatesCommand` | `click_on_coordinates` | Click on the specific fractional coordinates of the application window. |
-| `DragOnCoordinatesCommand` | `drag_on_coordinates` | Drag the mouse on the specific fractional coordinates of the application window. |
-| `SetEditTextCommand` | `set_edit_text` | Add new text to the control item. |
-| `GetTextsCommand` | `texts` | Get the text of the control item. |
-| `WheelMouseInputCommand` | `wheel_mouse_input` | Scroll the control item. |
-| `KeyboardInputCommand` | `keyboard_input` | Simulate the keyboard input. |
-
-!!! tip
- Please refer to the `ufo/prompts/share/base/api.yaml` file for the detailed API documentation of the UI Automator.
-
-!!! tip
- You can customize the commands by adding new command classes to the `ufo/automator/ui_control/controller/ControlCommand` module.
diff --git a/documents/docs/automator/web_automator.md b/documents/docs/automator/web_automator.md
deleted file mode 100644
index ede352edd..000000000
--- a/documents/docs/automator/web_automator.md
+++ /dev/null
@@ -1,62 +0,0 @@
-# Web Automator
-
-We also support the use of the `Web Automator` to get the content of a web page. The `Web Automator` is implemented in `ufo/autoamtor/app_apis/web` module.
-
-## Configuration
-
-There are several configurations that need to be set up before using the API Automator in the `config_dev.yaml` file. Below is the list of configurations related to the API Automator:
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
-| `USE_APIS` | Whether to allow the use of application APIs. | Boolean | True |
-| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |
-
-!!!note
- Only `msedge.exe` and `chrome.exe` are currently supported by the Web Automator.
-
-## Receiver
-The Web Automator receiver is the `WebReceiver` class defined in the `ufo/automator/app_apis/web/webclient.py` module:
-
-::: automator.app_apis.web.webclient.WebReceiver
-
-
-
-## Command
-
-We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator.
-
-```python
-@WebReceiver.register
-class WebCrawlerCommand(WebCommand):
- """
- The command to run the crawler with various options.
- """
-
- def execute(self):
- """
- Execute the command to run the crawler.
- :return: The result content.
- """
- return self.receiver.web_crawler(
- url=self.params.get("url"),
- ignore_link=self.params.get("ignore_link", False),
- )
-
- @classmethod
- def name(cls) -> str:
- """
- The name of the command.
- """
- return "web_crawler"
-```
-
-
-Below is the list of available commands in the Web Automator that are currently supported by UFO:
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `WebCrawlerCommand` | `web_crawler` | Get the content of a web page into a markdown format. |
-
-
-!!! tip
- Please refer to the `ufo/prompts/apps/web/api.yaml` file for the prompt details for the `WebCrawlerCommand` command.
\ No newline at end of file
diff --git a/documents/docs/automator/wincom_automator.md b/documents/docs/automator/wincom_automator.md
deleted file mode 100644
index 59b14ae8b..000000000
--- a/documents/docs/automator/wincom_automator.md
+++ /dev/null
@@ -1,85 +0,0 @@
-# API Automator
-
-UFO currently support the use of [`Win32 API`](https://learn.microsoft.com/en-us/windows/win32/api/) API automator to interact with the application's native API. We implement them in python using the [`pywin32`](https://pypi.org/project/pywin32/) library. The API automator now supports `Word` and `Excel` applications, and we are working on extending the support to other applications.
-
-## Configuration
-
-There are several configurations that need to be set up before using the API Automator in the `config_dev.yaml` file. Below is the list of configurations related to the API Automator:
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
-| `USE_APIS` | Whether to allow the use of application APIs. | Boolean | True |
-| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |
-
-!!! note
- Only `WINWORD.EXE` and `EXCEL.EXE` are currently supported by the API Automator.
-
-
-## Receiver
-The base class for the receiver of the API Automator is the `WinCOMReceiverBasic` class defined in the `ufo/automator/app_apis/basic` module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the `WinCOMReceiverBasic` class:
-
-::: automator.app_apis.basic.WinCOMReceiverBasic
-
-The receiver of `Word` and `Excel` applications inherit from the `WinCOMReceiverBasic` class. The `WordReceiver` and `ExcelReceiver` classes are defined in the `ufo/automator/app_apis/word` and `ufo/automator/app_apis/excel` modules, respectively:
-
-
-## Command
-
-The command of the API Automator for the `Word` and `Excel` applications in located in the `client` module in the `ufo/automator/app_apis/{app_name}` folder inheriting from the `WinCOMCommand` class. It encapsulates the function and parameters required to execute the action. Below is an example of a `WordCommand` class that inherits from the `SelectTextCommand` class:
-
-```python
-@WordWinCOMReceiver.register
-class SelectTextCommand(WinCOMCommand):
- """
- The command to select text.
- """
-
- def execute(self):
- """
- Execute the command to select text.
- :return: The selected text.
- """
- return self.receiver.select_text(self.params.get("text"))
-
- @classmethod
- def name(cls) -> str:
- """
- The name of the command.
- """
- return "select_text"
-```
-
-!!! note
- The concrete command classes must implement the `execute` method to execute the action and the `name` method to return the name of the atomic command.
-
-!!! note
- Each command must register with a concrete `WinCOMReceiver` to be executed using the `register` decorator.
-
-Below is the list of available commands in the API Automator that are currently supported by UFO:
-
-### Word API Commands
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `InsertTableCommand` | `insert_table` | Insert a table to a Word document. |
-| `SelectTextCommand` | `select_text` | Select the text in a Word document. |
-| `SelectTableCommand` | `select_table` | Select a table in a Word document. |
-
-
-### Excel API Commands
-
-| Command Name | Function Name | Description |
-|--------------|---------------|-------------|
-| `GetSheetContentCommand` | `get_sheet_content` | Get the content of a sheet in the Excel app. |
-| `Table2MarkdownCommand` | `table2markdown` | Convert the table content in a sheet of the Excel app to markdown format. |
-| `InsertExcelTableCommand` | `insert_excel_table` | Insert a table to the Excel sheet. |
-
-
-!!! tip
- Please refer to the `ufo/prompts/apps/{app_name}/api.yaml` file for the prompt details for the commands.
-
-!!! tip
- You can customize the commands by adding new command classes to the `ufo/automator/app_apis/{app_name}/` module.
-
-
-
\ No newline at end of file
diff --git a/documents/docs/choose_path.md b/documents/docs/choose_path.md
new file mode 100644
index 000000000..47b7cbfd4
--- /dev/null
+++ b/documents/docs/choose_path.md
@@ -0,0 +1,445 @@
+# Choosing Your Path: UFO² or UFO³ Galaxy?
+
+Not sure which UFO framework to use? This guide will help you make the right choice based on your specific needs.
+
+---
+
+## 🗺️ Quick Decision Tree
+
+Use this interactive flowchart to find the best solution for your use case:
+
+
+```mermaid
+graph TD
+ Start[What are you trying to automate?] --> Q1{Involves multiple devices/platforms?}
+
+ Q1 -->|Yes| Q2{Need parallel execution across devices?}
+ Q1 -->|No| Q3{Complex multi-app workflow on Windows?}
+
+ Q2 -->|Yes| Galaxy[✨ Use UFO³ Galaxy]
+ Q2 -->|No, sequential| Q4{Can tasks run independently?}
+
+ Q4 -->|Yes, independent| UFO2_Multi[Use UFO² on each device separately]
+ Q4 -->|No, dependencies| Galaxy
+
+ Q3 -->|Yes| UFO2[🪟 Use UFO²]
+ Q3 -->|No, simple task| UFO2
+
+ Q3 -->|Might scale later| Hybrid[Use UFO² now, Galaxy-ready setup]
+
+ Galaxy --> GalaxyDoc[📖 See Galaxy Quick Start]
+ UFO2 --> UFO2Doc[📖 See UFO² Quick Start]
+ UFO2_Multi --> UFO2Doc
+ Hybrid --> MigrationDoc[📖 See Migration Guide]
+
+ style Galaxy fill:#fff9c4
+ style UFO2 fill:#c8e6c9
+ style UFO2_Multi fill:#c8e6c9
+ style Hybrid fill:#e1bee7
+
+ click GalaxyDoc "./getting_started/quick_start_galaxy.md"
+ click UFO2Doc "./getting_started/quick_start_ufo2.md"
+ click MigrationDoc "./getting_started/migration_ufo2_to_galaxy.md"
+```
+
+---
+
+## 📊 Quick Comparison Matrix
+
+| Dimension | UFO² Desktop AgentOS | UFO³ Galaxy |
+|-----------|---------------------|-------------|
+| **Target Scope** | Single Windows desktop | Multiple devices (Windows/Linux/macOS) |
+| **Best For** | Simple local automation | Complex cross-device workflows |
+| **Setup Complexity** | ⭐ Simple | ⭐⭐⭐ Moderate (requires device pool) |
+| **Learning Curve** | ⭐⭐ Easy | ⭐⭐⭐⭐ Advanced |
+| **Execution Model** | Sequential multi-app | Parallel DAG orchestration |
+| **Network Required** | ❌ No | ✅ Yes (WebSocket between devices) |
+| **Parallelism** | Within single device | Across multiple devices |
+| **Fault Tolerance** | Retry on same device | Retry + task migration |
+| **Typical Latency** | 10-30s (local) | 20-60s (includes orchestration) |
+| **Ideal Task Count** | 1-5 steps | 5-20+ steps with dependencies |
+
+**Quick Rule of Thumb:**
+- **1 device + simple workflow** → UFO²
+- **2+ devices OR complex dependencies** → Galaxy
+- **Not sure?** → Start with UFO², migrate later ([Migration Guide](./getting_started/migration_ufo2_to_galaxy.md))
+
+---
+
+## 🎯 Scenario-Based Recommendations
+
+### Scenario 1: Desktop Productivity Automation
+
+**Task:** "Create a weekly report: extract data from Excel, generate charts in PowerPoint, send via Outlook"
+
+**Recommendation:** ✅ **UFO²**
+
+**Why:**
+- All applications on one Windows desktop
+- Sequential workflow (Excel → PowerPoint → Outlook)
+- No cross-device dependencies
+
+**Learn More:** [UFO² Overview](./ufo2/overview.md)
+
+---
+
+### Scenario 2: Development Workflow Automation
+
+**Task:** "Clone repo on my laptop, build Docker image on GPU server, run tests on CI cluster, open results on my desktop"
+
+**Recommendation:** ✅ **UFO³ Galaxy**
+
+**Why:**
+- Spans 3+ devices (laptop, GPU server, CI cluster, desktop)
+- Sequential dependencies (clone → build → test → display)
+- Requires device coordination and data transfer
+
+**Learn More:** [Galaxy Overview](./galaxy/overview.md)
+
+---
+
+### Scenario 3: Batch Data Processing
+
+**Task:** "Process 100 files: fetch from cloud, clean data, run ML model, save results"
+
+**Recommendation:** **Depends on setup**
+
+| Setup | Recommendation | Why |
+|-------|---------------|-----|
+| **Single powerful workstation** | ✅ UFO² | All processing on one machine, simpler |
+| **Distributed cluster** | ✅ Galaxy | Parallel processing across nodes, faster |
+| **Mix (local + cloud GPU)** | ✅ Galaxy | Heterogeneous resources |
+
+**Learn More:**
+- [UFO² for Single Device](./getting_started/quick_start_ufo2.md)
+- [Galaxy for Distributed](./getting_started/quick_start_galaxy.md)
+
+---
+
+### Scenario 4: Cross-Platform Testing
+
+**Task:** "Test web app on Windows Chrome, Linux Firefox, and macOS Safari"
+
+**Recommendation:** ✅ **UFO³ Galaxy**
+
+**Why:**
+- Requires 3 different OS platforms
+- Parallel execution saves time
+- Centralized result aggregation
+
+**Learn More:** [Galaxy Multi-Platform Support](./galaxy/overview.md#cross-device-collaboration)
+
+---
+
+### Scenario 5: File Management & Organization
+
+**Task:** "Organize Downloads folder by file type, compress old files, upload to cloud"
+
+**Recommendation:** ✅ **UFO²**
+
+**Why:**
+- Single-device local file operations
+- No network dependencies
+- Simple sequential workflow
+
+**Learn More:** [UFO² Quick Start](./getting_started/quick_start_ufo2.md)
+
+---
+
+### Scenario 6: Multi-Stage Data Pipeline
+
+**Task:** "Collect logs from 5 Linux servers, aggregate on central server, analyze, generate dashboard on Windows"
+
+**Recommendation:** ✅ **UFO³ Galaxy**
+
+**Why:**
+- Multiple source devices (5 Linux servers)
+- Parallel log collection (5x faster than sequential)
+- Cross-platform (Linux → Windows)
+- Complex dependency graph
+
+**Learn More:** [Galaxy Task Constellation](./galaxy/constellation/overview.md)
+
+---
+
+### Scenario 7: Learning Agent Development
+
+**Task:** "I'm new to agent development and want to learn by building simple automation"
+
+**Recommendation:** ✅ **UFO²**
+
+**Why:**
+- Simpler architecture (easier to understand)
+- Faster feedback loop (local execution)
+- Comprehensive documentation and examples
+- Can upgrade to Galaxy later
+
+**Learn More:** [UFO² Quick Start](./getting_started/quick_start_ufo2.md)
+
+---
+
+### Scenario 8: Enterprise Workflow Integration
+
+**Task:** "Integrate with existing CI/CD pipeline across dev laptops, build servers, and test farms"
+
+**Recommendation:** ✅ **UFO³ Galaxy**
+
+**Why:**
+- Enterprise-scale device coordination
+- Fault tolerance with automatic recovery
+- Formal safety guarantees for correctness
+- Supports heterogeneous infrastructure
+
+**Learn More:** [Galaxy Architecture](./galaxy/overview.md#architecture)
+
+---
+
+## 🔀 Hybrid Approaches
+
+You don't have to choose just one! Here are common hybrid patterns:
+
+### Pattern 1: UFO² as Galaxy Device
+
+**Setup:** Run UFO² as a Galaxy device (requires both server and client)
+
+```bash
+# Terminal 1: Start UFO² Server on Windows desktop
+python -m ufo.server.app --port 5000
+
+# Terminal 2: Start UFO² Client (connect to server)
+python -m ufo.client.client --ws --ws-server ws://localhost:5000/ws --client-id my_windows_device --platform windows
+```
+
+**Benefits:**
+- Keep UFO² for local Windows expertise
+- Gain Galaxy's cross-device orchestration
+- Best of both worlds
+
+**Learn More:** [UFO² as Galaxy Device](./ufo2/as_galaxy_device.md)
+
+---
+
+### Pattern 2: Gradual Migration
+
+**Strategy:** Start with UFO² for immediate needs, prepare for Galaxy expansion
+
+**Phase 1:** Use UFO² standalone
+```bash
+python -m ufo --task "Your current task"
+```
+
+**Phase 2:** Make UFO² Galaxy-compatible
+```yaml
+# config/galaxy/devices.yaml (prepare in advance)
+devices:
+ - device_id: "my_windows"
+ server_url: "ws://localhost:5000/ws" # Where UFO client connects to UFO server
+ os: "windows"
+ capabilities: ["office", "web"]
+```
+
+**Phase 3:** Start UFO device agent and connect to Galaxy
+```bash
+# Terminal 1: Start UFO Server on your Windows machine
+python -m ufo.server.app --port 5000
+
+# Terminal 2: Start UFO Client (connects to UFO server above)
+python -m ufo.client.client --ws --ws-server ws://localhost:5000/ws --client-id my_windows --platform windows
+
+# Terminal 3: Start Galaxy (on control machine, can be same or different)
+python -m galaxy --request "Cross-device workflow"
+```
+
+**Learn More:** [Migration Guide](./getting_started/migration_ufo2_to_galaxy.md)
+
+---
+
+### Pattern 3: Domain-Specific Split
+
+**Strategy:** Use different frameworks for different workflow types
+
+| Workflow Type | Framework | Example |
+|--------------|-----------|---------|
+| **Daily desktop tasks** | UFO² | Email processing, document creation |
+| **Development workflows** | Galaxy | Code build → test → deploy |
+| **Data processing** | Galaxy (if distributed) | Multi-node ML training |
+| **Quick automation** | UFO² | One-off tasks |
+
+**Learn More:** [When to Use Which](./getting_started/migration_ufo2_to_galaxy.md#when-to-use-which)
+
+---
+
+## 🚫 Common Misconceptions
+
+### Misconception 1: "Galaxy is always better because it's newer"
+
+**Reality:** UFO² is better for simple single-device tasks due to:
+- Lower latency (no network overhead)
+- Simpler setup and debugging
+- Battle-tested stability
+
+**Use Galaxy only when you actually need multi-device orchestration.**
+
+---
+
+### Misconception 2: "I need to rewrite everything to migrate to Galaxy"
+
+**Reality:** UFO² can run as a Galaxy device with minimal changes:
+```bash
+# Terminal 1: Start UFO Server
+python -m ufo.server.app --port 5000
+
+# Terminal 2: Start UFO Client in WebSocket mode
+python -m ufo.client.client --ws --ws-server ws://localhost:5000/ws --client-id my_device --platform windows
+```
+
+**Learn More:** [Migration Guide](./getting_started/migration_ufo2_to_galaxy.md#option-2-convert-ufo2-instance-to-galaxy-device)
+
+---
+
+### Misconception 3: "Galaxy can't run on a single device"
+
+**Reality:** Galaxy works perfectly on one device if you need:
+- DAG-based workflow planning
+- Advanced monitoring and trajectory reports
+- Preparation for future multi-device expansion
+
+```yaml
+# Single-device Galaxy setup
+devices:
+ - device_id: "localhost"
+ server_url: "ws://localhost:5005/ws"
+```
+
+---
+
+### Misconception 4: "UFO² is deprecated in favor of Galaxy"
+
+**Reality:** UFO² is actively maintained and recommended for single-device use:
+- More efficient for local tasks
+- Simpler for beginners
+- Core component when used as Galaxy device
+
+**Both frameworks are complementary, not competing.**
+
+---
+
+## 🎓 Learning Paths
+
+### For Beginners
+
+**Week 1-2: Start with UFO²**
+1. [UFO² Quick Start](./getting_started/quick_start_ufo2.md)
+2. Build simple automation (file management, email, etc.)
+3. Understand HostAgent/AppAgent architecture
+
+**Week 3-4: Explore Advanced UFO²**
+4. [Hybrid GUI-API Actions](./ufo2/core_features/hybrid_actions.md)
+5. [MCP Server Integration](./mcp/overview.md)
+6. [Customization & Learning](./ufo2/advanced_usage/customization.md)
+
+**Week 5+: Graduate to Galaxy (if needed)**
+7. [Migration Guide](./getting_started/migration_ufo2_to_galaxy.md)
+8. [Galaxy Quick Start](./getting_started/quick_start_galaxy.md)
+9. Build cross-device workflows
+
+---
+
+### For Experienced Developers
+
+**Direct to Galaxy** if you already know you need multi-device:
+1. [Galaxy Quick Start](./getting_started/quick_start_galaxy.md)
+2. [Task Constellation Concepts](./galaxy/constellation/overview.md)
+3. [ConstellationAgent Deep Dive](./galaxy/constellation_agent/overview.md)
+4. [Performance Monitoring](./galaxy/evaluation/performance_metrics.md)
+
+---
+
+## 📋 Decision Checklist
+
+Still unsure? Answer these questions:
+
+**Q1: Does your workflow involve 2+ physical devices?**
+
+- ✅ Yes → **Galaxy**
+- ❌ No → Continue to Q2
+
+**Q2: Do you need parallel execution across different machines?**
+
+- ✅ Yes → **Galaxy**
+- ❌ No → Continue to Q3
+
+**Q3: Does your workflow have complex dependencies (DAG structure)?**
+
+- ✅ Yes, complex DAG → **Galaxy**
+- ❌ No, simple sequence → Continue to Q4
+
+**Q4: Are you comfortable with distributed systems concepts?**
+
+- ✅ Yes → **Galaxy** (if any of Q1-Q3 is yes)
+- ❌ No → **UFO²** (learn basics first)
+
+**Q5: Do you need cross-platform support (Windows + Linux)?**
+
+- ✅ Yes → **Galaxy**
+- ❌ No, Windows only → **UFO²**
+
+---
+
+**Result:**
+
+- **3+ "Galaxy" answers** → Use Galaxy ([Quick Start](./getting_started/quick_start_galaxy.md))
+- **Mostly "UFO²" answers** → Use UFO² ([Quick Start](./getting_started/quick_start_ufo2.md))
+- **Mixed answers** → Start with UFO², keep Galaxy option open ([Migration Guide](./getting_started/migration_ufo2_to_galaxy.md))
+
+---
+
+## 🔗 Next Steps
+
+### If you chose UFO²:
+1. 📖 [UFO² Quick Start Guide](./getting_started/quick_start_ufo2.md)
+2. 🎯 [UFO² Overview & Architecture](./ufo2/overview.md)
+3. 🛠️ [Configuration Guide](./configuration/system/overview.md)
+
+### If you chose Galaxy:
+1. 📖 [Galaxy Quick Start Guide](./getting_started/quick_start_galaxy.md)
+2. 🎯 [Galaxy Overview & Architecture](./galaxy/overview.md)
+3. 🌟 [Task Constellation Concepts](./galaxy/constellation/overview.md)
+
+### If you're still exploring:
+1. 📊 [Detailed Comparison](./getting_started/migration_ufo2_to_galaxy.md#when-to-use-which)
+2. 🎬 [Demo Video](https://www.youtube.com/watch?v=QT_OhygMVXU)
+3. 📄 [Research Paper](https://arxiv.org/abs/2504.14603)
+
+---
+
+## 💡 Pro Tips
+
+!!! tip "Start Simple"
+ When in doubt, start with **UFO²**. It's easier to scale up to Galaxy later than to debug a complex Galaxy setup when you don't need it.
+
+!!! tip "Hybrid is Valid"
+ Don't feel locked into one choice. You can use **UFO² for local tasks** and **Galaxy for cross-device workflows** simultaneously.
+
+!!! tip "Test Before Committing"
+ Try both for a simple workflow to see which feels more natural for your use case:
+ ```bash
+ # UFO² test
+ python -m ufo --task "Create test report"
+
+ # Galaxy test
+ python -m galaxy --request "Create test report"
+ ```
+
+!!! warning "Network Requirements"
+ Galaxy requires **stable network connectivity** between devices. If your environment has network restrictions, UFO² might be more reliable.
+
+---
+
+## 🤝 Getting Help
+
+- **Documentation:** [https://microsoft.github.io/UFO/](https://microsoft.github.io/UFO/)
+- **GitHub Issues:** [https://github.com/microsoft/UFO/issues](https://github.com/microsoft/UFO/issues)
+- **Discussions:** [https://github.com/microsoft/UFO/discussions](https://github.com/microsoft/UFO/discussions)
+
+Still have questions? Check the [Migration FAQ](./getting_started/migration_ufo2_to_galaxy.md#getting-help) or open a discussion on GitHub!
diff --git a/documents/docs/client/computer.md b/documents/docs/client/computer.md
new file mode 100644
index 000000000..16d98ac18
--- /dev/null
+++ b/documents/docs/client/computer.md
@@ -0,0 +1,583 @@
+# Computer
+
+The **Computer** class is the core execution layer of the UFO client. It manages MCP (Model Context Protocol) tool execution, maintains tool registries, and provides thread-isolated execution for reliability. Each Computer instance represents a distinct execution context with its own namespace and resource management.
+
+## Architecture Overview
+
+The Computer layer provides the execution engine for MCP tools with three main components:
+
+```mermaid
+graph TB
+ CommandRouter["CommandRouter Command Routing"]
+ ComputerManager["ComputerManager Instance Management"]
+ Computer["Computer Core Execution Layer"]
+ MCPServerManager["MCP Server Manager Process Isolation"]
+
+ CommandRouter -->|Routes To| ComputerManager
+ ComputerManager -->|Creates & Manages| Computer
+ Computer -->|Data Collection| DataServers["Data Collection Servers screenshot, ui_detection, etc."]
+ Computer -->|Actions| ActionServers["Action Servers gui_automation, file_operations, etc."]
+ Computer -->|Uses| ToolsRegistry["Tools Registry tool_type::tool_name → MCPToolCall"]
+ Computer -->|Provides| MetaTools["Meta Tools list_tools built-in introspection"]
+ Computer -->|Delegates To| MCPServerManager
+```
+
+**Computer** manages MCP tool execution with thread isolation and timeout control (6000-second timeout, 10-worker thread pool).
+**ComputerManager** handles multiple Computer instances with namespace-based routing.
+**CommandRouter** routes and executes commands across Computer instances with early-exit support.
+
+### Key Responsibilities
+
+- **Tool Registration**: Register tools from multiple MCP servers with namespace isolation
+- **Command Routing**: Convert high-level commands to MCP tool calls
+- **Execution Management**: Execute tools in isolated thread pools with timeout protection
+- **Meta Tools**: Provide introspection capabilities (e.g., `list_tools`)
+
+## Table of Contents
+
+## Core Components
+
+### 1. Computer Class
+
+The `Computer` class manages a single logical computer with its own set of MCP servers and tools.
+
+#### Key Attributes
+
+| Attribute | Type | Description |
+|-----------|------|-------------|
+| `_name` | `str` | Unique identifier for the computer instance |
+| `_process_name` | `str` | Associated process name for MCP server isolation |
+| `_data_collection_servers` | `Dict[str, BaseMCPServer]` | Servers for data collection (screenshot, UI detection, etc.) |
+| `_action_servers` | `Dict[str, BaseMCPServer]` | Servers for actions (GUI automation, file operations, etc.) |
+| `_tools_registry` | `Dict[str, MCPToolCall]` | Registry of all available tools (key: `tool_type::tool_name`) |
+| `_meta_tools` | `Dict[str, Callable]` | Built-in introspection tools |
+| `_executor` | `ThreadPoolExecutor` | Thread pool for isolated tool execution (10 workers) |
+| `_tool_timeout` | `int` | Tool execution timeout (6000 seconds = 100 minutes) |
+
+#### Tool Namespaces
+
+Computer supports two types of tool namespaces:
+
+- **`data_collection`**: Tools for gathering information (non-destructive operations)
+- **`action`**: Tools for performing actions (state-changing operations)
+
+```python
+# Tool key format: "tool_type::tool_name"
+"data_collection::screenshot" # Take screenshot
+"data_collection::ui_detection" # Detect UI elements
+"action::click" # Click UI element
+"action::type_text" # Type text
+```
+
+> **Note:** Different namespaces allow the same tool name to exist in both data collection and action contexts. For example, both `data_collection::get_file_info` and `action::get_file_info` can coexist.
+
+### 2. ComputerManager Class
+
+The `ComputerManager` creates and manages multiple `Computer` instances based on agent configurations.
+
+#### Computer Instance Key
+
+Each computer instance is identified by a unique key:
+
+```python
+key = f"{agent_name}::{process_name}::{root_name}"
+```
+
+**Example:**
+```python
+"host_agent::chrome::default" # Default chrome computer for host_agent
+"host_agent::vscode::custom_config" # Custom VSCode computer for host_agent
+```
+
+#### Configuration Structure
+
+```yaml
+mcp:
+ host_agent:
+ default:
+ data_collection:
+ - namespace: "screenshot"
+ server_type: "local"
+ module: "ufo.client.mcp.local_servers.screenshot"
+ reset: false
+ - namespace: "ui_detection"
+ server_type: "local"
+ module: "ufo.client.mcp.local_servers.ui_detection"
+ reset: false
+ action:
+ - namespace: "gui_automation"
+ server_type: "local"
+ module: "ufo.client.mcp.local_servers.gui_automation"
+ reset: false
+```
+
+**Configuration Requirements**
+
+- Each agent must have at least a `default` root configuration
+- If `root_name` is not found, the manager falls back to `default`
+- Missing configurations will raise a `ValueError`
+
+### 3. CommandRouter Class
+
+The `CommandRouter` executes commands on the appropriate `Computer` instance by routing through the `ComputerManager`.
+
+#### Execution Flow
+
+```mermaid
+graph LR
+ Command --> CommandRouter
+ CommandRouter --> ComputerManager
+ ComputerManager -->|get_or_create| Computer
+ Computer -->|command2tool| ToolCall[MCPToolCall]
+ ToolCall -->|run_actions| Result[MCP Tool Result]
+```
+
+## Initialization
+
+### Computer Initialization
+
+```python
+from ufo.client.computer import Computer
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+# Create MCP server manager
+mcp_manager = MCPServerManager()
+
+# Initialize computer
+computer = Computer(
+ name="my_computer",
+ process_name="my_process",
+ mcp_server_manager=mcp_manager,
+ data_collection_servers_config=[
+ {
+ "namespace": "screenshot",
+ "server_type": "local",
+ "module": "ufo.client.mcp.local_servers.screenshot"
+ }
+ ],
+ action_servers_config=[
+ {
+ "namespace": "gui_automation",
+ "server_type": "local",
+ "module": "ufo.client.mcp.local_servers.gui_automation"
+ }
+ ]
+)
+
+# Async initialization (required)
+await computer.async_init()
+```
+
+> **⚠️ Important:** You **must** call `await computer.async_init()` after creating a `Computer` instance. This registers all MCP servers and their tools asynchronously.
+
+### ComputerManager Initialization
+
+```python
+from ufo.client.computer import ComputerManager
+
+# Load configuration
+with open("config.yaml") as f:
+ configs = yaml.safe_load(f)
+
+# Create manager
+manager = ComputerManager(
+ configs=configs,
+ mcp_server_manager=mcp_manager
+)
+
+# Get or create computer instance
+computer = await manager.get_or_create(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="default"
+)
+```
+
+## Tool Execution
+
+### Basic Tool Execution
+
+```python
+from aip.messages import MCPToolCall
+
+# Create tool call
+tool_call = MCPToolCall(
+ tool_key="data_collection::screenshot",
+ tool_name="screenshot",
+ parameters={"region": "full_screen"}
+)
+
+# Execute tool
+results = await computer.run_actions([tool_call])
+
+# Check result
+if results[0].is_error:
+ print(f"Error: {results[0].content}")
+else:
+ print(f"Success: {results[0].data}")
+```
+
+### Command to Tool Conversion
+
+The `command2tool()` method converts high-level `Command` objects to `MCPToolCall` objects:
+
+```python
+from aip.messages import Command
+
+# Create command
+command = Command(
+ tool_name="screenshot",
+ tool_type="data_collection",
+ parameters={"region": "active_window"}
+)
+
+# Convert to tool call
+tool_call = computer.command2tool(command)
+
+# Execute
+results = await computer.run_actions([tool_call])
+```
+
+If `tool_type` is not specified in the command, the `command2tool()` method will automatically detect whether the tool is registered as `data_collection` or `action`.
+
+### Batch Tool Execution
+
+```python
+# Execute multiple tools sequentially
+tool_calls = [
+ MCPToolCall(tool_key="data_collection::screenshot", tool_name="screenshot"),
+ MCPToolCall(tool_key="data_collection::ui_detection", tool_name="detect_ui"),
+ MCPToolCall(tool_key="action::click", tool_name="click", parameters={"x": 100, "y": 200})
+]
+
+results = await computer.run_actions(tool_calls)
+
+for i, result in enumerate(results):
+ print(f"Tool {i}: {'Success' if not result.is_error else 'Failed'}")
+```
+
+## Thread Isolation & Timeout
+
+### Why Thread Isolation?
+
+MCP tools may contain **blocking operations** (e.g., `time.sleep()`, synchronous I/O) that can block the event loop and cause WebSocket disconnections. To prevent this:
+
+1. Each tool call runs in a **separate thread** with its own event loop
+2. The thread pool has **10 concurrent workers**
+3. Each tool call has a **timeout of 6000 seconds** (100 minutes)
+
+### Implementation Details
+
+```python
+def _call_tool_in_thread():
+ """Execute MCP tool call in isolated thread with its own event loop."""
+ loop = asyncio.new_event_loop()
+ asyncio.set_event_loop(loop)
+ try:
+ async def _do_call():
+ async with Client(server) as client:
+ return await client.call_tool(
+ name=tool_name,
+ arguments=params,
+ raise_on_error=False
+ )
+ return loop.run_until_complete(_do_call())
+ finally:
+ loop.close()
+
+# Execute in thread pool with timeout
+result = await asyncio.wait_for(
+ loop.run_in_executor(self._executor, _call_tool_in_thread),
+ timeout=self._tool_timeout
+)
+```
+
+If a tool execution exceeds 6000 seconds, it will be cancelled and return a timeout error:
+
+```python
+CallToolResult(
+ is_error=True,
+ content=[TextContent(text="Tool execution timed out after 6000s")]
+)
+```
+
+## Meta Tools
+
+Meta tools are **built-in introspection tools** that provide information about the computer's capabilities.
+
+### Registering Meta Tools
+
+Use the `@Computer.meta_tool()` decorator to register a method as a meta tool:
+
+```python
+class Computer:
+ @meta_tool("list_tools")
+ async def list_tools(
+ self,
+ tool_type: Optional[str] = None,
+ namespace: Optional[str] = None,
+ remove_meta: bool = True
+ ) -> CallToolResult:
+ """List all available tools."""
+ # Implementation...
+```
+
+### Using Meta Tools
+
+```python
+# List all action tools
+tool_call = MCPToolCall(
+ tool_key="action::list_tools",
+ tool_name="list_tools",
+ parameters={"tool_type": "action"}
+)
+
+result = await computer.run_actions([tool_call])
+tools = result[0].data # List of available action tools
+```
+
+**Example:**
+
+```python
+# List all tools in "screenshot" namespace
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::list_tools",
+ tool_name="list_tools",
+ parameters={"namespace": "screenshot", "remove_meta": True}
+ )
+])
+
+# Returns: [{"tool_name": "take_screenshot", "description": "...", ...}]
+```
+
+## Dynamic Server Management
+
+### Adding a Server
+
+```python
+from ufo.client.mcp.mcp_server_manager import BaseMCPServer
+
+# Create new MCP server
+new_server = mcp_manager.create_or_get_server(
+ mcp_config={
+ "namespace": "custom_tools",
+ "server_type": "local",
+ "module": "my_custom_mcp_server"
+ },
+ reset=False,
+ process_name="my_process"
+)
+
+# Add to computer
+await computer.add_server(
+ namespace="custom_tools",
+ mcp_server=new_server,
+ tool_type="action"
+)
+```
+
+### Removing a Server
+
+```python
+# Remove server and all its tools
+await computer.delete_server(
+ namespace="custom_tools",
+ tool_type="action"
+)
+```
+
+**Use cases for dynamic server management:**
+
+- Add specialized tools for specific tasks
+- Remove servers to reduce memory footprint
+- Hot-reload MCP servers during development
+
+## Command Routing
+
+The `CommandRouter` orchestrates command execution across multiple computers.
+
+### Basic Usage
+
+```python
+from ufo.client.computer import CommandRouter
+from aip.messages import Command, Result
+
+# Create router
+router = CommandRouter(computer_manager=manager)
+
+# Execute commands
+commands = [
+ Command(tool_name="screenshot", tool_type="data_collection"),
+ Command(tool_name="click", tool_type="action", parameters={"x": 100, "y": 200})
+]
+
+results = await router.execute(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="default",
+ commands=commands,
+ early_exit=True # Stop on first error
+)
+
+for result in results:
+ print(f"Status: {result.status}")
+ print(f"Data: {result.data}")
+```
+
+### Error Handling
+
+```python
+# early_exit=True: Stop on first error
+results = await router.execute(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="default",
+ commands=commands,
+ early_exit=True
+)
+
+# early_exit=False: Execute all commands even if some fail
+results = await router.execute(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="default",
+ commands=commands,
+ early_exit=False
+)
+```
+
+> **⚠️ Warning:** When `early_exit=True`, if a command fails, subsequent commands will **not** be executed, and their results will be set to `ResultStatus.SKIPPED`.
+
+## Tool Registry
+
+The tools registry maintains a mapping of all available tools.
+
+### Tool Key Format
+
+```python
+tool_key = f"{tool_type}::{tool_name}"
+
+# Examples:
+"data_collection::screenshot"
+"action::click"
+"data_collection::list_tools" # Meta tool
+```
+
+### Accessing Tools
+
+```python
+# Get tool info
+tool_info = computer._tools_registry.get("action::click")
+
+# Tool info contains:
+print(tool_info.tool_name) # "click"
+print(tool_info.tool_type) # "action"
+print(tool_info.namespace) # e.g., "gui_automation"
+print(tool_info.description) # Tool description
+print(tool_info.input_schema) # JSON schema for input parameters
+print(tool_info.mcp_server) # Reference to MCP server
+```
+
+## Best Practices
+
+### Configuration
+
+1. **Use namespaces wisely**: Group related tools under meaningful namespaces
+2. **Separate concerns**: Use `data_collection` for read-only operations, `action` for state changes
+3. **Configure timeouts**: Adjust `_tool_timeout` for long-running operations
+4. **Use default root**: Always provide a `default` root configuration as fallback
+
+### Performance Optimization
+
+1. **Register servers in parallel**: The `async_init()` method already does this via `asyncio.gather()`
+2. **Reuse Computer instances**: Let `ComputerManager` cache instances rather than creating new ones
+3. **Limit concurrent tools**: The thread pool has 10 workers; excessive parallel tools may queue
+4. **Reset servers carefully**: Setting `reset=True` in server config will restart the MCP server process
+
+### Common Pitfalls
+
+> **⚠️ Important:** Avoid these common mistakes:
+> - **Forgetting `async_init()`**: Always call after creating a `Computer` instance
+> - **Tool key collisions**: Ensure tool names are unique within each `tool_type`
+> - **Timeout too short**: Some operations (e.g., file downloads) may need longer timeouts
+> - **Blocking in meta tools**: Meta tools should be fast; avoid I/O operations
+
+## Error Handling
+
+### Tool Execution Errors
+
+```python
+try:
+ results = await computer.run_actions([tool_call])
+ if results[0].is_error:
+ error_message = results[0].content[0].text
+ print(f"Tool error: {error_message}")
+except ValueError as e:
+ print(f"Tool not registered: {e}")
+except asyncio.TimeoutError:
+ print("Tool execution timed out")
+except Exception as e:
+ print(f"Unexpected error: {e}")
+```
+
+### Configuration Errors
+
+```python
+try:
+ computer = await manager.get_or_create(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="invalid_root"
+ )
+except ValueError as e:
+ print(f"Configuration error: {e}")
+ # Fallback to default
+ computer = await manager.get_or_create(
+ agent_name="host_agent",
+ process_name="chrome",
+ root_name="default"
+ )
+```
+
+## Integration Points
+
+### With UFO Client
+
+The `Computer` is created and managed by the `UFOClient`:
+
+```python
+# In UFOClient
+self.command_router = CommandRouter(computer_manager)
+
+# Execute commands from server
+results = await self.command_router.execute(
+ agent_name=self.agent_name,
+ process_name=self.process_name,
+ root_name=self.root_name,
+ commands=command_list
+)
+```
+
+### With MCP Server Manager
+
+The `Computer` relies on `MCPServerManager` for server lifecycle management:
+
+```python
+# Create or get existing MCP server
+mcp_server = self.mcp_server_manager.create_or_get_server(
+ mcp_config=server_config,
+ reset=False,
+ process_name=self._process_name
+)
+```
+
+See [MCP Integration](mcp_integration.md) for more details on MCP server management.
+
+## Related Documentation
+
+- [UFO Client Overview](overview.md) - High-level client architecture
+- [UFO Client](ufo_client.md) - Command execution orchestration
+- [Computer Manager](computer_manager.md) - Multi-computer instance management
+- [MCP Integration](mcp_integration.md) - MCP server details
+- [AIP Messages](../aip/messages.md) - Command and Result message formats
diff --git a/documents/docs/client/computer_manager.md b/documents/docs/client/computer_manager.md
new file mode 100644
index 000000000..67bc888f0
--- /dev/null
+++ b/documents/docs/client/computer_manager.md
@@ -0,0 +1,559 @@
+# Computer Manager & Computer
+
+The **Computer Manager** orchestrates multiple **Computer** instances, each representing an isolated execution namespace with dedicated MCP servers and tools. This enables context-specific tool routing and fine-grained control over data collection vs. action execution.
+
+---
+
+## Overview
+
+The Computer layer consists of two components working together:
+
+- **ComputerManager**: High-level orchestrator managing multiple Computer instances
+- **Computer**: Individual execution namespace with its own MCP servers and tool registry
+
+### Computer Manager Responsibilities
+
+| Capability | Description | Implementation |
+|------------|-------------|----------------|
+| **Multi-Computer Management** | Create and manage multiple Computer instances | Per-process, per-agent namespaces |
+| **Namespace Isolation** | Separate tool namespaces for different contexts | Independent MCP servers per Computer |
+| **Command Routing** | Route commands to appropriate Computer instances | CommandRouter resolves by agent/process/root |
+| **MCP Server Configuration** | Configure data collection and action servers | Config-driven server initialization |
+| **Lifecycle Management** | Initialize, reset, and tear down Computers | Async initialization, cascading reset |
+
+### Computer (Instance) Responsibilities
+
+| Capability | Description | Implementation |
+|------------|-------------|----------------|
+| **Tool Registry** | Maintain registry of available MCP tools | `_tools_registry` dict |
+| **Tool Execution** | Execute MCP tool calls with timeout protection | Thread pool isolation (max 10 workers) |
+| **Server Management** | Manage data collection and action MCP servers | Separate namespaces |
+| **Meta Tools** | Provide built-in tools (list_tools, etc.) | Decorated meta tool methods |
+| **Async Initialization** | Initialize MCP servers asynchronously | `async_init()` |
+
+**Architectural Relationship:**
+
+```mermaid
+graph TB
+ subgraph "Computer Manager Layer"
+ CM[Computer Manager]
+ CR[Command Router]
+ end
+
+ subgraph "Computer Instances"
+ C1[Computer: default]
+ C2[Computer: notepad.exe]
+ C3[Computer: explorer.exe]
+ end
+
+ subgraph "Computer 1 Components"
+ C1 --> DC1[Data Collection Servers]
+ C1 --> AS1[Action Servers]
+ C1 --> TR1[Tool Registry]
+ C1 --> MT1[Meta Tools]
+ end
+
+ CM -->|manages| C1
+ CM -->|manages| C2
+ CM -->|manages| C3
+ CR -->|routes to| C1
+ CR -->|routes to| C2
+ CR -->|routes to| C3
+
+ style CM fill:#ffe0b2
+ style C1 fill:#bbdefb
+ style C2 fill:#bbdefb
+ style C3 fill:#bbdefb
+```
+
+---
+
+## 🏗️ Computer Manager Architecture
+
+### Computer Instance Management
+
+```mermaid
+graph LR
+ subgraph "ComputerManager"
+ Config[UFO Config]
+ Registry[Computer Registry]
+ end
+
+ subgraph "Computers"
+ Default[default_agent]
+ Proc1[notepad.exe]
+ Proc2[explorer.exe]
+ end
+
+ Config -->|creates| Default
+ Config -->|creates| Proc1
+ Config -->|creates| Proc2
+
+ Registry -->|tracks| Default
+ Registry -->|tracks| Proc1
+ Registry -->|tracks| Proc2
+
+ style Config fill:#fff3e0
+ style Registry fill:#e1f5fe
+```
+
+**Computer Namespaces:**
+
+| Namespace Type | Purpose | Example |
+|----------------|---------|---------|
+| **Data Collection** | Gathering information, non-invasive queries | Screenshots, UI element detection, app state |
+| **Action** | Performing actions, invasive operations | GUI automation, file operations, app control |
+
+Data collection tools are designed for non-invasive information gathering, while action tools have full control for state-changing operations.
+
+---
+
+## Computer Manager Architecture
+
+## 🖥️ Computer (Instance) Architecture
+
+### Internal Structure
+
+```mermaid
+graph TB
+ subgraph "Computer Instance"
+ Init[Initialization]
+ Servers[MCP Servers]
+ Registry[Tool Registry]
+ Execution[Tool Execution]
+ end
+
+ subgraph "MCP Servers"
+ Servers --> DC[Data Collection Servers]
+ Servers --> AS[Action Servers]
+ end
+
+ subgraph "Tool Registry"
+ Registry --> TR[_tools_registry Dict]
+ TR -->|key: action::click| T1[MCPToolCall]
+ TR -->|key: data_collection::screenshot| T2[MCPToolCall]
+ TR -->|key: action::list_tools| T3[Meta Tool]
+ end
+
+ subgraph "Execution Engine"
+ Execution --> TP[Thread Pool Executor]
+ Execution --> TO[Timeout Protection]
+ TP -->|max 10 workers| Threads[Isolated Threads]
+ end
+
+ Init --> Servers
+ Servers --> Registry
+ Registry --> Execution
+
+ style Init fill:#c8e6c9
+ style Servers fill:#bbdefb
+ style Registry fill:#fff9c4
+ style Execution fill:#ffccbc
+```
+
+**Key Attributes:**
+
+| Attribute | Type | Purpose |
+|-----------|------|---------|
+| `_name` | `str` | Computer name (identifier) |
+| `_process_name` | `str` | Associated process (e.g., "notepad.exe") |
+| `_data_collection_servers` | `Dict[str, BaseMCPServer]` | Namespace → MCP server mapping (data collection) |
+| `_action_servers` | `Dict[str, BaseMCPServer]` | Namespace → MCP server mapping (actions) |
+| `_tools_registry` | `Dict[str, MCPToolCall]` | Tool key → tool info mapping |
+| `_meta_tools` | `Dict[str, Callable]` | Built-in meta tools |
+| `_executor` | `ThreadPoolExecutor` | Thread pool for tool execution (10 workers) |
+| `_tool_timeout` | `int` | Tool execution timeout: **6000 seconds (100 minutes)** |
+
+> **Note:** The tool execution timeout is 6000 seconds (100 minutes), allowing for very long-running operations while preventing indefinite hangs.
+
+---
+
+## Initialization
+
+### Computer Manager Initialization
+
+**Creating Computer Manager:**
+
+```python
+from ufo.client.computer import ComputerManager
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+from config.config_loader import get_ufo_config
+
+# 1. Get UFO configuration
+ufo_config = get_ufo_config()
+
+# 2. Initialize MCP server manager
+mcp_server_manager = MCPServerManager()
+
+# 3. Create computer manager
+computer_manager = ComputerManager(
+ ufo_config.to_dict(),
+ mcp_server_manager
+)
+```
+
+### Computer Instance Initialization
+
+**Computer Async Initialization:**
+
+```python
+computer = Computer(
+ name="default_agent",
+ process_name="explorer.exe",
+ mcp_server_manager=mcp_server_manager,
+ data_collection_servers_config=[...],
+ action_servers_config=[...]
+)
+
+# Async initialization (required)
+await computer.async_init()
+```
+
+**Initialization Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Code
+ participant Computer
+ participant MCP as MCP Server Manager
+ participant Servers
+
+ Code->>Computer: __init__(name, process, configs)
+ Computer->>Computer: Create thread pool executor
+ Computer->>Computer: Register meta tools
+
+ Code->>Computer: async_init()
+ Computer->>Computer: _init_data_collection_servers()
+ Computer->>MCP: create_or_get_server(config)
+ MCP-->>Computer: BaseMCPServer
+
+ Computer->>Computer: _init_action_servers()
+ Computer->>MCP: create_or_get_server(config)
+ MCP-->>Computer: BaseMCPServer
+
+ par Register Data Collection Servers
+ Computer->>Servers: register_mcp_servers(data_collection)
+ and Register Action Servers
+ Computer->>Servers: register_mcp_servers(action)
+ end
+
+ Servers-->>Computer: Tools registered
+```
+
+**Configuration Example:**
+
+```yaml
+data_collection_servers:
+ - namespace: screenshot_collector
+ type: local
+ module: ufo.client.mcp.local_servers.screenshot_server
+ reset: false
+ - namespace: ui_collector
+ type: local
+ module: ufo.client.mcp.local_servers.ui_server
+ reset: false
+
+action_servers:
+ - namespace: gui_automator
+ type: local
+ module: ufo.client.mcp.local_servers.automation_server
+ reset: false
+```
+
+---
+
+## 🔀 Command Routing
+
+### CommandRouter
+
+The CommandRouter resolves which Computer instance should handle each command based on agent/process/root context.
+
+**Routing Signature:**
+
+```python
+async def execute(
+ self,
+ agent_name: str,
+ process_name: str,
+ root_name: str,
+ commands: List[Command]
+) -> List[Result]
+```
+
+**Routing Logic:**
+
+```mermaid
+graph TD
+ Start[Command List]
+ Start --> Resolve[Resolve Computer Instance]
+ Resolve -->|agent_name, process_name, root_name| Computer[Get/Create Computer]
+
+ Computer --> Loop[For Each Command]
+ Loop --> Parse[Parse Command to MCPToolCall]
+ Parse --> Lookup[Lookup Tool in Registry]
+
+ Lookup -->|Found| Execute[Execute Tool]
+ Lookup -->|Not Found| Error[Return Error Result]
+
+ Execute --> Timeout[Tool Execution with Timeout]
+ Timeout -->|Success| Result[Return Result]
+ Timeout -->|Timeout| TimeoutError[Timeout Error Result]
+ Timeout -->|Exception| ExecError[Execution Error Result]
+
+ Result --> Collect[Collect Results]
+ Error --> Collect
+ TimeoutError --> Collect
+ ExecError --> Collect
+
+ Collect --> Return[Return List[Result]]
+
+ style Start fill:#e1f5fe
+ style Computer fill:#bbdefb
+ style Execute fill:#c8e6c9
+ style Collect fill:#fff9c4
+```
+
+---
+
+## 🔧 Tool Execution
+
+### Tool Execution Pipeline
+
+MCP tools are executed in isolated threads to prevent blocking operations (like `time.sleep`) from blocking the main event loop and causing WebSocket disconnections.
+
+**Execution Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Computer
+ participant TP as Thread Pool
+ participant Thread
+ participant Loop as New Event Loop
+ participant MCP as MCP Server
+
+ Computer->>Computer: _run_action(tool_call)
+ Computer->>Computer: Lookup tool in registry
+
+ alt Meta Tool
+ Computer->>Computer: Execute meta tool directly
+ Computer-->>Computer: Result
+ else MCP Tool
+ Computer->>TP: Submit _call_tool_in_thread()
+ TP->>Thread: Execute in thread
+ Thread->>Loop: Create new event loop
+ Loop->>MCP: client.call_tool(name, params)
+
+ alt Success (within timeout)
+ MCP-->>Loop: Result
+ Loop-->>Thread: Result
+ Thread-->>TP: Result
+ TP-->>Computer: CallToolResult
+ else Timeout (> 6000s)
+ Note over Computer,MCP: Tool execution timeout
+ Computer-->>Computer: TimeoutError Result
+ else Exception
+ Note over Computer,MCP: Tool execution failed
+ Computer-->>Computer: Error Result
+ end
+ end
+```
+
+**Thread Pool Configuration:**
+
+| Parameter | Value | Purpose |
+|-----------|-------|---------|
+| `max_workers` | **10** | Maximum concurrent tool executions |
+| `thread_name_prefix` | `"mcp_tool_"` | Thread naming for debugging |
+| Timeout | **6000 seconds (100 minutes)** | Per-tool execution timeout |
+
+**Code Implementation:**
+
+```python
+def _call_tool_in_thread():
+ """
+ Execute MCP tool call in an isolated thread with its own event loop.
+ This prevents blocking operations in MCP tools from blocking the main event loop.
+ """
+ # Create a new event loop for this thread
+ loop = asyncio.new_event_loop()
+ asyncio.set_event_loop(loop)
+ try:
+ async def _do_call():
+ async with Client(server) as client:
+ return await client.call_tool(
+ name=tool_name, arguments=params, raise_on_error=False
+ )
+ return loop.run_until_complete(_do_call())
+ finally:
+ loop.close()
+
+# Execute in thread pool with timeout protection
+result = await asyncio.wait_for(
+ loop.run_in_executor(self._executor, _call_tool_in_thread),
+ timeout=self._tool_timeout
+)
+```
+
+---
+
+## 🛠️ Tool Registry
+
+### Tool Registration
+
+Tools are discovered from MCP servers during initialization and registered with unique keys.
+
+**Tool Key Format:**
+
+```
+::
+
+Examples:
+- action::click
+- action::type_text
+- data_collection::screenshot
+- data_collection::get_ui_elements
+```
+
+**Registration Process:**
+
+```python
+async def register_one_mcp_server(
+ self, namespace: str, tool_type: str, mcp_server: BaseMCPServer
+) -> None:
+ async with Client(mcp_server.server) as client:
+ tools = await client.list_tools()
+
+ for tool in tools:
+ tool_key = self.make_tool_key(tool_type, tool.name)
+
+ self._register_tool(
+ tool_key=tool_key,
+ tool_name=tool.name,
+ title=tool.title,
+ namespace=namespace,
+ tool_type=tool_type,
+ description=tool.description,
+ input_schema=tool.inputSchema,
+ output_schema=tool.outputSchema,
+ mcp_server=mcp_server
+ )
+```
+
+**MCPToolCall Structure:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `tool_key` | `str` | Unique key (e.g., "action::click") |
+| `tool_name` | `str` | Tool name (e.g., "click") |
+| `title` | `str` | Display title |
+| `namespace` | `str` | Server namespace |
+| `tool_type` | `str` | "action" or "data_collection" |
+| `description` | `str` | Tool description |
+| `input_schema` | `Dict` | Input parameters schema |
+| `output_schema` | `Dict` | Output schema |
+| `mcp_server` | `BaseMCPServer` | Reference to server |
+
+---
+
+## Meta Tools
+
+Meta tools are built-in methods decorated with `@meta_tool` that provide computer-level operations.
+
+**Example: list_tools Meta Tool**
+
+```python
+@Computer.meta_tool("list_tools")
+async def list_tools(
+ self,
+ tool_type: Optional[str] = None,
+ namespace: Optional[str] = None,
+ remove_meta: bool = True
+) -> CallToolResult:
+ """
+ Get available tools of a specific type.
+ """
+ tools = []
+
+ for tool in self._tools_registry.values():
+ if ((tool_type is None or tool.tool_type == tool_type)
+ and (namespace is None or tool.namespace == namespace)
+ and (not remove_meta or tool.tool_name not in self._meta_tools)):
+ tools.append(tool.tool_info.model_dump())
+
+ return CallToolResult(
+ content=[TextContent(type="text", text=json.dumps(tools))]
+ )
+```
+
+**Meta Tool Registration:**
+
+```python
+# In __init__:
+for attr in dir(self):
+ method = getattr(self, attr)
+ if callable(method) and hasattr(method, "_meta_tool_name"):
+ name = getattr(method, "_meta_tool_name")
+ self._meta_tools[name] = method
+```
+
+---
+
+## 🔄 Lifecycle Management
+
+### Reset
+
+```python
+# Computer Manager reset (cascades to all computers)
+computer_manager.reset()
+
+# Computer instance reset
+computer.reset()
+```
+
+**Reset Operations:**
+
+| Component | Reset Action |
+|-----------|--------------|
+| Computer Manager | Reset all Computer instances |
+| Computer | Clear tool registry, reset MCP servers |
+| MCP Servers | Reset server state |
+
+---
+
+## Best Practices
+
+### Monitor Tool Execution Times
+
+```python
+import time
+start = time.time()
+result = await computer._run_action(tool_call)
+duration = time.time() - start
+if duration > 300: # 5 minutes
+ logger.warning(f"Slow tool: {tool_call.tool_name} took {duration}s")
+```
+
+### Handle Timeouts Gracefully
+
+```python
+# 100-minute timeout is generous but not infinite
+# Design tools to complete within reasonable time
+```
+
+### Use Namespace Isolation
+
+```python
+# Separate data collection from actions
+data_tools = await computer.list_tools(tool_type="data_collection")
+action_tools = await computer.list_tools(tool_type="action")
+```
+
+---
+
+## 🚀 Next Steps
+
+👉 [Device Info Provider](./device_info.md) - System profiling
+👉 [MCP Integration](./mcp_integration.md) - MCP server details
+👉 [UFO Client](./ufo_client.md) - Execution orchestration
+👉 [Quick Start](./quick_start.md) - Get started with client
+👉 [Configuration](../configuration/system/overview.md) - UFO configuration
diff --git a/documents/docs/client/device_info.md b/documents/docs/client/device_info.md
new file mode 100644
index 000000000..90483a7d6
--- /dev/null
+++ b/documents/docs/client/device_info.md
@@ -0,0 +1,485 @@
+# 📱 Device Info Provider
+
+The **Device Info Provider** collects comprehensive system information from client devices during registration, enabling intelligent task assignment and device selection in constellation (multi-device) scenarios.
+
+Device information is proactively collected during client registration and pushed to the server, reducing latency and enabling immediate task routing decisions.
+
+---
+
+## 📋 Overview
+
+**Core Capabilities:**
+
+| Capability | Description | Use Case |
+|------------|-------------|----------|
+| **System Detection** | Auto-detect OS, version, architecture | Platform-specific task routing |
+| **Hardware Profiling** | CPU count, memory capacity | Resource-aware task assignment |
+| **Network Discovery** | Hostname, IP address | Network topology mapping |
+| **Feature Detection** | GUI, CLI, browser, office apps | Capability-based device selection |
+| **Extensibility** | Custom metadata support | Environment-specific configuration |
+
+**Supported Platforms:**
+
+| Platform | Status | Features Detected |
+|----------|--------|-------------------|
+| **Windows** | ✅ Full Support | GUI, CLI, browser, file system, office, Windows apps |
+| **Linux** | ✅ Full Support | GUI, CLI, browser, file system, office, Linux apps |
+| **macOS** | ✅ Full Support | GUI, CLI, browser, file system, office |
+| **Mobile** | 🔮 Planned | Touch, mobile apps, sensors |
+| **IoT** | 🔮 Planned | Sensors, actuators, limited resources |
+
+---
+
+## 🏗️ Architecture
+
+### DeviceSystemInfo Dataclass
+
+The device info structure captures essential information to minimize registration overhead:
+
+```mermaid
+classDiagram
+ class DeviceSystemInfo {
+ +string device_id
+ +string platform
+ +string os_version
+ +int cpu_count
+ +float memory_total_gb
+ +string hostname
+ +string ip_address
+ +List~string~ supported_features
+ +string platform_type
+ +string schema_version
+ +Dict custom_metadata
+ +to_dict() Dict
+ }
+
+ class DeviceInfoProvider {
+ +collect_system_info() DeviceSystemInfo
+ -_get_platform() string
+ -_get_os_version() string
+ -_get_cpu_count() int
+ -_get_memory_total_gb() float
+ -_get_hostname() string
+ -_get_ip_address() string
+ -_detect_features() List~string~
+ -_get_platform_type() string
+ }
+
+ DeviceInfoProvider ..> DeviceSystemInfo : creates
+```
+
+**Field Reference:**
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `device_id` | `str` | Unique client identifier | `"device_windows_001"` |
+| `platform` | `str` | OS platform (lowercase) | `"windows"`, `"linux"`, `"darwin"` |
+| `os_version` | `str` | OS version string | `"10.0.19045"` (Windows 10) |
+| `cpu_count` | `int` | Number of CPU cores | `8` |
+| `memory_total_gb` | `float` | Total RAM in GB (rounded to 2 decimals) | `16.0` |
+| `hostname` | `str` | Network hostname | `"DESKTOP-ABC123"` |
+| `ip_address` | `str` | Local IP address | `"192.168.1.100"` |
+| `supported_features` | `List[str]` | Detected capabilities | `["gui", "cli", "browser", "office"]` |
+| `platform_type` | `str` | Device category | `"computer"`, `"mobile"`, `"web"`, `"iot"` |
+| `schema_version` | `str` | Schema version for compatibility | `"1.0"` |
+| `custom_metadata` | `Dict` | User-defined metadata | `{"environment": "production"}` |
+
+---
+
+## 🔍 Collection Process
+
+### Automatic Collection
+
+```python
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+# Collect system information
+system_info = DeviceInfoProvider.collect_system_info(
+ client_id="device_windows_001",
+ custom_metadata=None # Or load from config
+)
+
+# Result: DeviceSystemInfo object
+print(system_info.platform) # "windows"
+print(system_info.cpu_count) # 8
+print(system_info.memory_total_gb) # 16.0
+print(system_info.supported_features) # ["gui", "cli", "browser", ...]
+
+# Convert to dict for transmission
+device_dict = system_info.to_dict()
+```
+
+**Collection Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Client
+ participant DIP as Device Info Provider
+ participant OS as Operating System
+
+ Client->>DIP: collect_system_info(client_id, custom_metadata)
+
+ par Collect Basic Info
+ DIP->>OS: platform.system()
+ OS-->>DIP: "Windows"
+
+ DIP->>OS: platform.version()
+ OS-->>DIP: "10.0.19045"
+ and Collect Hardware Info
+ DIP->>OS: os.cpu_count()
+ OS-->>DIP: 8
+
+ DIP->>OS: psutil.virtual_memory()
+ OS-->>DIP: 16GB
+ and Collect Network Info
+ DIP->>OS: socket.gethostname()
+ OS-->>DIP: "DESKTOP-ABC123"
+
+ DIP->>OS: socket.getsockname()
+ OS-->>DIP: "192.168.1.100"
+ end
+
+ DIP->>DIP: _detect_features()
+ DIP->>DIP: _get_platform_type()
+
+ DIP-->>Client: DeviceSystemInfo
+```
+
+---
+
+## 🎯 Feature Detection
+
+### Platform-Specific Features
+
+Features are automatically detected based on the platform to enable capability-based device selection.
+
+**Windows Features:**
+
+```python
+features = [
+ "gui", # Graphical user interface
+ "cli", # Command line interface
+ "browser", # Web browser support
+ "file_system", # File system operations
+ "office", # Office applications (Word, Excel, etc.)
+ "windows_apps" # Windows-specific applications
+]
+```
+
+**Linux Features:**
+
+```python
+features = [
+ "gui", # Graphical user interface (X11/Wayland)
+ "cli", # Bash/shell
+ "browser", # Firefox, Chrome, etc.
+ "file_system", # Linux file system
+ "office", # LibreOffice, etc.
+ "linux_apps" # Linux-specific applications
+]
+```
+
+**macOS Features:**
+
+```python
+features = [
+ "gui", # macOS GUI
+ "cli", # Terminal
+ "browser", # Safari, Chrome, etc.
+ "file_system", # macOS file system
+ "office" # Office for Mac
+]
+```
+
+**Feature Detection Logic:**
+
+| Platform | Detected Features | Rationale |
+|----------|-------------------|-----------|
+| `windows`, `linux`, `darwin` | GUI, CLI, browser, file_system, office | Desktop/laptop computers have full capabilities |
+| `android`, `ios` (future) | Touch, mobile apps, camera | Mobile-specific features |
+| Custom | User-defined | Extensible via custom_metadata |
+
+---
+
+## 💡 Usage Examples
+
+### Basic Collection
+
+```python
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+# Collect with auto-detection
+info = DeviceInfoProvider.collect_system_info(
+ client_id="device_001",
+ custom_metadata=None
+)
+
+print(f"Platform: {info.platform}")
+print(f"CPU Cores: {info.cpu_count}")
+print(f"Memory: {info.memory_total_gb} GB")
+print(f"Features: {', '.join(info.supported_features)}")
+```
+
+### With Custom Metadata
+
+```python
+# Add environment-specific metadata
+custom_meta = {
+ "environment": "production",
+ "datacenter": "us-east-1",
+ "role": "automation_worker",
+ "team": "qa"
+}
+
+info = DeviceInfoProvider.collect_system_info(
+ client_id="device_prod_001",
+ custom_metadata=custom_meta
+)
+
+# Custom metadata is preserved
+print(info.custom_metadata["environment"]) # "production"
+```
+
+### JSON Serialization
+
+```python
+# Convert to dictionary for transmission
+device_dict = info.to_dict()
+
+# Serialize to JSON
+import json
+json_str = json.dumps(device_dict, indent=2)
+
+# Example output:
+# {
+# "device_id": "device_001",
+# "platform": "windows",
+# "os_version": "10.0.19045",
+# "cpu_count": 8,
+# "memory_total_gb": 16.0,
+# "hostname": "DESKTOP-ABC123",
+# "ip_address": "192.168.1.100",
+# "supported_features": ["gui", "cli", "browser", "file_system", "office", "windows_apps"],
+# "platform_type": "computer",
+# "schema_version": "1.0",
+# "custom_metadata": {}
+# }
+```
+
+---
+
+## ⚠️ Error Handling
+
+### Graceful Degradation
+
+If any detection method fails, the provider returns minimal info instead of crashing.
+
+**Error Handling Strategy:**
+
+```python
+try:
+ # Attempt full collection
+ return DeviceSystemInfo(...)
+except Exception as e:
+ logger.error(f"Error collecting system info: {e}", exc_info=True)
+ # Return minimal info on error
+ return DeviceSystemInfo(
+ device_id=client_id,
+ platform="unknown",
+ os_version="unknown",
+ cpu_count=0,
+ memory_total_gb=0.0,
+ hostname="unknown",
+ ip_address="unknown",
+ supported_features=[],
+ platform_type="unknown",
+ custom_metadata=custom_metadata or {}
+ )
+```
+
+**Individual Method Failures:**
+
+| Method | Failure Behavior | Fallback Value |
+|--------|------------------|----------------|
+| `_get_platform()` | Catch exception | `"unknown"` |
+| `_get_os_version()` | Catch exception | `"unknown"` |
+| `_get_cpu_count()` | Catch exception | `0` |
+| `_get_memory_total_gb()` | psutil not installed or exception | `0.0` |
+| `_get_hostname()` | Catch exception | `"unknown"` |
+| `_get_ip_address()` | Primary method fails | Try hostname resolution, then `"unknown"` |
+
+---
+
+## 🔧 Memory Detection Details
+
+### psutil Dependency
+
+!!!warning "Optional Dependency"
+ Memory detection requires `psutil`. If not installed, memory will be reported as `0.0`.
+
+**Installation:**
+
+```bash
+pip install psutil
+```
+
+**Detection Code:**
+
+```python
+@staticmethod
+def _get_memory_total_gb() -> float:
+ """Get total memory in GB"""
+ try:
+ import psutil
+ total_memory = psutil.virtual_memory().total
+ return round(total_memory / (1024**3), 2) # Convert to GB, round to 2 decimals
+ except ImportError:
+ logger.warning("psutil not installed, memory info unavailable")
+ return 0.0
+ except Exception:
+ return 0.0
+```
+
+---
+
+## 🌐 IP Address Detection
+
+### Multi-Method Approach
+
+!!!tip "Robust IP Detection"
+ IP detection uses a two-stage approach for reliability.
+
+**Primary Method (Socket Connection):**
+
+```python
+# Connect to external address (doesn't actually send data)
+s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+s.connect(("8.8.8.8", 80)) # Google DNS
+ip = s.getsockname()[0]
+s.close()
+```
+
+**Fallback Method (Hostname Resolution):**
+
+```python
+# If primary fails, resolve via hostname
+ip = socket.gethostbyname(socket.gethostname())
+```
+
+**Final Fallback:**
+
+```python
+# If all methods fail
+return "unknown"
+```
+
+---
+
+## 🚀 Integration Points
+
+### WebSocket Client Registration
+
+The WebSocket client uses the Device Info Provider during registration:
+
+```python
+# In websocket client's register_client()
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+system_info = DeviceInfoProvider.collect_system_info(
+ self.ufo_client.client_id,
+ custom_metadata=None
+)
+
+metadata = {
+ "system_info": system_info.to_dict(),
+ "registration_time": datetime.now(timezone.utc).isoformat()
+}
+
+await self.registration_protocol.register_as_device(
+ device_id=self.ufo_client.client_id,
+ metadata=metadata,
+ platform=self.ufo_client.platform
+)
+```
+
+See [WebSocket Client](./websocket_client.md) for complete registration flow details.
+
+### Agent Server
+
+The server receives device info during registration and stores it in the agent profile:
+
+```python
+# Server-side AgentProfile integration
+device_info = registration_data["metadata"]["system_info"]
+agent_profile.add_device(device_id, device_info)
+```
+
+See [Server Quick Start](../server/quick_start.md) for server-side processing details.
+
+---
+
+## ✅ Best Practices
+
+**1. Add Custom Metadata for Environment Tracking**
+
+```python
+custom_meta = {
+ "environment": os.getenv("ENVIRONMENT", "development"),
+ "version": "1.0.0",
+ "deployment_region": "us-west-2",
+ "cost_center": "engineering"
+}
+
+system_info = DeviceInfoProvider.collect_system_info(
+ client_id="device_001",
+ custom_metadata=custom_meta
+)
+```
+
+**2. Install psutil for Accurate Memory Detection**
+
+```bash
+pip install psutil
+```
+
+**3. Use Descriptive Client IDs**
+
+```python
+# Include environment and location in client_id
+client_id = f"device_{platform}_{env}_{location}_{instance_id}"
+# Example: "device_windows_prod_us-west_001"
+```
+
+**4. Log Collection Results**
+
+```python
+system_info = DeviceInfoProvider.collect_system_info(...)
+
+logger.info(
+ f"Collected device info: "
+ f"platform={system_info.platform}, "
+ f"cpu={system_info.cpu_count}, "
+ f"memory={system_info.memory_total_gb}GB, "
+ f"features={system_info.supported_features}"
+)
+```
+
+**5. Validate Before Sending**
+
+```python
+system_info = DeviceInfoProvider.collect_system_info(...)
+
+# Validate essential fields
+assert system_info.device_id, "Device ID required"
+assert system_info.platform != "unknown", "Platform detection failed"
+assert system_info.cpu_count > 0, "CPU detection failed"
+```
+
+---
+
+## 🚀 Next Steps
+
+- [WebSocket Client](./websocket_client.md) - See how device info is used in registration
+- [Quick Start](./quick_start.md) - Connect your device to the server
+- [MCP Integration](./mcp_integration.md) - Understand client tool capabilities
+- [Server Quick Start](../server/quick_start.md) - Learn server-side registration processing
diff --git a/documents/docs/client/mcp_integration.md b/documents/docs/client/mcp_integration.md
new file mode 100644
index 000000000..5170e0d6b
--- /dev/null
+++ b/documents/docs/client/mcp_integration.md
@@ -0,0 +1,433 @@
+# 🔌 MCP Integration
+
+**MCP (Model Context Protocol)** provides the tool execution layer in UFO clients, enabling agents to collect system state and execute actions through a standardized interface. This page provides a **client-focused overview** of how MCP integrates into the client architecture.
+
+**Related Documentation:**
+
+- [MCP Overview](../mcp/overview.md) - Core MCP concepts and architecture
+- [Configuration Guide](../mcp/configuration.md) - Server configuration details
+- [Data Collection Servers](../mcp/data_collection.md) - Observation tools
+- [Action Servers](../mcp/action.md) - Execution tools
+- [Creating MCP Servers](../tutorials/creating_mcp_servers.md) - Build custom tools
+
+---
+
+## 🏗️ MCP in Client Architecture
+
+### Role in the Client Stack
+
+```mermaid
+graph TB
+ Server[Agent Server via WebSocket]
+ Client[UFO Client Session Orchestration]
+ Router[Command Router Command Execution]
+ Computer[Computer MCP Tool Manager]
+ MCPMgr[MCP Server Manager Server Lifecycle]
+
+ DataServers[Data Collection Servers UICollector, etc.]
+ ActionServers[Action Servers UIExecutor, CommandLineExecutor]
+
+ Server -->|AIP Commands| Client
+ Client -->|Execute Actions| Router
+ Router -->|Route to Computer| Computer
+ Computer -->|Manage Servers| MCPMgr
+ Computer -->|Register & Execute| DataServers
+ Computer -->|Register & Execute| ActionServers
+
+ style Computer fill:#e1f5ff
+ style MCPMgr fill:#fff4e6
+ style DataServers fill:#e8f5e9
+ style ActionServers fill:#fff3e0
+```
+
+**Key Components:**
+
+| Component | Location | Responsibility |
+|-----------|----------|----------------|
+| **Computer** | `ufo.client.computer.Computer` | Manages MCP servers, routes tool calls, executes in thread pool |
+| **MCP Server Manager** | `ufo.client.mcp.mcp_server_manager.MCPServerManager` | Creates/manages server instances (local/http/stdio) |
+| **Command Router** | `ufo.client.computer.CommandRouter` | Routes commands to appropriate Computer instances |
+| **Data Collection Servers** | Various MCP servers | Tools for gathering system state (read-only) |
+| **Action Servers** | Various MCP servers | Tools for performing state changes |
+
+---
+
+## 🔄 Client-MCP Integration Flow
+
+### End-to-End Execution
+
+```mermaid
+sequenceDiagram
+ participant Server as Agent Server
+ participant Client as UFO Client
+ participant Router as Command Router
+ participant Computer as Computer
+ participant MCP as MCP Server
+
+ Server->>Client: AIP Command (tool_name, parameters)
+ Client->>Router: execute_actions(commands)
+ Router->>Computer: command2tool()
+ Computer->>Computer: Convert to MCPToolCall
+ Router->>Computer: run_actions([tool_call])
+ Computer->>MCP: call_tool(tool_name, parameters)
+ MCP-->>Computer: CallToolResult
+ Computer-->>Router: Results
+ Router-->>Client: List[Result]
+ Client-->>Server: AIP Result message
+```
+
+**Execution Stages:**
+
+| Stage | Component | Description |
+|-------|-----------|-------------|
+| **1. Command Reception** | UFO Client | Receives AIP Command from server |
+| **2. Command Routing** | Command Router | Routes to appropriate Computer instance |
+| **3. Command Conversion** | Computer | AIP Command → MCPToolCall |
+| **4. Tool Execution** | Computer | Executes tool via MCP Server |
+| **5. Result Return** | UFO Client | Packages result for server |
+
+---
+
+## 💻 Computer: The MCP Manager
+
+### Computer Class Overview
+
+The `Computer` class is the **client-side MCP manager**, handling server registration, tool discovery, and execution.
+
+**Core Responsibilities:**
+
+```python
+from ufo.client.computer import Computer
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+# Initialize Computer with MCP servers
+computer = Computer(
+ name="notepad_computer",
+ process_name="notepad.exe",
+ mcp_server_manager=mcp_manager,
+ data_collection_servers_config=[
+ {"namespace": "UICollector", "type": "local", "reset": False}
+ ],
+ action_servers_config=[
+ {"namespace": "HostUIExecutor", "type": "local", "reset": False}
+ ]
+)
+
+# Async initialization registers all tools
+await computer.async_init()
+```
+
+**Initialization Sequence:**
+
+| Step | Action | Result |
+|------|--------|--------|
+| 1. Create MCP Server Manager | Initialize server lifecycle manager | Ready to create servers |
+| 2. Initialize data_collection servers | Register observation tools | UICollector ready |
+| 3. Initialize action servers | Register execution tools | HostUIExecutor, CommandLineExecutor ready |
+| 4. Register MCP servers | Query each server for tools | Tool registry populated |
+
+See [Computer](./computer.md) for detailed class documentation.
+
+---
+
+## 🛠️ Two Server Types
+
+### Data Collection vs Action
+
+Understanding the difference between server types is essential for proper MCP usage:
+
+**Comparison:**
+
+| Aspect | Data Collection Servers | Action Servers |
+|--------|------------------------|----------------|
+| **Purpose** | Observe system state | Modify system state |
+| **Examples** | `take_screenshot`, `detect_ui_elements` | `click`, `type_text`, `run_command` |
+| **Invocation** | LLM-selected tools | LLM-selected tools |
+| **Side Effects** | ❌ None (read-only) | ✅ Yes (state changes) |
+| **Namespace** | `"data_collection"` | `"action"` |
+| **Tool Key Format** | `data_collection::tool_name` | `action::tool_name` |
+
+**Data Collection Example:**
+
+```python
+# Example: Take screenshot for UI analysis
+result = await computer.run_actions([
+ computer.command2tool(Command(
+ tool_name="take_screenshot",
+ tool_type="data_collection",
+ parameters={"region": "active_window"}
+ ))
+])
+```
+
+**Action Example:**
+
+```python
+# Example: Click a button
+result = await computer.run_actions([
+ computer.command2tool(Command(
+ tool_name="click",
+ tool_type="action",
+ parameters={
+ "control_text": "Save",
+ "control_type": "Button"
+ }
+ ))
+])
+```
+
+See [MCP Overview - Server Types](../mcp/overview.md#1-two-server-types) for detailed comparison.
+
+---
+
+## 📋 Server Configuration
+
+### Configuration File
+
+MCP servers are configured in `config/ufo/mcp.yaml`:
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector # Server namespace
+ type: local # local, http, or stdio
+ reset: false # Reset on each step?
+
+ action:
+ - namespace: HostUIExecutor # Server namespace
+ type: local
+ reset: false
+
+ - namespace: CommandLineExecutor # Multiple servers allowed
+ type: local
+ reset: false
+```
+
+**Configuration Parameters:**
+
+| Parameter | Type | Description | Example |
+|-----------|------|-------------|---------|
+| `namespace` | `str` | Server identifier (must match registered name) | `"UICollector"` |
+| `type` | `str` | Deployment type: `local`, `http`, `stdio` | `"local"` |
+| `reset` | `bool` | Reset server state on each step | `false` |
+
+!!!tip "📖 Full Configuration Guide"
+ See [MCP Configuration](../mcp/configuration.md) for advanced configuration including:
+ - HTTP server endpoints
+ - Stdio server commands
+ - Custom server parameters
+ - Environment-specific configs
+
+---
+
+## 🔧 Tool Registry & Execution
+
+### Tool Discovery
+
+The Computer automatically discovers and registers tools from all configured MCP servers during initialization:
+
+**Automatic Registration:**
+
+```python
+# During computer.async_init()
+async def register_mcp_servers(self, servers, tool_type):
+ """Register tools from all MCP servers"""
+ for namespace, server in servers.items():
+ # Connect to MCP server
+ async with Client(server.server) as client:
+ # List available tools
+ tools = await client.list_tools()
+
+ # Register each tool with unique key
+ for tool in tools:
+ tool_key = self.make_tool_key(tool_type, tool.name)
+ self._tools_registry[tool_key] = MCPToolCall(
+ tool_key=tool_key,
+ tool_name=tool.name,
+ title=tool.title,
+ namespace=namespace,
+ tool_type=tool_type,
+ description=tool.description,
+ input_schema=tool.inputSchema,
+ output_schema=tool.outputSchema,
+ mcp_server=server
+ )
+```
+
+**Tool Registry Structure:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `tool_key` | `str` | Unique key: `"tool_type::tool_name"` |
+| `tool_name` | `str` | Tool name (e.g., `"take_screenshot"`) |
+| `title` | `str` | Display title |
+| `namespace` | `str` | Server namespace (e.g., `"UICollector"`) |
+| `tool_type` | `str` | `"data_collection"` or `"action"` |
+| `description` | `str` | Tool description |
+| `input_schema` | `dict` | JSON schema for parameters |
+| `output_schema` | `dict` | JSON schema for results |
+| `mcp_server` | `BaseMCPServer` | Server instance |
+
+### Tool Execution
+
+Tools execute in isolated threads with timeout protection (default: 6000 seconds = 100 minutes per tool):
+
+```python
+# Thread pool configuration
+self._executor = concurrent.futures.ThreadPoolExecutor(
+ max_workers=10,
+ thread_name_prefix="mcp_tool_"
+)
+self._tool_timeout = 6000 # 100 minutes
+```
+
+See [Computer](./computer.md) for execution details.
+
+---
+
+## 🚀 Integration Examples
+
+### Basic Usage
+
+```python
+from ufo.client.computer import ComputerManager, CommandRouter
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+from aip.messages import Command
+
+# Create MCP server manager
+mcp_server_manager = MCPServerManager()
+
+# Create computer manager (manages Computer instances)
+computer_manager = ComputerManager(config, mcp_server_manager)
+
+# Create command router
+command_router = CommandRouter(computer_manager)
+
+# Execute action through MCP
+command = Command(
+ tool_name="click",
+ tool_type="action",
+ parameters={
+ "control_text": "Save",
+ "control_type": "Button"
+ }
+)
+
+# Router creates Computer instance and executes
+results = await command_router.execute(
+ agent_name="HostAgent",
+ process_name="notepad.exe",
+ root_name="default",
+ commands=[command]
+)
+```
+
+### Custom MCP Server
+
+```python
+from fastmcp import FastMCP
+
+# Define custom MCP server
+mcp = FastMCP("CustomTools")
+
+@mcp.tool()
+async def custom_action(param: str) -> str:
+ """Execute custom action"""
+ return f"Executed: {param}"
+
+# Register in config/ufo/mcp.yaml:
+# action:
+# - namespace: CustomTools
+# type: local
+# reset: false
+```
+
+**For step-by-step instructions:**
+
+- [Creating MCP Servers](../tutorials/creating_mcp_servers.md) - Build your own MCP tools
+
+---
+
+## 🔗 Integration Points
+
+### With Other Client Components
+
+**UFO Client:**
+- Receives AIP Commands from server
+- Delegates to Command Router
+- Returns AIP Results
+
+**Command Router:**
+- Routes commands to appropriate Computer instance (by agent/process/root name)
+- Manages command execution with early-exit support
+
+**Computer:**
+- **MCP entry point**: Manages all MCP servers
+- Executes tools via MCP Server Manager
+- Maintains tool registry
+
+**MCP Server Manager:**
+- Creates and manages MCP server instances
+- Supports local, HTTP, and stdio deployment types
+
+See [UFO Client](./ufo_client.md) and [Computer](./computer.md) for integration details.
+
+---
+
+## 📚 Related Documentation
+
+### Client Components
+
+| Component | Description | Link |
+|-----------|-------------|------|
+| **Computer** | Core MCP execution layer | [Computer](./computer.md) |
+| **UFO Client** | Session orchestration | [UFO Client](./ufo_client.md) |
+| **WebSocket Client** | Server communication | [WebSocket Client](./websocket_client.md) |
+
+### MCP Deep Dive
+
+| Topic | Description | Link |
+|-------|-------------|------|
+| **MCP Overview** | Architecture, concepts, deployment models | [Overview](../mcp/overview.md) |
+| **Data Collection** | Observation tools (UI, screenshots, system) | [Data Collection](../mcp/data_collection.md) |
+| **Action Servers** | Execution tools (click, type, run) | [Action](../mcp/action.md) |
+| **Configuration** | YAML configuration guide | [Configuration](../mcp/configuration.md) |
+| **Local Servers** | Built-in in-process servers | [Local Servers](../mcp/local_servers.md) |
+| **Remote Servers** | HTTP/Stdio deployment | [Remote Servers](../mcp/remote_servers.md) |
+| **Creating MCP Servers** | Build your own tools | [Creating MCP Servers](../tutorials/creating_mcp_servers.md) |
+
+---
+
+## 🎯 Key Takeaways
+
+**MCP in Client - Summary**
+
+**1. Computer is the MCP Manager**
+- Manages all MCP server instances
+- Routes tool calls to appropriate servers
+- Executes in thread pool for isolation
+
+**2. Two Server Types**
+- **Data Collection**: Read-only, observation tools
+- **Action**: State-changing, execution tools
+
+**3. Configuration-Driven**
+- Servers configured in `config/ufo/mcp.yaml`
+- Supports local, HTTP, and stdio deployment
+
+**4. Automatic Registration**
+- Tools auto-discovered during initialization
+- Tool registry built from server metadata
+
+**5. Detailed Docs Available**
+- Full MCP section at [MCP Overview](../mcp/overview.md)
+- Custom server guides, examples, troubleshooting
+
+---
+
+## 🚀 Next Steps
+
+- [MCP Overview](../mcp/overview.md) - Understand MCP architecture in depth
+- [Computer](./computer.md) - See how MCP servers are managed
+- [Creating MCP Servers](../tutorials/creating_mcp_servers.md) - Build your own MCP tools
diff --git a/documents/docs/client/overview.md b/documents/docs/client/overview.md
new file mode 100644
index 000000000..816a5734f
--- /dev/null
+++ b/documents/docs/client/overview.md
@@ -0,0 +1,847 @@
+# UFO Client Overview
+
+The **UFO Client** runs on target devices and serves as the **execution layer** of UFO's distributed agent system. It manages MCP (Model Context Protocol) servers, executes commands deterministically, and communicates with the Agent Server through the Agent Interaction Protocol (AIP).
+
+**Quick Start:** Jump to the [Quick Start Guide](./quick_start.md) to connect your device. Make sure the [Agent Server](../server/quick_start.md) is running first.
+
+---
+
+## 🎯 What is the UFO Client?
+
+```mermaid
+graph LR
+ subgraph "Agent Server (Brain)"
+ Reasoning[High-Level Reasoning]
+ Planning[Task Planning]
+ Strategy[Strategy Selection]
+ end
+
+ subgraph "Agent Client (Hands)"
+ Execution[Command Execution]
+ Tools[Tool Management]
+ Reporting[Status Reporting]
+ end
+
+ subgraph "Device Environment"
+ Apps[Applications]
+ Files[File System]
+ UI[User Interface]
+ end
+
+ Reasoning -->|Directives| Execution
+ Planning -->|Commands| Execution
+ Strategy -->|Tasks| Execution
+
+ Execution --> Tools
+ Tools --> Apps
+ Tools --> Files
+ Tools --> UI
+
+ Reporting -->|Results| Reasoning
+
+ style Reasoning fill:#bbdefb
+ style Execution fill:#c8e6c9
+ style Tools fill:#fff9c4
+```
+
+**The UFO Client is a stateless execution agent that:**
+
+| Capability | Description | Benefit |
+|------------|-------------|---------|
+| **🔧 Executes Commands** | Translates server directives into concrete actions | Deterministic, reliable execution |
+| **🛠️ Manages MCP Servers** | Orchestrates local and remote tool interfaces | Extensible tool ecosystem |
+| **📊 Reports Device Info** | Provides hardware and software profile to server | Intelligent task assignment |
+| **📡 Communicates via AIP** | Maintains persistent WebSocket connection | Real-time bidirectional communication |
+| **🚫 Remains Stateless** | Executes directives without high-level reasoning | Independent updates, simple architecture |
+
+**Stateless Design Philosophy:** The client focuses purely on execution. All reasoning and decision-making happens on the server, allowing independent updates to server logic and client tools, simple client architecture, intelligent orchestration of multiple clients, and resource-efficient operation.
+
+**Architecture:** The UFO Client is part of UFO's distributed **server-client architecture**, where it handles command execution and resource access while the [Agent Server](../server/overview.md) handles orchestration and decision-making. See [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md) for the complete design rationale, communication protocols, and deployment patterns.
+
+---
+
+## 🏗️ Architecture
+
+The client implements a **layered architecture** separating communication, execution, and tool management for maximum flexibility and maintainability.
+
+```mermaid
+graph TB
+ subgraph "Communication"
+ WSC[WebSocket Client AIP Protocol]
+ end
+
+ subgraph "Orchestration"
+ UFC[UFO Client]
+ CM[Computer Manager]
+ end
+
+ subgraph "Execution"
+ COMP[Computer]
+ MCPM[MCP Manager]
+ end
+
+ subgraph "Tools"
+ LOCAL[Local MCP Servers]
+ REMOTE[Remote MCP Servers]
+ end
+
+ WSC --> UFC
+ UFC --> CM
+ CM --> COMP
+ COMP --> MCPM
+ MCPM --> LOCAL
+ MCPM --> REMOTE
+
+ style WSC fill:#bbdefb
+ style UFC fill:#c8e6c9
+ style COMP fill:#fff9c4
+ style MCPM fill:#ffcdd2
+```
+
+### Core Components
+
+| Component | Responsibility | Key Features | Documentation |
+|-----------|---------------|--------------|---------------|
+| **WebSocket Client** | AIP communication | • Connection management • Registration • Heartbeat monitoring • Message routing | [Details →](./websocket_client.md) |
+| **UFO Client** | Execution orchestration | • Command execution • Result aggregation • Error handling • Session management | [Details →](./ufo_client.md) |
+| **Computer Manager** | Multi-computer abstraction | • Computer instance management • Namespace routing • Resource isolation | [Details →](./computer_manager.md) |
+| **Computer** | Tool management | • MCP server registration • Tool registry • Execution isolation • Thread pool management | [Details →](./computer.md) |
+| **MCP Server Manager** | MCP lifecycle | • Server creation • Configuration loading • Connection pooling • Health monitoring | [MCP Documentation →](../mcp/overview.md) |
+| **Device Info Provider** | System profiling | • Hardware detection • Capability reporting • Platform identification • Feature enumeration | [Details →](./device_info.md) |
+
+For detailed component documentation:
+
+- [WebSocket Client](./websocket_client.md) - AIP protocol implementation
+- [UFO Client](./ufo_client.md) - Execution orchestration
+- [Computer Manager](./computer_manager.md) - Multi-computer management
+- [Device Info Provider](./device_info.md) - System profiling
+- [MCP Integration](../mcp/overview.md) - MCP server management (comprehensive documentation)
+
+---
+
+## 🚀 Key Capabilities
+
+### 1. Deterministic Command Execution
+
+The client executes commands **exactly as specified** without interpretation or reasoning, ensuring predictable behavior.
+
+```mermaid
+sequenceDiagram
+ participant Server
+ participant Client as UFO Client
+ participant Computer
+ participant Tool as MCP Tool
+
+ Server->>Client: COMMAND (AIP)
+ Client->>Computer: Execute Command
+ Computer->>Computer: Lookup Tool
+ Computer->>Tool: Execute with Timeout
+ Tool-->>Computer: Result
+ Computer-->>Client: Aggregated Result
+ Client-->>Server: COMMAND_RESULTS (AIP)
+```
+
+**Execution Flow:**
+
+| Step | Action | Purpose |
+|------|--------|---------|
+| 1️⃣ **Receive** | Get structured command from server via AIP | Ensure well-formed input |
+| 2️⃣ **Route** | Dispatch to appropriate computer instance | Support multi-namespace execution |
+| 3️⃣ **Lookup** | Find tool in MCP registry | Dynamic tool resolution |
+| 4️⃣ **Execute** | Run tool in isolated thread pool | Fault isolation and timeout protection |
+| 5️⃣ **Aggregate** | Combine results from multiple tools | Structured response format |
+| 6️⃣ **Return** | Send results back to server via AIP | Complete the execution loop |
+
+**Execution Guarantees:**
+- **Isolation**: Each tool runs in separate thread pool
+- **Timeouts**: Configurable timeout (default: 6000 seconds/100 minutes)
+- **Fault Tolerance**: One failed tool doesn't crash entire client
+- **Thread Safety**: Concurrent tool execution supported
+- **Error Reporting**: Structured errors returned to server
+
+### 2. MCP Server Management
+
+The client manages a collection of **MCP (Model Context Protocol) servers** to provide diverse tool access for automation tasks. The client is responsible for registering, managing, and executing these tools, while the [Agent Server](../server/overview.md) handles command orchestration. See [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md#client-command-execution-and-resource-access) for how MCP integration fits into the overall architecture.
+
+**MCP Server Categories:**
+
+**Data Collection Servers** gather information from the device:
+
+| Server Type | Tools Provided | Use Cases |
+|-------------|---------------|-----------|
+| **System Info** | CPU, memory, disk stats | Resource monitoring |
+| **Application State** | Running apps, windows | Context awareness |
+| **Screenshot** | Screen capture | Visual verification |
+| **UI Element Detection** | Control trees, accessibility | UI automation |
+
+Example Tools: `get_system_info()`, `list_running_apps()`, `capture_screenshot()`, `get_ui_tree()`
+
+**Action Servers** perform actions on the device:
+
+| Server Type | Tools Provided | Use Cases |
+|-------------|---------------|-----------|
+| **GUI Automation** | Keyboard, mouse, clicks | UI interaction |
+| **Application Control** | Launch, close, focus | App management |
+| **File System** | Read, write, delete | File operations |
+| **Command Execution** | Shell commands | System automation |
+
+Example Tools: `click_button(label)`, `type_text(text)`, `open_application(name)`, `execute_command(cmd)`
+
+**Server Types:**
+
+| Type | Deployment | Pros | Cons |
+|------|------------|------|------|
+| **Local MCP Servers** | Run in same process via FastMCP | Fast, no network overhead | Limited to local capabilities |
+| **Remote MCP Servers** | Connect via HTTP/SSE | Scalable, shared services | Network latency, external dependency |
+
+**Example MCP Server Configuration:**
+
+```yaml
+mcp_servers:
+ data_collection:
+ - name: "system_info"
+ type: "local"
+ class: "SystemInfoServer"
+ - name: "ui_detector"
+ type: "local"
+ class: "UIDetectionServer"
+
+ action:
+ - name: "gui_automation"
+ type: "local"
+ class: "GUIAutomationServer"
+ - name: "file_ops"
+ type: "remote"
+ url: "http://localhost:8080/mcp"
+```
+
+See [MCP Integration](../mcp/overview.md) for comprehensive MCP server documentation.
+
+### 3. Device Profiling
+
+The client automatically collects and reports **device information** to enable the server to make intelligent task routing decisions.
+
+**Device Profile Structure:**
+
+```json
+{
+ "device_id": "device_windows_001",
+ "platform": "windows",
+ "platform_type": "computer",
+ "os_version": "10.0.22631",
+ "system_info": {
+ "cpu_count": 8,
+ "memory_total_gb": 16.0,
+ "disk_total_gb": 512.0,
+ "hostname": "DESKTOP-ABC123",
+ "ip_address": "192.168.1.100"
+ },
+ "supported_features": [
+ "gui_automation",
+ "cli_execution",
+ "browser_control",
+ "office_integration",
+ "windows_apps"
+ ],
+ "installed_applications": [
+ "Chrome",
+ "Excel",
+ "PowerPoint",
+ "VSCode"
+ ],
+ "screen_resolution": "1920x1080",
+ "connected_at": "2025-11-05T10:30:00Z"
+}
+```
+
+**Profile Usage on Server:**
+
+```mermaid
+graph LR
+ Client[Client Detects Device Info]
+ Server[Server Stores Profile]
+ Route[Server Routes Tasks]
+
+ Client -->|Report Profile| Server
+ Server -->|Match Requirements| Route
+ Route -->|Dispatch Task| Client
+
+ style Client fill:#bbdefb
+ style Server fill:#c8e6c9
+ style Route fill:#fff9c4
+```
+
+**Server Uses Profile For:**
+
+| Use Case | Example Logic |
+|----------|--------------|
+| **Platform Matching** | Route Excel task to Windows device |
+| **Capability Filtering** | Only send browser tasks to devices with Chrome |
+| **Load Balancing** | Distribute tasks based on CPU/memory |
+| **Failure Recovery** | Reassign task if device disconnects |
+
+See [Device Info Provider](./device_info.md) for detailed profiling documentation.
+
+### 4. Resilient Communication
+
+Robust, fault-tolerant communication with the server using strongly-typed AIP messages.
+
+**Connection Lifecycle:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> Disconnected
+ Disconnected --> Connecting: Initiate Connection
+ Connecting --> Registering: WebSocket Established
+ Registering --> Connected: Registration Success
+ Connecting --> Disconnected: Connection Failed
+ Registering --> Disconnected: Registration Failed
+
+ Connected --> Heartbeating: Start Heartbeat Loop
+ Heartbeating --> Heartbeating: Send/Receive Heartbeat
+ Heartbeating --> Disconnected: Heartbeat Timeout
+ Heartbeating --> Disconnected: WebSocket Closed
+
+ Disconnected --> Connecting: Retry (Exponential Backoff)
+
+ note right of Connected
+ • Receive commands
+ • Execute tasks
+ • Report results
+ end note
+
+ note right of Heartbeating
+ Default interval: 30s
+ Timeout: 60s
+ end note
+```
+
+**Connection Features:**
+
+| Feature | Description | Configuration |
+|---------|-------------|---------------|
+| **Auto Registration** | Registers with server on connect | Device ID, platform, capabilities |
+| **Exponential Backoff** | Smart retry on connection failure | Max retries: 5 (default) |
+| **Heartbeat Monitoring** | Keep-alive mechanism | Interval: 30s (configurable) |
+| **Graceful Reconnection** | Resume operation after disconnect | Auto-reconnect on network recovery |
+
+**Message Types:**
+
+| Message | Direction | Purpose |
+|---------|-----------|---------|
+| `REGISTRATION` | Client → Server | Register device with capabilities |
+| `REGISTRATION_ACK` | Server → Client | Confirm registration |
+| `HEARTBEAT` | Client ↔ Server | Keep connection alive |
+| `COMMAND` | Server → Client | Execute task command |
+| `COMMAND_RESULTS` | Client → Server | Return execution results |
+| `ERROR` | Client → Server | Report execution errors |
+
+See [WebSocket Client](./websocket_client.md) and [AIP Protocol](../aip/overview.md) for protocol details.
+
+---
+
+## 📋 Workflow Examples
+
+### Client Initialization & Registration
+
+```mermaid
+sequenceDiagram
+ participant Main as Client Main
+ participant MCP as MCP Manager
+ participant WSC as WebSocket Client
+ participant Server
+
+ Main->>MCP: Initialize MCP Servers
+ MCP-->>Main: Server Registry Ready
+
+ Main->>WSC: Create Client & Connect
+ WSC->>Server: WebSocket Connect
+ Server-->>WSC: Connection Established
+
+ WSC->>WSC: Collect Device Info
+ WSC->>Server: REGISTRATION
+ Server-->>WSC: REGISTRATION_ACK
+
+ WSC->>WSC: Start Heartbeat Loop
+
+ loop Every 30 seconds
+ WSC->>Server: HEARTBEAT
+ Server-->>WSC: HEARTBEAT_ACK
+ end
+
+ Note over WSC,Server: Ready to Execute Commands
+```
+
+**Initialization Steps:**
+
+| Step | Action | Details |
+|------|--------|---------|
+| 1️⃣ **Parse Args** | Process command-line arguments | `--client-id`, `--ws-server`, `--platform` |
+| 2️⃣ **Load Config** | Load UFO configuration | MCP servers, tools, settings |
+| 3️⃣ **Init MCP** | Initialize MCP server manager | Create local/remote servers |
+| 4️⃣ **Create Managers** | Create computer manager | Register MCP servers with computers |
+| 5️⃣ **Connect** | Establish WebSocket connection | Connect to server |
+| 6️⃣ **Register** | Send device profile | Platform, capabilities, system info |
+| 7️⃣ **Heartbeat** | Start keep-alive loop | Default: 30s interval |
+| 8️⃣ **Listen** | Wait for commands | Ready for task execution |
+
+### Command Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Server
+ participant Client as UFO Client
+ participant Comp as Computer
+ participant Tool as MCP Tool
+
+ Server->>Client: COMMAND {type: "click_button", args: {...}}
+ Client->>Comp: execute_command()
+ Comp->>Comp: find_tool("click_button")
+
+ alt Tool Found
+ Comp->>Tool: execute(args)
+ Note over Tool: Thread Pool Execution 6000s timeout
+ Tool-->>Comp: Success
+ Comp-->>Client: Result
+ Client-->>Server: COMMAND_RESULTS {status: "completed"}
+ else Tool Not Found
+ Comp-->>Client: Error
+ Client-->>Server: ERROR {error: "Tool not found"}
+ end
+```
+
+---
+
+## 🖥️ Platform Support
+
+The client supports multiple platforms with platform-specific tool implementations.
+
+| Platform | Status | Features | Native Tools |
+|----------|--------|----------|--------------|
+| **Windows** | ✅ **Full Support** | • UI Automation (UIAutomation API) • COM API integration • Office automation • Windows-specific apps | PowerShell, Registry, WMI, Win32 API |
+| **Linux** | ✅ **Full Support** | • Bash automation • X11/Wayland GUI tools • Package managers • Linux applications | bash, apt/yum, systemd, xdotool |
+| **macOS** | 🚧 **In Development** | • macOS applications • Automator integration • AppleScript support | osascript, Automator, launchctl |
+| **Mobile** | 🔮 **Planned** | • Touch interface • Mobile apps • Gesture control | ADB (Android), XCTest (iOS) |
+
+**Platform Detection:**
+
+- **Automatic**: Detected via `platform.system()` on startup
+- **Override**: Use `--platform` flag to specify manually
+- **Validation**: Server validates platform matches task requirements
+
+**Platform-Specific Example:**
+
+**Windows:**
+```python
+# Windows-specific tools
+tools = [
+ "open_windows_app(name='Excel')",
+ "execute_powershell(script='Get-Process')",
+ "read_registry(key='HKLM\\Software')"
+]
+```
+
+**Linux:**
+```python
+# Linux-specific tools
+tools = [
+ "execute_bash(command='ls -la')",
+ "install_package(name='vim')",
+ "control_systemd(service='nginx', action='restart')"
+]
+```
+
+---
+
+## ⚙️ Configuration
+
+### Command-Line Arguments
+
+Start the UFO client with:
+
+```bash
+python -m ufo.client.client [OPTIONS]
+```
+
+**Available Options:**
+
+| Option | Type | Default | Description | Example |
+|--------|------|---------|-------------|---------|
+| `--client-id` | `str` | `client_001` | Unique client identifier | `--client-id device_win_001` |
+| `--ws-server` | `str` | `ws://localhost:5000/ws` | WebSocket server URL | `--ws-server ws://192.168.1.10:5000/ws` |
+| `--ws` | `flag` | `False` | **Enable WebSocket mode** (required) | `--ws` |
+| `--max-retries` | `int` | `5` | Connection retry limit | `--max-retries 10` |
+| `--platform` | `str` | Auto-detect | Platform override | `--platform windows` |
+| `--log-level` | `str` | `WARNING` | Logging verbosity | `--log-level DEBUG` |
+
+**Quick Start Command:**
+
+```bash
+# Minimal command (default server)
+python -m ufo.client.client --ws --client-id my_device
+
+# Production command (custom server)
+python -m ufo.client.client \
+ --ws \
+ --client-id device_production_01 \
+ --ws-server ws://ufo-server.company.com:5000/ws \
+ --max-retries 10 \
+ --log-level INFO
+```
+
+### UFO Configuration
+
+The client inherits settings from `config_dev.yaml`:
+
+**Key Configuration Sections:**
+
+| Section | Purpose | Example |
+|---------|---------|---------|
+| **MCP Servers** | Define data collection and action servers | `mcp_servers.data_collection`, `mcp_servers.action` |
+| **Tool Settings** | Tool-specific parameters | Timeouts, retries, API keys |
+| **Logging** | Log levels, formats, destinations | File logging, console output |
+| **Platform Settings** | OS-specific configurations | Windows UI automation settings |
+
+**Sample Configuration:**
+
+```yaml
+client:
+ heartbeat_interval: 30 # seconds
+ command_timeout: 6000 # seconds (100 minutes)
+ max_concurrent_tools: 10
+
+mcp_servers:
+ data_collection:
+ - name: system_info
+ type: local
+ enabled: true
+ action:
+ - name: gui_automation
+ type: local
+ enabled: true
+ settings:
+ click_delay: 0.5
+ typing_speed: 100 # chars per minute
+
+logging:
+ level: INFO
+ format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+ file: "logs/client.log"
+```
+
+See [Configuration Guide](../configuration/system/overview.md) for comprehensive documentation.
+
+---
+
+## ⚠️ Error Handling
+
+The client is designed to handle various failure scenarios gracefully without crashing.
+
+### Connection Failures
+
+```mermaid
+stateDiagram-v2
+ [*] --> Attempting
+ Attempting --> Connected: Success
+ Attempting --> Failed: Error
+
+ Failed --> Waiting: Exponential Backoff
+ Waiting --> Attempting: Retry (2^n seconds)
+
+ Failed --> [*]: Max Retries Exceeded
+
+ note right of Waiting
+ Retry Delays:
+ 1st: 2s
+ 2nd: 4s
+ 3rd: 8s
+ 4th: 16s
+ 5th: 32s
+ end note
+```
+
+**Connection Error Handling:**
+
+| Scenario | Client Behavior | Configuration |
+|----------|----------------|---------------|
+| **Initial Connection Failed** | Exponential backoff retry | `--max-retries` (default: 5) |
+| **Connection Lost** | Attempt reconnection | Automatic |
+| **Max Retries Exceeded** | Exit with error code | Log error, exit |
+| **Server Unreachable** | Log error, retry | Backoff between retries |
+
+### Tool Execution Failures
+
+**Protection Mechanisms:**
+
+| Mechanism | Purpose | Default Value |
+|-----------|---------|---------------|
+| **Thread Pool Isolation** | Prevent one tool from blocking others | Enabled |
+| **Execution Timeout** | Kill hung tools | 6000 seconds (100 minutes) |
+| **Exception Catching** | Graceful error handling | All tools wrapped |
+| **Error Reporting** | Notify server of failures | Structured error messages |
+
+**Error Handling Example:**
+
+```python
+# Client automatically handles tool errors
+try:
+ result = tool.execute(args)
+ return {"status": "success", "result": result}
+except TimeoutError:
+ return {"status": "error", "error": "Tool execution timeout"}
+except Exception as e:
+ return {"status": "error", "error": str(e)}
+```
+
+### Server Disconnection
+
+**Graceful Shutdown Process:**
+
+1. **Detect Disconnection** - WebSocket connection lost
+2. **Stop Heartbeat** - Terminate keep-alive loop
+3. **Cancel Pending Tasks** - Abort in-progress commands
+4. **Attempt Reconnection** - Use exponential backoff
+5. **Clean Shutdown** - If max retries exceeded
+
+---
+
+## ✅ Best Practices
+
+### Development Best Practices
+
+**1. Use Unique Client IDs**
+
+```bash
+# Bad: Generic ID
+--client-id client_001
+
+# Good: Descriptive ID
+--client-id device_win_dev_john_laptop
+```
+
+**2. Start with INFO Logging**
+
+```bash
+# Development: WARNING for normal operation (default)
+--log-level WARNING
+
+# Debugging: DEBUG for troubleshooting
+--log-level DEBUG
+```
+
+**3. Test MCP Connectivity First**
+
+```python
+# Verify MCP servers are accessible before running client
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+manager = MCPServerManager()
+# Test server creation from configuration
+```
+
+### Production Best Practices
+
+**1. Use Descriptive Client IDs**
+
+```bash
+# Include environment, location, purpose
+--client-id device_windows_production_office_01
+--client-id device_linux_staging_lab_02
+```
+
+**2. Configure Automatic Restart**
+
+**systemd (Linux):**
+
+```ini
+[Unit]
+Description=UFO Agent Client
+After=network.target
+
+[Service]
+Type=simple
+User=ufo
+WorkingDirectory=/opt/ufo
+ExecStart=/usr/bin/python3 -m ufo.client.client \
+ --ws \
+ --client-id device_linux_prod_01 \
+ --ws-server ws://ufo-server.internal:5000/ws \
+ --log-level INFO
+Restart=always
+RestartSec=10
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**PM2 (Cross-platform):**
+
+```json
+{
+ "apps": [{
+ "name": "ufo-client",
+ "script": "python",
+ "args": [
+ "-m", "ufo.client.client",
+ "--ws",
+ "--client-id", "device_win_prod_01",
+ "--ws-server", "ws://ufo-server.internal:5000/ws",
+ "--log-level", "INFO"
+ ],
+ "cwd": "C:\\ufo",
+ "restart_delay": 5000,
+ "max_restarts": 10
+ }]
+}
+```
+
+**3. Monitor Connection Health**
+
+```bash
+# Check logs for connection status
+tail -f logs/client.log | grep -E "Connected|Disconnected|ERROR"
+```
+
+### Security Best Practices
+
+!!! warning "Security Considerations"
+
+ | Practice | Description | Implementation |
+ |----------|-------------|----------------|
+ | **Use WSS** | Encrypt WebSocket communication | `wss://server:5000/ws` instead of `ws://` |
+ | **Validate Server** | Verify server certificate | Configure SSL/TLS verification |
+ | **Restrict Tools** | Limit MCP server access | Only enable necessary tools |
+ | **Least Privilege** | Run with minimum permissions | Create dedicated user account |
+ | **Network Isolation** | Use firewalls and VPNs | Restrict server access to internal network |
+
+---
+
+## 🎓 Documentation Map
+
+### Getting Started
+
+| Document | Purpose | When to Read |
+|----------|---------|--------------|
+| [Quick Start](./quick_start.md) | Connect your device quickly | First time setup |
+| [Server Quick Start](../server/quick_start.md) | Understand server-side setup | Before running client |
+
+### Component Details
+
+| Document | Component | Topics Covered |
+|----------|-----------|----------------|
+| [WebSocket Client](./websocket_client.md) | Communication layer | AIP protocol, connection management |
+| [UFO Client](./ufo_client.md) | Orchestration | Session tracking, command execution |
+| [Computer Manager](./computer_manager.md) | Multi-computer abstraction | Namespace management, routing |
+| [Computer](./computer.md) | Tool management | MCP registry, execution |
+| [Device Info](./device_info.md) | System profiling | Hardware detection, capabilities |
+| [MCP Integration](./mcp_integration.md) | MCP servers | Server types, configuration |
+
+### Related Documentation
+
+| Document | Topic | Relevance |
+|----------|-------|-----------|
+| [Server Overview](../server/overview.md) | Server architecture | Understand the other half |
+| [AIP Protocol](../aip/overview.md) | Communication protocol | Deep dive into messaging |
+| [Configuration](../configuration/system/overview.md) | UFO configuration | Customize behavior |
+
+---
+
+## 🔄 Client vs. Server
+
+Understanding the **clear division** between client and server responsibilities is crucial for effective system design.
+
+**Responsibility Matrix:**
+
+| Aspect | Client (Execution) | Server (Orchestration) |
+|--------|-------------------|------------------------|
+| **Primary Role** | Execute directives deterministically | Reason about tasks, plan actions |
+| **State Management** | Stateless (no session memory) | Stateful (maintains sessions) |
+| **Reasoning** | None (pure execution) | Full (high-level decision-making) |
+| **Tools** | MCP servers (local/remote) | Agent strategies, prompts, LLMs |
+| **Communication** | Device ↔ Server (AIP) | Multi-client coordination |
+| **Updates** | Tool implementation changes | Strategy and logic updates |
+| **Complexity** | Low (simple execution loop) | High (complex orchestration) |
+| **Dependencies** | MCP servers, system APIs | LLMs, databases, client registry |
+
+**Workflow Comparison:**
+
+```mermaid
+graph TB
+ subgraph "Server Workflow"
+ S1[Receive User Request]
+ S2[Reason About Task]
+ S3[Plan Execution Steps]
+ S4[Select Target Device]
+ S5[Send Commands]
+ end
+
+ subgraph "Client Workflow"
+ C1[Receive Command]
+ C2[Lookup Tool]
+ C3[Execute Tool]
+ C4[Return Result]
+ end
+
+ S1 --> S2
+ S2 --> S3
+ S3 --> S4
+ S4 --> S5
+ S5 -.->|AIP| C1
+ C1 --> C2
+ C2 --> C3
+ C3 --> C4
+ C4 -.->|AIP| S5
+
+ style S1 fill:#bbdefb
+ style S2 fill:#bbdefb
+ style S3 fill:#bbdefb
+ style C1 fill:#c8e6c9
+ style C2 fill:#c8e6c9
+ style C3 fill:#c8e6c9
+```
+
+**Decoupled Architecture Benefits:**
+- Independent Updates: Modify server logic without touching clients
+- Flexible Deployment: Run clients on any platform
+- Scalability: Add more clients without server changes
+- Maintainability: Simpler client code, easier debugging
+- Testability: Test client and server independently
+
+---
+
+## 🚀 Next Steps
+
+**1. Run Your First Client**
+
+```bash
+# Follow the quick start guide
+python -m ufo.client.client \
+ --ws \
+ --client-id my_first_device \
+ --ws-server ws://localhost:5000/ws
+```
+👉 [Quick Start Guide](./quick_start.md)
+
+**2. Understand Registration Process**
+
+Learn how clients register with the server, device profile structure, and registration acknowledgment.
+
+👉 [Server Quick Start](../server/quick_start.md) - Start server and connect clients
+
+**3. Explore MCP Integration**
+
+Learn about MCP servers, configure custom tools, and create your own MCP servers.
+
+👉 [MCP Integration](../mcp/overview.md)
+
+**4. Configure for Your Environment**
+
+Customize MCP servers, adjust timeouts and retries, and configure platform-specific settings.
+
+👉 [Configuration Guide](../configuration/system/overview.md)
+
+**5. Master the Protocol**
+
+Deep dive into AIP messages, understand message flow, and error handling patterns.
+
+👉 [AIP Protocol](../aip/overview.md)
diff --git a/documents/docs/client/quick_start.md b/documents/docs/client/quick_start.md
new file mode 100644
index 000000000..395bfb7a7
--- /dev/null
+++ b/documents/docs/client/quick_start.md
@@ -0,0 +1,1103 @@
+# ⚡ Quick Start
+
+Get your device connected to the UFO Agent Server and start executing tasks in minutes. No complex setup—just run a single command.
+
+---
+
+## 📋 Prerequisites
+
+Before connecting a client device, ensure these requirements are met:
+
+| Requirement | Version/Details | Verification Command |
+|-------------|-----------------|----------------------|
+| **Python** | 3.10 or higher | `python --version` |
+| **UFO Installation** | Latest version with dependencies | `python -c "import ufo; print('✅ Installed')"` |
+| **Running Server** | Agent server accessible on network | `curl http://server:5000/api/health` |
+| **Network Access** | Client can reach server WebSocket endpoint | Test connectivity to server |
+
+!!! tip "Server First!"
+ **Always start the Agent Server before connecting clients.** The server must be running and accessible for clients to register successfully.
+
+ 👉 [Server Quick Start Guide](../server/quick_start.md)
+
+**Verify Server Status:**
+
+**Windows:**
+```powershell
+# Test HTTP API
+Invoke-WebRequest -Uri http://localhost:5000/api/health
+
+# Test WebSocket (requires wscat)
+wscat -c ws://localhost:5000/ws
+```
+
+**Linux/macOS:**
+```bash
+# Test HTTP API
+curl http://localhost:5000/api/health
+
+# Test WebSocket (requires wscat)
+wscat -c ws://localhost:5000/ws
+```
+
+---
+
+## 🚀 Starting a Device Client
+
+### Minimal Command (Local Server)
+
+Connect to a server running on the same machine with default settings:
+
+```bash
+python -m ufo.client.client --ws --client-id my_device
+```
+
+**What This Does:**
+
+| Parameter | Default Value | Purpose |
+|-----------|---------------|---------|
+| `--ws` | N/A (flag) | **Enable WebSocket mode** (required) |
+| `--client-id` | `my_device` | Unique identifier for this device |
+| `--ws-server` | `ws://localhost:5000/ws` | Connect to local server |
+| `--platform` | Auto-detected | Detected from `platform.system()` |
+| `--max-retries` | `5` | Connection retry attempts |
+
+### Connect to Remote Server
+
+Connect to a server running on a different machine in your network:
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_windows_001
+```
+
+**Network Requirements:**
+
+- ✅ Client can ping the server: `ping 192.168.1.100`
+- ✅ Port **5000** is accessible (firewall allows)
+- ✅ Server is running and listening on correct port
+
+### Override Platform Detection
+
+!!! tip "When to Override"
+ Normally, the client auto-detects the platform (`windows` or `linux`). Override when:
+
+ - Running in container/VM with mismatched OS
+ - Testing cross-platform behavior
+ - Platform detection fails
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://127.0.0.1:5000/ws \
+ --client-id my_linux_device \
+ --platform linux
+```
+
+### Complete Command (All Options)
+
+Production-ready configuration with all available options:
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_windows_prod_01 \
+ --platform windows \
+ --max-retries 10 \
+ --log-level WARNING
+```
+
+**Enhancements:**
+
+- 🔁 **10 retries**: Resilient to temporary network issues
+- 📋 **WARNING logging**: Default level (less verbose than INFO)
+- 🏷️ **Descriptive ID**: `device_windows_prod_01` clearly identifies environment
+
+---
+
+## 📝 Connection Parameters Reference
+
+All available command-line options for the UFO client.
+
+### Required Parameters
+
+| Parameter | Description | Example |
+|-----------|-------------|---------|
+| `--ws` | **Enable WebSocket mode** (flag, no value) | `--ws` |
+
+### Connection Parameters
+
+| Parameter | Type | Default | Description | Example |
+|-----------|------|---------|-------------|---------|
+| `--ws-server` | `str` | `ws://localhost:5000/ws` | WebSocket server URL | `--ws-server ws://192.168.1.10:5000/ws` |
+| `--max-retries` | `int` | `5` | Maximum connection retry attempts | `--max-retries 10` |
+
+### Device Parameters
+
+| Parameter | Type | Default | Description | Example |
+|-----------|------|---------|-------------|---------|
+| `--client-id` | `str` | `client_001` | **Unique device identifier** | `--client-id device_win_prod_01` |
+| `--platform` | `str` | Auto-detect | Platform override: `windows` or `linux` | `--platform linux` |
+
+### Logging Parameters
+
+| Parameter | Type | Default | Description | Example |
+|-----------|------|---------|-------------|---------|
+| `--log-level` | `str` | `WARNING` | Logging verbosity: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`, `OFF` | `--log-level DEBUG` |
+
+!!! warning "Unique Client IDs - Critical!"
+ **Each device MUST have a unique `--client-id`.** Duplicate IDs will cause:
+
+ - ❌ Connection conflicts (devices disconnecting each other)
+ - ❌ Task routing failures (tasks sent to wrong device)
+ - ❌ Session corruption (server state confusion)
+
+ **Best Practice:** Use descriptive IDs:
+ ```
+ ✅ device_windows_prod_datacenter1_rack3
+ ✅ device_linux_staging_jenkins_worker2
+ ❌ client_001
+ ❌ device1
+ ```
+
+---
+
+## ✅ Successful Connection
+
+### Client Logs
+
+When the client connects successfully, you'll see this sequence:
+
+```log
+INFO - Platform detected/specified: windows
+INFO - UFO Client initialized for platform: windows
+INFO - [WS] Connecting to ws://127.0.0.1:5000/ws (attempt 1/5)
+INFO - [WS] [AIP] Collected device info: platform=windows, cpu=8, memory=16.0GB
+INFO - [WS] [AIP] Attempting to register as device_windows_001
+INFO - [WS] [AIP] ✅ Successfully registered as device_windows_001
+INFO - [WS] Heartbeat loop started (interval: 30s)
+```
+
+**Registration Flow:**
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant S as Server
+
+ C->>C: Load Config & Initialize MCP
+ C->>S: WebSocket Connect
+ S-->>C: Connection Ack
+
+ C->>C: Collect Device Info
+ C->>S: REGISTRATION (id, platform, capabilities)
+ S->>S: Validate & Store
+ S-->>C: REGISTRATION_ACK
+
+ loop Every 30s
+ C->>S: HEARTBEAT
+ S-->>C: HEARTBEAT_ACK
+ end
+
+ Note over C,S: Ready for Commands
+```
+
+### Server Logs
+
+On the server side, you'll see:
+
+```log
+INFO - [WS] ✅ Registered device client: device_windows_001
+INFO - [WS] Device device_windows_001 capabilities: {
+ "platform": "windows",
+ "cpu_count": 8,
+ "memory_gb": 16.0,
+ "mcp_servers": ["system_info", "gui_automation"]
+}
+```
+
+---
+
+## 🔍 Verify Connection
+
+### Check Connected Clients (HTTP API)
+
+From the server machine or any network-accessible machine:
+
+**cURL:**
+```bash
+curl http://localhost:5000/api/clients
+```
+
+**PowerShell:**
+```powershell
+Invoke-RestMethod -Uri http://localhost:5000/api/clients | ConvertTo-Json
+```
+
+**Python:**
+```python
+import requests
+response = requests.get("http://localhost:5000/api/clients")
+print(response.json())
+```
+
+**Expected Response:**
+
+```json
+{
+ "clients": [
+ {
+ "client_id": "device_windows_001",
+ "type": "device",
+ "platform": "windows",
+ "connected_at": 1730736000.0,
+ "uptime_seconds": 45,
+ "capabilities": {
+ "cpu_count": 8,
+ "memory_gb": 16.0,
+ "mcp_servers": ["system_info", "gui_automation"]
+ }
+ }
+ ],
+ "total": 1
+}
+```
+
+**Client Status Indicators:**
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| `client_id` | Unique device identifier | `device_windows_001` |
+| `type` | Client type (always `"device"`) | `device` |
+| `platform` | Operating system | `windows`, `linux` |
+| `connected_at` | Unix timestamp of connection | `1730736000.0` |
+| `uptime_seconds` | Seconds since connection | `45` |
+| `capabilities` | Device hardware/software profile | CPU, memory, MCP servers |
+
+### Monitor Heartbeats
+
+The client sends **heartbeat messages every 30 seconds** to prove it's still alive.
+
+**Client Logs (DEBUG level):**
+
+```log
+DEBUG - [WS] [AIP] Heartbeat sent
+DEBUG - [WS] [AIP] Heartbeat acknowledged
+```
+
+**Server Logs (DEBUG level):**
+
+```log
+DEBUG - [WS] Heartbeat received from device_windows_001
+DEBUG - [WS] Heartbeat acknowledged for device_windows_001
+```
+
+---
+
+## 🎯 Running Your First Task
+
+Once the client is connected, dispatch a simple task from the server to verify end-to-end functionality.
+
+### Dispatch Task via HTTP API
+
+**cURL:**
+```bash
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "device_windows_001",
+ "request": "Open Notepad and type Hello from UFO"
+ }'
+```
+
+**PowerShell:**
+```powershell
+$body = @{
+ client_id = "device_windows_001"
+ request = "Open Notepad and type Hello from UFO"
+} | ConvertTo-Json
+
+Invoke-RestMethod -Uri http://localhost:5000/api/dispatch `
+ -Method POST `
+ -ContentType "application/json" `
+ -Body $body
+```
+
+**Python:**
+```python
+import requests
+
+response = requests.post(
+ "http://localhost:5000/api/dispatch",
+ json={
+ "client_id": "device_windows_001",
+ "request": "Open Notepad and type Hello from UFO"
+ }
+)
+print(response.json())
+```
+
+### Server Response
+
+```json
+{
+ "status": "success",
+ "session_id": "session_20251104_143022_abc123",
+ "message": "Task dispatched to device_windows_001",
+ "client_id": "device_windows_001"
+}
+```
+
+**Response Fields:**
+
+| Field | Description |
+|-------|-------------|
+| `status` | `"success"` or `"error"` |
+| `session_id` | Unique session identifier for tracking |
+| `message` | Human-readable status message |
+| `client_id` | Target device that received the task |
+
+### Client Execution Logs
+
+```log
+INFO - [WS] Starting task: Open Notepad and type Hello from UFO
+INFO - [WS] [AIP] Sent task request with platform: windows
+INFO - Executing 3 actions in total
+INFO - [WS] [AIP] Sent client result for prev_response_id: resp_abc123
+INFO - [WS] Task session_20251104_143022_abc123 completed
+```
+
+**Execution Flow:**
+
+```mermaid
+sequenceDiagram
+ participant API as HTTP API
+ participant Server
+ participant Client
+ participant App as Notepad
+
+ API->>Server: POST /dispatch
+ Server->>Server: Create Session
+ Server-->>API: {session_id, status}
+
+ Server->>Client: COMMAND
+ Client->>App: Launch & Type
+ App-->>Client: Done
+ Client->>Server: COMMAND_RESULTS
+```
+
+---
+
+## ⚠️ Common Issues
+
+### 1. Connection Refused
+
+**Symptom:**
+```log
+ERROR - [WS] Unexpected error: [Errno 10061] Connect call failed
+ERROR - [WS] Max retries reached. Exiting.
+```
+
+**Root Causes:**
+
+| Cause | Verification | Solution |
+|-------|--------------|----------|
+| Server not running | `curl http://localhost:5000/api/health` | Start server first |
+| Wrong port | Check server startup logs | Use correct port (`--ws-server ws://...`) |
+| Firewall blocking | `telnet server 5000` | Allow port 5000 in firewall |
+| Server using `--local` flag | Check server CLI args | Connect from localhost only |
+
+**Solutions:**
+
+**Verify Server:**
+```bash
+# Check if server is running
+curl http://localhost:5000/api/health
+
+# Expected response:
+# {"status": "healthy", "uptime_seconds": 123}
+```
+
+**Check Firewall:**
+```bash
+# Windows: Check if port is listening
+netstat -an | findstr ":5000"
+
+# Linux: Check if port is listening
+netstat -tuln | grep :5000
+```
+
+**Fix Connection:**
+```bash
+# Ensure server and client match:
+# Server: --port 5000
+# Client: --ws-server ws://localhost:5000/ws
+```
+
+### 2. Registration Failed
+
+**Symptom:**
+```log
+ERROR - [WS] [AIP] ❌ Failed to register as device_windows_001
+RuntimeError: Registration failed for device_windows_001
+```
+
+**Root Causes:**
+
+| Cause | Explanation | Solution |
+|-------|-------------|----------|
+| Duplicate client ID | Another device using same ID | Use unique `--client-id` |
+| Server rejecting connection | Server validation error | Check server logs for details |
+| Network interruption | Connection dropped during registration | Retry connection |
+| Device info collection error | Failed to gather system info | Check MCP server initialization |
+
+**Solutions:**
+
+**Check Duplicate IDs:**
+```bash
+# List all connected clients
+curl http://localhost:5000/api/clients | grep client_id
+
+# If your ID appears, choose a different one
+python -m ufo.client.client --ws --client-id NEW_UNIQUE_ID
+```
+
+**Check Server Logs:**
+```bash
+# Server logs show detailed rejection reasons
+# Example: "Client ID already exists"
+# Example: "Platform mismatch"
+```
+
+### 3. Platform Detection Issues
+
+**Symptom:**
+```log
+WARNING - Platform not detected correctly
+WARNING - Defaulting to platform: unknown
+```
+
+**Solution:**
+
+Explicitly set the platform:
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://127.0.0.1:5000/ws \
+ --client-id my_device \
+ --platform windows # or 'linux'
+```
+
+**Platform Values:**
+
+| Value | OS | Auto-Detection |
+|-------|----|-|
+| `windows` | Windows 10/11, Server 2016+ | `platform.system() == "Windows"` |
+| `linux` | Ubuntu, Debian, RHEL, etc. | `platform.system() == "Linux"` |
+
+### 4. Heartbeat Timeout
+
+**Symptom:**
+```log
+ERROR - [WS] Connection closed: ConnectionClosedError
+INFO - [WS] Reconnecting... (attempt 2/5)
+```
+
+**Root Causes:**
+
+| Cause | Description | Solution |
+|-------|-------------|----------|
+| Network instability | Wi-Fi dropouts, packet loss | Use wired connection |
+| Server crashed | Server process terminated | Restart server |
+| Proxy interference | Corporate proxy blocking WebSocket | Configure proxy bypass |
+| Firewall timeout | Idle connection timeout | Reduce heartbeat interval |
+
+**Solutions:**
+
+**Increase Retries:**
+```bash
+# For unreliable networks
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://server:5000/ws \
+ --client-id my_device \
+ --max-retries 20
+```
+
+**Check Network:**
+```bash
+# Test sustained connection
+ping -t server # Windows
+ping server # Linux (Ctrl+C to stop)
+```
+
+**Verify Server:**
+```bash
+# Check if server is still running
+curl http://server:5000/api/health
+```
+
+---
+
+## 🌐 Multiple Devices
+
+Connect multiple devices to the same server for **fleet management** and **task distribution**.
+
+### Example Configuration
+
+**Device 1 (Windows Desktop):**
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_windows_desktop_001
+```
+
+**Device 2 (Linux Server):**
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_linux_server_001 \
+ --platform linux
+```
+
+**Device 3 (Windows Laptop):**
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_windows_laptop_002
+```
+
+### Verify All Connected
+
+```bash
+curl http://192.168.1.100:5000/api/clients
+```
+
+**Expected Response:**
+
+```json
+{
+ "clients": [
+ {
+ "client_id": "device_windows_desktop_001",
+ "type": "device",
+ "platform": "windows",
+ "uptime_seconds": 120
+ },
+ {
+ "client_id": "device_linux_server_001",
+ "type": "device",
+ "platform": "linux",
+ "uptime_seconds": 95
+ },
+ {
+ "client_id": "device_windows_laptop_002",
+ "type": "device",
+ "platform": "windows",
+ "uptime_seconds": 45
+ }
+ ],
+ "total": 3
+}
+```
+
+**Client ID Naming Convention:**
+
+```
+device____
+
+Examples:
+- device_windows_prod_datacenter1_001
+- device_linux_staging_cloud_aws_002
+- device_windows_dev_office_laptop_john
+```
+
+---
+
+## 🔧 Running as Background Service
+
+!!! tip "Production Deployment"
+ For production use, run the client as a **system service** that starts automatically and restarts on failure.
+
+### Linux (systemd)
+
+Create `/etc/systemd/system/ufo-client.service`:
+
+```ini
+[Unit]
+Description=UFO Device Client - Execution Agent
+Documentation=https://github.com/microsoft/UFO
+After=network-online.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=ufouser
+Group=ufouser
+WorkingDirectory=/home/ufouser/UFO2
+
+# Environment variables (if needed)
+Environment="PYTHONUNBUFFERED=1"
+
+# Main command
+ExecStart=/usr/bin/python3 -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5000/ws \
+ --client-id device_linux_prod_01 \
+ --platform linux \
+ --log-level INFO
+
+# Restart policy
+Restart=always
+RestartSec=10
+StartLimitBurst=5
+StartLimitIntervalSec=300
+
+# Resource limits (optional)
+LimitNOFILE=65536
+MemoryLimit=2G
+
+# Logging
+StandardOutput=journal
+StandardError=journal
+SyslogIdentifier=ufo-client
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**Enable and Start:**
+
+```bash
+# Reload systemd configuration
+sudo systemctl daemon-reload
+
+# Enable service (start on boot)
+sudo systemctl enable ufo-client
+
+# Start service now
+sudo systemctl start ufo-client
+
+# Check status
+sudo systemctl status ufo-client
+
+# View logs
+sudo journalctl -u ufo-client -f
+```
+
+**Service Management:**
+
+| Command | Purpose |
+|---------|---------|
+| `systemctl start ufo-client` | Start the service |
+| `systemctl stop ufo-client` | Stop the service |
+| `systemctl restart ufo-client` | Restart the service |
+| `systemctl status ufo-client` | Check service status |
+| `journalctl -u ufo-client -f` | Follow logs in real-time |
+| `systemctl disable ufo-client` | Disable auto-start |
+
+### Windows (NSSM)
+
+**NSSM** (Non-Sucking Service Manager) wraps any application as a Windows service.
+
+**1. Download NSSM:**
+
+Download from [nssm.cc](https://nssm.cc/download)
+
+**2. Install Service:**
+
+```powershell
+# Install as service
+nssm install UFOClient "C:\Python310\python.exe" `
+ "-m" "ufo.client.client" `
+ "--ws" `
+ "--ws-server" "ws://192.168.1.100:5000/ws" `
+ "--client-id" "device_windows_prod_01" `
+ "--log-level" "INFO"
+
+# Set working directory
+nssm set UFOClient AppDirectory "C:\UFO2"
+
+# Set restart policy
+nssm set UFOClient AppExit Default Restart
+nssm set UFOClient AppRestartDelay 10000
+
+# Set logging
+nssm set UFOClient AppStdout "C:\UFO2\logs\client-stdout.log"
+nssm set UFOClient AppStderr "C:\UFO2\logs\client-stderr.log"
+```
+
+**3. Manage Service:**
+
+```powershell
+# Start service
+nssm start UFOClient
+
+# Check status
+nssm status UFOClient
+
+# Stop service
+nssm stop UFOClient
+
+# Remove service
+nssm remove UFOClient confirm
+```
+
+**Alternative: Windows Task Scheduler**
+
+```powershell
+# Create scheduled task to run on startup
+$action = New-ScheduledTaskAction -Execute "python.exe" `
+ -Argument "-m ufo.client.client --ws --ws-server ws://server:5000/ws --client-id device_win_01"
+$trigger = New-ScheduledTaskTrigger -AtStartup
+$settings = New-ScheduledTaskSettingsSet -RestartCount 3 -RestartInterval (New-TimeSpan -Minutes 1)
+
+Register-ScheduledTask -TaskName "UFOClient" `
+ -Action $action `
+ -Trigger $trigger `
+ -Settings $settings `
+ -User "System" `
+ -RunLevel Highest
+```
+
+### PM2 (Cross-Platform)
+
+**PM2** is a cross-platform process manager with built-in load balancing, monitoring, and auto-restart.
+
+**1. Install PM2:**
+
+```bash
+npm install -g pm2
+```
+
+**2. Create Ecosystem File (`ecosystem.config.js`):**
+
+```javascript
+module.exports = {
+ apps: [{
+ name: "ufo-client",
+ script: "python",
+ args: [
+ "-m", "ufo.client.client",
+ "--ws",
+ "--ws-server", "ws://192.168.1.100:5000/ws",
+ "--client-id", "device_prod_01",
+ "--log-level", "INFO"
+ ],
+ cwd: "/home/user/UFO2",
+ interpreter: "none",
+ autorestart: true,
+ watch: false,
+ max_restarts: 10,
+ min_uptime: "10s",
+ restart_delay: 5000,
+ env: {
+ PYTHONUNBUFFERED: "1"
+ }
+ }]
+};
+```
+
+**3. Start with PM2:**
+
+```bash
+# Start from ecosystem file
+pm2 start ecosystem.config.js
+
+# Or start directly
+pm2 start "python -m ufo.client.client --ws --ws-server ws://192.168.1.100:5000/ws --client-id device_001" \
+ --name ufo-client
+
+# Save PM2 configuration
+pm2 save
+
+# Enable startup script (auto-start on boot)
+pm2 startup
+# Follow the instructions printed by the command
+
+# Monitor
+pm2 monit
+
+# View logs
+pm2 logs ufo-client
+```
+
+**PM2 Management:**
+
+| Command | Purpose |
+|---------|---------|
+| `pm2 list` | List all processes |
+| `pm2 start ufo-client` | Start process |
+| `pm2 stop ufo-client` | Stop process |
+| `pm2 restart ufo-client` | Restart process |
+| `pm2 delete ufo-client` | Remove process |
+| `pm2 logs ufo-client` | View logs |
+| `pm2 monit` | Real-time monitoring dashboard |
+
+---
+
+## 🏭 Production Deployment Best Practices
+
+Follow these best practices for reliable production deployments.
+
+### 1. Descriptive Client IDs
+
+```bash
+# ❌ Bad: Generic, non-unique
+--client-id client_001
+--client-id device1
+
+# ✅ Good: Descriptive, environment, location
+--client-id production_windows_datacenter1_rack3_slot1
+--client-id staging_linux_cloud_aws_us-east-1_worker2
+--client-id dev_windows_office_john_laptop
+```
+
+**ID Structure:**
+
+```
+___
+
+- environment: production, staging, development, test
+- platform: windows, linux
+- location: datacenter1, cloud_aws, office
+- identifier: unique number or name
+```
+
+### 2. Structured Logging
+
+**File Logging:**
+```bash
+# Redirect to log file with rotation
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://server:5000/ws \
+ --client-id device_prod_01 \
+ --log-level INFO \
+ > /var/log/ufo-client.log 2>&1
+```
+
+**Systemd Journal:**
+```bash
+# Already configured in systemd service
+# View logs:
+journalctl -u ufo-client -f --since "1 hour ago"
+```
+
+**Syslog:**
+```bash
+# Configure Python logging to send to syslog
+# Add to config_dev.yaml:
+# logging:
+# handlers:
+# syslog:
+# class: logging.handlers.SysLogHandler
+# address: /dev/log
+```
+
+### 3. Automatic Restart on Failure
+
+**Service Configuration:**
+
+| Platform | Mechanism | Restart Delay | Max Restarts |
+|----------|-----------|---------------|--------------|
+| Linux | systemd | 10 seconds | Unlimited (with rate limiting) |
+| Windows | NSSM | 10 seconds | Unlimited |
+| Cross-platform | PM2 | 5 seconds | 10 attempts, then manual |
+
+### 4. Health Monitoring
+
+**Monitoring Script:**
+
+```bash
+#!/bin/bash
+# check-ufo-client.sh
+
+CLIENT_ID="device_prod_01"
+SERVER_URL="http://192.168.1.100:5000"
+
+# Check if client is connected
+response=$(curl -s "${SERVER_URL}/api/clients" | grep -c "${CLIENT_ID}")
+
+if [ "$response" -eq "0" ]; then
+ echo "ALERT: Client ${CLIENT_ID} is not connected!"
+ # Send alert (email, Slack, PagerDuty, etc.)
+ exit 1
+else
+ echo "OK: Client ${CLIENT_ID} is connected"
+ exit 0
+fi
+```
+
+**Run via cron:**
+```cron
+# Check every 5 minutes
+*/5 * * * * /usr/local/bin/check-ufo-client.sh
+```
+
+### 5. Secure Communication
+
+!!! danger "Production Security"
+ **Never expose clients to the internet without these security measures:**
+
+**Use WSS (WebSocket Secure):**
+
+```bash
+# Production: Encrypted WebSocket
+--ws-server wss://ufo-server.company.com/ws
+
+# Development only: Unencrypted
+--ws-server ws://localhost:5000/ws
+```
+
+**Server-Side TLS Configuration:**
+
+```bash
+# Server with TLS
+python -m ufo.server.app \
+ --port 5000 \
+ --ssl-cert /path/to/cert.pem \
+ --ssl-key /path/to/key.pem
+```
+
+**Network Security:**
+
+| Measure | Implementation |
+|---------|----------------|
+| **Firewall Rules** | Allow only server IP on port 5000 |
+| **VPN/Private Network** | Run server on internal network only |
+| **Authentication** | Implement client authentication (future feature) |
+| **Certificate Validation** | Verify server TLS certificates |
+
+---
+
+## 🔧 Troubleshooting Commands
+
+Use these commands to diagnose connection and execution issues.
+
+### Test Server Connectivity
+
+**HTTP Health Check:**
+```bash
+curl http://localhost:5000/api/health
+
+# Expected response:
+# {"status": "healthy", "uptime_seconds": 3456}
+```
+
+**WebSocket Test:**
+```bash
+# Install wscat
+npm install -g wscat
+
+# Test WebSocket connection
+wscat -c ws://localhost:5000/ws
+
+# You should see connection established
+# Send a test message (will likely be rejected, but connection works)
+```
+
+**Network Connectivity:**
+```bash
+# Test if server is reachable
+ping 192.168.1.100
+
+# Test if port is open
+telnet 192.168.1.100 5000 # Windows/Linux
+nc -zv 192.168.1.100 5000 # Linux/macOS
+```
+
+### Check Connected Clients
+
+```bash
+# List all connected clients
+curl http://localhost:5000/api/clients | python -m json.tool
+
+# Check specific client
+curl http://localhost:5000/api/clients | grep "device_windows_001"
+```
+
+### Monitor Client Logs
+
+**Increase Verbosity:**
+```bash
+# Enable DEBUG logging
+python -m ufo.client.client \
+ --ws \
+ --client-id my_device \
+ --log-level DEBUG
+```
+
+**Filter Logs:**
+```bash
+# Only show errors
+python -m ufo.client.client --ws --client-id my_device 2>&1 | grep ERROR
+
+# Only show connection events
+python -m ufo.client.client --ws --client-id my_device 2>&1 | grep -E "Connect|Register"
+```
+
+### Test Task Dispatch
+
+```bash
+# Dispatch simple test task
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "device_windows_001",
+ "request": "List all files in the current directory"
+ }'
+```
+
+---
+
+## 🚀 Next Steps
+
+!!! tip "Continue Learning"
+ Now that your client is connected and running tasks:
+
+**1. Understand Registration Flow**
+
+Learn how clients register with the server and exchange device profiles:
+
+👉 [UFO Client Overview](./overview.md)
+
+**2. Explore Device Information**
+
+Deep dive into what device information is collected and how it's used for task assignment:
+
+👉 [Device Info Provider](./device_info.md)
+
+**3. Master WebSocket Communication**
+
+Understand the AIP protocol and WebSocket message flow:
+
+👉 [WebSocket Client](./websocket_client.md)
+
+**4. Configure MCP Servers**
+
+Learn how to add custom tools and configure MCP servers:
+
+👉 [MCP Integration](../mcp/overview.md)
+
+**5. Study the AIP Protocol**
+
+Deep dive into message types, flow control, and error handling:
+
+👉 [AIP Protocol](../aip/overview.md)
+
+**6. Production Deployment**
+
+Best practices for running clients in production environments:
+
+👉 [Configuration Guide](../configuration/system/overview.md)
diff --git a/documents/docs/client/ufo_client.md b/documents/docs/client/ufo_client.md
new file mode 100644
index 000000000..ff1c73236
--- /dev/null
+++ b/documents/docs/client/ufo_client.md
@@ -0,0 +1,862 @@
+# 🎯 UFO Client
+
+The **UFO Client** is the execution engine that receives commands from the server, routes them to appropriate tools via the CommandRouter, and aggregates results. It focuses on stateless command execution, delegating all decision-making to the server.
+
+## 📋 Overview
+
+The UFO Client bridges network communication and local tool execution.
+
+**Key Capabilities:**
+
+| Capability | Description | Implementation |
+|------------|-------------|----------------|
+| **Command Execution** | Processes server commands deterministically | `execute_step()`, `execute_actions()` |
+| **Session Management** | Tracks session state and metadata | Session ID, agent/process/root names |
+| **Result Aggregation** | Collects and structures tool execution results | Returns `List[Result]` |
+| **Thread Safety** | Ensures safe concurrent execution | `asyncio.Lock` (`task_lock`) |
+| **State Management** | Maintains agent, process, and root names | Property setters with validation |
+| **Manager Coordination** | Orchestrates ComputerManager and MCPServerManager | `reset()` cascades to all managers |
+
+The UFO Client follows a stateless execution philosophy:
+
+- Executes commands sent by the server
+- Routes commands to the appropriate tools
+- Returns execution results
+- Does **not** decide which commands to run
+- Does **not** interpret user requests
+- Does **not** store long-term state
+
+**Architectural Position:**
+
+```mermaid
+graph LR
+ subgraph Server["Server Side (Orchestration)"]
+ SRV[Agent Server]
+ LLM[LLM Reasoning]
+ end
+
+ subgraph Network["Network Layer"]
+ WSC[WebSocket Client]
+ end
+
+ subgraph Client["Client Side (Execution)"]
+ UFC[UFO Client]
+ CR[Command Router]
+ Tools[MCP Tools]
+ end
+
+ SRV -->|Commands| WSC
+ WSC -->|execute_step| UFC
+ UFC -->|execute| CR
+ CR -->|tool calls| Tools
+ Tools -->|results| CR
+ CR -->|results| UFC
+ UFC -->|results| WSC
+ WSC -->|results| SRV
+
+ LLM -->|planning| SRV
+
+ style SRV fill:#ffe0b2
+ style UFC fill:#bbdefb
+ style Tools fill:#c8e6c9
+```
+
+## 🏗️ Architecture
+
+The UFO Client has a minimal API surface—just initialization, execution, and reset.
+
+### Component Structure
+
+```mermaid
+graph TB
+ subgraph "UFOClient"
+ State[Session State]
+ Execution[Execution Methods]
+ Dependencies[Manager Dependencies]
+ end
+
+ subgraph "Session State"
+ State1[session_id]
+ State2[agent_name]
+ State3[process_name]
+ State4[root_name]
+ State5[task_lock]
+ end
+
+ subgraph "Execution Methods"
+ Exec1[execute_step]
+ Exec2[execute_actions]
+ Exec3[reset]
+ end
+
+ subgraph "Dependencies"
+ Dep1[CommandRouter]
+ Dep2[ComputerManager]
+ Dep3[MCPServerManager]
+ end
+
+ State --> State1
+ State --> State2
+ State --> State3
+ State --> State4
+ State --> State5
+
+ Execution --> Exec1
+ Execution --> Exec2
+ Execution --> Exec3
+
+ Dependencies --> Dep1
+ Dependencies --> Dep2
+ Dependencies --> Dep3
+
+ Exec1 --> Exec2
+ Exec2 --> Dep1
+ Exec3 --> Dep2
+ Exec3 --> Dep3
+
+ style State fill:#e3f2fd
+ style Execution fill:#f1f8e9
+ style Dependencies fill:#fff3e0
+```
+
+**Class Attributes:**
+
+| Attribute | Type | Purpose |
+|-----------|------|---------|
+| `mcp_server_manager` | `MCPServerManager` | Manages MCP server lifecycle |
+| `computer_manager` | `ComputerManager` | Manages computer instances (tool namespaces) |
+| `command_router` | `CommandRouter` | Routes commands to appropriate computers |
+| `task_lock` | `asyncio.Lock` | Ensures thread-safe execution |
+| `client_id` | `str` | Unique identifier for this client (default: `"client_001"`) |
+| `platform` | `str` | Platform type (`"windows"` or `"linux"`) - auto-detected if not provided |
+| `session_id` | `Optional[str]` | Current session identifier |
+| `agent_name` | `Optional[str]` | Active agent (e.g., `"HostAgent"`, `"AppAgent"`) |
+| `process_name` | `Optional[str]` | Process context (e.g., `"notepad.exe"`) |
+| `root_name` | `Optional[str]` | Root operation name |
+
+## 🚀 Initialization
+
+Creating a UFO Client requires two manager instances: MCPServerManager and ComputerManager.
+
+```python
+from ufo.client.ufo_client import UFOClient
+from ufo.client.computer import ComputerManager
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+# 1. Initialize MCP Server Manager
+mcp_server_manager = MCPServerManager()
+mcp_server_manager.create_servers_from_config() # Load from config_dev.yaml
+
+# 2. Initialize Computer Manager
+computer_manager = ComputerManager(
+ ufo_config.to_dict(),
+ mcp_server_manager
+)
+
+# 3. Create UFO Client
+client = UFOClient(
+ mcp_server_manager=mcp_server_manager,
+ computer_manager=computer_manager,
+ client_id="device_windows_001",
+ platform="windows"
+)
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `mcp_server_manager` | `MCPServerManager` | ✅ Yes | - | MCP server lifecycle manager |
+| `computer_manager` | `ComputerManager` | ✅ Yes | - | Computer instance manager |
+| `client_id` | `str` | No | `"client_001"` | Unique client identifier |
+| `platform` | `str` | No | Auto-detected | Platform type: `"windows"` or `"linux"` |
+
+**Initialization Side Effects:**
+
+1. Creates `CommandRouter` instance (delegates to ComputerManager)
+2. Initializes `task_lock` (`asyncio.Lock()`)
+3. Sets session state to `None` (session_id, agent_name, process_name, root_name)
+
+## 📊 Session State Management
+
+The UFO Client maintains contextual metadata for the current execution session.
+
+### Session ID
+
+**Purpose:** Unique identifier for the current task session
+
+```python
+# Set session ID (typically set by server)
+client.session_id = "session_20251104_143022_abc123"
+
+# Get session ID
+current_session = client.session_id # "session_20251104_143022_abc123"
+
+# Clear session ID
+client.reset() # Sets session_id to None
+```
+
+**Validation:**
+
+```python
+# ✅ Valid
+client.session_id = "session_123"
+client.session_id = None
+
+# ❌ Invalid - raises ValueError
+client.session_id = 12345 # Not a string
+```
+
+### Agent Name
+
+**Purpose:** Identifies the active agent (HostAgent, AppAgent, etc.)
+
+```python
+# Set agent name (from server message)
+client.agent_name = "HostAgent"
+
+# Get agent name
+agent = client.agent_name # "HostAgent"
+```
+
+**Common Agent Names:**
+
+| Agent Name | Purpose |
+|------------|---------|
+| `HostAgent` | OS-level operations (start apps, manage files) |
+| `AppAgent` | Application-specific operations (UI automation) |
+| `FollowerAgent` | Follow predefined workflows |
+
+### Process Name
+
+**Purpose:** Identifies the process context
+
+```python
+# Set process name (from server message)
+client.process_name = "notepad.exe"
+
+# Get process name
+process = client.process_name # "notepad.exe"
+```
+
+**Usage:** Helps route commands to the correct application context
+
+### Root Name
+
+**Purpose:** Identifies the root operation name
+
+```python
+# Set root name (from server message)
+client.root_name = "open_application"
+
+# Get root name
+root = client.root_name # "open_application"
+```
+
+**Property Validation:**
+
+All properties validate their inputs:
+
+```python
+try:
+ client.agent_name = 123 # Not a string
+except ValueError as e:
+ print(e) # "Agent name must be a string or None."
+```
+
+**Validation Table:**
+
+| Property | Valid Types | Raises on Invalid |
+|----------|-------------|-------------------|
+| `session_id` | `str`, `None` | `ValueError` |
+| `agent_name` | `str`, `None` | `ValueError` |
+| `process_name` | `str`, `None` | `ValueError` |
+| `root_name` | `str`, `None` | `ValueError` |
+
+## ⚙️ Command Execution
+
+### Execute Step (Main Entry Point)
+
+`execute_step()` processes one complete server message, extracting metadata and executing all commands.
+
+**Signature:**
+
+```python
+async def execute_step(self, response: ServerMessage) -> List[Result]:
+ """
+ Perform a single step execution.
+ :param response: The ServerMessage instance to process.
+ :return: A list of Result instances.
+ """
+```
+
+**Execution Flow:**
+
+```mermaid
+sequenceDiagram
+ participant WSC as WebSocket Client
+ participant UFC as UFO Client
+ participant CR as Command Router
+ participant Tools
+
+ WSC->>UFC: execute_step(ServerMessage)
+
+ Note over UFC: 1. Extract Metadata
+ UFC->>UFC: self.agent_name = response.agent_name
+ UFC->>UFC: self.process_name = response.process_name
+ UFC->>UFC: self.root_name = response.root_name
+
+ Note over UFC: 2. Execute Actions
+ UFC->>UFC: execute_actions(response.actions)
+
+ UFC->>CR: command_router.execute( agent_name, process_name, root_name, commands)
+
+ CR->>Tools: Route commands to tools
+ Tools-->>CR: Results
+ CR-->>UFC: List[Result]
+
+ UFC-->>WSC: List[Result]
+```
+
+**Implementation:**
+
+```python
+async def execute_step(self, response: ServerMessage) -> List[Result]:
+ """Perform a single step execution."""
+
+ # Extract metadata from server response
+ self.agent_name = response.agent_name
+ self.process_name = response.process_name
+ self.root_name = response.root_name
+
+ # Execute actions
+ action_results = await self.execute_actions(response.actions)
+
+ return action_results
+```
+
+**Example Usage:**
+
+```python
+from aip.messages import ServerMessage
+
+# Receive server message
+server_response = ServerMessage.model_validate_json(msg)
+
+# Execute step
+action_results = await client.execute_step(server_response)
+
+# action_results is List[Result]
+for result in action_results:
+ print(f"Action: {result.action}, Status: {result.status}")
+```
+
+### Execute Actions
+
+`execute_actions()` executes a list of commands via the CommandRouter.
+
+**Signature:**
+
+```python
+async def execute_actions(self, commands: Optional[List[Command]]) -> List[Result]:
+ """
+ Execute the actions provided by the server.
+ :param commands: List of actions to execute.
+ :returns: Results of the executed actions.
+ """
+```
+
+**Implementation:**
+
+```python
+async def execute_actions(self, commands: Optional[List[Command]]) -> List[Result]:
+ """Execute the actions provided by the server."""
+
+ action_results = []
+
+ if commands:
+ self.logger.info(f"Executing {len(commands)} actions in total")
+
+ # Delegate to CommandRouter
+ action_results = await self.command_router.execute(
+ agent_name=self.agent_name,
+ process_name=self.process_name,
+ root_name=self.root_name,
+ commands=commands
+ )
+
+ return action_results
+```
+
+**Example:**
+
+```python
+from aip.messages import Command
+
+commands = [
+ Command(
+ action="click",
+ parameters={"control_label": "Start", "x": 10, "y": 10}
+ ),
+ Command(
+ action="type_text",
+ parameters={"text": "notepad"}
+ ),
+ Command(
+ action="press_key",
+ parameters={"key": "enter"}
+ )
+]
+
+# Execute all commands
+results = await client.execute_actions(commands)
+
+# results contains Result object for each command
+```
+
+**Command Execution Table:**
+
+| Step | Action | Component |
+|------|--------|-----------|
+| 1 | Receive commands | UFO Client |
+| 2 | Log command count | UFO Client |
+| 3 | Call CommandRouter | UFO Client |
+| 4 | Route to Computer | CommandRouter |
+| 5 | Execute via MCP | Computer |
+| 6 | Collect results | CommandRouter |
+| 7 | Return results | UFO Client |
+
+See [Computer Manager](./computer_manager.md) for command routing details.
+
+## 🔄 State Reset
+
+!!!warning "Critical for Multi-Task Execution"
+ Always reset state between tasks to prevent data leakage between sessions.
+
+**Signature:**
+
+```python
+def reset(self):
+ """Reset session state and dependent managers."""
+```
+
+**Implementation:**
+
+```python
+def reset(self):
+ """Reset session state and dependent managers."""
+
+ # Clear session state
+ self._session_id = None
+ self._agent_name = None
+ self._process_name = None
+ self._root_name = None
+
+ # Reset managers
+ self.computer_manager.reset()
+ self.mcp_server_manager.reset()
+
+ self.logger.info("Client state has been reset.")
+```
+
+**Reset Cascade:**
+
+```mermaid
+graph TD
+ Reset[client.reset]
+
+ Reset --> S1[session_id = None]
+ Reset --> S2[agent_name = None]
+ Reset --> S3[process_name = None]
+ Reset --> S4[root_name = None]
+
+ Reset --> M1[computer_manager.reset]
+ Reset --> M2[mcp_server_manager.reset]
+
+ M1 --> C1[Clear computer instances]
+ M2 --> M3[Reset MCP servers]
+
+ style Reset fill:#ffcdd2
+ style M1 fill:#fff9c4
+ style M2 fill:#fff9c4
+```
+
+**When to Reset:**
+
+| Scenario | Why Reset |
+|----------|-----------|
+| **Before starting new task** | Clear previous task state |
+| **On task completion** | Prepare for next task |
+| **On task failure** | Clean up failed state |
+| **On server disconnection** | Reset to known good state |
+
+**Note:** The WebSocket client automatically calls `reset()` before starting new tasks:
+
+```python
+async with self.ufo_client.task_lock:
+ self.ufo_client.reset() # Automatic
+ await self.task_protocol.send_task_request(...)
+```
+
+## 🔒 Thread Safety
+
+The UFO Client uses `asyncio.Lock` to prevent concurrent state modifications.
+
+**Lock Implementation:**
+
+```python
+# In UFOClient.__init__
+self.task_lock = asyncio.Lock()
+```
+
+**Usage in WebSocket Client:**
+
+```python
+# In WebSocket client
+async with client.task_lock:
+ client.reset()
+ await client.execute_step(server_response)
+```
+
+**Protected Operations:**
+
+| Operation | Protected By | Reason |
+|-----------|--------------|--------|
+| Session state modifications | `task_lock` | Prevent race conditions |
+| Command execution | `task_lock` | Ensure one task at a time |
+| State reset | `task_lock` | Atomic reset operation |
+
+!!!warning "Single Task Execution"
+ The lock ensures only **one task executes at a time**. Attempting concurrent execution will block until the lock is released.
+
+## 📋 Complete Execution Pipeline
+
+```mermaid
+sequenceDiagram
+ participant Server
+ participant WSC as WebSocket Client
+ participant UFC as UFO Client
+ participant CR as Command Router
+ participant CM as Computer Manager
+ participant Comp as Computer
+ participant Tool as MCP Tool
+
+ Note over Server,Tool: Full Execution Pipeline
+
+ Server->>WSC: COMMAND message
+ WSC->>UFC: execute_step(ServerMessage)
+
+ Note over UFC: Extract Metadata
+ UFC->>UFC: agent_name = "HostAgent"
+ UFC->>UFC: process_name = "explorer.exe"
+ UFC->>UFC: root_name = "navigate"
+
+ Note over UFC: Execute Actions
+ UFC->>CR: execute(agent, process, root, commands)
+
+ CR->>CM: Route commands
+ CM->>Comp: Get computer instance
+ Comp->>Tool: Execute tool
+
+ Tool-->>Comp: Result
+ Comp-->>CM: Result
+ CM-->>CR: List[Result]
+ CR-->>UFC: List[Result]
+
+ UFC-->>WSC: List[Result]
+ WSC->>Server: COMMAND_RESULTS (via AIP)
+```
+
+## ⚠️ Error Handling
+
+### Command Execution Errors
+
+Individual command failures are captured in `Result` objects, not thrown as exceptions.
+
+**Error Result Structure:**
+
+```python
+from aip.messages import Result, ResultStatus
+
+error_result = Result(
+ action="click",
+ status=ResultStatus.ERROR,
+ error_message="Control not found",
+ observation="Failed to locate control with label 'Start'"
+)
+```
+
+**Handling Execution Errors:**
+
+```python
+try:
+ results = await client.execute_actions(commands)
+
+ # Check each result
+ for result in results:
+ if result.status == ResultStatus.ERROR:
+ logger.error(f"Action {result.action} failed: {result.error_message}")
+ else:
+ logger.info(f"Action {result.action} succeeded")
+
+except Exception as e:
+ # Unexpected error (not tool failure)
+ logger.error(f"Command execution failed: {e}", exc_info=True)
+```
+
+### Property Validation Errors
+
+```python
+try:
+ client.session_id = 12345 # Invalid type
+except ValueError as e:
+ logger.error(f"Invalid session ID: {e}")
+ # ValueError: Session ID must be a string or None.
+```
+
+**Error Handling Table:**
+
+| Error Type | Raised By | Handling |
+|------------|-----------|----------|
+| Tool execution error | MCP tools | Captured in `Result.error_message` |
+| Property validation error | Property setters | `ValueError` exception |
+| Unexpected errors | Any component | Logged, may propagate |
+
+## 📝 Logging
+
+The UFO Client logs all major events for debugging and monitoring.
+
+**Log Examples:**
+
+**Initialization:**
+
+```log
+INFO - UFO Client initialized for platform: windows
+```
+
+**Session State Changes:**
+
+```log
+INFO - Session ID set to: session_20251104_143022_abc123
+INFO - Agent name set to: HostAgent
+INFO - Process name set to: notepad.exe
+INFO - Root name set to: open_application
+```
+
+**Execution:**
+
+```log
+INFO - Executing 5 actions in total
+```
+
+**Reset:**
+
+```log
+INFO - Client state has been reset.
+```
+
+**Log Level Recommendations:**
+
+| Environment | Level | Rationale |
+|-------------|-------|-----------|
+| Development | `DEBUG` | See all operations |
+| Staging | `INFO` | Track execution flow |
+| Production | `INFO` | Monitor without spam |
+| Troubleshooting | `DEBUG` | Diagnose issues |
+
+## 💡 Usage Example
+
+### Complete Workflow
+
+This example shows how to use the UFO Client in a typical workflow.
+
+```python
+import asyncio
+from ufo.client.ufo_client import UFOClient
+from aip.messages import ServerMessage, Command, ServerMessageType, TaskStatus
+
+async def main():
+ # 1. Initialize client
+ client = UFOClient(
+ mcp_server_manager=mcp_manager,
+ computer_manager=computer_manager,
+ client_id="device_windows_001",
+ platform="windows"
+ )
+
+ # 2. Simulate server message
+ server_msg = ServerMessage(
+ type=ServerMessageType.COMMAND,
+ session_id="session_123",
+ response_id="resp_456",
+ agent_name="HostAgent",
+ process_name="explorer.exe",
+ root_name="navigate_folder",
+ actions=[
+ Command(action="click", parameters={"label": "File"}),
+ Command(action="click", parameters={"label": "New Folder"})
+ ],
+ status=TaskStatus.PROCESSING
+ )
+
+ # 3. Execute step
+ async with client.task_lock: # Thread-safe execution
+ results = await client.execute_step(server_msg)
+
+ # 4. Process results
+ for result in results:
+ print(f"Action: {result.action}")
+ print(f"Status: {result.status}")
+ print(f"Observation: {result.observation}")
+ if result.status == ResultStatus.ERROR:
+ print(f"Error: {result.error_message}")
+
+ # 5. Reset for next task
+ client.reset()
+
+asyncio.run(main())
+```
+
+## ✅ Best Practices
+
+### Development Best Practices
+
+**1. Always Reset Between Tasks**
+
+```python
+async with client.task_lock:
+ client.reset() # Clear previous state
+ await client.execute_step(new_server_response)
+```
+
+**2. Use Property Setters (Not Direct Assignment)**
+
+```python
+# ✅ Good - validates input
+client.session_id = "session_123"
+
+# ❌ Bad - bypasses validation
+client._session_id = "session_123"
+```
+
+**3. Log Execution Progress**
+
+```python
+self.logger.info(f"Executing {len(commands)} actions for {self.agent_name}")
+```
+
+**4. Handle Errors Gracefully**
+
+```python
+try:
+ results = await client.execute_actions(commands)
+except Exception as e:
+ self.logger.error(f"Execution failed: {e}", exc_info=True)
+ # Error is also captured in results
+```
+
+### Production Best Practices
+
+**1. Use Thread Locks Consistently**
+
+```python
+# Always use task_lock for state operations
+async with client.task_lock:
+ client.reset()
+ results = await client.execute_step(msg)
+```
+
+**2. Monitor Execution Times**
+
+```python
+import time
+
+start = time.time()
+results = await client.execute_actions(commands)
+duration = time.time() - start
+
+if duration > 60: # Alert if > 1 minute
+ logger.warning(f"Slow execution: {duration}s for {len(commands)} commands")
+```
+
+**3. Validate Results**
+
+```python
+# Check for failures
+failed_actions = [r for r in results if r.status == ResultStatus.ERROR]
+if failed_actions:
+ logger.error(f"{len(failed_actions)} actions failed")
+ # Report to monitoring system
+```
+
+## 🔗 Integration Points
+
+### WebSocket Client Integration
+
+The WebSocket client uses UFO Client for all command execution.
+
+**Integration:**
+
+```python
+# In WebSocket client
+action_results = await self.ufo_client.execute_step(server_response)
+```
+
+See [WebSocket Client](./websocket_client.md) for communication details.
+
+### Command Router Integration
+
+The UFO Client delegates all execution to the CommandRouter.
+
+**Integration:**
+
+```python
+action_results = await self.command_router.execute(
+ agent_name=self.agent_name,
+ process_name=self.process_name,
+ root_name=self.root_name,
+ commands=commands
+)
+```
+
+See [Computer Manager](./computer_manager.md) for routing details.
+
+### Computer Manager Integration
+
+The Computer Manager maintains computer instances for tool execution.
+
+**Integration:**
+
+```python
+# Reset cascades to computer manager
+self.computer_manager.reset()
+```
+
+See [Computer Manager](./computer_manager.md) for management details.
+
+### MCP Server Manager Integration
+
+The MCP Server Manager handles MCP server creation and cleanup.
+
+**Integration:**
+
+```python
+# Reset cascades to MCP server manager
+self.mcp_server_manager.reset()
+```
+
+See [MCP Integration](./mcp_integration.md) for MCP details.
+
+## 🚀 Next Steps
+
+**Continue Learning**
+
+1. **Understand Network Communication** - Learn how the WebSocket client uses UFO Client: [WebSocket Client](./websocket_client.md)
+
+2. **Explore Command Routing** - See how commands are routed to the right tools: [Computer Manager](./computer_manager.md)
+
+3. **Study Device Profiling** - Understand device information collection: [Device Info Provider](./device_info.md)
+
+4. **Learn About MCP Integration** - Deep dive into MCP server management: [MCP Integration](./mcp_integration.md)
+
+5. **Master AIP Messages** - Understand message structures: [AIP Messages](../aip/messages.md)
\ No newline at end of file
diff --git a/documents/docs/client/websocket_client.md b/documents/docs/client/websocket_client.md
new file mode 100644
index 000000000..24abea8eb
--- /dev/null
+++ b/documents/docs/client/websocket_client.md
@@ -0,0 +1,1105 @@
+# 🔌 WebSocket Client
+
+The **WebSocket Client** implements the **AIP (Agent Interaction Protocol)** for reliable, bidirectional communication between device clients and the Agent Server. It provides the low-level communication infrastructure for UFO device clients.
+
+## 📋 Overview
+
+The WebSocket client handles all network communication aspects, allowing the UFO Client to focus on task execution.
+
+**Key Responsibilities:**
+
+| Capability | Description | Implementation |
+|------------|-------------|----------------|
+| **Connection Management** | Persistent WebSocket connection with automatic retry | Exponential backoff, configurable max retries |
+| **AIP Protocol Implementation** | Structured message handling via Registration, Heartbeat, Task Execution | Three protocol handlers |
+| **Device Registration** | Automatic registration with device profile on connect | Push model (proactive info collection) |
+| **Heartbeat Monitoring** | Regular keepalive messages for connection health | Configurable interval (default: 30s) |
+| **Message Routing** | Dispatch incoming messages to appropriate handlers | Type-based routing |
+| **Error Handling** | Graceful error recovery and reporting | Retry logic, error propagation via AIP |
+
+**Message Flow Overview:**
+
+```mermaid
+graph LR
+ subgraph "Client Side"
+ WSC[WebSocket Client]
+ AIP[AIP Protocols]
+ UFC[UFO Client]
+ end
+
+ subgraph "Network"
+ WS[WebSocket Connection]
+ end
+
+ subgraph "Server Side"
+ Server[Agent Server]
+ end
+
+ WSC <-->|AIP Messages| AIP
+ AIP <-->|WebSocket| WS
+ WS <-->|TCP/IP| Server
+ WSC -->|Delegate Execution| UFC
+
+ style WSC fill:#bbdefb
+ style AIP fill:#c8e6c9
+ style Server fill:#ffe0b2
+```
+
+## 🏗️ Architecture
+
+The WebSocket client is organized into distinct layers for connection management, protocol handling, and message routing.
+
+### Component Structure
+
+```mermaid
+graph TB
+ subgraph "UFOWebSocketClient"
+ CM[Connection Management Layer]
+ PH[Protocol Handler Layer]
+ MR[Message Routing Layer]
+ end
+
+ subgraph "Connection Management"
+ CM1[connect_and_listen]
+ CM2[Retry Logic]
+ CM3[State Tracking]
+ end
+
+ subgraph "AIP Protocols"
+ PH1[RegistrationProtocol]
+ PH2[HeartbeatProtocol]
+ PH3[TaskExecutionProtocol]
+ end
+
+ subgraph "Message Handlers"
+ MR1[recv_loop]
+ MR2[handle_message]
+ MR3[handle_commands]
+ MR4[handle_task_end]
+ end
+
+ CM --> CM1
+ CM --> CM2
+ CM --> CM3
+
+ PH --> PH1
+ PH --> PH2
+ PH --> PH3
+
+ MR --> MR1
+ MR --> MR2
+ MR --> MR3
+ MR --> MR4
+
+ CM1 --> PH
+ PH --> MR
+
+ style CM fill:#e3f2fd
+ style PH fill:#f1f8e9
+ style MR fill:#fff3e0
+```
+
+### Class Structure
+
+| Component | Type | Purpose |
+|-----------|------|---------|
+| **UFOWebSocketClient** | Main Class | Orchestrates all WebSocket communication |
+| **WebSocketTransport** | AIP Component | Low-level WebSocket send/receive |
+| **RegistrationProtocol** | AIP Protocol | Client registration messages |
+| **HeartbeatProtocol** | AIP Protocol | Connection keepalive messages |
+| **TaskExecutionProtocol** | AIP Protocol | Task request/result messages |
+
+---
+
+## 🔄 Connection Lifecycle
+
+### Initialization & Connection Flow
+
+```mermaid
+sequenceDiagram
+ participant Main as Client Main
+ participant WSC as WebSocket Client
+ participant WS as WebSocket
+ participant Server
+
+ Note over Main: 1. Initialization
+ Main->>WSC: Create UFOWebSocketClient(ws_url, ufo_client)
+ WSC->>WSC: Initialize attributes (max_retries=3, timeout=120)
+
+ Note over WSC,Server: 2. Connection Attempt
+ WSC->>WS: websockets.connect(ws_url)
+ WS->>Server: TCP Handshake
+ Server-->>WS: WebSocket Upgrade
+ WS-->>WSC: Connection Established
+
+ Note over WSC,Server: 3. AIP Protocol Initialization
+ WSC->>WSC: Create WebSocketTransport(ws)
+ WSC->>WSC: Create RegistrationProtocol(transport)
+ WSC->>WSC: Create HeartbeatProtocol(transport)
+ WSC->>WSC: Create TaskExecutionProtocol(transport)
+
+ Note over WSC,Server: 4. Device Registration
+ WSC->>WSC: Collect Device Info
+ WSC->>Server: REGISTRATION (via AIP)
+ Server-->>WSC: REGISTRATION_ACK
+ WSC->>WSC: Set connected_event
+
+ Note over WSC,Server: 5. Message Handling
+ par Receive Loop
+ loop Continuous
+ Server->>WSC: Server Messages
+ WSC->>WSC: Route to Handlers
+ end
+ and Heartbeat Loop
+ loop Every 30s
+ WSC->>Server: HEARTBEAT
+ Server-->>WSC: HEARTBEAT_ACK
+ end
+ end
+```
+
+### Initialization Code
+
+Creating a WebSocket client:
+
+```python
+from ufo.client.websocket import UFOWebSocketClient
+from ufo.client.ufo_client import UFOClient
+
+# Create UFO client (execution engine)
+ufo_client = UFOClient(
+ mcp_server_manager=mcp_manager,
+computer_manager=computer_manager,
+ client_id="device_windows_001",
+ platform="windows"
+)
+
+# Create WebSocket client (communication layer)
+ws_client = UFOWebSocketClient(
+ ws_url="ws://localhost:5000/ws",
+ ufo_client=ufo_client,
+ max_retries=3, # Default: 3 attempts
+ timeout=120 # Heartbeat interval in seconds (default: 120)
+)
+
+# Connect and start listening (blocking call)
+await ws_client.connect_and_listen()
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `ws_url` | `str` | Required | WebSocket server URL (e.g., `ws://localhost:5000/ws`) |
+| `ufo_client` | `UFOClient` | Required | UFO client instance for command execution |
+| `max_retries` | `int` | `3` | Maximum connection retry attempts |
+| `timeout` | `float` | `120` | Heartbeat interval in seconds (passed to `heartbeat_loop()`) |
+
+**Note:** The `timeout` parameter is passed to `heartbeat_loop(interval)` to control heartbeat frequency. While `heartbeat_loop()` has a default of 30s in its signature, the client constructor uses 120s which is passed when calling the method.
+
+### Connection Establishment Details
+
+The client uses specific WebSocket parameters optimized for long-running task execution:
+
+**WebSocket Connection Parameters:**
+
+```python
+async with websockets.connect(
+ self.ws_url,
+ ping_interval=20, # Send WebSocket ping every 20 seconds
+ ping_timeout=180, # Wait up to 3 minutes for pong response
+ close_timeout=10, # 10 second close handshake timeout
+ max_size=100 * 1024 * 1024 # 100MB max message size
+) as ws:
+ # Connection established
+```
+
+**Parameter Rationale:**
+
+| Parameter | Value | Reason |
+|-----------|-------|--------|
+| `ping_interval` | **20 seconds** | Frequent keepalive to detect connection loss quickly |
+| `ping_timeout` | **180 seconds** | Tolerates long-running operations (e.g., complex tasks) |
+| `close_timeout` | **10 seconds** | Quick cleanup on intentional disconnect |
+| `max_size` | **100 MB** | Supports large screenshots, logs, file transfers |
+
+**Note:** The 180-second `ping_timeout` ensures the connection stays alive during lengthy tool executions (up to 100 minutes per tool).
+
+## 📝 Registration Flow
+
+### Device Information Collection
+
+UFO uses a **push model** for device information: clients proactively send their profile during registration, rather than waiting for the server to request it. This reduces latency for constellation (multi-client) scenarios.
+
+**Device Info Collection:**
+
+```python
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+# Collect comprehensive system information
+system_info = DeviceInfoProvider.collect_system_info(
+ client_id=self.ufo_client.client_id,
+ custom_metadata=None # Server adds custom metadata if configured
+)
+
+# System info includes:
+# - platform (windows/linux/darwin)
+# - os_version
+# - cpu_count
+# - memory_total_gb
+# - hostname
+# - ip_address
+# - supported_features
+# - platform_type
+```
+
+**Metadata Structure:**
+
+```python
+metadata = {
+ "system_info": {
+ "platform": "windows",
+ "os_version": "Windows-10-10.0.19045",
+ "cpu_count": 8,
+ "memory_total_gb": 16.0,
+ "hostname": "DESKTOP-ABC123",
+ "ip_address": "192.168.1.100",
+ # ... additional fields
+ },
+ "registration_time": "2025-11-05T14:30:00.123Z"
+}
+```
+
+See [Device Info Provider](./device_info.md) for complete field descriptions.
+
+### Registration Message Exchange
+
+```mermaid
+sequenceDiagram
+ participant Client
+ participant AIP as AIP Registration Protocol
+ participant Server
+
+ Note over Client: Collect Device Info
+ Client->>Client: DeviceInfoProvider.collect_system_info()
+
+ Note over Client,Server: Registration Request
+ Client->>AIP: register_as_device( device_id, metadata, platform)
+ AIP->>Server: REGISTRATION {device_id, metadata, platform}
+
+ Note over Server: Validate & Store
+ Server->>Server: Check for duplicate ID
+ Server->>Server: Store device info
+ Server->>Server: Add to client registry
+
+ Note over Client,Server: Registration Response
+ Server-->>AIP: REGISTRATION_ACK {success: true}
+ AIP-->>Client: success = True
+
+ Client->>Client: Set connected_event
+ Client->>Client: Log success
+```
+
+**Registration Code:**
+
+```python
+async def register_client(self):
+ """Send client_id and device system information to server."""
+
+ # Collect device info
+ try:
+ system_info = DeviceInfoProvider.collect_system_info(
+ self.ufo_client.client_id,
+ custom_metadata=None
+ )
+ metadata = {
+ "system_info": system_info.to_dict(),
+ "registration_time": datetime.datetime.now(
+ datetime.timezone.utc
+ ).isoformat(),
+ }
+ self.logger.info(
+ f"[WS] \[AIP] Collected device info: platform={system_info.platform}, "
+ f"cpu={system_info.cpu_count}, memory={system_info.memory_total_gb}GB"
+ )
+ except Exception as e:
+ self.logger.error(f"[WS] \[AIP] Error collecting device info: {e}")
+ # Continue with minimal metadata
+ metadata = {
+ "registration_time": datetime.datetime.now(
+ datetime.timezone.utc
+ ).isoformat(),
+ }
+
+ # Use AIP RegistrationProtocol
+ success = await self.registration_protocol.register_as_device(
+ device_id=self.ufo_client.client_id,
+ metadata=metadata,
+ platform=self.ufo_client.platform
+ )
+
+ if success:
+ self.connected_event.set() # Signal successful registration
+ self.logger.info(f"[WS] \[AIP] ✅ Successfully registered as {self.ufo_client.client_id}")
+ else:
+ self.logger.error(f"[WS] \[AIP] ❌ Failed to register as {self.ufo_client.client_id}")
+ raise RuntimeError(f"Registration failed for {self.ufo_client.client_id}")
+```
+
+### Registration Outcomes
+
+**Success Scenario:**
+
+```log
+INFO - [WS] \[AIP] Collected device info: platform=windows, cpu=8, memory=16.0GB
+INFO - [WS] \[AIP] Attempting to register as device_windows_001
+INFO - [WS] \[AIP] ✅ Successfully registered as device_windows_001
+```
+
+- `connected_event` is set (allows task requests)
+- Client enters message handling loops
+
+**Failure Scenario:**
+
+```log
+ERROR - [WS] \[AIP] ❌ Failed to register as device_windows_001
+RuntimeError: Registration failed for device_windows_001
+```
+
+- Connection is closed
+- Retry logic engages (exponential backoff)
+
+**Common Failure Causes:**
+
+| Cause | Server Behavior | Client Action |
+|-------|----------------|---------------|
+| Duplicate client ID | Reject registration | Change client ID, retry |
+| Server capacity limit | Reject registration | Wait and retry later |
+| Network interruption | Timeout | Automatic retry with backoff |
+| Invalid platform | Reject registration | Fix platform parameter |
+
+---
+
+## 💓 Heartbeat Mechanism
+
+Heartbeats prove the client is still alive and responsive, allowing the server to detect disconnected clients quickly.
+
+### Heartbeat Loop Implementation
+
+**Default Configuration:**
+
+| Parameter | Value | Configurable |
+|-----------|-------|--------------|
+| **Interval** | 30 seconds | ✅ Yes (function parameter) |
+| **Protocol** | AIP HeartbeatProtocol | No |
+| **Error Handling** | Break loop on failure | No |
+
+**Heartbeat Code:**
+
+```python
+async def heartbeat_loop(self, interval: float = 30) -> None:
+ """
+ Send periodic heartbeat messages using AIP HeartbeatProtocol.
+ :param interval: Interval between heartbeats in seconds (default: 30)
+ """
+ while True:
+ await asyncio.sleep(interval)
+ try:
+ await self.heartbeat_protocol.send_heartbeat(
+ self.ufo_client.client_id
+ )
+ self.logger.debug("[WS] \[AIP] Heartbeat sent")
+ except (ConnectionError, IOError) as e:
+ self.logger.debug(
+ f"[WS] \[AIP] Heartbeat failed (connection closed): {e}"
+ )
+ break # Exit loop if connection is closed
+```
+
+**Customizing Heartbeat Interval:**
+
+Adjust the interval when calling the heartbeat loop:
+
+```python
+# In handle_messages():
+await asyncio.gather(
+ self.recv_loop(),
+ self.heartbeat_loop(interval=60) # Custom 60-second interval
+)
+```
+
+### Heartbeat Message Structure
+
+**Client → Server (Heartbeat):**
+
+```json
+{
+ "type": "HEARTBEAT",
+ "client_id": "device_windows_001",
+ "timestamp": "2025-11-05T14:30:22.123Z"
+}
+```
+
+**Server → Client (Heartbeat Ack - Optional):**
+
+```json
+{
+ "type": "HEARTBEAT",
+ "timestamp": "2025-11-05T14:30:22.456Z"
+}
+```
+
+### Heartbeat State Diagram
+
+```mermaid
+stateDiagram-v2
+ [*] --> Sleeping
+ Sleeping --> SendingHeartbeat: After interval (30s)
+ SendingHeartbeat --> Success: Sent successfully
+ SendingHeartbeat --> Failed: Connection error
+
+ Success --> Sleeping: Continue loop
+ Failed --> [*]: Exit loop
+
+ note right of Sleeping
+ Wait for interval duration
+ (default: 30 seconds)
+ end note
+
+ note right of Failed
+ Connection closed
+ recv_loop will also exit
+ Outer retry logic activates
+ end note
+```
+
+---
+
+## 📨 Message Handling
+
+### Message Router
+
+All incoming messages are validated against the AIP schema and routed based on their `type` field.
+
+**Message Dispatcher Code:**
+
+```python
+async def handle_message(self, msg: str):
+ """Dispatch messages based on their type."""
+ try:
+ # Parse and validate message
+ data = ServerMessage.model_validate_json(msg)
+ msg_type = data.type
+
+ self.logger.info(f"[WS] Received message: {data}")
+
+ # Route by type
+ if msg_type == ServerMessageType.TASK:
+ await self.start_task(data.user_request, data.task_name)
+ elif msg_type == ServerMessageType.HEARTBEAT:
+ self.logger.info("[WS] Heartbeat received")
+ elif msg_type == ServerMessageType.TASK_END:
+ await self.handle_task_end(data)
+ elif msg_type == ServerMessageType.ERROR:
+ self.logger.error(f"[WS] Server error: {data.error}")
+ elif msg_type == ServerMessageType.COMMAND:
+ await self.handle_commands(data)
+ else:
+ self.logger.warning(f"[WS] Unknown message type: {msg_type}")
+
+ except Exception as e:
+ self.logger.error(f"[WS] Error handling message: {e}", exc_info=True)
+```
+
+**Message Type Routing:**
+
+| Server Message Type | Handler Method | Purpose |
+|---------------------|----------------|---------|
+| `TASK` | `start_task()` | Begin new task execution |
+| `COMMAND` | `handle_commands()` | Execute specific commands |
+| `TASK_END` | `handle_task_end()` | Process task completion |
+| `HEARTBEAT` | Log only | Acknowledge keepalive |
+| `ERROR` | Log error | Handle server-side errors |
+| Unknown | Log warning | Ignore unrecognized types |
+
+### Task Start Handler
+
+!!!warning "Single Task Execution"
+ The client executes **only one task at a time**. New task requests are ignored if a task is currently running.
+
+**Task Start Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Server
+ participant WSC as WebSocket Client
+ participant UFC as UFO Client
+ participant Task as Task Coroutine
+
+ Server->>WSC: TASK message {user_request, task_name}
+
+ alt Current Task Running
+ WSC->>WSC: Check current_task.done()
+ WSC->>Server: ⚠️ Ignore (log warning)
+ else No Task Running
+ WSC->>Task: Create task_loop() coroutine
+ Task->>UFC: Reset session state
+ Task->>Task: Build metadata (platform)
+ Task->>Server: TASK_REQUEST (via AIP)
+ Server-->>Task: Acknowledgment
+ Task->>WSC: Task coroutine running
+ end
+```
+
+**Task Start Code:**
+
+```python
+async def start_task(self, request_text: str, task_name: str | None):
+ """Start a new task based on server request."""
+
+ # Check if task is already running
+ if self.current_task is not None and not self.current_task.done():
+ self.logger.warning(
+ f"[WS] Task {self.session_id} is still running, ignoring new task"
+ )
+ return
+
+ self.logger.info(f"[WS] Starting task: {request_text}")
+
+ async def task_loop():
+ try:
+ async with self.ufo_client.task_lock:
+ self.ufo_client.reset() # Clear previous session state
+
+ # Build metadata with platform info
+ metadata = {}
+ if self.ufo_client.platform:
+ metadata["platform"] = self.ufo_client.platform
+
+ # Send task request via AIP
+ await self.task_protocol.send_task_request(
+ request=request_text,
+ task_name=task_name if task_name else str(uuid4()),
+ session_id=self.ufo_client.session_id,
+ client_id=self.ufo_client.client_id,
+ metadata=metadata if metadata else None
+ )
+
+ self.logger.info(
+ f"[WS] \[AIP] Sent task request with platform: {self.ufo_client.platform}"
+ )
+ except Exception as e:
+ self.logger.error(f"[WS] \[AIP] Error sending task request: {e}")
+ # Send error via AIP
+ error_msg = ClientMessage(
+ type=ClientMessageType.ERROR,
+ error=str(e),
+ client_id=self.ufo_client.client_id,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat()
+ )
+ await self.transport.send(error_msg.model_dump_json().encode())
+
+ # Create task coroutine
+ self.current_task = asyncio.create_task(task_loop())
+```
+
+### Command Execution Handler
+
+The server sends specific commands (tool calls) to execute, and the client returns results.
+
+**Command Execution Flow:**
+
+```python
+async def handle_commands(self, server_response: ServerMessage):
+ """
+ Handle commands received from server.
+ Uses AIP TaskExecutionProtocol to send results back.
+ """
+ response_id = server_response.response_id
+ task_status = server_response.status
+ self.session_id = server_response.session_id
+
+ # Execute commands via UFO Client
+ action_results = await self.ufo_client.execute_step(server_response)
+
+ # Send results via AIP
+ await self.task_protocol.send_task_result(
+ session_id=self.session_id,
+ prev_response_id=response_id,
+ action_results=action_results,
+ status=task_status,
+ client_id=self.ufo_client.client_id
+ )
+
+ self.logger.info(
+ f"[WS] \[AIP] Sent client result for prev_response_id: {response_id}"
+ )
+
+ # Check for task completion
+ if task_status in [TaskStatus.COMPLETED, TaskStatus.FAILED]:
+ await self.handle_task_end(server_response)
+```
+
+**Execution Steps:**
+
+1. **Extract Metadata**: Get `response_id`, `task_status`, `session_id`
+2. **Execute Commands**: Delegate to `ufo_client.execute_step()`
+3. **Send Results**: Use `TaskExecutionProtocol.send_task_result()`
+4. **Check Completion**: Handle task end if status is terminal
+
+### Task Completion Handler
+
+```python
+async def handle_task_end(self, server_response: ServerMessage):
+ """Handle task end messages from server."""
+
+ if server_response.status == TaskStatus.COMPLETED:
+ self.logger.info(
+ f"[WS] Task {self.session_id} completed, result: {server_response.result}"
+ )
+ elif server_response.status == TaskStatus.FAILED:
+ self.logger.info(
+ f"[WS] Task {self.session_id} failed, with error: {server_response.error}"
+ )
+ else:
+ self.logger.warning(
+ f"[WS] Unknown task status for {self.session_id}: {server_response.status}"
+ )
+```
+
+---
+
+## ⚠️ Error Handling
+
+### Connection Error Recovery
+
+The client automatically retries failed connections using exponential backoff to avoid overwhelming the server.
+
+**Retry Logic:**
+
+```python
+async def connect_and_listen(self):
+ """Connect with automatic retry."""
+ while self.retry_count < self.max_retries:
+ try:
+ async with websockets.connect(...) as ws:
+ # Initialize protocols
+ self.transport = WebSocketTransport(ws)
+ self.registration_protocol = RegistrationProtocol(self.transport)
+ self.heartbeat_protocol = HeartbeatProtocol(self.transport)
+ self.task_protocol = TaskExecutionProtocol(self.transport)
+
+ await self.register_client()
+ self.retry_count = 0 # Reset on successful connection
+ await self.handle_messages()
+
+ except (websockets.ConnectionClosedError, websockets.ConnectionClosedOK) as e:
+ self.logger.error(f"[WS] Connection closed: {e}")
+ self.retry_count += 1
+ await self._maybe_retry()
+
+ except Exception as e:
+ self.logger.error(f"[WS] Unexpected error: {e}", exc_info=True)
+ self.retry_count += 1
+ await self._maybe_retry()
+
+ self.logger.error("[WS] Max retries reached. Exiting.")
+```
+
+**Exponential Backoff:**
+
+```python
+async def _maybe_retry(self):
+ """Exponential backoff before retry."""
+ if self.retry_count < self.max_retries:
+ wait_time = 2 ** self.retry_count # 2s, 4s, 8s, 16s...
+ self.logger.info(f"[WS] Retrying in {wait_time}s...")
+ await asyncio.sleep(wait_time)
+```
+
+**Retry Schedule:**
+
+| Attempt | Wait Time | Cumulative Wait |
+|---------|-----------|-----------------|
+| 1st retry | 2 seconds | 2s |
+| 2nd retry | 4 seconds | 6s |
+| 3rd retry | 8 seconds | 14s |
+| **Max retries reached** | Exit | - |
+
+**Default Max Retries = 3**
+
+Based on source code: `max_retries: int = 3` in constructor. Increase for unreliable networks:
+
+```python
+ws_client = UFOWebSocketClient(
+ ws_url="ws://...",
+ ufo_client=ufo_client,
+ max_retries=10 # More resilient
+)
+```
+
+### Message Parsing Errors
+
+**Graceful Error Handling:**
+
+```python
+try:
+ data = ServerMessage.model_validate_json(msg)
+ # Process message...
+except Exception as e:
+ self.logger.error(f"[WS] Error handling message: {e}", exc_info=True)
+ # Message is dropped, client continues listening
+```
+
+Message parsing errors don't crash the client—the error is logged and the receive loop continues.
+
+### Registration Error Handling
+
+**Fallback to Minimal Metadata:**
+
+```python
+try:
+ system_info = DeviceInfoProvider.collect_system_info(...)
+ metadata = {"system_info": system_info.to_dict()}
+except Exception as e:
+ self.logger.error(f"[WS] \[AIP] Error collecting device info: {e}")
+ # Continue with minimal metadata
+ metadata = {
+ "registration_time": datetime.datetime.now(datetime.timezone.utc).isoformat()
+ }
+```
+
+If device info collection fails, registration still proceeds with minimal metadata (timestamp only).
+
+---
+
+## 🔌 AIP Protocol Integration
+
+The WebSocket client uses three specialized AIP protocols for different communication patterns.
+
+### 1. Registration Protocol
+
+**Purpose:** Client registration and device profile exchange
+
+```python
+from aip.protocol.registration import RegistrationProtocol
+
+self.registration_protocol = RegistrationProtocol(self.transport)
+
+# Register as device
+success = await self.registration_protocol.register_as_device(
+ device_id="device_windows_001",
+ metadata={"system_info": {...}},
+ platform="windows"
+)
+```
+
+**Key Methods:**
+
+| Method | Parameters | Returns | Purpose |
+|--------|------------|---------|---------|
+| `register_as_device()` | `device_id`, `metadata`, `platform` | `bool` | Register client as device |
+
+See [AIP Registration Protocol](../aip/protocols.md#registration-protocol) for message format details.
+
+### 2. Heartbeat Protocol
+
+**Purpose:** Connection keepalive and health monitoring
+
+```python
+from aip.protocol.heartbeat import HeartbeatProtocol
+
+self.heartbeat_protocol = HeartbeatProtocol(self.transport)
+
+# Send heartbeat
+await self.heartbeat_protocol.send_heartbeat("device_windows_001")
+```
+
+**Key Methods:**
+
+| Method | Parameters | Returns | Purpose |
+|--------|------------|---------|---------|
+| `send_heartbeat()` | `client_id` | `None` | Send keepalive message |
+
+See [AIP Heartbeat Protocol](../aip/protocols.md#heartbeat-protocol) for message format details.
+
+### 3. Task Execution Protocol
+
+**Purpose:** Task request and result exchange
+
+```python
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+self.task_protocol = TaskExecutionProtocol(self.transport)
+
+# Send task request
+await self.task_protocol.send_task_request(
+ request="Open Notepad",
+ task_name="task_001",
+ session_id=None,
+ client_id="device_windows_001",
+ metadata={"platform": "windows"}
+)
+
+# Send task result
+await self.task_protocol.send_task_result(
+ session_id="session_123",
+ prev_response_id="resp_456",
+ action_results=[...],
+ status=TaskStatus.COMPLETED,
+ client_id="device_windows_001"
+)
+```
+
+**Key Methods:**
+
+| Method | Parameters | Returns | Purpose |
+|--------|------------|---------|---------|
+| `send_task_request()` | `request`, `task_name`, `session_id`, `client_id`, `metadata` | `None` | Request task execution |
+| `send_task_result()` | `session_id`, `prev_response_id`, `action_results`, `status`, `client_id` | `None` | Return execution results |
+
+See [AIP Task Execution Protocol](../aip/protocols.md#task-execution-protocol) for message format details.
+
+---
+
+## 🔍 Connection State Management
+
+### State Checking
+
+Use `is_connected()` to check if the client is ready to send messages.
+
+**Implementation:**
+
+```python
+def is_connected(self) -> bool:
+ """Check if WebSocket is connected and registered."""
+ return (
+ self.connected_event.is_set() # Registration succeeded
+ and self._ws is not None # WebSocket exists
+ and not self._ws.closed # WebSocket is open
+ )
+```
+
+**Usage Example:**
+
+```python
+if ws_client.is_connected():
+ await ws_client.start_task("Open Calculator", "task_calc")
+else:
+ logger.error("Not connected to server - cannot send task")
+```
+
+### Connected Event
+
+The `connected_event` is an `asyncio.Event` that signals successful registration.
+
+**Usage Pattern:**
+
+```python
+# Wait for connection before sending requests
+await ws_client.connected_event.wait()
+
+# Now safe to send task requests
+await ws_client.start_task("Open Notepad", "task_notepad")
+```
+
+**Event Lifecycle:**
+
+| State | Event Status | Meaning |
+|-------|--------------|---------|
+| Initial | Not set | Client not connected |
+| Connecting | Not set | WebSocket connecting, registering |
+| Registered | **Set** | ✅ Ready to send/receive messages |
+| Disconnected | Cleared | Connection lost, will retry |
+
+## ✅ Best Practices
+
+### Development Best Practices
+
+**1. Enable DEBUG Logging**
+
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+
+**Output:**
+```log
+DEBUG - [WS] [AIP] Heartbeat sent
+DEBUG - [WS] [AIP] Heartbeat failed (connection closed): ...
+INFO - [WS] Received message: ServerMessage(type='COMMAND', ...)
+```
+
+**2. Test Connection Before Full Integration**
+
+```python
+# Test just connection and registration
+ws_client = UFOWebSocketClient(ws_url, ufo_client)
+await ws_client.connect_and_listen() # Should register successfully
+```
+
+**3. Handle Connection Loss Gracefully**
+
+```python
+try:
+ await ws_client.connect_and_listen()
+except Exception as e:
+ logger.error(f"WebSocket client error: {e}")
+ # Implement recovery (e.g., alert, restart)
+```
+
+### Production Best Practices
+
+**1. Use Appropriate Retry Limits**
+
+For production networks with occasional instability:
+
+```python
+ws_client = UFOWebSocketClient(
+ ws_url="wss://production-server.com/ws",
+ ufo_client=ufo_client,
+ max_retries=10 # More retries for resilience
+)
+```
+
+**2. Monitor Connection Health**
+
+Log heartbeat success/failure for alerting:
+
+```python
+# In heartbeat_loop (add custom monitoring):
+try:
+ await self.heartbeat_protocol.send_heartbeat(...)
+ self.logger.debug("[WS] ✅ Heartbeat sent successfully")
+ # Update metrics: heartbeat_success_count++
+except Exception as e:
+ self.logger.error(f"[WS] ❌ Heartbeat failed: {e}")
+ # Trigger alert: connection_health_alert()
+```
+
+**3. Use Secure WebSocket (WSS)**
+
+```python
+# Production: Encrypted WebSocket
+ws_client = UFOWebSocketClient(
+ ws_url="wss://ufo-server.company.com/ws", # WSS, not WS
+ ufo_client=ufo_client
+)
+```
+
+**4. Clean State on Reconnection**
+
+The client automatically resets state:
+
+```python
+async with self.ufo_client.task_lock:
+ self.ufo_client.reset() # Clears session state
+ # Send new task request
+```
+
+### Error Handling Best Practices
+
+!!!warning "Defensive Programming"
+
+ **1. Expect Transient Failures**
+ ```python
+ # Increase retries for unreliable networks
+ max_retries=10
+
+ # Monitor retry count in logs
+ self.logger.info(f"[WS] Retry {self.retry_count}/{self.max_retries}")
+ ```
+
+ **2. Validate Messages Before Processing**
+ ```python
+ # Already handled by Pydantic in source code:
+ data = ServerMessage.model_validate_json(msg) # Raises on invalid
+ ```
+
+ **3. Report Errors via AIP**
+ ```python
+ # Send structured error messages back to server
+ error_msg = ClientMessage(
+ type=ClientMessageType.ERROR,
+ error=str(e),
+ client_id=self.ufo_client.client_id,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat()
+ )
+ await self.transport.send(error_msg.model_dump_json().encode())
+ ```
+
+---
+
+## 🔗 Integration Points
+
+### UFO Client Integration
+
+The WebSocket client delegates all command execution to the UFO Client.
+
+**Execution Flow:**
+
+```python
+# WebSocket client receives command
+action_results = await self.ufo_client.execute_step(server_response)
+```
+
+**Integration:**
+
+| WebSocket Client Role | UFO Client Role |
+|----------------------|-----------------|
+| Receive commands from server | Execute commands via MCP tools |
+| Parse server messages | Manage computer/tool registry |
+| Send results back | Collect execution results |
+| Handle connection errors | Handle execution errors |
+
+See [UFO Client](./ufo_client.md) for execution details.
+
+### Device Info Provider Integration
+
+Device information is collected once during registration.
+
+**Integration:**
+
+```python
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+system_info = DeviceInfoProvider.collect_system_info(
+ client_id=self.ufo_client.client_id,
+ custom_metadata=None
+)
+```
+
+See [Device Info Provider](./device_info.md) for profiling details.
+
+### AIP Transport Integration
+
+All messages go through the WebSocket transport layer.
+
+**Transport Creation:**
+
+```python
+from aip.transport.websocket import WebSocketTransport
+
+self.transport = WebSocketTransport(ws)
+```
+
+**Transport Usage:**
+
+- **Protocols use transport** for sending messages
+- **Direct transport access** for error messages
+
+See [AIP Transport Layer](../aip/transport.md) for transport details.
+
+## 🚀 Next Steps
+
+**Continue Learning**
+
+1. **Connect Your Client** - Follow the step-by-step guide: [Quick Start Guide](./quick_start.md)
+
+2. **Understand Command Execution** - Learn how the UFO Client executes commands: [UFO Client Documentation](./ufo_client.md)
+
+3. **Explore Device Profiling** - See what device information is collected: [Device Info Provider](./device_info.md)
+
+4. **Master the AIP Protocol** - Deep dive into message formats: [AIP Protocol Guide](../aip/protocols.md)
+
+5. **Study Server-Side Registration** - Understand how the server handles registration: [Server Overview](../server/overview.md)
diff --git a/documents/docs/configuration/models/azure_openai.md b/documents/docs/configuration/models/azure_openai.md
new file mode 100644
index 000000000..ee8381a44
--- /dev/null
+++ b/documents/docs/configuration/models/azure_openai.md
@@ -0,0 +1,96 @@
+# Azure OpenAI (AOAI)
+
+## Step 1: Create Azure OpenAI Resource
+
+To use the Azure OpenAI API, create an account on the [Azure OpenAI website](https://azure.microsoft.com/en-us/products/ai-services/openai-service). After creating an account, deploy a model and obtain your API key and endpoint.
+
+## Step 2: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the Azure OpenAI API.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Azure OpenAI configuration:
+
+### Option 1: API Key Authentication (Recommended for Development)
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable visual mode to understand screenshots
+ REASONING_MODEL: False # Set to True for o-series models
+ API_TYPE: "aoai" # Use Azure OpenAI API
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com" # Your Azure endpoint
+ API_KEY: "YOUR_AOAI_KEY" # Your Azure OpenAI API key
+ API_VERSION: "2024-02-15-preview" # API version
+ API_MODEL: "gpt-4o" # Model name
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID" # Your deployment name
+
+APP_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o-mini" # Use gpt-4o-mini for cost efficiency
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+```
+
+### Option 2: Azure AD Authentication (Recommended for Production)
+
+For Azure Active Directory authentication, use `API_TYPE: "azure_ad"`:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "azure_ad" # Use Azure AD authentication
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com" # Your Azure endpoint
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+
+ # Azure AD Configuration
+ AAD_TENANT_ID: "YOUR_TENANT_ID" # Your Azure tenant ID
+ AAD_API_SCOPE: "YOUR_SCOPE" # Your API scope
+ AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Scope base (without api:// prefix)
+
+APP_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "azure_ad"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o-mini"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+ AAD_TENANT_ID: "YOUR_TENANT_ID"
+ AAD_API_SCOPE: "YOUR_SCOPE"
+ AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` to enable vision capabilities. Ensure your deployment supports visual inputs
+- **`API_TYPE`**: Use `"aoai"` for API key auth or `"azure_ad"` for Azure AD auth
+- **`API_BASE`**: Your Azure OpenAI endpoint URL (format: `https://{resource-name}.openai.azure.com`)
+- **`API_KEY`**: Your Azure OpenAI API key (not needed for Azure AD auth)
+- **`API_VERSION`**: Azure API version (e.g., `"2024-02-15-preview"`)
+- **`API_MODEL`**: Model identifier (e.g., `gpt-4o`, `gpt-4o-mini`)
+- **`API_DEPLOYMENT_ID`**: Your Azure deployment name (required for AOAI)
+- **`AAD_TENANT_ID`**: Azure tenant ID (required for Azure AD auth)
+- **`AAD_API_SCOPE`**: Azure AD API scope (required for Azure AD auth)
+- **`AAD_API_SCOPE_BASE`**: Scope base without `api://` prefix (required for Azure AD auth)
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+- [OpenAI](openai.md) - Standard OpenAI API setup
+
+## Step 3: Start Using UFO
+
+After configuration, you can start using UFO with the Azure OpenAI API. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
\ No newline at end of file
diff --git a/documents/docs/configuration/models/claude.md b/documents/docs/configuration/models/claude.md
new file mode 100644
index 000000000..162d0b35a
--- /dev/null
+++ b/documents/docs/configuration/models/claude.md
@@ -0,0 +1,69 @@
+# Anthropic Claude
+
+## Step 1: Obtain API Key
+
+To use the Claude API, create an account on the [Anthropic Console](https://console.anthropic.com/) and access your API key from the API keys section.
+
+## Step 2: Install Dependencies
+
+Install the required Anthropic Python package:
+
+```bash
+pip install -U anthropic==0.37.1
+```
+
+## Step 3: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the Claude API.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Claude configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable visual mode to understand screenshots
+ API_TYPE: "claude" # Use Claude API
+ API_BASE: "https://api.anthropic.com" # Claude API endpoint
+ API_KEY: "YOUR_CLAUDE_API_KEY" # Your Claude API key
+ API_MODEL: "claude-3-5-sonnet-20241022" # Model name
+ API_VERSION: "2023-06-01" # API version
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "claude"
+ API_BASE: "https://api.anthropic.com"
+ API_KEY: "YOUR_CLAUDE_API_KEY"
+ API_MODEL: "claude-3-5-sonnet-20241022"
+ API_VERSION: "2023-06-01"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` to enable vision capabilities. Most Claude 3+ models support visual inputs (see [Claude models](https://www.anthropic.com/pricing#anthropic-api))
+- **`API_TYPE`**: Use `"claude"` for Claude API (case-sensitive in code: lowercase)
+- **`API_BASE`**: Claude API endpoint - `https://api.anthropic.com`
+- **`API_KEY`**: Your Anthropic API key from the console
+- **`API_MODEL`**: Model identifier (e.g., `claude-3-5-sonnet-20241022`, `claude-3-opus-20240229`)
+- **`API_VERSION`**: API version identifier
+
+**Available Models:**
+
+- **Claude 3.5 Sonnet**: `claude-3-5-sonnet-20241022` - Best balance of intelligence and speed
+- **Claude 3 Opus**: `claude-3-opus-20240229` - Most capable model
+- **Claude 3 Sonnet**: `claude-3-sonnet-20240229` - Balanced performance
+- **Claude 3 Haiku**: `claude-3-haiku-20240307` - Fast and cost-effective
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+- [Anthropic Documentation](https://docs.anthropic.com/) - Official Claude API docs
+
+## Step 4: Start Using UFO
+
+After configuration, you can start using UFO with the Claude API. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
\ No newline at end of file
diff --git a/documents/docs/configuration/models/custom_model.md b/documents/docs/configuration/models/custom_model.md
new file mode 100644
index 000000000..3b893c624
--- /dev/null
+++ b/documents/docs/configuration/models/custom_model.md
@@ -0,0 +1,121 @@
+# Customized LLM Models
+
+UFO supports and welcomes the integration of custom LLM models. If you have a custom LLM model that you would like to use with UFO, follow the steps below to configure it.
+
+## Step 1: Create and Serve Your Model
+
+Create a custom LLM model and serve it on your local or remote environment. Ensure your model has an accessible API endpoint.
+
+## Step 2: Implement Model Service Class
+
+Create a Python script under the `ufo/llm` directory and implement your own LLM model class by inheriting the `BaseService` class from `ufo/llm/base.py`.
+
+**Reference Example:** See `PlaceHolderService` in `ufo/llm/placeholder.py` as a template.
+
+You must implement the `chat_completion` method:
+
+```python
+def chat_completion(
+ self,
+ messages: List[Dict[str, str]],
+ n: int = 1,
+ temperature: Optional[float] = None,
+ max_tokens: Optional[int] = None,
+ top_p: Optional[float] = None,
+ **kwargs: Any,
+) -> Tuple[List[str], Optional[float]]:
+ """
+ Generates completions for a given list of messages.
+
+ Args:
+ messages: The list of messages to generate completions for.
+ n: The number of completions to generate for each message.
+ temperature: Controls the randomness (higher = more random).
+ max_tokens: The maximum number of tokens in completions.
+ top_p: Controls diversity (higher = more diverse).
+ **kwargs: Additional keyword arguments.
+
+ Returns:
+ Tuple[List[str], Optional[float]]:
+ - List of generated completions for each message
+ - Cost of the API call (None if not applicable)
+
+ Raises:
+ Exception: If an error occurs while making the API request.
+ """
+ # Your implementation here
+ pass
+```
+
+**Key Implementation Points:**
+
+- Handle message formatting according to your model's API
+- Process visual inputs if `VISUAL_MODE` is enabled
+- Implement retry logic for failed requests
+- Calculate and return cost if applicable
+
+## Step 3: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use your custom model.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your custom model configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Set based on your model's capabilities
+ API_TYPE: "custom_model" # Use custom model type
+ API_BASE: "http://your-endpoint:port" # Your model's API endpoint
+ API_KEY: "YOUR_API_KEY" # Your API key (if required)
+ API_MODEL: "your-model-name" # Your model identifier
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "custom_model"
+ API_BASE: "http://your-endpoint:port"
+ API_KEY: "YOUR_API_KEY"
+ API_MODEL: "your-model-name"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` if your model supports visual inputs
+- **`API_TYPE`**: Use `"custom_model"` for custom implementations
+- **`API_BASE`**: Your custom model's API endpoint URL
+- **`API_KEY`**: Authentication key (if your model requires it)
+- **`API_MODEL`**: Model identifier or name
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+
+## Step 4: Register Your Model
+
+Update the model factory in `ufo/llm/__init__.py` to include your custom model class:
+
+```python
+from ufo.llm.your_model import YourModelService
+
+# Add to the model factory mapping
+MODEL_FACTORY = {
+ # ... existing models ...
+ "custom_model": YourModelService,
+}
+```
+
+## Step 5: Start Using UFO
+
+After configuration, you can start using UFO with your custom model. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
+
+**Testing Your Integration:**
+
+1. Test with simple requests first
+2. Verify visual mode works (if applicable)
+3. Check error handling and retry logic
+4. Monitor response quality and latency
\ No newline at end of file
diff --git a/documents/docs/configuration/models/deepseek.md b/documents/docs/configuration/models/deepseek.md
new file mode 100644
index 000000000..d76a4f3fa
--- /dev/null
+++ b/documents/docs/configuration/models/deepseek.md
@@ -0,0 +1,54 @@
+# DeepSeek Model
+
+## Step 1: Obtain API Key
+
+DeepSeek is developed by DeepSeek AI. To use DeepSeek models, go to [DeepSeek Platform](https://www.deepseek.com/), register an account, and obtain your API key from the API management console.
+
+## Step 2: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the DeepSeek model.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your DeepSeek configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: False # DeepSeek models typically don't support visual inputs
+ API_TYPE: "deepseek" # Use DeepSeek API
+ API_KEY: "YOUR_DEEPSEEK_API_KEY" # Your DeepSeek API key
+ API_MODEL: "deepseek-chat" # Model name
+
+APP_AGENT:
+ VISUAL_MODE: False
+ API_TYPE: "deepseek"
+ API_KEY: "YOUR_DEEPSEEK_API_KEY"
+ API_MODEL: "deepseek-chat"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `False` - Most DeepSeek models don't support visual inputs
+- **`API_TYPE`**: Use `"deepseek"` for DeepSeek API (case-sensitive in code: lowercase)
+- **`API_KEY`**: Your DeepSeek API key
+- **`API_MODEL`**: Model identifier (e.g., `deepseek-chat`, `deepseek-coder`)
+
+**Available Models:**
+
+- **DeepSeek-Chat**: `deepseek-chat` - General conversation model
+- **DeepSeek-Coder**: `deepseek-coder` - Code-specialized model
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+
+## Step 3: Start Using UFO
+
+After configuration, you can start using UFO with the DeepSeek model. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
+
+**Note:** Since DeepSeek models don't support visual mode, UFO will operate in text-only mode, which may limit some UI automation capabilities that rely on screenshot understanding.
diff --git a/documents/docs/configuration/models/gemini.md b/documents/docs/configuration/models/gemini.md
new file mode 100644
index 000000000..e4d02ff35
--- /dev/null
+++ b/documents/docs/configuration/models/gemini.md
@@ -0,0 +1,78 @@
+# Google Gemini
+
+## Step 1: Obtain API Key
+
+To use the Google Gemini API, create an account on [Google AI Studio](https://ai.google.dev/) and generate your API key from the API keys section.
+
+## Step 2: Install Dependencies
+
+Install the required Google GenAI Python package:
+
+```bash
+pip install -U google-genai==1.12.1
+```
+
+## Step 3: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the Google Gemini API.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Gemini configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable visual mode to understand screenshots
+ JSON_SCHEMA: True # Enable JSON schema for structured responses
+ API_TYPE: "gemini" # Use Gemini API
+ API_BASE: "https://generativelanguage.googleapis.com" # Gemini API endpoint
+ API_KEY: "YOUR_GEMINI_API_KEY" # Your Gemini API key
+ API_MODEL: "gemini-2.0-flash-exp" # Model name
+ API_VERSION: "v1beta" # API version
+
+APP_AGENT:
+ VISUAL_MODE: True
+ JSON_SCHEMA: True
+ API_TYPE: "gemini"
+ API_BASE: "https://generativelanguage.googleapis.com"
+ API_KEY: "YOUR_GEMINI_API_KEY"
+ API_MODEL: "gemini-2.0-flash-exp"
+ API_VERSION: "v1beta"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` to enable vision capabilities. Most Gemini models support visual inputs (see [Gemini models](https://ai.google.dev/gemini-api/docs/models/gemini))
+- **`JSON_SCHEMA`**: Set to `True` to enable structured JSON output formatting
+- **`API_TYPE`**: Use `"gemini"` for Google Gemini API (case-sensitive in code: lowercase)
+- **`API_BASE`**: Gemini API endpoint - `https://generativelanguage.googleapis.com`
+- **`API_KEY`**: Your Google AI API key
+- **`API_MODEL`**: Model identifier (e.g., `gemini-2.0-flash-exp`, `gemini-1.5-pro`)
+- **`API_VERSION`**: API version (typically `v1beta`)
+
+**Available Models:**
+
+- **Gemini 2.0 Flash**: `gemini-2.0-flash-exp` - Latest experimental model with multimodal capabilities
+- **Gemini 1.5 Pro**: `gemini-1.5-pro` - Advanced reasoning and long context
+- **Gemini 1.5 Flash**: `gemini-1.5-flash` - Fast and efficient
+
+**Rate Limits:**
+
+If you encounter `429 Resource has been exhausted` errors, you've hit the rate limit of your Gemini API quota. Consider:
+- Reducing request frequency
+- Upgrading your API tier
+- Using exponential backoff for retries
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+- [Gemini API Documentation](https://ai.google.dev/gemini-api) - Official Gemini API docs
+
+## Step 4: Start Using UFO
+
+After configuration, you can start using UFO with the Gemini API. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
\ No newline at end of file
diff --git a/documents/docs/configuration/models/ollama.md b/documents/docs/configuration/models/ollama.md
new file mode 100644
index 000000000..27a58e367
--- /dev/null
+++ b/documents/docs/configuration/models/ollama.md
@@ -0,0 +1,104 @@
+# Ollama
+
+## Step 1: Install and Start Ollama
+
+Go to [Ollama](https://github.com/jmorganca/ollama) and follow the installation instructions for your platform.
+
+**For Linux & WSL2:**
+
+```bash
+# Install Ollama
+curl https://ollama.ai/install.sh | sh
+
+# Start the Ollama server
+ollama serve
+```
+
+**For Windows/Mac:** Download and install from the [Ollama website](https://ollama.ai/).
+
+## Step 2: Pull and Test a Model
+
+Open a new terminal and pull a model:
+
+```bash
+# Pull a model (e.g., llama2)
+ollama pull llama2
+
+# Test the model
+ollama run llama2
+```
+
+By default, Ollama starts a server at `http://localhost:11434`, which will be used as the API base in your configuration.
+
+## Step 3: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use Ollama.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Ollama configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable if model supports vision (e.g., llava)
+ API_TYPE: "ollama" # Use Ollama API
+ API_BASE: "http://localhost:11434" # Ollama server endpoint
+ API_KEY: "ollama" # Placeholder (not used but required)
+ API_MODEL: "llama2" # Model name (must match pulled model)
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "ollama"
+ API_BASE: "http://localhost:11434"
+ API_KEY: "ollama"
+ API_MODEL: "llama2"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` only for vision-capable models like `llava`
+- **`API_TYPE`**: Use `"ollama"` for Ollama API (case-sensitive in code: lowercase)
+- **`API_BASE`**: Ollama server URL (default: `http://localhost:11434`)
+- **`API_KEY`**: Placeholder value (not used but required in config)
+- **`API_MODEL`**: Model name matching your pulled model
+
+**Important: Increase Context Length**
+
+UFO requires at least 20,000 tokens to function properly. Ollama's default context length is 2048 tokens, which is insufficient. You must create a custom model with increased context:
+
+1. Create a `Modelfile`:
+
+```text
+FROM llama2
+PARAMETER num_ctx 32768
+```
+
+2. Build the custom model:
+
+```bash
+ollama create llama2-max-ctx -f Modelfile
+```
+
+3. Use the custom model in your config:
+
+```yaml
+API_MODEL: "llama2-max-ctx"
+```
+
+For more details, see [Ollama's Modelfile documentation](https://github.com/ollama/ollama/blob/main/docs/modelfile.md).
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+
+## Step 4: Start Using UFO
+
+After configuration, you can start using UFO with Ollama. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
+
+
+
diff --git a/documents/docs/configuration/models/openai.md b/documents/docs/configuration/models/openai.md
new file mode 100644
index 000000000..2955218aa
--- /dev/null
+++ b/documents/docs/configuration/models/openai.md
@@ -0,0 +1,57 @@
+# OpenAI
+
+## Step 1: Obtain API Key
+
+To use the OpenAI API, create an account on the [OpenAI website](https://platform.openai.com/signup). After creating an account, you can access your API key from the [API keys page](https://platform.openai.com/account/api-keys).
+
+## Step 2: Configure Agent Settings
+
+After obtaining the API key, configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the OpenAI API.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your OpenAI configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable visual mode to understand screenshots
+ REASONING_MODEL: False # Set to True for o-series models (o1, o3, o3-mini)
+ API_TYPE: "openai" # Use OpenAI API
+ API_BASE: "https://api.openai.com/v1" # OpenAI API endpoint
+ API_KEY: "sk-YOUR_KEY_HERE" # Your OpenAI API key (starts with sk-)
+ API_VERSION: "2025-02-01-preview" # API version
+ API_MODEL: "gpt-4o" # Model name (gpt-4o, gpt-4o-mini, etc.)
+
+APP_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o-mini" # Use gpt-4o-mini for cost efficiency
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` to enable vision capabilities. Ensure your selected model supports visual inputs (see [OpenAI models](https://platform.openai.com/docs/models))
+- **`REASONING_MODEL`**: Set to `True` when using o-series models (o1, o3, o3-mini) which have different behavior
+- **`API_TYPE`**: Use `"openai"` for OpenAI API
+- **`API_BASE`**: OpenAI API base URL - `https://api.openai.com/v1`
+- **`API_KEY`**: Your OpenAI API key from the API keys page
+- **`API_VERSION`**: API version identifier
+- **`API_MODEL`**: Model identifier (e.g., `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`)
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+- [Azure OpenAI](azure_openai.md) - Alternative Azure-hosted OpenAI setup
+
+## Step 3: Start Using UFO
+
+After configuration, you can start using UFO with the OpenAI API. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
\ No newline at end of file
diff --git a/documents/docs/configuration/models/operator.md b/documents/docs/configuration/models/operator.md
new file mode 100644
index 000000000..c8bb9a6e6
--- /dev/null
+++ b/documents/docs/configuration/models/operator.md
@@ -0,0 +1,82 @@
+# OpenAI CUA (Operator)
+
+The [Operator](https://openai.com/index/computer-using-agent/) is a specialized agentic model tailored for Computer-Using Agents (CUA). It's currently available via the Azure OpenAI API (AOAI) using the [Response API](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure).
+
+## Step 1: Create Azure OpenAI Resource
+
+To use the Operator model, create an account on the [Azure OpenAI website](https://azure.microsoft.com/en-us/products/ai-services/openai-service). After creating an account, deploy the Operator model and access your API key.
+
+## Step 2: Configure Operator Agent
+
+Configure the `OPERATOR` in the `config/ufo/agents.yaml` file to use the Azure OpenAI Operator model.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Operator configuration:
+
+```yaml
+OPERATOR:
+ SCALER: [1024, 768] # Visual input resolution [width, height]
+ API_TYPE: "azure_ad" # Use Azure AD authentication
+ API_MODEL: "computer-use-preview-20250311" # Operator model name
+ API_VERSION: "2025-03-01-preview" # API version for Operator
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com" # Your Azure endpoint
+
+ # Azure AD Authentication (required)
+ AAD_TENANT_ID: "YOUR_TENANT_ID" # Your Azure tenant ID
+ AAD_API_SCOPE: "YOUR_SCOPE" # Your API scope
+ AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Scope base (without api:// prefix)
+```
+
+**Configuration Fields:**
+
+- **`SCALER`**: Resolution for visual input `[width, height]` (recommended: `[1024, 768]`)
+- **`API_TYPE`**: Use `"azure_ad"` for Azure AD authentication (or `"aoai"` for API key auth)
+- **`API_MODEL`**: Operator model identifier (e.g., `computer-use-preview-20250311`)
+- **`API_VERSION`**: API version for Operator (e.g., `2025-03-01-preview`)
+- **`API_BASE`**: Your Azure OpenAI endpoint URL
+- **`AAD_TENANT_ID`**: Azure tenant ID (required for Azure AD auth)
+- **`AAD_API_SCOPE`**: Azure AD API scope (required for Azure AD auth)
+- **`AAD_API_SCOPE_BASE`**: Scope base without `api://` prefix (required for Azure AD auth)
+
+**For API Key Authentication (Development):**
+
+If you prefer API key authentication instead of Azure AD:
+
+```yaml
+OPERATOR:
+ SCALER: [1024, 768]
+ API_TYPE: "aoai" # Use API key authentication
+ API_MODEL: "computer-use-preview-20250311"
+ API_VERSION: "2025-03-01-preview"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY" # Your Azure OpenAI API key
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID" # Your deployment name
+```
+
+## Step 3: Run Operator in UFO
+
+UFO supports running Operator in two modes:
+
+1. **Standalone Agent**: Run Operator as a single agent
+2. **As AppAgent**: Call Operator as a separate `AppAgent` from the `HostAgent`
+
+Operator uses a specialized visual-only workflow different from other models and currently does not support the standard `AppAgent` workflow.
+
+**For detailed usage instructions, see:**
+
+- [Operator as AppAgent](../../ufo2/advanced_usage/operator_as_app_agent.md) - How to integrate Operator into UFO workflows
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Azure OpenAI](azure_openai.md) - General Azure OpenAI setup
+
+**Important Notes:**
+
+- Operator is a visual-only model optimized for computer control tasks
+- It uses a different workflow from standard text-based models
+- Best suited for direct UI manipulation and visual understanding tasks
+- Requires Azure OpenAI deployment (not available via standard OpenAI API)
+
diff --git a/documents/docs/configuration/models/overview.md b/documents/docs/configuration/models/overview.md
new file mode 100644
index 000000000..1927296fa
--- /dev/null
+++ b/documents/docs/configuration/models/overview.md
@@ -0,0 +1,96 @@
+# Supported Models
+
+UFO supports a wide variety of LLM models and APIs. You can configure different models for `HOST_AGENT`, `APP_AGENT`, `BACKUP_AGENT`, and `EVALUATION_AGENT` in the `config/ufo/agents.yaml` file to optimize for performance, cost, or specific capabilities.
+
+## Available Model Integrations
+
+| Provider | Documentation | Visual Support | Authentication |
+| --- | --- | --- | --- |
+| **OpenAI** | [OpenAI API](./openai.md) | ✅ | API Key |
+| **Azure OpenAI (AOAI)** | [Azure OpenAI API](./azure_openai.md) | ✅ | API Key / Azure AD |
+| **Google Gemini** | [Gemini API](./gemini.md) | ✅ | API Key |
+| **Anthropic Claude** | [Claude API](./claude.md) | ✅ | API Key |
+| **Qwen (Alibaba)** | [Qwen API](./qwen.md) | ✅ | API Key |
+| **DeepSeek** | [DeepSeek API](./deepseek.md) | ❌ | API Key |
+| **Ollama** | [Ollama API](./ollama.md) | ⚠️ Limited | Local |
+| **OpenAI Operator** | [Operator (CUA)](./operator.md) | ✅ | Azure AD |
+| **Custom Models** | [Custom API](./custom_model.md) | Depends | Varies |
+
+## Model Selection Guide
+
+### By Use Case
+
+**For Production Deployments:**
+- **Primary**: OpenAI GPT-4o or Azure OpenAI (enterprise features)
+- **Cost-optimized**: GPT-4o-mini for APP_AGENT, GPT-4o for HOST_AGENT
+- **Privacy-sensitive**: Ollama (local models)
+
+**For Development & Testing:**
+- **Fast iteration**: Gemini 2.0 Flash (high speed, low cost)
+- **Local testing**: Ollama with llama2 or similar
+- **Budget-friendly**: DeepSeek or Qwen models
+
+**For Specialized Tasks:**
+- **Computer control**: OpenAI Operator (CUA model)
+- **Code generation**: DeepSeek-Coder or Claude
+- **Long context**: Gemini 1.5 Pro (large context window)
+
+### By Capability
+
+**Vision Support (Screenshot Understanding):**
+- ✅ OpenAI GPT-4o, GPT-4-turbo
+- ✅ Azure OpenAI (vision-enabled deployments)
+- ✅ Google Gemini (all 1.5+ models)
+- ✅ Claude 3+ (all variants)
+- ✅ Qwen-VL models
+- ⚠️ Ollama (llava models only)
+- ❌ DeepSeek (text-only)
+
+**JSON Schema Support:**
+- ✅ OpenAI / Azure OpenAI
+- ✅ Google Gemini
+- ⚠️ Limited: Claude, Qwen, Ollama
+
+## Configuration Architecture
+
+Each model is implemented as a separate class in the `ufo/llm` directory, inheriting from the `BaseService` class in `ufo/llm/base.py`. All models implement the `chat_completion` method to maintain a consistent interface.
+
+**Key Configuration Files:**
+
+- **`config/ufo/agents.yaml`**: Primary agent configuration (HOST, APP, BACKUP, EVALUATION, OPERATOR)
+- **`config/ufo/system.yaml`**: System-wide LLM parameters (MAX_TOKENS, TEMPERATURE, etc.)
+- **`config/ufo/prices.yaml`**: Cost tracking for different models
+
+## Multi-Provider Setup
+
+You can mix and match providers for different agents to optimize cost and performance:
+
+```yaml
+# Use OpenAI for planning
+HOST_AGENT:
+ API_TYPE: "openai"
+ API_MODEL: "gpt-4o"
+
+# Use Azure OpenAI for execution (cost control)
+APP_AGENT:
+ API_TYPE: "aoai"
+ API_MODEL: "gpt-4o-mini"
+
+# Use Claude for evaluation
+EVALUATION_AGENT:
+ API_TYPE: "claude"
+ API_MODEL: "claude-3-5-sonnet-20241022"
+```
+
+## Getting Started
+
+1. Choose your LLM provider from the table above
+2. Follow the provider-specific documentation to obtain API keys
+3. Configure `config/ufo/agents.yaml` with your credentials
+4. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) to begin
+
+**For detailed configuration options:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete configuration reference
+- [System Configuration](../system/system_config.md) - LLM parameters and behavior
+- [Quick Start Guide](../../getting_started/quick_start_ufo2.md) - Step-by-step setup
\ No newline at end of file
diff --git a/documents/docs/configuration/models/qwen.md b/documents/docs/configuration/models/qwen.md
new file mode 100644
index 000000000..9242e5240
--- /dev/null
+++ b/documents/docs/configuration/models/qwen.md
@@ -0,0 +1,53 @@
+# Qwen Model
+
+## Step 1: Obtain API Key
+
+Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use Qwen models, go to [DashScope](https://dashscope.aliyun.com/), register an account, and obtain your API key. Detailed instructions are available in the [DashScope documentation](https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key) (Chinese).
+
+## Step 2: Configure Agent Settings
+
+Configure the `HOST_AGENT` and `APP_AGENT` in the `config/ufo/agents.yaml` file to use the Qwen model.
+
+If the file doesn't exist, copy it from the template:
+
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+Edit `config/ufo/agents.yaml` with your Qwen configuration:
+
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True # Enable visual mode for vision-capable models
+ API_TYPE: "qwen" # Use Qwen API
+ API_KEY: "YOUR_QWEN_API_KEY" # Your DashScope API key
+ API_MODEL: "qwen-vl-max" # Model name (e.g., qwen-vl-max, qwen-max)
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "qwen"
+ API_KEY: "YOUR_QWEN_API_KEY"
+ API_MODEL: "qwen-vl-max"
+```
+
+**Configuration Fields:**
+
+- **`VISUAL_MODE`**: Set to `True` for vision-capable models (qwen-vl-*). Set to `False` for text-only models
+- **`API_TYPE`**: Use `"qwen"` for Qwen API (case-sensitive in code: lowercase)
+- **`API_KEY`**: Your DashScope API key
+- **`API_MODEL`**: Model identifier (see [Qwen model list](https://help.aliyun.com/zh/dashscope/developer-reference/model-square/))
+
+**Available Models:**
+
+- **Qwen-VL-Max**: `qwen-vl-max` - Vision and language model
+- **Qwen-Max**: `qwen-max` - Text-only advanced model
+- **Qwen-Plus**: `qwen-plus` - Balanced performance model
+
+**For detailed configuration options, see:**
+
+- [Agent Configuration Guide](../system/agents_config.md) - Complete agent settings reference
+- [Model Configuration Overview](overview.md) - Compare different LLM providers
+
+## Step 3: Start Using UFO
+
+After configuration, you can start using UFO with the Qwen model. Refer to the [Quick Start Guide](../../getting_started/quick_start_ufo2.md) for detailed instructions on running your first tasks.
diff --git a/documents/docs/configuration/system/agents_config.md b/documents/docs/configuration/system/agents_config.md
new file mode 100644
index 000000000..bcb3a7b54
--- /dev/null
+++ b/documents/docs/configuration/system/agents_config.md
@@ -0,0 +1,504 @@
+# Agent Configuration (agents.yaml)
+
+Configure all LLM models and agent-specific settings for UFO². Each agent type can use different models and API configurations for optimal performance.
+
+## Overview
+
+The `agents.yaml` file defines LLM settings for all agents in UFO². This is the **most important configuration file** as it contains your API keys and model selections.
+
+**File Location**: `config/ufo/agents.yaml`
+
+**Initial Setup Required:**
+
+1. **Copy the template file**:
+ ```powershell
+ Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+ ```
+
+2. **Edit `config/ufo/agents.yaml`** with your API keys and settings
+
+3. **Never commit `agents.yaml`** to version control (it contains secrets)
+
+## Quick Start
+
+### Step 1: Create Configuration File
+
+```powershell
+# Copy template to create your configuration
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+### Step 2: Configure Your LLM Provider
+
+Choose your LLM provider and edit `config/ufo/agents.yaml`:
+
+**OpenAI:**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_OPENAI_KEY_HERE"
+ API_MODEL: "gpt-4o"
+ API_VERSION: "2025-02-01-preview"
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_OPENAI_KEY_HERE"
+ API_MODEL: "gpt-4o-mini"
+ API_VERSION: "2025-02-01-preview"
+```
+
+**Azure OpenAI:**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_MODEL: "gpt-4o"
+ API_VERSION: "2024-02-15-preview"
+ API_DEPLOYMENT_ID: "gpt-4o-deployment"
+
+APP_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_MODEL: "gpt-4o-mini"
+ API_VERSION: "2024-02-15-preview"
+ API_DEPLOYMENT_ID: "gpt-4o-mini-deployment"
+```
+
+**Google Gemini:**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "gemini"
+ API_BASE: "https://generativelanguage.googleapis.com"
+ API_KEY: "YOUR_GEMINI_API_KEY"
+ API_MODEL: "gemini-2.0-flash-exp"
+ API_VERSION: "v1beta"
+```
+
+**Anthropic Claude:**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "claude"
+ API_BASE: "https://api.anthropic.com"
+ API_KEY: "YOUR_CLAUDE_API_KEY"
+ API_MODEL: "claude-3-5-sonnet-20241022"
+ API_VERSION: "2023-06-01"
+```
+
+### Step 3: Verify Configuration
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+print(f"HOST_AGENT model: {config.host_agent.api_model}")
+print(f"APP_AGENT model: {config.app_agent.api_model}")
+```
+
+---
+
+## Agent Types
+
+UFO² uses different agents for different purposes. Each can be configured with different models.
+
+| Agent | Purpose | Recommended Model | Frequency |
+|-------|---------|-------------------|-----------|
+| **HOST_AGENT** | Task planning, app coordination | GPT-4o, GPT-4 | Low (planning) |
+| **APP_AGENT** | Action execution, UI interaction | GPT-4o-mini, GPT-4o | High (every action) |
+| **BACKUP_AGENT** | Fallback when others fail | GPT-4-vision-preview | Rare (errors) |
+| **EVALUATION_AGENT** | Task completion evaluation | GPT-4o | Low (end of task) |
+| **OPERATOR** | CUA-based automation | computer-use-preview | Optional |
+
+**Cost Optimization Tips:**
+
+- Use **GPT-4o** for HOST_AGENT (complex planning)
+- Use **GPT-4o-mini** for APP_AGENT (frequent actions, 60% cheaper)
+- Same model can be used for BACKUP_AGENT and EVALUATION_AGENT
+
+## Configuration Fields
+
+### Common Fields (All Agents)
+
+These fields are available for `HOST_AGENT`, `APP_AGENT`, `BACKUP_AGENT`, `EVALUATION_AGENT`, and `OPERATOR`.
+
+#### Core Settings
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `VISUAL_MODE` | Boolean | ❌ | `True` | Enable vision capabilities (screenshot understanding) |
+| `REASONING_MODEL` | Boolean | ❌ | `False` | Whether model is a reasoning model (o1, o3, o3-mini) |
+| `API_TYPE` | String | ✅ | `"openai"` | LLM provider type |
+| `API_BASE` | String | ✅ | varies | API endpoint URL |
+| `API_KEY` | String | ✅ | `""` | API authentication key |
+| `API_MODEL` | String | ✅ | varies | Model identifier |
+| `API_VERSION` | String | ❌ | `"2025-02-01-preview"` | API version |
+
+**Legend:** ✅ = Required (must be set), ❌ = Optional (has default value)
+
+#### API_TYPE Options
+
+| API_TYPE | Provider | Example API_BASE |
+|----------|----------|------------------|
+| `"openai"` | OpenAI | `https://api.openai.com/v1/chat/completions` |
+| `"aoai"` | Azure OpenAI | `https://YOUR_RESOURCE.openai.azure.com` |
+| `"azure_ad"` | Azure OpenAI (AD auth) | `https://YOUR_RESOURCE.openai.azure.com` |
+| `"gemini"` | Google Gemini | `https://generativelanguage.googleapis.com` |
+| `"claude"` | Anthropic Claude | `https://api.anthropic.com` |
+| `"qwen"` | Alibaba Qwen | varies |
+| `"ollama"` | Ollama (local) | `http://localhost:11434` |
+
+#### Azure OpenAI Additional Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `API_DEPLOYMENT_ID` | String | ✅ (for AOAI) | Azure deployment name |
+
+**Example**:
+```yaml
+HOST_AGENT:
+ API_TYPE: "aoai"
+ API_BASE: "https://myresource.openai.azure.com"
+ API_KEY: "abc123..."
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "gpt-4o-deployment-name"
+```
+
+#### Azure AD Authentication Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `AAD_TENANT_ID` | String | ✅ (for azure_ad) | Azure AD tenant ID |
+| `AAD_API_SCOPE` | String | ✅ (for azure_ad) | Azure AD API scope |
+| `AAD_API_SCOPE_BASE` | String | ✅ (for azure_ad) | Scope base URL |
+
+**Example**:
+```yaml
+HOST_AGENT:
+ API_TYPE: "azure_ad"
+ API_BASE: "https://myresource.openai.azure.com"
+ AAD_TENANT_ID: "your-tenant-id"
+ AAD_API_SCOPE: "your-scope"
+ AAD_API_SCOPE_BASE: "API://your-scope-base"
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "gpt-4o-deployment"
+```
+
+#### Prompt Configuration
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `PROMPT` | String | ❌ | Path to main prompt template |
+| `EXAMPLE_PROMPT` | String | ❌ | Path to example prompt template |
+| `API_PROMPT` | String | ❌ | Path to API usage prompt (APP_AGENT only) |
+
+**Default Prompt Paths:**
+```yaml
+HOST_AGENT:
+ PROMPT: "ufo/prompts/share/base/host_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/host_agent_example.yaml"
+
+APP_AGENT:
+ PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/share/base/api.yaml"
+```
+
+You can customize prompts by creating your own YAML files and updating these paths. See the [Customization Guide](../../ufo2/advanced_usage/customization.md) for details.
+
+#### OPERATOR-Specific Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `SCALER` | List[int] | ❌ | Screen dimensions for visual input `[width, height]`, default: `[1024, 768]` |
+
+**Example:**
+```yaml
+OPERATOR:
+ SCALER: [1920, 1080] # Full HD resolution
+ API_MODEL: "computer-use-preview-20250311"
+ # ... other settings
+```
+
+## Complete Configuration Example
+
+Here's a complete `agents.yaml` with all agent types configured:
+
+```yaml
+# HOST_AGENT - Task planning and coordination
+HOST_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4o"
+ API_VERSION: "2025-02-01-preview"
+ PROMPT: "ufo/prompts/share/base/host_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/host_agent_example.yaml"
+
+# APP_AGENT - Action execution
+APP_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4o-mini" # Cheaper for frequent actions
+ API_VERSION: "2025-02-01-preview"
+ PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/share/base/api.yaml"
+
+# BACKUP_AGENT - Fallback agent
+BACKUP_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4-vision-preview"
+ API_VERSION: "2025-02-01-preview"
+
+# EVALUATION_AGENT - Task evaluation
+EVALUATION_AGENT:
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4o"
+ API_VERSION: "2025-02-01-preview"
+
+# OPERATOR - OpenAI Operator (optional)
+OPERATOR:
+ SCALER: [1024, 768] # Screen resolution for visual input
+ VISUAL_MODE: True
+ REASONING_MODEL: False
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "computer-use-preview-20250311"
+ API_VERSION: "2025-03-01-preview"
+```
+
+## Multi-Provider Configuration
+
+You can use different providers for different agents:
+
+```yaml
+# Use OpenAI for planning
+HOST_AGENT:
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_OPENAI_KEY"
+ API_MODEL: "gpt-4o"
+
+# Use Azure OpenAI for actions (cost control)
+APP_AGENT:
+ API_TYPE: "aoai"
+ API_BASE: "https://mycompany.openai.azure.com"
+ API_KEY: "YOUR_AZURE_KEY"
+ API_MODEL: "gpt-4o-mini"
+ API_DEPLOYMENT_ID: "gpt-4o-mini-deploy"
+
+# Use Claude for evaluation
+EVALUATION_AGENT:
+ API_TYPE: "claude"
+ API_BASE: "https://api.anthropic.com"
+ API_KEY: "YOUR_CLAUDE_KEY"
+ API_MODEL: "claude-3-5-sonnet-20241022"
+```
+
+## Model Recommendations
+
+### For HOST_AGENT (Planning)
+
+| Model | Provider | Pros | Cons |
+|-------|----------|------|------|
+| **gpt-4o** | OpenAI | Best overall, fast, multimodal | $$ |
+| **gpt-4-turbo** | OpenAI | Good quality, cheaper than GPT-4 | Slower |
+| **claude-3-5-sonnet** | Anthropic | Excellent reasoning | No vision API yet |
+| **gemini-2.0-flash** | Google | Fast, cheap, multimodal | New, less tested |
+
+### For APP_AGENT (Execution)
+
+| Model | Provider | Pros | Cons |
+|-------|----------|------|------|
+| **gpt-4o-mini** | OpenAI | 60% cheaper, fast, good quality | Slightly less capable |
+| **gpt-4o** | OpenAI | Best quality | More expensive |
+| **gemini-1.5-flash** | Google | Very cheap, fast | Less accurate |
+
+### For OPERATOR (CUA Mode)
+
+| Model | Provider | Notes |
+|-------|----------|-------|
+| **computer-use-preview-20250311** | OpenAI | Supported model for Operator mode (Computer Use Agent) |
+
+## Reasoning Models
+
+For models like OpenAI o1, o3, o3-mini, set `REASONING_MODEL: True`:
+
+```yaml
+HOST_AGENT:
+ REASONING_MODEL: True # Enable for o1/o3/o3-mini
+ API_TYPE: "openai"
+ API_MODEL: "o3-mini"
+ # ... other settings
+```
+
+**Note:** Reasoning models have different behavior including no streaming responses, different token limits, and may have different pricing.
+
+## Environment Variables
+
+Instead of hardcoding API keys, you can use environment variables:
+
+```yaml
+HOST_AGENT:
+ API_KEY: "${OPENAI_API_KEY}" # Reads from environment variable
+
+APP_AGENT:
+ API_KEY: "${AZURE_OPENAI_KEY}"
+```
+
+**Setting environment variables**:
+
+**Windows (PowerShell):**
+```powershell
+$env:OPENAI_API_KEY = "sk-your-key"
+$env:AZURE_OPENAI_KEY = "your-azure-key"
+```
+
+**Windows (Persistent):**
+```powershell
+[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-your-key', 'User')
+```
+
+**Linux/macOS:**
+```bash
+export OPENAI_API_KEY="sk-your-key"
+export AZURE_OPENAI_KEY="your-azure-key"
+```
+
+## Programmatic Access
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Access HOST_AGENT settings
+host_model = config.host_agent.api_model
+host_type = config.host_agent.api_type
+host_visual = config.host_agent.visual_mode
+
+# Access APP_AGENT settings
+app_model = config.app_agent.api_model
+app_key = config.app_agent.api_key
+
+# Check if agent is configured
+if config.host_agent.api_key:
+ print("HOST_AGENT is configured")
+else:
+ print("Warning: HOST_AGENT API key not set")
+```
+
+## Troubleshooting
+
+### Issue 1: "agents.yaml not found"
+
+**Error Message:**
+```
+FileNotFoundError: config/ufo/agents.yaml not found
+```
+
+**Solution:** Copy the template file
+```powershell
+Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+```
+
+### Issue 2: API Authentication Errors
+
+**Error Message:**
+```
+openai.AuthenticationError: Invalid API key
+```
+
+**Solutions:**
+1. Verify API key is correct
+2. Check for extra spaces or quotes
+3. Ensure API_TYPE matches your provider
+4. For Azure, verify API_DEPLOYMENT_ID is set
+
+### Issue 3: Model Not Found
+
+**Error Message:**
+```
+openai.NotFoundError: The model 'gpt-4o' does not exist
+```
+
+**Solutions:**
+1. Verify model name is correct (check provider's documentation)
+2. For Azure, ensure deployment exists and API_DEPLOYMENT_ID matches
+3. Check if you have access to the model
+
+### Issue 4: Rate Limits
+
+**Error Message:**
+```
+openai.RateLimitError: Rate limit exceeded
+```
+
+**Solutions:**
+1. Add delays between requests (configure in `system.yaml`)
+2. Upgrade your API plan
+3. Use different API keys for different agents
+
+## Security Best Practices
+
+**API Key Security Guidelines:**
+
+1. ✅ **Never commit `agents.yaml` to Git**
+ - Add to `.gitignore`
+ - Only commit `agents.yaml.template`
+
+2. ✅ **Use environment variables** for production
+ ```yaml
+ API_KEY: "${OPENAI_API_KEY}"
+ ```
+
+3. ✅ **Rotate keys regularly**
+
+4. ✅ **Use separate keys** for dev/prod environments
+
+5. ✅ **Restrict key permissions** (e.g., read-only for evaluation agents)
+
+## Related Documentation
+
+- **[Third-Party Agent Configuration](third_party_config.md)** - Configure external agents like LinuxAgent and HardwareAgent
+- **[Creating Custom Third-Party Agents](../../tutorials/creating_third_party_agents.md)** - Build your own specialized agents
+- **[System Configuration](system_config.md)** - Runtime and execution settings
+- **[MCP Configuration](mcp_reference.md)** - Tool server configuration
+- **[RAG Configuration](rag_config.md)** - Knowledge retrieval settings
+- **[Model Setup Guide](../models/overview.md)** - Provider-specific setup
+- **[Migration Guide](migration.md)** - Migrating from legacy config
+
+## Summary
+
+**Key Takeaways:**
+
+✅ **Copy template first**: `Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml`
+✅ **Add your API keys**: Edit `agents.yaml` with your credentials
+✅ **Choose models wisely**: GPT-4o for planning, GPT-4o-mini for actions
+✅ **Never commit secrets**: Keep `agents.yaml` out of version control
+✅ **Use environment variables**: For production deployments
+
+**Your agents are now ready to work!** 🚀
diff --git a/documents/docs/configuration/system/extending.md b/documents/docs/configuration/system/extending.md
new file mode 100644
index 000000000..09bffa80d
--- /dev/null
+++ b/documents/docs/configuration/system/extending.md
@@ -0,0 +1,334 @@
+# Extending Configuration
+
+This guide shows you how to add custom configuration options to UFO2.
+
+**Three Ways to Extend:**
+
+1. **Simple YAML files** - Quick custom settings in existing files
+2. **New configuration files** - Organize new features separately
+3. **Typed configuration schemas** - Full type safety with Python dataclasses
+
+## Method 1: Adding Fields to Existing Files
+
+For simple customizations, add fields directly to existing configuration files.
+
+```yaml
+ # config/ufo/system.yaml
+ MAX_STEP: 20
+ SLEEP_TIME: 1.0
+
+ # Your custom fields
+ CUSTOM_TIMEOUT: 300
+ DEBUG_MODE: true
+ FEATURE_FLAGS:
+ enable_telemetry: false
+ use_experimental_api: true
+```
+
+### Accessing Custom Fields
+
+```python
+ from config.config_loader import get_ufo_config
+
+ config = get_ufo_config()
+
+ # Access custom fields dynamically
+ timeout = config.system.CUSTOM_TIMEOUT # 300
+ debug = config.system.DEBUG_MODE # True
+ use_experimental = config.system.FEATURE_FLAGS['use_experimental_api'] # True
+```
+
+Custom fields are automatically discovered and loaded - no code modifications needed!
+
+---
+
+## Method 2: Creating New Configuration Files
+
+For larger features, create dedicated configuration files.
+
+```yaml
+ # config/ufo/analytics.yaml
+ ANALYTICS:
+ enabled: true
+ backend: "influxdb"
+ endpoint: "http://localhost:8086"
+ database: "ufo_metrics"
+ retention: "30d"
+
+ metrics:
+ - name: "task_duration"
+ type: "histogram"
+ - name: "success_rate"
+ type: "counter"
+ ```
+
+### Automatic Discovery
+
+The config loader automatically discovers and loads all YAML files in `config/ufo/`:
+
+```python
+# No registration needed!
+config = get_ufo_config()
+
+# Your new file is automatically loaded
+analytics_enabled = config.ANALYTICS['enabled']
+metrics = config.ANALYTICS['metrics']
+```
+
+---
+
+## Method 3: Typed Configuration Schemas
+
+For production features requiring type safety and validation, define typed schemas.
+
+```python
+ # config/config_schemas.py
+ from dataclasses import dataclass, field
+ from typing import List, Literal
+
+ @dataclass
+ class MetricConfig:
+ """Configuration for a single metric."""
+ name: str
+ type: Literal["counter", "histogram", "gauge"]
+ tags: List[str] = field(default_factory=list)
+
+ @dataclass
+ class AnalyticsConfig:
+ """Analytics system configuration."""
+
+ # Required fields
+ enabled: bool
+ backend: Literal["influxdb", "prometheus", "datadog"]
+ endpoint: str
+
+ # Optional fields with defaults
+ database: str = "ufo_metrics"
+ retention: str = "30d"
+ batch_size: int = 100
+ flush_interval: float = 10.0
+
+ # Nested configuration
+ metrics: List[MetricConfig] = field(default_factory=list)
+
+ def __post_init__(self):
+ """Validate configuration after initialization."""
+ if self.enabled and not self.endpoint:
+ raise ValueError("endpoint required when analytics enabled")
+
+ if self.batch_size <= 0:
+ raise ValueError("batch_size must be positive")
+```
+
+### Step 2: Integrate into UFOConfig
+
+```python
+ # config/config_schemas.py
+ from dataclasses import dataclass
+
+ @dataclass
+ class UFOConfig:
+ """Main UFO configuration."""
+ host_agent: AgentConfig
+ app_agent: AgentConfig
+ system: SystemConfig
+ rag: RAGConfig
+ analytics: AnalyticsConfig # Add your new config
+
+ # ... rest of implementation
+```
+
+### Step 3: Use Typed Configuration
+
+```python
+ from config.config_loader import get_ufo_config
+
+ config = get_ufo_config()
+
+ # Type-safe access with IDE autocomplete
+ if config.analytics.enabled:
+ for metric in config.analytics.metrics:
+ print(f"Metric: {metric.name}, Type: {metric.type}")
+
+ # Validation happens automatically
+ batch_size = config.analytics.batch_size # Guaranteed > 0
+```
+
+---
+
+## Common Patterns
+
+### Environment-Specific Overrides
+
+```yaml
+ # config/ufo/system.yaml (base)
+ LOG_LEVEL: "INFO"
+ DEBUG_MODE: false
+ CACHE_SIZE: 1000
+
+ # config/ufo/system.dev.yaml (development override)
+ LOG_LEVEL: "DEBUG"
+ DEBUG_MODE: true
+ PROFILING_ENABLED: true
+
+ # config/ufo/system.prod.yaml (production override)
+ LOG_LEVEL: "WARNING"
+ CACHE_SIZE: 10000
+ MONITORING_ENABLED: true
+```
+
+### Feature Flags
+
+```yaml
+ # config/ufo/features.yaml
+ FEATURES:
+ experimental_actions: false
+ multi_device_mode: true
+ advanced_logging: false
+
+ # Per-agent feature flags
+ agent_features:
+ host_agent:
+ use_vision_model: true
+ parallel_processing: false
+ app_agent:
+ speculative_execution: true
+ action_batching: true
+```
+
+### Plugin Configuration
+
+```yaml
+ # config/ufo/plugins.yaml
+ PLUGINS:
+ enabled: true
+ auto_discover: true
+ load_order:
+ - "core"
+ - "analytics"
+ - "custom"
+
+ plugins:
+ analytics:
+ enabled: true
+ config_file: "config/plugins/analytics.yaml"
+
+ custom_processor:
+ enabled: false
+ class: "plugins.custom.MyProcessor"
+ priority: 100
+```
+
+---
+
+## Best Practices
+
+**DO - Recommended Practices**
+
+- ✅ **Group related settings** in dedicated files
+- ✅ **Use typed schemas** for production features
+- ✅ **Provide sensible defaults** for all optional fields
+- ✅ **Add validation** in `__post_init__` methods
+- ✅ **Document all fields** with docstrings
+- ✅ **Use environment overrides** for deployment-specific settings
+- ✅ **Version your config schemas** when making breaking changes
+- ✅ **Test configuration loading** in CI/CD pipelines
+
+**DON'T - Anti-Patterns**
+
+- ❌ **Don't hardcode secrets** - use environment variables
+- ❌ **Don't duplicate settings** across multiple files
+- ❌ **Don't use dynamic field names** - breaks type safety
+- ❌ **Don't skip validation** - catch errors early
+- ❌ **Don't mix concerns** - keep configs focused
+- ❌ **Don't ignore warnings** from config loader
+- ❌ **Don't commit sensitive data** - use .env files
+
+---
+
+## Security Considerations
+
+!!!warning "Secrets Management"
+ Never commit sensitive data to configuration files:
+
+ ```yaml
+ # ? BAD - Hardcoded secrets
+ DATABASE:
+ password: "my-secret-password"
+ api_key: "sk-1234567890"
+
+ # ? GOOD - Environment variable references
+ DATABASE:
+ password: "${DB_PASSWORD}"
+ api_key: "${API_KEY}"
+ ```
+
+### "Environment Variables"
+Use environment variables for secrets:
+
+```python
+import os
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Resolve environment variables
+db_password = os.getenv('DB_PASSWORD')
+api_key = os.getenv('API_KEY')
+```
+
+---
+
+## Testing Your Configuration
+
+```python
+ import pytest
+ from config.config_loader import ConfigLoader
+ from config.ufo.schemas.analytics_config import AnalyticsConfig
+
+ def test_analytics_config_defaults():
+ """Test analytics configuration defaults."""
+ config_data = {
+ 'enabled': True,
+ 'backend': 'influxdb',
+ 'endpoint': 'http://localhost:8086'
+ }
+
+ analytics = AnalyticsConfig(**config_data)
+
+ assert analytics.enabled is True
+ assert analytics.database == 'ufo_metrics' # Default
+ assert analytics.batch_size == 100 # Default
+
+ def test_analytics_config_validation():
+ """Test analytics configuration validation."""
+ with pytest.raises(ValueError, match="endpoint required"):
+ AnalyticsConfig(enabled=True, backend='influxdb', endpoint='')
+
+ with pytest.raises(ValueError, match="batch_size must be positive"):
+ AnalyticsConfig(
+ enabled=True,
+ backend='influxdb',
+ endpoint='http://localhost',
+ batch_size=-1
+ )
+
+ def test_config_loading():
+ """Test full configuration loading."""
+ loader = ConfigLoader()
+ config = loader.load_ufo_config('config/ufo')
+
+ # Verify custom configuration loaded
+ assert hasattr(config, 'analytics')
+ assert config.analytics.enabled in [True, False]
+```
+
+---
+
+## Next Steps
+
+- **[Agents Configuration](./agents_config.md)** - LLM and agent settings
+- **[System Configuration](./system_config.md)** - Runtime and execution settings
+- **[RAG Configuration](./rag_config.md)** - Knowledge retrieval settings
+- **[Migration Guide](./migration.md)** - Migrate from legacy configuration
+- **[Configuration Overview](./overview.md)** - Understand configuration system design
diff --git a/documents/docs/configuration/system/galaxy_agent.md b/documents/docs/configuration/system/galaxy_agent.md
new file mode 100644
index 000000000..b86f93fa1
--- /dev/null
+++ b/documents/docs/configuration/system/galaxy_agent.md
@@ -0,0 +1,467 @@
+# Galaxy Constellation Agent Configuration
+
+**agent.yaml** configures the **Constellation Agent** - the AI agent responsible for creating constellations (task decomposition) and editing them based on execution results.
+
+---
+
+## Overview
+
+The **agent.yaml** configuration file provides **LLM and API settings** for the Constellation Agent. This agent is responsible for:
+
+- **Constellation Creation**: Breaking down user requests into device-specific tasks
+- **Constellation Editing**: Adjusting task plans based on execution results
+- **Device Selection**: Choosing appropriate devices for each sub-task
+- **Task Orchestration**: Coordinating multi-device workflows
+
+**Configuration Separation:**
+
+- **agent.yaml** - LLM configuration for constellation agent (this document)
+- **constellation.yaml** - Runtime settings for orchestrator ([Galaxy Constellation Configuration](./galaxy_constellation.md))
+- **devices.yaml** - Device definitions ([Galaxy Devices Configuration](./galaxy_devices.md))
+
+**Agent Role in System:**
+
+```mermaid
+graph TB
+ A[User Request] -->|Natural Language| B[Constellation Agent]
+ B -->|Uses LLM Config| C[agent.yaml]
+ B -->|Creates/Edits| D[Constellation Plan]
+ D -->|Tasks| E[Device Agent 1]
+ D -->|Tasks| F[Device Agent 2]
+ D -->|Tasks| G[Device Agent N]
+
+ style B fill:#e1f5ff
+ style C fill:#ffe1e1
+ style D fill:#fff4e1
+```
+
+---
+
+## File Location
+
+**Standard Location:**
+
+```
+UFO2/
+├── config/
+│ └── galaxy/
+│ ├── agent.yaml # ← Constellation agent config (copy from template)
+│ ├── agent.yaml.template # ← Template for initial setup
+│ ├── constellation.yaml # ← Runtime settings
+│ └── devices.yaml # ← Device definitions
+```
+
+!!!warning "Setup Required"
+ 1. Copy `agent.yaml.template` to `agent.yaml`
+ 2. Fill in your API credentials (API_KEY, AAD_TENANT_ID, etc.)
+ 3. Never commit `agent.yaml` with real credentials to version control
+
+**Loading in Code:**
+
+```python
+from config.config_loader import get_galaxy_config
+
+# Load Galaxy configuration (includes agent settings)
+config = get_galaxy_config()
+
+# Access constellation agent settings
+agent_config = config.constellation_agent
+reasoning_model = agent_config.reasoning_model
+api_type = agent_config.api_type
+api_model = agent_config.api_model
+```
+
+---
+
+## Configuration Schema
+
+### Complete Schema
+
+```yaml
+# Galaxy Constellation Agent Configuration
+
+CONSTELLATION_AGENT:
+ # Reasoning
+ REASONING_MODEL: bool # Enable reasoning/chain-of-thought
+
+ # API Connection
+ API_TYPE: string # API provider type
+ API_BASE: string # API base URL
+ API_KEY: string # API authentication key
+ API_VERSION: string # API version
+ API_MODEL: string # Model name/deployment
+
+ # Azure AD Authentication (for azure_ad API_TYPE)
+ AAD_TENANT_ID: string # Azure AD tenant ID
+ AAD_API_SCOPE: string # API scope name
+ AAD_API_SCOPE_BASE: string # API scope base GUID
+
+ # Prompt Configuration
+ CONSTELLATION_CREATION_PROMPT: string # Path to creation prompt template
+ CONSTELLATION_EDITING_PROMPT: string # Path to editing prompt template
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: string # Path to creation examples
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: string # Path to editing examples
+```
+
+---
+
+## Configuration Fields
+
+### Reasoning Capabilities
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `REASONING_MODEL` | `bool` | No | `False` | Enable chain-of-thought reasoning for complex planning |
+
+**Example:**
+
+```yaml
+CONSTELLATION_AGENT:
+ REASONING_MODEL: False # Standard LLM response (faster)
+```
+
+!!!tip "Reasoning Model"
+ Set `REASONING_MODEL: True` for:
+ - Complex multi-device workflows
+ - Tasks requiring step-by-step planning
+ - Debugging constellation failures
+
+ **Trade-off:** Slower response time, higher token cost
+
+---
+
+### API Connection Settings
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `API_TYPE` | `string` | Yes | - | API provider: `"openai"`, `"azure"`, `"azure_ad"`, `"aoai"` |
+| `API_BASE` | `string` | Yes* | - | API base URL (required for Azure) |
+| `API_KEY` | `string` | Yes* | - | API authentication key (required for non-AAD auth) |
+| `API_VERSION` | `string` | Yes* | - | API version (required for Azure) |
+| `API_MODEL` | `string` | Yes | - | Model name or deployment name |
+
+**Supported API Types:**
+
+| API_TYPE | Provider | Authentication | Example API_BASE |
+|----------|----------|----------------|------------------|
+| `openai` | OpenAI | API Key | Not required (uses default) |
+| `azure` | Azure OpenAI | API Key | `https://your-resource.openai.azure.com/` |
+| `azure_ad` | Azure OpenAI | Azure AD (AAD) | `https://your-resource.azure-api.net/` |
+| `aoai` | Azure OpenAI (alias) | API Key | `https://your-resource.openai.azure.com/` |
+
+---
+
+#### Example 1: OpenAI Configuration
+
+```yaml
+CONSTELLATION_AGENT:
+ API_TYPE: "openai"
+ API_KEY: "sk-proj-..." # Your OpenAI API key
+ API_MODEL: "gpt-4o" # OpenAI model name
+ API_VERSION: "2024-02-01" # Optional for OpenAI
+```
+
+---
+
+#### Example 2: Azure OpenAI (API Key Auth)
+
+```yaml
+CONSTELLATION_AGENT:
+ API_TYPE: "azure"
+ API_BASE: "https://my-resource.openai.azure.com/"
+ API_KEY: "abc123..." # Azure OpenAI API key
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o-deployment" # Your deployment name
+```
+
+---
+
+#### Example 3: Azure OpenAI (Azure AD Auth)
+
+```yaml
+CONSTELLATION_AGENT:
+ API_TYPE: "azure_ad"
+ API_BASE: "https://cloudgpt-openai.azure-api.net/"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-5-chat-20251003"
+
+ # Azure AD Configuration
+ AAD_TENANT_ID: "72f988bf-86f1-41af-91ab-2d7cd011db47"
+ AAD_API_SCOPE: "openai"
+ AAD_API_SCOPE_BASE: "feb7b661-cac7-44a8-8dc1-163b63c23df2"
+```
+
+!!!warning "Azure AD Authentication"
+ When using `API_TYPE: "azure_ad"`:
+ - No `API_KEY` needed (uses Azure AD token)
+ - Requires `AAD_TENANT_ID`, `AAD_API_SCOPE`, `AAD_API_SCOPE_BASE`
+ - User must be authenticated with `az login` or have proper AAD credentials
+
+---
+
+### Azure AD Fields (azure_ad API_TYPE only)
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `AAD_TENANT_ID` | `string` | Yes* | Azure AD tenant GUID |
+| `AAD_API_SCOPE` | `string` | Yes* | API scope identifier (e.g., "openai") |
+| `AAD_API_SCOPE_BASE` | `string` | Yes* | API scope base GUID |
+
+*Required only when `API_TYPE: "azure_ad"`
+
+---
+
+### Prompt Configuration Paths
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `CONSTELLATION_CREATION_PROMPT` | `string` | Yes | - | Path to constellation creation prompt template |
+| `CONSTELLATION_EDITING_PROMPT` | `string` | Yes | - | Path to constellation editing prompt template |
+| `CONSTELLATION_CREATION_EXAMPLE_PROMPT` | `string` | Yes | - | Path to creation examples (few-shot learning) |
+| `CONSTELLATION_EDITING_EXAMPLE_PROMPT` | `string` | Yes | - | Path to editing examples (few-shot learning) |
+
+**Default Prompt Paths:**
+
+```yaml
+CONSTELLATION_AGENT:
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+!!!tip "Custom Prompts"
+ You can customize prompts for your use case:
+ ```yaml
+ CONSTELLATION_CREATION_PROMPT: "custom_prompts/my_constellation_creation.yaml"
+ ```
+
+---
+
+## Complete Examples
+
+### Example 1: Production (Azure AD)
+
+```yaml
+# Galaxy Constellation Agent Configuration - Production
+# Uses Azure OpenAI with Azure AD authentication
+
+CONSTELLATION_AGENT:
+ # Capabilities
+ REASONING_MODEL: False
+
+ # Azure OpenAI (Azure AD Auth)
+ API_TYPE: "azure_ad"
+ API_BASE: "https://cloudgpt-openai.azure-api.net/"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-5-chat-20251003"
+
+ # Azure AD Configuration
+ AAD_TENANT_ID: "72f988bf-86f1-41af-91ab-2d7cd011db47"
+ AAD_API_SCOPE: "openai"
+ AAD_API_SCOPE_BASE: "feb7b661-cac7-44a8-8dc1-163b63c23df2"
+
+ # Prompt Configurations
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+---
+
+### Example 2: Development (OpenAI)
+
+```yaml
+# Galaxy Constellation Agent Configuration - Development
+# Uses OpenAI API for quick testing
+
+CONSTELLATION_AGENT:
+ # Capabilities
+ REASONING_MODEL: True # Enable for debugging
+
+ # OpenAI API
+ API_TYPE: "openai"
+ API_KEY: "sk-proj-..." # Your OpenAI API key (DO NOT COMMIT!)
+ API_MODEL: "gpt-4o"
+ API_VERSION: "2024-02-01"
+
+ # Prompt Configurations (default paths)
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+---
+
+### Example 3: Azure OpenAI (API Key)
+
+```yaml
+# Galaxy Constellation Agent Configuration - Azure (API Key Auth)
+# Uses Azure OpenAI with API key authentication
+
+CONSTELLATION_AGENT:
+ # Capabilities
+ REASONING_MODEL: False
+
+ # Azure OpenAI (API Key Auth)
+ API_TYPE: "azure"
+ API_BASE: "https://my-openai-resource.openai.azure.com/"
+ API_KEY: "abc123..." # Azure OpenAI API key (DO NOT COMMIT!)
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o-deployment-name"
+
+ # Prompt Configurations
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+---
+
+## Security Best Practices
+
+!!!danger "Never Commit Credentials"
+ **DO NOT commit `agent.yaml` with real credentials to version control!**
+
+ ✅ **Recommended Workflow:**
+ ```bash
+ # 1. Copy template
+ cp config/galaxy/agent.yaml.template config/galaxy/agent.yaml
+
+ # 2. Edit agent.yaml with your credentials
+ # (This file is .gitignored)
+
+ # 3. Commit only the template
+ git add config/galaxy/agent.yaml.template
+ git commit -m "Update agent template"
+ ```
+
+**Use Environment Variables for Sensitive Data:**
+
+```yaml
+# In agent.yaml
+CONSTELLATION_AGENT:
+ API_KEY: ${GALAXY_API_KEY} # Read from environment variable
+```
+
+```bash
+# In your shell
+export GALAXY_API_KEY="sk-proj-..."
+```
+
+---
+
+## Integration with Other Configurations
+
+The agent configuration works together with other Galaxy configs:
+
+**agent.yaml** (LLM config) + **constellation.yaml** (runtime) + **devices.yaml** (devices) → **Complete Galaxy System**
+
+### Complete Initialization Example
+
+```python
+from config.config_loader import get_galaxy_config
+from galaxy.agents.constellation_agent import ConstellationAgent
+from galaxy.client.device_manager import ConstellationDeviceManager
+import yaml
+
+# 1. Load all Galaxy configurations
+galaxy_config = get_galaxy_config()
+
+# 2. Initialize Constellation Agent with LLM config
+agent = ConstellationAgent(
+ reasoning_model=galaxy_config.constellation_agent.reasoning_model,
+ api_type=galaxy_config.constellation_agent.api_type,
+ api_base=galaxy_config.constellation_agent.api_base,
+ api_key=galaxy_config.constellation_agent.api_key,
+ api_version=galaxy_config.constellation_agent.api_version,
+ api_model=galaxy_config.constellation_agent.api_model
+)
+
+# 3. Load constellation runtime settings
+with open("config/galaxy/constellation.yaml", "r") as f:
+ constellation_config = yaml.safe_load(f)
+
+# 4. Initialize Device Manager with runtime settings
+device_manager = ConstellationDeviceManager(
+ task_name=constellation_config["CONSTELLATION_ID"],
+ heartbeat_interval=constellation_config["HEARTBEAT_INTERVAL"],
+ reconnect_delay=constellation_config["RECONNECT_DELAY"]
+)
+
+# 5. Load and register devices
+device_config_path = constellation_config["DEVICE_INFO"]
+with open(device_config_path, "r") as f:
+ devices_config = yaml.safe_load(f)
+
+for device in devices_config["devices"]:
+ await device_manager.register_device(**device)
+
+print("✅ Galaxy Constellation System Initialized")
+print(f" Agent Model: {galaxy_config.constellation_agent.api_model}")
+print(f" Constellation ID: {constellation_config['CONSTELLATION_ID']}")
+print(f" Devices: {len(devices_config['devices'])}")
+```
+
+---
+
+## Best Practices
+
+**Configuration Best Practices:**
+
+1. **Use Templates for Team Collaboration**
+ ```bash
+ # Share template, not credentials
+ config/galaxy/agent.yaml.template # ✅ Commit this
+ config/galaxy/agent.yaml # ❌ Never commit this
+ ```
+
+2. **Test with OpenAI, Deploy with Azure**
+ ```yaml
+ # Development: OpenAI (fast iteration)
+ API_TYPE: "openai"
+
+ # Production: Azure (enterprise features)
+ API_TYPE: "azure_ad"
+ ```
+
+3. **Use Reasoning Mode Selectively**
+ ```yaml
+ # For complex workflows
+ REASONING_MODEL: True
+
+ # For simple tasks
+ REASONING_MODEL: False # Faster
+ ```
+
+---
+
+## Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Constellation Runtime** | [Galaxy Constellation Configuration](./galaxy_constellation.md) | Runtime settings for orchestrator |
+| **Device Configuration** | [Galaxy Devices Configuration](./galaxy_devices.md) | Device definitions |
+| **System Configuration** | [Configuration Overview](./overview.md) | Overall configuration architecture |
+
+---
+
+## Next Steps
+
+1. **Copy Template**: `cp agent.yaml.template agent.yaml`
+2. **Configure Credentials**: Fill in API_KEY or AAD settings
+3. **Configure Runtime**: See [Galaxy Constellation Configuration](./galaxy_constellation.md)
+4. **Configure Devices**: See [Galaxy Devices Configuration](./galaxy_devices.md)
+5. **Test Constellation**: Run Galaxy orchestrator
+
+---
+
+## Source Code References
+
+- **ConstellationAgent**: `galaxy/agents/constellation_agent.py`
+- **Configuration Loading**: `config/config_loader.py`
+- **Configuration Schemas**: `config/config_schemas.py`
+- **Prompt Templates**: `galaxy/prompts/constellation/`
diff --git a/documents/docs/configuration/system/galaxy_constellation.md b/documents/docs/configuration/system/galaxy_constellation.md
new file mode 100644
index 000000000..33cc7de69
--- /dev/null
+++ b/documents/docs/configuration/system/galaxy_constellation.md
@@ -0,0 +1,459 @@
+# Galaxy Constellation Runtime Configuration
+
+**constellation.yaml** defines constellation-wide runtime settings that control how the Galaxy orchestrator manages devices, tasks, and logging across the entire constellation system.
+
+---
+
+## Overview
+
+The **constellation.yaml** configuration file provides **constellation-level runtime settings** that apply to the entire Galaxy system. These settings control:
+
+- Constellation identification and logging
+- Heartbeat and connection management
+- Task concurrency and step limits
+- Device configuration file path
+
+**Configuration Separation:**
+
+- **constellation.yaml** - Runtime settings for the constellation orchestrator (this document)
+- **devices.yaml** - Individual device definitions ([Galaxy Devices Configuration](./galaxy_devices.md))
+- **agent.yaml** - LLM configuration for constellation agent ([Galaxy Agent Configuration](./galaxy_agent.md))
+
+**Configuration Relationship:**
+
+```mermaid
+graph TB
+ A[constellation.yaml] -->|Runtime Settings| B[ConstellationDeviceManager]
+ C[devices.yaml] -->|Device Definitions| B
+ D[agent.yaml] -->|LLM Config| E[ConstellationAgent]
+ B -->|Orchestrates| F[Device Agents]
+ E -->|Plans Tasks| B
+
+ style A fill:#e1f5ff
+ style C fill:#fff4e1
+ style D fill:#ffe1e1
+```
+
+---
+
+## File Location
+
+**Standard Location:**
+
+```
+UFO2/
+├── config/
+│ └── galaxy/
+│ ├── constellation.yaml # ← Runtime settings (this file)
+│ ├── devices.yaml # ← Device definitions
+│ └── agent.yaml.template # ← Agent LLM configuration template
+```
+
+**Loading in Code:**
+
+```python
+import yaml
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Load constellation configuration
+with open("config/galaxy/constellation.yaml", "r", encoding="utf-8") as f:
+ config = yaml.safe_load(f)
+
+# Initialize ConstellationDeviceManager with runtime settings
+manager = ConstellationDeviceManager(
+ task_name=config["CONSTELLATION_ID"],
+ heartbeat_interval=config["HEARTBEAT_INTERVAL"],
+ reconnect_delay=config["RECONNECT_DELAY"]
+)
+
+# Load device configuration from specified path
+device_config_path = config["DEVICE_INFO"]
+with open(device_config_path, "r", encoding="utf-8") as f:
+ devices_config = yaml.safe_load(f)
+
+# Register devices
+for device in devices_config["devices"]:
+ await manager.register_device(**device)
+```
+
+---
+
+## Configuration Schema
+
+### Complete Schema
+
+```yaml
+# Galaxy Constellation Configuration
+# Runtime settings for constellation system
+
+# Constellation Identity & Logging
+CONSTELLATION_ID: string # Unique constellation identifier
+LOG_TO_MARKDOWN: bool # Save trajectory logs to markdown
+
+# Connection & Health Management
+HEARTBEAT_INTERVAL: float # Heartbeat check interval (seconds)
+RECONNECT_DELAY: float # Reconnection delay (seconds)
+
+# Task & Execution Limits
+MAX_CONCURRENT_TASKS: int # Maximum concurrent tasks
+MAX_STEP: int # Maximum steps per session
+
+# Device Configuration Reference
+DEVICE_INFO: string # Path to devices.yaml file
+```
+
+---
+
+## Configuration Fields
+
+### Constellation Identity & Logging
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `CONSTELLATION_ID` | `string` | Yes | - | Unique identifier for this constellation instance |
+| `LOG_TO_MARKDOWN` | `bool` | No | `true` | Whether to save trajectory logs in markdown format |
+
+**Example:**
+
+```yaml
+CONSTELLATION_ID: "production_constellation"
+LOG_TO_MARKDOWN: true
+```
+
+**Constellation ID Best Practices:**
+
+Use descriptive names that indicate environment and purpose:
+- `production_main` - Main production constellation
+- `dev_testing` - Development testing constellation
+- `qa_regression` - QA regression testing constellation
+
+---
+
+### Connection & Health Management
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `HEARTBEAT_INTERVAL` | `float` | No | `30.0` | Interval (in seconds) between heartbeat checks for connected devices |
+| `RECONNECT_DELAY` | `float` | No | `5.0` | Delay (in seconds) before attempting to reconnect a failed device |
+
+**Example:**
+
+```yaml
+HEARTBEAT_INTERVAL: 30.0 # Check device health every 30 seconds
+RECONNECT_DELAY: 5.0 # Wait 5 seconds before reconnecting
+```
+
+!!!info "Heartbeat Mechanism"
+ The heartbeat system monitors device agent connections:
+ - Every `HEARTBEAT_INTERVAL` seconds, the constellation checks if devices are responsive
+ - If a device fails to respond, it is marked as `FAILED`
+ - After `RECONNECT_DELAY` seconds, automatic reconnection is attempted
+ - Reconnection continues until `max_retries` is reached (configured per-device in devices.yaml)
+
+**Tuning Guidelines:**
+
+| Environment | HEARTBEAT_INTERVAL | RECONNECT_DELAY | Rationale |
+|-------------|-------------------|-----------------|-----------|
+| **Production** | 10.0 - 30.0 | 5.0 - 10.0 | Balance responsiveness with network overhead |
+| **Development** | 30.0 - 60.0 | 3.0 - 5.0 | Reduce noise during debugging |
+| **Testing** | 5.0 - 10.0 | 2.0 - 3.0 | Faster failure detection for tests |
+
+---
+
+### Task & Execution Limits
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `MAX_CONCURRENT_TASKS` | `int` | No | `6` | Maximum number of tasks that can run concurrently across all devices |
+| `MAX_STEP` | `int` | No | `15` | Maximum number of steps allowed per session before termination |
+
+**Example:**
+
+```yaml
+MAX_CONCURRENT_TASKS: 6 # Allow 6 tasks to run simultaneously
+MAX_STEP: 15 # Limit sessions to 15 steps
+```
+
+!!!warning "Concurrency Considerations"
+ - **MAX_CONCURRENT_TASKS** controls task queue parallelism across the entire constellation
+ - Each device can handle 1 task at a time (per device, not global)
+ - Example: 6 devices + MAX_CONCURRENT_TASKS=6 → All devices can be busy simultaneously
+ - Example: 10 devices + MAX_CONCURRENT_TASKS=4 → Only 4 devices busy at once, 6 idle
+
+**Task Concurrency Calculation:**
+
+```python
+# Effective concurrency
+effective_concurrency = min(
+ num_registered_devices,
+ MAX_CONCURRENT_TASKS
+)
+
+# Example 1: 3 devices, MAX_CONCURRENT_TASKS=6
+# → effective_concurrency = 3 (device-limited)
+
+# Example 2: 10 devices, MAX_CONCURRENT_TASKS=4
+# → effective_concurrency = 4 (config-limited)
+```
+
+**MAX_STEP Guidelines:**
+
+| Use Case | MAX_STEP | Rationale |
+|----------|----------|-----------|
+| **Simple Automation** | 5 - 10 | Quick tasks (open app, click button) |
+| **Complex Workflows** | 15 - 30 | Multi-step processes (data entry, reporting) |
+| **Unrestricted** | 100+ | Research, exploratory tasks |
+
+---
+
+### Device Configuration Reference
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `DEVICE_INFO` | `string` | Yes | - | Relative or absolute path to `devices.yaml` configuration file |
+
+**Example:**
+
+```yaml
+DEVICE_INFO: "config/galaxy/devices.yaml"
+```
+
+**Path Resolution:**
+
+- **Relative paths** are resolved from the UFO2 project root
+- **Absolute paths** are supported for external configuration files
+- The loader validates that the file exists and is readable
+
+**Example Paths:**
+
+```yaml
+# Relative path (recommended)
+DEVICE_INFO: "config/galaxy/devices.yaml"
+
+# Absolute path
+DEVICE_INFO: "/etc/ufo/galaxy/devices.yaml"
+
+# Different config for testing
+DEVICE_INFO: "config/galaxy/devices_test.yaml"
+```
+
+---
+
+## Complete Examples
+
+### Example 1: Production Configuration
+
+```yaml
+# Galaxy Constellation Configuration - Production
+# High reliability, moderate concurrency
+
+# Identity & Logging
+CONSTELLATION_ID: "production_main"
+LOG_TO_MARKDOWN: true
+
+# Connection & Health
+HEARTBEAT_INTERVAL: 15.0 # Fast failure detection
+RECONNECT_DELAY: 10.0 # Give devices time to recover
+
+# Task Limits
+MAX_CONCURRENT_TASKS: 10 # High concurrency for production load
+MAX_STEP: 20 # Allow complex workflows
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices.yaml"
+```
+
+**Use Case:** Production constellation managing office automation across 10+ devices.
+
+---
+
+### Example 2: Development Configuration
+
+```yaml
+# Galaxy Constellation Configuration - Development
+# Relaxed settings for testing and debugging
+
+# Identity & Logging
+CONSTELLATION_ID: "dev_testing"
+LOG_TO_MARKDOWN: true
+
+# Connection & Health
+HEARTBEAT_INTERVAL: 60.0 # Reduce noise during debugging
+RECONNECT_DELAY: 5.0 # Fast reconnects for quick iteration
+
+# Task Limits
+MAX_CONCURRENT_TASKS: 3 # Limit concurrency for easier debugging
+MAX_STEP: 50 # Allow exploration and experimentation
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices_dev.yaml"
+```
+
+**Use Case:** Development environment with 2-3 test devices for feature development.
+
+---
+
+### Example 3: Testing/CI Configuration
+
+```yaml
+# Galaxy Constellation Configuration - CI/CD
+# Fast failure detection, limited concurrency
+
+# Identity & Logging
+CONSTELLATION_ID: "ci_regression"
+LOG_TO_MARKDOWN: true
+
+# Connection & Health
+HEARTBEAT_INTERVAL: 5.0 # Very fast detection for CI
+RECONNECT_DELAY: 2.0 # Quick retries in CI environment
+
+# Task Limits
+MAX_CONCURRENT_TASKS: 4 # Parallel test execution
+MAX_STEP: 15 # Strict limits for regression tests
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices_ci.yaml"
+```
+
+**Use Case:** Automated testing in CI/CD pipeline with controlled test devices.
+
+---
+
+## Integration with Device Configuration
+
+The constellation configuration works together with device configuration:
+
+**constellation.yaml (runtime)** + **devices.yaml (device definitions)** → **Complete Constellation System**
+
+### Loading Workflow
+
+```mermaid
+sequenceDiagram
+ participant App as Application
+ participant Config as constellation.yaml
+ participant DevConfig as devices.yaml
+ participant Manager as ConstellationDeviceManager
+
+ App->>Config: Load constellation.yaml
+ Config-->>App: Runtime settings
+
+ App->>Manager: Initialize with runtime settings
+ Note over Manager: CONSTELLATION_ID, HEARTBEAT_INTERVAL, etc.
+
+ App->>Config: Read DEVICE_INFO path
+ Config-->>App: "config/galaxy/devices.yaml"
+
+ App->>DevConfig: Load devices.yaml from path
+ DevConfig-->>App: Device definitions
+
+ App->>Manager: Register devices
+ Manager->>Manager: Apply runtime settings to all devices
+```
+
+### Example: Complete Initialization
+
+```python
+import yaml
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# 1. Load constellation runtime settings
+with open("config/galaxy/constellation.yaml", "r", encoding="utf-8") as f:
+ constellation_config = yaml.safe_load(f)
+
+# 2. Initialize manager with runtime settings
+manager = ConstellationDeviceManager(
+ task_name=constellation_config["CONSTELLATION_ID"],
+ heartbeat_interval=constellation_config["HEARTBEAT_INTERVAL"],
+ reconnect_delay=constellation_config["RECONNECT_DELAY"]
+)
+
+# 3. Load device configuration from path specified in constellation.yaml
+device_config_path = constellation_config["DEVICE_INFO"]
+with open(device_config_path, "r", encoding="utf-8") as f:
+ devices_config = yaml.safe_load(f)
+
+# 4. Register all devices
+for device in devices_config["devices"]:
+ await manager.register_device(
+ device_id=device["device_id"],
+ server_url=device["server_url"],
+ os=device.get("os"),
+ capabilities=device.get("capabilities", []),
+ metadata=device.get("metadata", {}),
+ max_retries=device.get("max_retries", 5),
+ auto_connect=device.get("auto_connect", True)
+ )
+
+print(f"✅ Constellation '{constellation_config['CONSTELLATION_ID']}' initialized")
+print(f" Devices registered: {len(devices_config['devices'])}")
+print(f" Max concurrent tasks: {constellation_config['MAX_CONCURRENT_TASKS']}")
+```
+
+---
+
+## Best Practices
+
+**Configuration Best Practices:**
+
+1. **Use Environment-Specific Configurations**
+ ```bash
+ config/galaxy/
+ ├── constellation.yaml # Base production config
+ ├── constellation_dev.yaml # Development overrides
+ ├── constellation_test.yaml # Testing overrides
+ ```
+
+2. **Tune Heartbeat for Your Network**
+ ```yaml
+ # Local network - fast heartbeats
+ HEARTBEAT_INTERVAL: 10.0
+
+ # WAN/Internet - slower heartbeats
+ HEARTBEAT_INTERVAL: 60.0
+ ```
+
+3. **Match Concurrency to Use Case**
+ ```yaml
+ # High-throughput automation
+ MAX_CONCURRENT_TASKS: 20
+
+ # Resource-constrained environment
+ MAX_CONCURRENT_TASKS: 3
+ ```
+
+4. **Set Reasonable Step Limits**
+ ```yaml
+ # Prevent runaway sessions
+ MAX_STEP: 30
+
+ # For debugging (see all steps)
+ MAX_STEP: 100
+ ```
+
+---
+
+## Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Device Configuration** | [Galaxy Devices Configuration](./galaxy_devices.md) | Device definitions and capabilities |
+| **Agent Configuration** | [Galaxy Agent Configuration](./galaxy_agent.md) | LLM settings for constellation agent |
+| **Agent Registration** | [Agent Registration Overview](../../galaxy/agent_registration/overview.md) | Registration process and architecture |
+| **System Configuration** | [Configuration Overview](./overview.md) | Overall configuration architecture |
+
+---
+
+## Next Steps
+
+1. **Configure Devices**: See [Galaxy Devices Configuration](./galaxy_devices.md)
+2. **Configure Agent**: See [Galaxy Agent Configuration](./galaxy_agent.md)
+3. **Understand Registration**: Read [Agent Registration Overview](../../galaxy/agent_registration/overview.md)
+4. **Run Constellation**: Check Galaxy orchestrator documentation
+
+---
+
+## Source Code References
+
+- **ConstellationDeviceManager**: `galaxy/client/device_manager.py`
+- **Configuration Loading**: `config/config_loader.py`
+- **Configuration Schemas**: `config/config_schemas.py`
diff --git a/documents/docs/configuration/system/galaxy_devices.md b/documents/docs/configuration/system/galaxy_devices.md
new file mode 100644
index 000000000..9d81664aa
--- /dev/null
+++ b/documents/docs/configuration/system/galaxy_devices.md
@@ -0,0 +1,780 @@
+# Galaxy Devices Configuration
+
+Device configuration in **devices.yaml** defines the constellation's device agents, providing device identity, capabilities, metadata, and connection parameters for each agent in the constellation.
+
+---
+
+## Overview
+
+The **devices.yaml** configuration file defines the **devices array** for the Galaxy constellation system. It provides:
+
+- Device identity and endpoint information
+- User-specified capabilities
+- Custom metadata and preferences
+- Connection and retry parameters
+
+**Constellation vs Device Configuration:**
+
+- **devices.yaml** - Defines individual device agents (this document)
+- **constellation.yaml** - Defines constellation-wide runtime settings
+- See [Galaxy Constellation Configuration](./galaxy_constellation.md) for runtime settings
+
+**Configuration Flow:**
+
+```mermaid
+graph LR
+ A[devices.yaml] -->|Load| B[ConstellationDeviceManager]
+ B -->|Parse| C[Device Entries]
+ C -->|For Each Device| D[DeviceRegistry.register_device]
+ D -->|Create| E[AgentProfile v1]
+ E -->|If auto_connect| F[Connection Process]
+ F -->|Merge| G[Complete AgentProfile]
+
+ style A fill:#e1f5ff
+ style E fill:#fff4e1
+ style G fill:#c8e6c9
+```
+
+---
+
+## 📁 File Location
+
+**Standard Location:**
+
+```
+UFO2/
+├── config/
+ └── galaxy/
+ ├── devices.yaml # 📄 Device definitions (this file)
+ ├── constellation.yaml # ⚙️ Runtime settings
+ └── agent.yaml.template # 🤖 Agent LLM configuration template
+```
+
+**Loading in Code:**
+
+```python
+from galaxy.client.device_manager import ConstellationDeviceManager
+import yaml
+
+# Load device configuration
+with open("config/galaxy/devices.yaml", "r", encoding="utf-8") as f:
+ devices_config = yaml.safe_load(f)
+
+# Load constellation configuration
+with open("config/galaxy/constellation.yaml", "r", encoding="utf-8") as f:
+ constellation_config = yaml.safe_load(f)
+
+# Initialize manager with constellation settings
+manager = ConstellationDeviceManager(
+ task_name=constellation_config.get("CONSTELLATION_ID", "default"),
+ heartbeat_interval=constellation_config.get("HEARTBEAT_INTERVAL", 30.0),
+ reconnect_delay=constellation_config.get("RECONNECT_DELAY", 5.0)
+)
+
+# Register devices from devices.yaml
+for device_config in devices_config["devices"]:
+ await manager.register_device(
+ device_id=device_config["device_id"],
+ server_url=device_config["server_url"],
+ os=device_config.get("os"),
+ capabilities=device_config.get("capabilities", []),
+ metadata=device_config.get("metadata", {}),
+ max_retries=device_config.get("max_retries", 5),
+ auto_connect=device_config.get("auto_connect", True)
+ )
+```
+
+---
+
+## 📝 Configuration Schema
+
+### File Structure
+
+```yaml
+# Device Configuration - YAML Format
+# Defines devices for the constellation
+# Runtime settings are configured in constellation.yaml
+
+devices: # List of device configurations
+ - device_id: string # Unique device identifier
+ server_url: string # WebSocket URL of device agent
+ os: string # Operating system
+ capabilities: list[string] # Device capabilities
+ metadata: dict # Custom metadata
+ max_retries: int # Connection retry limit
+ auto_connect: bool # Auto-connect on registration
+```
+
+---
+
+### Device Configuration Fields
+
+#### Required Fields
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `device_id` | `string` | **Unique device identifier** | `"windowsagent"`, `"linux_server_01"` |
+| `server_url` | `string` | **WebSocket endpoint URL** | `"ws://localhost:5005/ws"` |
+
+!!!danger "Required Fields"
+ `device_id` and `server_url` are **required** for every device. Registration will fail without them.
+
+#### Optional Fields
+
+| Field | Type | Default | Description | Example |
+|-------|------|---------|-------------|---------|
+| `os` | `string` | `None` | Operating system type | `"windows"`, `"linux"`, `"darwin"` |
+| `capabilities` | `list[string]` | `[]` | Device capabilities | `["web_browsing", "office"]` |
+| `metadata` | `dict` | `{}` | Custom metadata | See [Metadata Fields](#metadata-fields) |
+| `max_retries` | `int` | `5` | Maximum connection retries | `3`, `10` |
+| `auto_connect` | `bool` | `true` | Auto-connect after registration | `true`, `false` |
+
+!!!danger "Required Fields"
+ `device_id` and `server_url` are **required** for every device. Registration will fail without them.
+
+---
+
+---
+
+### Metadata Fields
+
+The `metadata` dictionary is **completely flexible** and can contain any custom fields. However, some common patterns are recommended:
+
+**Recommended Metadata Fields:**
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `location` | `string` | Physical location | `"office_desktop"`, `"datacenter_rack_a42"` |
+| `performance` | `string` | Performance tier | `"low"`, `"medium"`, `"high"`, `"very_high"` |
+| `description` | `string` | Human-readable description | `"Primary Windows workstation"` |
+| `tags` | `list[string]` | Custom tags | `["production", "gpu", "critical"]` |
+| `operation_engineer_email` | `string` | Contact email | `"admin@example.com"` |
+| `operation_engineer_name` | `string` | Contact name | `"John Doe"` |
+
+**Custom Fields (Application-Specific):**
+
+```yaml
+metadata:
+ # File paths
+ logs_file_path: "/var/log/application.log"
+ dev_path: "/home/deploy/projects/"
+ app_log_file: "log_detailed.xlsx"
+
+ # Excel logging
+ sheet_name_for_writing_log_in_excel: "report"
+
+ # Email configuration
+ sender_name: "Automation Bot"
+
+ # Log patterns
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+
+ # GPU information
+ gpu_type: "NVIDIA RTX 4090"
+ gpu_count: 2
+ gpu_memory_gb: 48
+```
+
+---
+
+## 📚 Complete Example
+
+### Example 1: Multi-Device Constellation
+
+```yaml
+# Device Configuration - YAML Format
+# Defines devices for the constellation
+# Runtime settings (constellation_id, heartbeat_interval, etc.) are configured in constellation.yaml
+
+devices:
+ # ===== Windows Desktop Agent =====
+ - device_id: "windowsagent"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "web_browsing"
+ - "office_applications"
+ - "file_management"
+ - "email_sending"
+ metadata:
+ location: "office_desktop"
+ performance: "high"
+ description: "Primary Windows workstation for office automation"
+ operation_engineer_email: "admin@example.com"
+ operation_engineer_name: "John Doe"
+ sender_name: "Office Bot"
+ app_log_file: "automation_log.xlsx"
+ sheet_name_for_writing_log_in_excel: "report"
+ tags:
+ - "production"
+ - "office"
+ - "critical"
+ max_retries: 5
+ auto_connect: true
+
+ # ===== Linux Server 1 =====
+ - device_id: "linux_server_01"
+ server_url: "ws://10.0.1.50:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server_management"
+ - "log_monitoring"
+ - "database_operations"
+ metadata:
+ location: "datacenter_rack_a42"
+ performance: "medium"
+ description: "Production Linux server for backend services"
+ logs_file_path: "/var/log/application.log"
+ dev_path: "/home/deploy/projects/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+ tags:
+ - "production"
+ - "backend"
+ - "monitoring"
+ max_retries: 3
+ auto_connect: true
+
+ # ===== Linux Server 2 =====
+ - device_id: "linux_server_02"
+ server_url: "ws://10.0.1.51:5002/ws"
+ os: "linux"
+ capabilities:
+ - "server_management"
+ - "log_monitoring"
+ - "database_operations"
+ metadata:
+ location: "datacenter_rack_a43"
+ performance: "medium"
+ description: "Secondary Linux server for load balancing"
+ logs_file_path: "/var/log/application.log"
+ dev_path: "/home/deploy/projects/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR or FATAL"
+ tags:
+ - "production"
+ - "backend"
+ - "load_balancer"
+ max_retries: 3
+ auto_connect: true
+
+ # ===== GPU Workstation =====
+ - device_id: "gpu_workstation"
+ server_url: "ws://192.168.1.100:5005/ws"
+ os: "windows"
+ capabilities:
+ - "gpu_computation"
+ - "model_training"
+ - "data_processing"
+ - "deep_learning"
+ metadata:
+ location: "ml_lab"
+ performance: "very_high"
+ description: "High-performance GPU workstation for ML training"
+ operation_engineer_email: "ml-team@example.com"
+ gpu_type: "NVIDIA RTX 4090"
+ gpu_count: 2
+ gpu_memory_gb: 48
+ cpu_count: 32
+ memory_total_gb: 128
+ tags:
+ - "production"
+ - "ml"
+ - "gpu"
+ - "high_priority"
+ max_retries: 10
+ auto_connect: true
+```
+
+### Example 2: Development Environment
+
+```yaml
+# Device Configuration - YAML Format
+# Runtime settings are configured in constellation.yaml
+
+devices:
+ - device_id: "dev_windows"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "web_browsing"
+ - "office_applications"
+ metadata:
+ location: "developer_laptop"
+ performance: "medium"
+ description: "Development Windows machine"
+ environment: "development"
+ max_retries: 3
+ auto_connect: true
+
+ - device_id: "dev_linux"
+ server_url: "ws://localhost:5001/ws"
+ os: "linux"
+ capabilities:
+ - "cli"
+ - "file_system"
+ metadata:
+ location: "developer_laptop"
+ performance: "medium"
+ description: "Development Linux VM"
+ environment: "development"
+ max_retries: 3
+ auto_connect: false # Manual connection for debugging
+```
+
+---
+
+## 🔄 Multi-Source Metadata Merging
+
+The `metadata` field in configuration is **Source 1** in the multi-source profiling architecture. It will be merged with:
+
+- **Source 2**: Service-level manifest (registration data)
+- **Source 3**: Client telemetry (DeviceInfoProvider)
+
+### Merging Process
+
+```mermaid
+graph TB
+ subgraph "Source 1: User Config"
+ UC[metadata in devices.yaml]
+ UC --> |location, performance, tags| Final
+ end
+
+ subgraph "Source 2: Service Manifest"
+ SM[AIP Registration]
+ SM --> |platform, registration_time| Final
+ end
+
+ subgraph "Source 3: Client Telemetry"
+ CT[DeviceInfoProvider]
+ CT --> |system_info object| Final
+ end
+
+ Final[Complete metadata in AgentProfile]
+
+ style UC fill:#e1f5ff
+ style SM fill:#fff4e1
+ style CT fill:#e8f5e9
+ style Final fill:#f3e5f5
+```
+
+**Before Merging (User Config Only):**
+
+```yaml
+metadata:
+ location: "office_desktop"
+ performance: "high"
+ description: "Primary Windows workstation"
+```
+
+**After Merging (All Sources):**
+
+```python
+metadata = {
+ # Source 1: User Config
+ "location": "office_desktop",
+ "performance": "high",
+ "description": "Primary Windows workstation",
+
+ # Source 2: Service Manifest
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00Z",
+
+ # Source 3: Client Telemetry
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ }
+}
+```
+
+See [AgentProfile Documentation](../../galaxy/agent_registration/agent_profile.md#multi-source-construction) for merging details.
+
+---
+
+## 🎯 Use Cases and Patterns
+
+### Pattern 1: Office Automation
+
+```yaml
+devices:
+ - device_id: "office_pc"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "web_browsing"
+ - "office_applications"
+ - "email_sending"
+ - "file_management"
+ metadata:
+ location: "office_desktop"
+ performance: "medium"
+ description: "Office PC for daily automation tasks"
+ operation_engineer_email: "it@company.com"
+ sender_name: "Office Automation"
+ app_log_file: "office_automation.xlsx"
+```
+
+**Task Assignment:**
+
+```python
+# Find device with office capabilities
+devices = manager.get_all_devices(connected=True)
+for device_id, profile in devices.items():
+ if "office_applications" in profile.capabilities:
+ await manager.assign_task_to_device(
+ task_id="create_report",
+ device_id=device_id,
+ task_description="Create monthly report in Excel",
+ task_data={"template": "monthly_template.xlsx"}
+ )
+```
+
+### Pattern 2: Server Monitoring
+
+```yaml
+devices:
+ - device_id: "prod_server_01"
+ server_url: "ws://10.0.1.50:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server_management"
+ - "log_monitoring"
+ metadata:
+ location: "datacenter_us_west"
+ performance: "high"
+ logs_file_path: "/var/log/app.log"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+```
+
+**Task Assignment:**
+
+```python
+# Monitor server logs
+await manager.assign_task_to_device(
+ task_id="monitor_logs",
+ device_id="prod_server_01",
+ task_description="Check logs for errors",
+ task_data={
+ "log_file": profile.metadata["logs_file_path"],
+ "error_pattern": profile.metadata["error_log_pattern"]
+ }
+)
+```
+
+### Pattern 3: GPU Computation
+
+```yaml
+devices:
+ - device_id: "gpu_node_01"
+ server_url: "ws://192.168.1.100:5005/ws"
+ os: "linux"
+ capabilities:
+ - "gpu_computation"
+ - "model_training"
+ - "data_processing"
+ metadata:
+ location: "ml_lab_rack_01"
+ performance: "very_high"
+ gpu_type: "NVIDIA A100"
+ gpu_count: 4
+ gpu_memory_gb: 320 # 4 × 80GB
+ cpu_count: 96
+ memory_total_gb: 1024
+```
+
+**Task Assignment:**
+
+```python
+# Select GPU device based on metadata
+devices = manager.get_all_devices(connected=True)
+for device_id, profile in devices.items():
+ metadata = profile.metadata
+ if (
+ "gpu_computation" in profile.capabilities
+ and metadata.get("gpu_count", 0) >= 4
+ and metadata.get("gpu_memory_gb", 0) >= 300
+ ):
+ await manager.assign_task_to_device(
+ task_id="train_model",
+ device_id=device_id,
+ task_description="Train large language model",
+ task_data={"model": "llama-70b", "dataset": "training_data.json"}
+ )
+```
+
+---
+
+## ⚠️ Validation and Best Practices
+
+### Required Field Validation
+
+```python
+def validate_device_config(device: dict) -> bool:
+ """Validate device configuration."""
+
+ # Required fields
+ if "device_id" not in device:
+ logger.error("Missing required field: device_id")
+ return False
+
+ if "server_url" not in device:
+ logger.error("Missing required field: server_url")
+ return False
+
+ # Validate server_url format
+ if not device["server_url"].startswith("ws://") and \
+ not device["server_url"].startswith("wss://"):
+ logger.error(f"Invalid server_url: {device['server_url']}")
+ return False
+
+ return True
+```
+
+### Best Practices
+
+!!!tip "Configuration Best Practices"
+
+ **1. Use Meaningful device_id**
+ ```yaml
+ # ✅ Good: Descriptive and unique
+ device_id: "windows_office_pc_01"
+ device_id: "linux_prod_server_us_west_01"
+ device_id: "gpu_ml_workstation_lab_a"
+
+ # ❌ Bad: Generic or ambiguous
+ device_id: "device1"
+ device_id: "test"
+ device_id: "agent"
+ ```
+
+ **2. Specify Granular Capabilities**
+ ```yaml
+ # ✅ Good: Specific capabilities
+ capabilities:
+ - "web_browsing_chrome"
+ - "office_excel_automation"
+ - "email_outlook"
+
+ # ❌ Bad: Vague capabilities
+ capabilities:
+ - "office"
+ - "internet"
+ ```
+
+ **3. Include Rich Metadata**
+ ```yaml
+ # ✅ Good: Comprehensive metadata
+ metadata:
+ location: "datacenter_us_west_rack_a42"
+ performance: "very_high"
+ description: "Production GPU server for ML training"
+ tags: ["production", "ml", "gpu", "critical"]
+ operation_engineer_email: "ml-ops@company.com"
+ gpu_type: "NVIDIA A100"
+ gpu_count: 4
+
+ # ❌ Bad: Minimal metadata
+ metadata:
+ location: "server room"
+ ```
+
+ **4. Set Appropriate max_retries**
+ ```yaml
+ # Critical production devices
+ max_retries: 10
+
+ # Development/test devices
+ max_retries: 3
+ ```
+
+ **5. Use auto_connect Wisely**
+ ```yaml
+ # Production: auto-connect
+ auto_connect: true
+
+ # Development/debugging: manual connect
+ auto_connect: false
+ ```
+
+---
+
+## 🔧 Loading and Parsing
+
+### Basic Loading
+
+```python
+import yaml
+
+with open("config/galaxy/devices.yaml", "r", encoding="utf-8") as f:
+ config = yaml.safe_load(f)
+
+# Access constellation-level settings
+constellation_id = config.get("constellation_id", "default")
+heartbeat_interval = config.get("heartbeat_interval", 30.0)
+
+# Access devices
+devices = config.get("devices", [])
+```
+
+### Loading with Validation
+
+```python
+import yaml
+from typing import Dict, List, Any
+
+def load_and_validate_config(config_path: str) -> Dict[str, Any]:
+ """Load and validate devices configuration."""
+
+ with open(config_path, "r", encoding="utf-8") as f:
+ config = yaml.safe_load(f)
+
+ # Validate top-level structure
+ if "devices" not in config:
+ raise ValueError("Configuration must contain 'devices' list")
+
+ if not isinstance(config["devices"], list):
+ raise ValueError("'devices' must be a list")
+
+ # Validate each device
+ for i, device in enumerate(config["devices"]):
+ if "device_id" not in device:
+ raise ValueError(f"Device {i}: Missing 'device_id'")
+
+ if "server_url" not in device:
+ raise ValueError(f"Device {i}: Missing 'server_url'")
+
+ # Validate URL format
+ if not device["server_url"].startswith(("ws://", "wss://")):
+ raise ValueError(
+ f"Device {device['device_id']}: Invalid server_url format"
+ )
+
+ return config
+```
+
+### Registration from Config
+
+```python
+async def register_devices_from_config(
+ manager: ConstellationDeviceManager,
+ config_path: str
+) -> List[str]:
+ """Register all devices from configuration file."""
+
+ config = load_and_validate_config(config_path)
+
+ registered = []
+ failed = []
+
+ for device_config in config["devices"]:
+ try:
+ success = await manager.register_device(
+ device_id=device_config["device_id"],
+ server_url=device_config["server_url"],
+ os=device_config.get("os"),
+ capabilities=device_config.get("capabilities", []),
+ metadata=device_config.get("metadata", {}),
+ max_retries=device_config.get("max_retries", 5),
+ auto_connect=device_config.get("auto_connect", True)
+ )
+
+ if success:
+ registered.append(device_config["device_id"])
+ else:
+ failed.append(device_config["device_id"])
+
+ except Exception as e:
+ logger.error(
+ f"Failed to register {device_config['device_id']}: {e}"
+ )
+ failed.append(device_config["device_id"])
+
+ logger.info(f"Registered: {len(registered)} devices")
+ if failed:
+ logger.warning(f"Failed: {len(failed)} devices - {failed}")
+
+ return registered
+```
+
+---
+
+## 🔗 Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Overview** | [Agent Registration Overview](./overview.md) | Registration architecture |
+| **AgentProfile** | [AgentProfile](../../galaxy/agent_registration/agent_profile.md) | Profile structure and merging |
+| **Registration Flow** | [Registration Flow](../../galaxy/agent_registration/registration_flow.md) | Registration process |
+| **Device Registry** | [Device Registry](../../galaxy/agent_registration/device_registry.md) | Registry component |
+| **Device Info** | [Device Info Provider](../../client/device_info.md) | Telemetry (Source 3) |
+
+---
+
+## 💡 Tips and Tricks
+
+!!!tip "Advanced Configuration Tips"
+
+ **Use YAML Anchors for Reusable Metadata**
+ ```yaml
+ # Define reusable metadata templates
+ _metadata_templates:
+ production_server: &prod_server
+ environment: "production"
+ tags: ["production", "critical"]
+ max_retries: 10
+
+ dev_server: &dev_server
+ environment: "development"
+ tags: ["development", "testing"]
+ max_retries: 3
+
+ devices:
+ - device_id: "prod_server_01"
+ server_url: "ws://10.0.1.50:5001/ws"
+ metadata:
+ <<: *prod_server # Merge production template
+ location: "datacenter_us_west"
+
+ - device_id: "dev_server_01"
+ server_url: "ws://localhost:5001/ws"
+ metadata:
+ <<: *dev_server # Merge dev template
+ location: "developer_laptop"
+ ```
+
+ **Environment Variable Substitution**
+ ```yaml
+ # Use environment variables for sensitive data
+ devices:
+ - device_id: "prod_server"
+ server_url: "${SERVER_URL}" # From environment
+ metadata:
+ api_key: "${API_KEY}"
+ ```
+
+---
+
+## 🚀 Next Steps
+
+1. **Create Your Configuration**: Copy example and customize
+2. **Validate Configuration**: Use validation function
+3. **Register Devices**: Load config and register
+4. **Monitor Status**: Check device status after registration
+
+---
+
+## 📚 Source Code References
+
+- **Example Config**: `config/galaxy/devices.yaml`
+- **Loading Logic**: `galaxy/client/device_manager.py`
+- **DeviceRegistry**: `galaxy/client/components/device_registry.py`
+- **AgentProfile**: `galaxy/client/components/types.py`
diff --git a/documents/docs/configuration/system/mcp_reference.md b/documents/docs/configuration/system/mcp_reference.md
new file mode 100644
index 000000000..e1f2efb53
--- /dev/null
+++ b/documents/docs/configuration/system/mcp_reference.md
@@ -0,0 +1,168 @@
+# MCP Configuration Reference
+
+This document provides a quick reference for MCP (Model Context Protocol) server configuration in UFO².
+
+For comprehensive MCP configuration guide with examples, best practices, and detailed explanations, see:
+
+- **[MCP Configuration Guide](../../mcp/configuration.md)** - Complete configuration documentation
+- [MCP Overview](../../mcp/overview.md) - Architecture and concepts
+- [Data Collection Servers](../../mcp/data_collection.md) - Observation tools
+- [Action Servers](../../mcp/action.md) - Execution tools
+
+## Quick Reference
+
+**Configuration File**: `config/ufo/mcp.yaml`
+
+### Structure
+
+```yaml
+AgentName: # e.g., "HostAgent", "AppAgent"
+ SubType: # "default" or app name (e.g., "WINWORD.EXE")
+ data_collection: # Data collection servers (read-only)
+ - namespace: ...
+ type: ... # "local", "http", or "stdio"
+ action: # Action servers (state-changing)
+ - namespace: ...
+ type: ...
+```
+
+### Server Types
+
+| Type | Description | Use Case |
+|------|-------------|----------|
+| `local` | In-process server | Fast, built-in tools |
+| `http` | Remote HTTP server | Cross-machine, language-agnostic |
+| `stdio` | Child process via stdin/stdout | Process isolation |
+
+### Common Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `namespace` | String | ✅ Yes | Unique server identifier |
+| `type` | String | ✅ Yes | Server type: `local`, `http`, or `stdio` |
+| `reset` | Boolean | ❌ No | Reset on context switch (default: `false`) |
+
+### Local Server Example
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ reset: false
+```
+
+### HTTP Server Example
+
+```yaml
+HardwareAgent:
+ default:
+ data_collection:
+ - namespace: HardwareCollector
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+```
+
+### Stdio Server Example
+
+```yaml
+CustomAgent:
+ default:
+ action:
+ - namespace: CustomProcessor
+ type: stdio
+ command: "python"
+ start_args: ["-m", "custom_mcp_server"]
+ env: {"API_KEY": "secret"}
+ cwd: "/path/to/server"
+```
+
+## Built-in Agent Configurations
+
+### HostAgent (System-Level)
+
+- **Data Collection**: UICollector
+- **Actions**: HostUIExecutor, CommandLineExecutor
+
+### AppAgent (Application-Level)
+
+**Default**: UICollector, AppUIExecutor, CommandLineExecutor
+
+**App-Specific**:
+- **WINWORD.EXE**: + WordCOMExecutor
+- **EXCEL.EXE**: + ExcelCOMExecutor
+- **POWERPNT.EXE**: + PowerPointCOMExecutor
+- **explorer.exe**: + PDFReaderExecutor
+
+### ConstellationAgent
+
+- **Actions**: ConstellationEditor
+
+### HardwareAgent
+
+- **Data Collection**: HardwareCollector (HTTP)
+- **Actions**: HardwareExecutor (HTTP)
+
+### LinuxAgent
+
+- **Actions**: BashExecutor (HTTP)
+
+## Reset Behavior
+
+!!!tip "When to Use `reset: true`"
+ - **COM executors** (Word, Excel, PowerPoint) - Prevents state leakage between documents
+ - **Stateful tools** - Requires clean state per task
+
+ **Default: `false`** - Server persists across context switches
+
+## Access in Code
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+mcp_config = config.MCP
+
+# Get agent-specific config
+host_agent = mcp_config.get("HostAgent", {})
+app_agent = mcp_config.get("AppAgent", {})
+
+# Get sub-type config
+word_config = app_agent.get("WINWORD.EXE", app_agent.get("default", {}))
+```
+
+## Complete Documentation
+
+For detailed configuration guide including:
+- Complete field reference for all server types
+- Agent-specific configuration examples
+- Best practices and anti-patterns
+- Configuration validation
+- Debugging and troubleshooting
+- Migration guide
+
+See **[MCP Configuration Guide](../../mcp/configuration.md)**
+
+!!!tip "Creating Custom MCP Servers"
+ Want to create your own MCP servers? See the **[Creating Custom MCP Servers Tutorial](../../tutorials/creating_mcp_servers.md)** for step-by-step instructions on building local, HTTP, and stdio servers.
+
+## Related Documentation
+
+- [MCP Overview](../../mcp/overview.md) - MCP architecture
+- [Data Collection Servers](../../mcp/data_collection.md) - Read-only tools
+- [Action Servers](../../mcp/action.md) - State-changing tools
+- [Local Servers](../../mcp/local_servers.md) - Built-in servers
+- [Remote Servers](../../mcp/remote_servers.md) - HTTP/Stdio deployment
+- **[Creating Custom MCP Servers Tutorial](../../tutorials/creating_mcp_servers.md)** - Build your own servers
+- [Configuration Overview](./overview.md) - General configuration system
+- [System Configuration](./system_config.md) - MCP-related system settings
+
diff --git a/documents/docs/configuration/system/migration.md b/documents/docs/configuration/system/migration.md
new file mode 100644
index 000000000..aa2ae317e
--- /dev/null
+++ b/documents/docs/configuration/system/migration.md
@@ -0,0 +1,444 @@
+# Configuration Migration Guide
+
+This guide helps you migrate from the legacy configuration system (`ufo/config/config.yaml`) to the new modular configuration system (`config/ufo/`).
+
+**Migration Overview:** Migrating to the new configuration system is **optional but recommended**. Your existing configuration will continue to work, but the new system offers better organization, type safety, and IDE support.
+
+## Why Migrate?
+
+The new configuration system offers several advantages:
+
+| Feature | Legacy (`ufo/config/`) | New (`config/ufo/`) |
+|---------|----------------------|-------------------|
+| **Structure** | Single monolithic YAML | Modular domain-specific files |
+| **Type Safety** | Dict access only | Typed + dynamic access |
+| **IDE Support** | No autocomplete | Full IntelliSense |
+| **Scalability** | Hard to maintain | Easy to extend |
+| **Documentation** | External docs | Self-documenting structure |
+| **Environment Support** | Manual | Built-in dev/test/prod |
+
+## Migration Methods
+
+### Option 1: Automatic Migration (Recommended)
+
+Use the built-in migration tool:
+
+**Automatic Migration Tool**:
+
+```bash
+# From UFO2 root directory
+python -m ufo.tools.migrate_config
+
+# Or with options
+python -m ufo.tools.migrate_config --backup --validate
+```
+
+**What it does**:
+1. ✅ Reads your legacy `ufo/config/config.yaml`
+2. ✅ Splits into modular files by domain
+3. ✅ Creates backup of original file
+4. ✅ Validates the new configuration
+5. ✅ Provides migration report
+
+!!!warning "Backup Reminder"
+ Always backup your configuration before migration! The tool creates a backup automatically, but it's good practice to keep your own copy.
+
+### Option 2: Manual Migration
+
+Step-by-step manual migration process.
+
+#### Step 1: Create Directory Structure
+
+```bash
+# Create new config directories
+mkdir -p config/ufo
+mkdir -p config/galaxy # If using Galaxy
+```
+
+#### Step 2: Copy Templates
+
+```bash
+# Copy template files
+cp config/ufo/agents.yaml.template config/ufo/agents.yaml
+cp config/galaxy/agent.yaml.template config/galaxy/agent.yaml # If using Galaxy
+```
+
+#### Step 3: Split Configuration
+
+Split your `ufo/config/config.yaml` into modular files:
+
+**Legacy config.yaml**:
+```yaml
+# ufo/config/config.yaml (OLD - Monolithic)
+HOST_AGENT:
+ API_TYPE: "openai"
+ API_KEY: "sk-..."
+ API_MODEL: "gpt-4o"
+
+APP_AGENT:
+ API_TYPE: "openai"
+ API_KEY: "sk-..."
+ API_MODEL: "gpt-4o"
+
+MAX_STEP: 50
+MAX_RETRY: 20
+TEMPERATURE: 0.0
+
+RAG_OFFLINE_DOCS: False
+RAG_EXPERIENCE: True
+```
+
+**New modular structure**:
+
+`config/ufo/agents.yaml`:
+```yaml
+# Agent LLM configurations
+HOST_AGENT:
+ API_TYPE: "openai"
+ API_KEY: "sk-..."
+ API_MODEL: "gpt-4o"
+
+APP_AGENT:
+ API_TYPE: "openai"
+ API_KEY: "sk-..."
+ API_MODEL: "gpt-4o"
+```
+
+`config/ufo/system.yaml`:
+```yaml
+# System and runtime configurations
+MAX_STEP: 50
+MAX_RETRY: 20
+TEMPERATURE: 0.0
+```
+
+`config/ufo/rag.yaml`:
+```yaml
+# RAG knowledge configurations
+RAG_OFFLINE_DOCS: False
+RAG_EXPERIENCE: True
+```
+
+#### Step 4: Verify Configuration
+
+**Verification Script**:
+
+```python
+# Test your new configuration
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Verify values loaded correctly
+print(f"Max step: {config.system.max_step}")
+print(f"Host agent model: {config.host_agent.api_model}")
+print(f"RAG experience: {config.rag.experience}")
+```
+
+#### Step 5: Update Code (Optional)
+
+Modernize configuration access patterns:
+
+```python
+# OLD (still works but deprecated)
+config = Config()
+max_step = config["MAX_STEP"]
+api_model = config["HOST_AGENT"]["API_MODEL"]
+
+# NEW (recommended)
+config = get_ufo_config()
+max_step = config.system.max_step # Type-safe!
+api_model = config.host_agent.api_model # IDE autocomplete!
+```
+
+#### Step 6: Clean Up Legacy Config
+
+!!!danger "Remove Legacy Config Only After Verification"
+ Only remove the legacy config after thoroughly testing that the new configuration works correctly!
+
+```bash
+# Backup legacy config
+cp ufo/config/config.yaml ufo/config/config.yaml.backup
+
+# Remove legacy config (after verifying new config works)
+rm ufo/config/config.yaml
+```
+
+## Field Mapping Reference
+
+### Agent Configurations
+
+| Legacy Location | New Location | Notes |
+|----------------|--------------|-------|
+| `HOST_AGENT.*` | `config/ufo/agents.yaml` → `HOST_AGENT.*` | Same structure |
+| `APP_AGENT.*` | `config/ufo/agents.yaml` → `APP_AGENT.*` | Same structure |
+| `BACKUP_AGENT.*` | `config/ufo/agents.yaml` → `BACKUP_AGENT.*` | Same structure |
+| `EVALUATION_AGENT.*` | `config/ufo/agents.yaml` → `EVALUATION_AGENT.*` | Same structure |
+| `OPERATOR.*` | `config/ufo/agents.yaml` → `OPERATOR.*` | New in UFO² |
+
+### System Configurations
+
+| Legacy Field | New Location | New Access Pattern |
+|-------------|--------------|-------------------|
+| `MAX_STEP` | `config/ufo/system.yaml` | `config.system.max_step` |
+| `MAX_RETRY` | `config/ufo/system.yaml` | `config.system.max_retry` |
+| `TEMPERATURE` | `config/ufo/system.yaml` | `config.system.temperature` |
+| `CONTROL_BACKEND` | `config/ufo/system.yaml` | `config.system.control_backend` |
+| `ACTION_SEQUENCE` | `config/ufo/system.yaml` | `config.system.action_sequence` |
+
+### RAG Configurations
+
+| Legacy Field | New Location | New Access Pattern |
+|-------------|--------------|-------------------|
+| `RAG_OFFLINE_DOCS` | `config/ufo/rag.yaml` | `config.rag.offline_docs` |
+| `RAG_EXPERIENCE` | `config/ufo/rag.yaml` | `config.rag.experience` |
+| `RAG_DEMONSTRATION` | `config/ufo/rag.yaml` | `config.rag.demonstration` |
+| `BING_API_KEY` | `config/ufo/rag.yaml` | `config.rag.BING_API_KEY` |
+
+### MCP Configurations
+
+| Legacy Field | New Location | Notes |
+|-------------|--------------|-------|
+| `USE_MCP` | `config/ufo/system.yaml` | Keep in system config |
+| `MCP_SERVERS_CONFIG` | `config/ufo/system.yaml` | Points to `config/ufo/mcp.yaml` |
+| MCP server definitions | `config/ufo/mcp.yaml` | New dedicated file |
+
+## Common Migration Scenarios
+
+### Scenario 1: Different Models for Different Agents
+
+**Legacy approach** (duplicated config):
+```yaml
+# ufo/config/config.yaml
+HOST_AGENT:
+ API_MODEL: "gpt-4o"
+ # ... other settings
+
+APP_AGENT:
+ API_MODEL: "gpt-4o-mini" # Different model
+ # ... other settings
+```
+
+**New approach** (clear separation):
+```yaml
+# config/ufo/agents.yaml
+HOST_AGENT:
+ API_MODEL: "gpt-4o"
+
+APP_AGENT:
+ API_MODEL: "gpt-4o-mini"
+```
+
+### Scenario 2: Environment-Specific Settings
+
+**Legacy approach** (manual switching):
+```yaml
+# ufo/config/config.yaml
+# Manually comment/uncomment for different environments
+# MAX_STEP: 10 # Development
+MAX_STEP: 50 # Production
+```
+
+**New approach** (automatic environment support):
+```yaml
+# config/ufo/system.yaml (base)
+MAX_STEP: 50
+
+# config/ufo/system_dev.yaml (development override)
+MAX_STEP: 10
+LOG_LEVEL: "DEBUG"
+```
+
+```bash
+# Set environment
+export UFO_ENV=dev # Automatically uses system_dev.yaml overrides
+```
+
+### Scenario 3: Custom Experimental Features
+
+**Legacy approach** (modify code):
+```python
+# Had to modify Config class
+class Config:
+ def __init__(self):
+ self.MY_CUSTOM_FEATURE = True # Added to code
+```
+
+**New approach** (just add to YAML):
+```yaml
+# config/ufo/custom.yaml (new file)
+MY_CUSTOM_FEATURE: True
+EXPERIMENTAL_SETTING: "value"
+```
+
+```python
+# Automatically available
+config = get_ufo_config()
+if config.MY_CUSTOM_FEATURE:
+ value = config.EXPERIMENTAL_SETTING
+```
+
+## Validation After Migration
+
+### 1. Test Configuration Loading
+
+```python
+from config.config_loader import get_ufo_config
+
+# Load configuration
+config = get_ufo_config()
+
+# Verify critical settings
+assert config.system.max_step > 0
+assert config.host_agent.api_key != ""
+assert config.app_agent.api_model != ""
+
+print("✅ Configuration loaded successfully!")
+```
+
+### 2. Test Backward Compatibility
+
+```python
+# Old access patterns should still work
+config = get_ufo_config()
+
+# Dict-style access (legacy)
+max_step_old = config["MAX_STEP"]
+host_agent_old = config["HOST_AGENT"]
+
+# Verify they match new access
+assert max_step_old == config.system.max_step
+assert host_agent_old["API_MODEL"] == config.host_agent.api_model
+
+print("✅ Backward compatibility verified!")
+```
+
+### 3. Run Application Tests
+
+```bash
+# Test with simple task
+python -m ufo --task "Open Notepad"
+
+# Check logs for configuration warnings
+# Should not see "LEGACY CONFIG PATH DETECTED" after migration
+```
+
+## Troubleshooting
+
+### Issue: "No configuration found"
+
+**Cause**: Configuration files not in expected locations
+
+!!!bug "Solution"
+ Verify file locations and permissions
+
+```bash
+# Verify file locations
+ls config/ufo/agents.yaml
+ls config/ufo/system.yaml
+
+# Check file permissions
+chmod 644 config/ufo/*.yaml
+```
+
+### Issue: "Configuration conflicts detected"
+
+**Cause**: Both legacy and new configs exist
+
+!!!warning "Conflict Resolution"
+ Choose one of these options to resolve conflicts
+
+```bash
+# Option 1: Remove legacy config (after backup)
+mv ufo/config/config.yaml ufo/config/config.yaml.backup
+
+# Option 2: Disable automatic fallback (in code)
+config = get_ufo_config() # Will warn but use new path
+```
+
+### Issue: "Missing required fields"
+
+**Cause**: Required fields not present in new configuration
+
+!!!failure "Required Fields Missing"
+ Ensure all required agent fields are present
+
+```yaml
+# config/ufo/agents.yaml
+# Ensure all required agent fields present:
+HOST_AGENT:
+ API_TYPE: "openai" # Required
+ API_BASE: "..." # Required
+ API_KEY: "..." # Required
+ API_MODEL: "..." # Required
+```
+ ```
+
+### Issue: "Type errors in code"
+
+**Cause**: Using old dict-style access with new typed config
+
+**Solution**:
+```python
+# OLD (can cause type issues)
+config["HOST_AGENT"]["API_MODEL"]
+
+# NEW (type-safe)
+config.host_agent.api_model
+
+# Or keep old style for now
+config["HOST_AGENT"]["API_MODEL"] # Still works!
+```
+
+## Migration Checklist
+
+- [ ] Backup legacy configuration
+- [ ] Create `config/ufo/` directory
+- [ ] Copy and customize template files
+- [ ] Split monolithic config into modular files
+- [ ] Test configuration loading
+- [ ] Verify backward compatibility
+- [ ] Update code to use new access patterns (optional)
+- [ ] Run application tests
+- [ ] Remove legacy configuration (after verification)
+- [ ] Update documentation/README
+- [ ] Commit changes to version control
+
+## Rollback Procedure
+
+If migration causes issues:
+
+!!!danger "Emergency Rollback"
+ Your application will immediately fall back to the legacy configuration without any code changes.
+
+```bash
+# 1. Restore legacy config from backup
+cp ufo/config/config.yaml.backup ufo/config/config.yaml
+
+# 2. Remove new config files
+rm -rf config/ufo/*.yaml
+
+# 3. Restart application
+# Old configuration will be used automatically
+```
+
+## Getting Help
+
+If you encounter issues during migration:
+
+1. **Check the logs** for detailed error messages
+2. **Review configuration guides** ([Agents Config](./agents_config.md), [System Config](./system_config.md), [RAG Config](./rag_config.md)) for correct field names
+3. **Consult [Configuration Overview](./overview.md)** for system design
+4. **Open an issue** on GitHub with:
+ - Your legacy config (redacted sensitive data)
+ - Error messages
+ - Steps you've tried
+
+## Next Steps
+
+After successful migration:
+
+- **[Agents Configuration](./agents_config.md)** - Configure LLM and agent settings
+- **[System Configuration](./system_config.md)** - Configure runtime and execution settings
+- **[RAG Configuration](./rag_config.md)** - Configure knowledge retrieval
+- **[Extending Configuration](./extending.md)** - Learn how to add custom settings
diff --git a/documents/docs/configuration/system/overview.md b/documents/docs/configuration/system/overview.md
new file mode 100644
index 000000000..72b151b3f
--- /dev/null
+++ b/documents/docs/configuration/system/overview.md
@@ -0,0 +1,397 @@
+# Configuration Architecture
+
+UFO² features a modern, modular configuration system designed for flexibility, maintainability, and backward compatibility. This guide explains the overall architecture and design principles.
+
+## Design Philosophy
+
+The configuration system follows professional software engineering best practices:
+
+### Separation of Concerns
+
+Configuration files are organized by domain rather than monolithic structure:
+
+- **Agent configurations** (`agents.yaml`) - LLM settings for different agents → [Agent Config Guide](./agents_config.md)
+- **System configurations** (`system.yaml`) - Execution and runtime settings → [System Config Guide](./system_config.md)
+- **RAG configurations** (`rag.yaml`) - Knowledge retrieval settings → [RAG Config Guide](./rag_config.md)
+- **MCP configurations** (`mcp.yaml`) - Model Context Protocol servers → [MCP Config Guide](./mcp_reference.md)
+- **Pricing configurations** (`prices.yaml`) - Cost tracking for different models → [Pricing Config Guide](./prices_config.md)
+- **Third-party configurations** (`third_party.yaml`) - External agent integration (LinuxAgent, HardwareAgent) → [Third-Party Config Guide](./third_party_config.md)
+
+### Type Safety + Flexibility
+
+Hybrid approach combining:
+
+- **Fixed typed fields** - IDE autocomplete, type checking, and IntelliSense
+- **Dynamic YAML fields** - Add new settings without code changes
+
+**Example:**
+
+```python
+# Type-safe access (recommended)
+config = get_ufo_config()
+max_step = config.system.max_step # IDE autocomplete!
+api_model = config.app_agent.api_model
+
+# Dynamic access (for custom fields)
+custom_value = config.CUSTOM_FEATURE_FLAG
+new_setting = config["NEW_YAML_KEY"]
+
+# Backward compatible (legacy code still works)
+max_step_old = config["MAX_STEP"]
+```
+
+### Backward Compatibility
+
+Zero breaking changes - existing code continues to work:
+
+- Old configuration paths still supported (`ufo/config/`)
+- Old access patterns still work (`config["MAX_STEP"]`)
+- Automatic migration warnings guide users to new structure
+
+Your existing code will continue to work without any modifications. The system automatically falls back to legacy paths and access patterns. See the [Migration Guide](./migration.md) for details on upgrading to the new structure.
+
+### Auto-Discovery
+
+No manual file registration needed:
+
+- All `*.yaml` files in `config/ufo/` are automatically loaded
+- Files are merged intelligently with deep merging
+- Environment-specific overrides (`*_dev.yaml`, `*_test.yaml`) supported
+
+## Directory Structure
+
+```
+UFO/
+├── config/ ← New Configuration Root (Recommended)
+│ ├── ufo/ ← UFO² Configurations
+│ │ ├── agents.yaml # LLM agent settings
+│ │ ├── agents.yaml.template # Template for setup
+│ │ ├── system.yaml # System and runtime settings
+│ │ ├── rag.yaml # RAG knowledge settings
+│ │ ├── mcp.yaml # MCP server configurations
+│ │ ├── prices.yaml # Model pricing
+│ │ └── third_party.yaml # Third-party agents (optional)
+│ │
+│ ├── galaxy/ ← Galaxy Configurations
+│ │ ├── agent.yaml # Constellation agent settings
+│ │ ├── agent.yaml.template # Template for setup
+│ │ ├── constellation.yaml # Constellation runtime settings
+│ │ └── devices.yaml # Device/client configurations
+│ │
+│ ├── config_loader.py # Modern config loader
+│ └── config_schemas.py # Type definitions
+│
+└── ufo/config/ ← Legacy Path (Still Supported)
+ └── config.yaml # Old monolithic config
+```
+
+---
+
+## Galaxy Configuration Files
+
+The Galaxy constellation system has its own set of configuration files in `config/galaxy/`:
+
+| File | Purpose | Template | Documentation |
+|------|---------|----------|---------------|
+| **constellation.yaml** | Constellation runtime settings (heartbeat, concurrency, step limits) | No | [Galaxy Constellation Config](./galaxy_constellation.md) |
+| **devices.yaml** | Device agent definitions (device_id, server_url, capabilities, metadata) | No | [Galaxy Devices Config](./galaxy_devices.md) |
+| **agent.yaml** | Constellation agent LLM configuration (API settings, prompts) | **Yes** (.template) | [Galaxy Agent Config](./galaxy_agent.md) |
+
+### Galaxy Configuration Structure
+
+```
+config/galaxy/
+├── constellation.yaml # Runtime settings for orchestrator
+│ ├── CONSTELLATION_ID # Constellation identifier
+│ ├── HEARTBEAT_INTERVAL # Health check frequency
+│ ├── RECONNECT_DELAY # Reconnection delay
+│ ├── MAX_CONCURRENT_TASKS # Task concurrency limit
+│ ├── MAX_STEP # Step limit per session
+│ ├── DEVICE_INFO # Path to devices.yaml
+│ └── LOG_TO_MARKDOWN # Markdown logging flag
+│
+├── devices.yaml # Device definitions
+│ └── devices: [] # Array of device configurations
+│ ├── device_id # Unique device identifier
+│ ├── server_url # WebSocket endpoint
+│ ├── os # Operating system
+│ ├── capabilities # Device capabilities
+│ ├── metadata # Custom metadata
+│ ├── max_retries # Connection retry limit
+│ └── auto_connect # Auto-connect flag
+│
+└── agent.yaml # Constellation agent LLM config
+ └── CONSTELLATION_AGENT:
+ ├── REASONING_MODEL # Enable reasoning mode
+ ├── API_TYPE # API provider (openai, azure, azure_ad)
+ ├── API_BASE # API base URL
+ ├── API_KEY # API authentication key
+ ├── API_VERSION # API version
+ ├── API_MODEL # Model name/deployment
+ ├── AAD_* # Azure AD auth settings
+ └── *_PROMPT # Prompt template paths
+```
+
+### Galaxy Configuration Loading
+
+```python
+# Load Galaxy configurations
+from config.config_loader import get_galaxy_config
+
+# Load Galaxy configuration (includes agent and constellation settings)
+galaxy_config = get_galaxy_config()
+
+# Access agent configuration (LLM settings)
+agent_config = galaxy_config.agent.constellation_agent
+
+# Access constellation runtime settings
+constellation_settings = galaxy_config.constellation
+
+# Or use raw dict access for backward compatibility
+constellation_id = galaxy_config["CONSTELLATION_ID"]
+```
+
+**Galaxy vs UFO Configuration:**
+
+- **UFO Configurations** (`config/ufo/`) - Single-agent automation settings
+- **Galaxy Configurations** (`config/galaxy/`) - Multi-device constellation settings
+- Both systems can coexist in the same project
+
+---
+
+## Configuration Loading Process
+
+### Priority Chain
+
+The configuration system uses a clear priority chain (highest to lowest):
+
+1. **New modular configs** - `config/{module}/*.yaml`
+2. **Legacy monolithic config** - `{module}/config/config.yaml`
+3. **Environment variables** - Runtime overrides
+
+When the same setting exists in multiple locations, the **new modular config** takes precedence over legacy configs. Values are merged with later sources overriding earlier ones.
+
+### Loading Algorithm
+
+```python
+def load_config():
+ # Step 1: Start with environment variables (lowest priority)
+ config_data = dict(os.environ)
+
+ # Step 2: Load legacy config if exists (middle priority)
+ if exists("ufo/config/config.yaml"):
+ legacy_data = load_yaml("ufo/config/config.yaml")
+ merge(config_data, legacy_data)
+
+ # Step 3: Load new modular configs (highest priority)
+ for yaml_file in discover("config/ufo/*.yaml"):
+ new_data = load_yaml(yaml_file)
+ merge(config_data, new_data)
+
+ # Step 4: Create typed config object
+ return UFOConfig.from_dict(config_data)
+```
+
+### Deep Merging
+
+Configuration files are merged recursively, allowing you to split configurations across multiple files without duplication:
+
+```yaml
+# config/ufo/agents.yaml
+HOST_AGENT:
+ API_TYPE: "openai"
+ API_MODEL: "gpt-4o"
+
+# config/ufo/custom.yaml (added later)
+HOST_AGENT:
+ TEMPERATURE: 0.5 # Added to HOST_AGENT
+
+# Result: HOST_AGENT has all three fields
+```
+
+Fields from later files are added to (not replacing) earlier configurations.
+
+## File Organization Patterns
+
+### Split by Domain (Current Approach)
+
+```
+config/ufo/
+├── agents.yaml # All agent LLM configs
+├── system.yaml # All system settings
+├── rag.yaml # All RAG settings
+├── mcp.yaml # All MCP servers
+└── prices.yaml # Model pricing
+```
+
+**Advantages:** Easy to find related settings, clear separation of concerns, good for documentation.
+
+### Alternative: Split by Agent
+
+```
+config/ufo/
+├── host_agent.yaml # HOST_AGENT config
+├── app_agent.yaml # APP_AGENT config
+├── system.yaml # Shared system config
+└── rag.yaml # Shared RAG config
+```
+
+**Advantages:** Agent-specific settings isolated, easy to customize per agent, good for multi-agent scenarios.
+
+Both patterns work! The loader auto-discovers and merges all YAML files.
+
+## Environment-Specific Overrides
+
+Support for development, testing, and production environments:
+
+```bash
+# Base configuration
+config/ufo/agents.yaml # All environments
+
+# Environment-specific overrides
+config/ufo/agents_dev.yaml # Development only
+config/ufo/agents_test.yaml # Testing only
+config/ufo/agents_prod.yaml # Production only
+```
+
+**Activation**:
+```bash
+# Set environment
+export UFO_ENV=dev # Linux/Mac
+$env:UFO_ENV = "dev" # Windows PowerShell
+
+# Configuration loads:
+# 1. agents.yaml (base)
+# 2. agents_dev.yaml (overrides)
+```
+
+## Type System
+
+### Fixed Types (Recommended)
+
+Provides IDE autocomplete and type safety:
+
+```python
+@dataclass
+class SystemConfig:
+ max_step: int = 50
+ max_retry: int = 20
+ temperature: float = 0.0
+ # ...
+
+# Usage - IDE knows the types!
+config.system.max_step # int
+config.system.temperature # float
+```
+
+### Dynamic Types (Flexible)
+
+For custom or experimental settings. Learn more about adding custom fields in the [Extending Configuration guide](./extending.md).
+
+**Example:**
+
+```python
+# In YAML
+MY_CUSTOM_FEATURE: True
+NEW_EXPERIMENTAL_SETTING: "value"
+
+# In code - dynamic access
+if config.MY_CUSTOM_FEATURE:
+ setting = config.NEW_EXPERIMENTAL_SETTING
+```
+
+### Hybrid Approach
+
+Best of both worlds:
+
+```python
+class SystemConfig:
+ # Fixed fields
+ max_step: int = 50
+
+ # Dynamic extras
+ _extras: Dict[str, Any]
+
+ def __getattr__(self, name):
+ # Try extras for unknown fields
+ return self._extras.get(name)
+```
+
+## Migration Warnings
+
+The system provides clear warnings when using legacy paths:
+
+```
+⚠️ LEGACY CONFIG PATH DETECTED: UFO
+
+Using legacy config: ufo/config/
+Please migrate to: config/ufo/
+
+Quick migration:
+ mkdir -p config/ufo
+ cp ufo/config/*.yaml config/ufo/
+
+Or use migration tool:
+ python -m ufo.tools.migrate_config
+```
+
+These warnings appear once per session and guide you to migrate to the new structure.
+
+## Best Practices
+
+**Recommended Practices:**
+
+- **Use modular files** - Split by domain or agent
+- **Use typed access** - `config.system.max_step` over `config["MAX_STEP"]`
+- **Add templates** - Provide `.template` files for sensitive data
+- **Document custom fields** - Add comments in YAML
+- **Use environment overrides** - For dev/test/prod differences
+
+**Anti-Patterns to Avoid:**
+
+- **Mix old and new** - Migrate fully to new structure
+- **Put secrets in YAML** - Use environment variables instead
+- **Duplicate settings** - Leverage deep merging
+- **Break backward compat** - Keep `config["OLD_KEY"]` working
+- **Hardcode paths** - Use config system
+
+## Configuration Lifecycle
+
+```mermaid
+graph LR
+ A[Application Start] --> B[Load Environment Vars]
+ B --> C[Check for Legacy Config]
+ C --> D[Load New Modular Configs]
+ D --> E[Deep Merge All Sources]
+ E --> F[Apply Transformations]
+ F --> G[Create Typed Config Object]
+ G --> H[Cache for Reuse]
+ H --> I[Application Running]
+```
+
+## Next Steps
+
+### UFO Configuration Guides
+- **[Agent Configuration](./agents_config.md)** - LLM and API settings for all agents
+- **[System Configuration](./system_config.md)** - Runtime and execution settings
+- **[RAG Configuration](./rag_config.md)** - Knowledge retrieval and learning settings
+- **[MCP Configuration](./mcp_reference.md)** - Model Context Protocol servers
+- **[Pricing Configuration](./prices_config.md)** - LLM cost tracking
+- **[Third-Party Configuration](./third_party_config.md)** - External agent integration (LinuxAgent, HardwareAgent)
+- **[Migration Guide](./migration.md)** - How to migrate from old to new config
+- **[Extending Configuration](./extending.md)** - How to add new configuration options
+
+### Galaxy Configuration Guides
+- **[Galaxy Constellation Configuration](./galaxy_constellation.md)** - Runtime settings for constellation orchestrator
+- **[Galaxy Devices Configuration](./galaxy_devices.md)** - Device definitions and capabilities
+- **[Galaxy Agent Configuration](./galaxy_agent.md)** - LLM configuration for constellation agent
+
+## API Reference
+
+For detailed API documentation of configuration classes and methods, see:
+
+- `config.config_loader.ConfigLoader` - Configuration loading and caching
+- `config.config_schemas.UFOConfig` - UFO configuration schema
+- `config.config_schemas.GalaxyConfig` - Galaxy configuration schema
+- `config.config_loader.get_ufo_config()` - Get UFO configuration instance
+- `config.config_loader.get_galaxy_config()` - Get Galaxy configuration instance
diff --git a/documents/docs/configuration/system/prices_config.md b/documents/docs/configuration/system/prices_config.md
new file mode 100644
index 000000000..5546ce857
--- /dev/null
+++ b/documents/docs/configuration/system/prices_config.md
@@ -0,0 +1,265 @@
+# Pricing Configuration (prices.yaml)
+
+Configure token pricing for different LLM models to track and estimate API costs during UFO² execution.
+
+---
+
+## Overview
+
+The `prices.yaml` file defines the cost per 1,000 tokens for different LLM models. UFO² uses this information to calculate and report the estimated cost of task executions.
+
+**File Location**: `config/ufo/prices.yaml`
+
+!!!warning "Pricing May Be Outdated"
+ The pricing information in this file **may not be current**. LLM providers frequently update their pricing.
+
+ - Always verify current pricing on provider websites
+ - This file will be updated periodically
+ - Use these values as estimates only
+
+---
+
+## Quick Start
+
+### View Current Pricing
+
+```yaml
+# config/ufo/prices.yaml
+gpt-4o:
+ prompt: 0.0025
+ completion: 0.01
+
+gpt-4o-mini:
+ prompt: 0.00015
+ completion: 0.0006
+
+gpt-4-turbo:
+ prompt: 0.01
+ completion: 0.03
+```
+
+### Add Your Model
+
+```yaml
+# Add pricing for your custom model
+my-custom-model:
+ prompt: 0.001 # USD per 1K prompt tokens
+ completion: 0.003 # USD per 1K completion tokens
+```
+
+---
+
+## Configuration Format
+
+### Structure
+
+Each model has two pricing fields:
+
+```yaml
+model-name:
+ prompt:
+ completion:
+```
+
+| Field | Type | Unit | Description |
+|-------|------|------|-------------|
+| `prompt` | Float | USD/1K tokens | Cost per 1,000 input (prompt) tokens |
+| `completion` | Float | USD/1K tokens | Cost per 1,000 output (completion) tokens |
+
+---
+
+## Common Models (As of Template)
+
+!!!info "Verify Current Pricing"
+ These prices are from the template and **may be outdated**. Always check provider websites for current pricing:
+
+ - [OpenAI Pricing](https://openai.com/pricing)
+ - [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
+ - [Anthropic Pricing](https://www.anthropic.com/pricing)
+ - [Google AI Pricing](https://ai.google.dev/pricing)
+
+### OpenAI Models
+
+| Model | Prompt ($/1K) | Completion ($/1K) | Notes |
+|-------|---------------|-------------------|-------|
+| `gpt-4o` | $0.0025 | $0.01 | Latest GPT-4 optimized |
+| `gpt-4o-mini` | $0.00015 | $0.0006 | Cheaper alternative |
+| `gpt-4-turbo` | $0.01 | $0.03 | GPT-4 Turbo |
+| `gpt-4-vision-preview` | $0.01 | $0.03 | GPT-4 with vision |
+| `gpt-3.5-turbo` | $0.0005 | $0.0015 | GPT-3.5 |
+
+### Example Configuration
+
+```yaml
+# OpenAI Models
+gpt-4o:
+ prompt: 0.0025
+ completion: 0.01
+
+gpt-4o-mini:
+ prompt: 0.00015
+ completion: 0.0006
+
+gpt-4-turbo:
+ prompt: 0.01
+ completion: 0.03
+
+gpt-4-vision-preview:
+ prompt: 0.01
+ completion: 0.03
+
+gpt-3.5-turbo:
+ prompt: 0.0005
+ completion: 0.0015
+
+# Claude Models (example)
+claude-3-5-sonnet-20241022:
+ prompt: 0.003
+ completion: 0.015
+
+# Gemini Models (example)
+gemini-2.0-flash-exp:
+ prompt: 0.0
+ completion: 0.0
+```
+
+---
+
+## Cost Tracking
+
+UFO² automatically tracks costs when pricing information is available.
+
+### During Execution
+
+```python
+# UFO² automatically calculates costs
+Session logs show:
+- Total prompt tokens used
+- Total completion tokens used
+- Estimated cost (based on prices.yaml)
+```
+
+### View Cost Summary
+
+After task execution, check logs:
+
+```
+logs//cost_summary.json
+```
+
+**Example output**:
+```json
+{
+ "total_cost_usd": 0.15,
+ "prompt_tokens": 5000,
+ "completion_tokens": 2000,
+ "model": "gpt-4o"
+}
+```
+
+---
+
+## Updating Pricing
+
+### Step 1: Check Current Pricing
+
+Visit your LLM provider's pricing page:
+
+- **OpenAI**: https://openai.com/pricing
+- **Azure OpenAI**: https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/
+- **Anthropic**: https://www.anthropic.com/pricing
+- **Google**: https://ai.google.dev/pricing
+
+### Step 2: Update prices.yaml
+
+```yaml
+# Update with current pricing
+gpt-4o:
+ prompt: 0.0025 # Update if changed
+ completion: 0.01
+```
+
+### Step 3: Add New Models
+
+```yaml
+# Add newly released models
+gpt-5:
+ prompt: 0.005
+ completion: 0.02
+```
+
+---
+
+## Programmatic Access
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Get pricing for a specific model
+model_name = "gpt-4o"
+if model_name in config.prices:
+ prompt_cost = config.prices[model_name]["prompt"]
+ completion_cost = config.prices[model_name]["completion"]
+ print(f"{model_name}:")
+ print(f" Prompt: ${prompt_cost}/1K tokens")
+ print(f" Completion: ${completion_cost}/1K tokens")
+else:
+ print(f"No pricing info for {model_name}")
+```
+
+---
+
+## Cost Estimation Example
+
+```python
+# Example: Estimate cost for a task
+prompt_tokens = 10000 # 10K prompt tokens
+completion_tokens = 5000 # 5K completion tokens
+model = "gpt-4o"
+
+# Get pricing
+prompt_cost_per_1k = 0.0025
+completion_cost_per_1k = 0.01
+
+# Calculate
+total_cost = (
+ (prompt_tokens / 1000) * prompt_cost_per_1k +
+ (completion_tokens / 1000) * completion_cost_per_1k
+)
+
+print(f"Estimated cost: ${total_cost:.4f}")
+# Output: Estimated cost: $0.0750
+```
+
+---
+
+## Notes
+
+!!!info "Important Notes"
+ - ✅ Pricing is for **cost estimation only**, not billing
+ - ✅ Actual costs may vary based on your provider contract
+ - ✅ Different Azure regions may have different pricing
+ - ✅ Some models have tiered pricing based on volume
+ - ✅ Prices change frequently - update regularly
+
+---
+
+## Related Documentation
+
+- **[Agent Configuration](agents_config.md)** - LLM model selection
+- **[System Configuration](system_config.md)** - Token limits and usage
+
+---
+
+## Summary
+
+!!!success "Key Takeaways"
+ ✅ **prices.yaml tracks LLM costs** - Estimates API spending
+ ✅ **Pricing may be outdated** - Always verify current rates
+ ✅ **Update regularly** - Providers change pricing frequently
+ ✅ **Add new models** - Include pricing for any custom models
+ ✅ **Cost tracking is automatic** - UFO² calculates costs during execution
+
+ **Keep pricing updated for accurate cost tracking!** 💰
diff --git a/documents/docs/configuration/system/rag_config.md b/documents/docs/configuration/system/rag_config.md
new file mode 100644
index 000000000..297cc0478
--- /dev/null
+++ b/documents/docs/configuration/system/rag_config.md
@@ -0,0 +1,620 @@
+# RAG Configuration (rag.yaml)
+
+Configure Retrieval-Augmented Generation (RAG) to enhance UFO² with external knowledge sources, online search, experience learning, and demonstration-based learning.
+
+---
+
+## Overview
+
+The `rag.yaml` file configures knowledge retrieval systems that augment UFO²'s capabilities beyond its base LLM knowledge. RAG helps UFO² make better decisions by providing:
+
+- **Offline Documentation**: Application manuals and documentation
+- **Online Search**: Real-time web search via Bing
+- **Experience Learning**: Learn from past successful executions
+- **Demonstration Learning**: Learn from user demonstrations
+
+**File Location**: `config/ufo/rag.yaml`
+
+**Optional Configuration:** RAG features are **optional**. UFO² works without them, but they can significantly improve performance on complex or domain-specific tasks.
+
+---
+
+## Quick Start
+
+### Disable All RAG (Default)
+
+```yaml
+# Minimal configuration - no external knowledge
+RAG_OFFLINE_DOCS: False
+RAG_ONLINE_SEARCH: False
+RAG_EXPERIENCE: False
+RAG_DEMONSTRATION: False
+```
+
+### Enable Online Search Only
+
+```yaml
+# Most useful for general tasks
+RAG_OFFLINE_DOCS: False
+
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "YOUR_BING_API_KEY_HERE"
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+
+RAG_EXPERIENCE: False
+RAG_DEMONSTRATION: False
+```
+
+### Enable Experience Learning
+
+```yaml
+# Learn from past executions
+RAG_OFFLINE_DOCS: False
+RAG_ONLINE_SEARCH: False
+
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+
+RAG_DEMONSTRATION: False
+```
+
+### Enable All Features
+
+```yaml
+# Maximum knowledge augmentation
+RAG_OFFLINE_DOCS: True
+RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1
+
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "YOUR_BING_API_KEY_HERE"
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+
+RAG_DEMONSTRATION: True
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5
+```
+
+---
+
+## RAG Components
+
+### 1. Offline Documentation
+
+Retrieve relevant documentation from local knowledge bases (app manuals, guides, API docs).
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `RAG_OFFLINE_DOCS` | Boolean | `False` | Enable offline documentation retrieval |
+| `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` | Integer | `1` | Number of documents to retrieve |
+
+**Example**:
+```yaml
+RAG_OFFLINE_DOCS: True
+RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1
+```
+
+!!!info "Use Case"
+ - Application-specific tasks (Excel formulas, Word formatting)
+ - Domain-specific workflows (accounting, design)
+ - Requires pre-indexed documentation
+
+**Setup**:
+1. Place documentation in `vectordb/docs/`
+2. Index documents: `python -m learner`
+3. Enable in `rag.yaml`
+
+---
+
+### 2. Online Search
+
+Search the web in real-time using Bing Search API.
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `RAG_ONLINE_SEARCH` | Boolean | `False` | Enable online Bing search |
+| `BING_API_KEY` | String | `""` | Bing Search API key |
+| `RAG_ONLINE_SEARCH_TOPK` | Integer | `5` | Number of search results to fetch |
+| `RAG_ONLINE_RETRIEVED_TOPK` | Integer | `5` | Number of results to include in prompt |
+
+**Example**:
+```yaml
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "abc123xyz..."
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+```
+
+!!!tip "Getting Bing API Key"
+ 1. Go to [Azure Portal](https://portal.azure.com)
+ 2. Create a "Bing Search v7" resource
+ 3. Copy the API key from "Keys and Endpoint"
+ 4. Add to `rag.yaml`: `BING_API_KEY: "your-key"`
+
+**Use Cases**:
+- Tasks requiring current information
+- Unfamiliar applications or features
+- Troubleshooting specific error messages
+- Finding how-to guides dynamically
+
+**Example Query Flow**:
+```
+User Request: "Create a pivot table in Excel"
+↓
+Bing Search: "how to create pivot table in Excel"
+↓
+Retrieved: Top 5 results about pivot tables
+↓
+LLM receives context from search results
+↓
+Better action decisions
+```
+
+---
+
+### 3. Experience Learning
+
+Learn from UFO²'s own past successful task executions.
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `RAG_EXPERIENCE` | Boolean | `False` | Enable experience learning |
+| `RAG_EXPERIENCE_RETRIEVED_TOPK` | Integer | `5` | Number of past experiences to retrieve |
+| `EXPERIENCE_SAVED_PATH` | String | Auto-generated | Path to experience database |
+| `EXPERIENCE_PROMPT` | String | Auto-generated | Experience prompt template |
+
+**Example**:
+```yaml
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+```
+
+!!!info "How It Works"
+ 1. UFO² completes a task successfully
+ 2. Task steps are saved to experience database
+ 3. For future similar tasks, relevant past experiences are retrieved
+ 4. LLM learns from successful patterns
+
+**Use Cases**:
+- Repetitive tasks with slight variations
+- Learning organizational-specific workflows
+- Improving over time on common tasks
+
+**Example**:
+```
+First Time: "Create a monthly sales report"
+→ Task succeeds, 15 steps recorded
+
+Second Time: "Create a quarterly sales report"
+→ Retrieves "monthly report" experience
+→ Adapts the pattern, faster execution
+```
+
+**Default Paths**:
+```yaml
+# Auto-generated if not specified
+EXPERIENCE_SAVED_PATH: "vectordb/experience"
+EXPERIENCE_PROMPT: "ufo/prompts/share/experience/experience.yaml"
+```
+
+---
+
+### 4. Demonstration Learning
+
+Learn from user demonstrations (you show UFO² how to do a task).
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `RAG_DEMONSTRATION` | Boolean | `False` | Enable demonstration learning |
+| `RAG_DEMONSTRATION_RETRIEVED_TOPK` | Integer | `5` | Number of demonstrations to retrieve |
+| `DEMONSTRATION_SAVED_PATH` | String | Auto-generated | Path to demonstration database |
+| `DEMONSTRATION_PROMPT` | String | Auto-generated | Demonstration prompt template |
+
+**Example**:
+```yaml
+RAG_DEMONSTRATION: True
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5
+```
+
+!!!info "How It Works"
+ 1. User demonstrates a task (UFO² records it)
+ 2. Demonstration is saved with annotations
+ 3. For similar future tasks, demonstrations are retrieved
+ 4. LLM mimics the demonstrated behavior
+
+**Use Cases**:
+- Complex, domain-specific workflows
+- Organizational-specific procedures
+- Tasks with many edge cases
+
+**Workflow**:
+```
+1. Record Demonstration:
+ python -m ufo --mode demonstration
+ → Perform task manually
+ → UFO² records your actions
+
+2. Save Demonstration:
+ → Stored in vectordb/demonstration/
+
+3. Future Task:
+ "Do the same report formatting"
+ → Retrieves your demonstration
+ → Replicates your steps
+```
+
+**Default Paths**:
+```yaml
+# Auto-generated if not specified
+DEMONSTRATION_SAVED_PATH: "vectordb/demonstration"
+DEMONSTRATION_PROMPT: "ufo/prompts/share/demonstration/demonstration.yaml"
+```
+
+---
+
+## Complete Configuration Examples
+
+### Minimal (No RAG)
+
+```yaml
+# config/ufo/rag.yaml
+RAG_OFFLINE_DOCS: False
+RAG_ONLINE_SEARCH: False
+RAG_EXPERIENCE: False
+RAG_DEMONSTRATION: False
+```
+
+### Online Search Only
+
+```yaml
+RAG_OFFLINE_DOCS: False
+
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "your-bing-api-key-here"
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+
+RAG_EXPERIENCE: False
+RAG_DEMONSTRATION: False
+```
+
+### Experience Learning Only
+
+```yaml
+RAG_OFFLINE_DOCS: False
+RAG_ONLINE_SEARCH: False
+
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+EXPERIENCE_SAVED_PATH: "vectordb/experience"
+EXPERIENCE_PROMPT: "ufo/prompts/share/experience/experience.yaml"
+
+RAG_DEMONSTRATION: False
+```
+
+### Full RAG Setup
+
+```yaml
+# Offline docs
+RAG_OFFLINE_DOCS: True
+RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1
+
+# Online search
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "your-bing-api-key"
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+
+# Experience
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+EXPERIENCE_SAVED_PATH: "vectordb/experience"
+EXPERIENCE_PROMPT: "ufo/prompts/share/experience/experience.yaml"
+
+# Demonstration
+RAG_DEMONSTRATION: True
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5
+DEMONSTRATION_SAVED_PATH: "vectordb/demonstration"
+DEMONSTRATION_PROMPT: "ufo/prompts/share/demonstration/demonstration.yaml"
+```
+
+---
+
+## Setting Up Each RAG Component
+
+### Setup: Offline Documentation
+
+**Step 1**: Prepare documentation
+```powershell
+# Place docs in vectordb/docs/
+New-Item -ItemType Directory -Path "vectordb\docs\excel" -Force
+Copy-Item "C:\path\to\excel_guide.pdf" "vectordb\docs\excel\"
+```
+
+**Step 2**: Index documents
+```powershell
+python -m learner --index-docs
+```
+
+**Step 3**: Enable in config
+```yaml
+RAG_OFFLINE_DOCS: True
+RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1
+```
+
+---
+
+### Setup: Online Search
+
+**Step 1**: Get Bing API key
+
+1. Go to [Azure Portal](https://portal.azure.com)
+2. Create resource → Search for "Bing Search v7"
+3. Create the resource
+4. Go to "Keys and Endpoint"
+5. Copy Key 1
+
+**Step 2**: Add to config
+```yaml
+RAG_ONLINE_SEARCH: True
+BING_API_KEY: "your-copied-key-here"
+RAG_ONLINE_SEARCH_TOPK: 5
+RAG_ONLINE_RETRIEVED_TOPK: 5
+```
+
+**Step 3**: Test
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+print(f"Bing search enabled: {config.rag.online_search}")
+print(f"API key set: {bool(config.rag.BING_API_KEY)}")
+```
+
+---
+
+### Setup: Experience Learning
+
+**Step 1**: Enable in config
+```yaml
+RAG_EXPERIENCE: True
+RAG_EXPERIENCE_RETRIEVED_TOPK: 5
+```
+
+**Step 2**: Run tasks normally
+```powershell
+python -m ufo --request "Create a sales report"
+```
+
+**Step 3**: Successful tasks are auto-saved
+```
+Experience saved to: vectordb/experience/
+```
+
+**Step 4**: Future tasks retrieve experiences
+```powershell
+# Similar task will use past experience
+python -m ufo --request "Create a quarterly report"
+```
+
+---
+
+### Setup: Demonstration Learning
+
+**Step 1**: Record demonstration
+```powershell
+python -m ufo --mode demonstration --task "format_monthly_report"
+```
+
+**Step 2**: Perform task manually
+- UFO² records your every action
+- Add annotations/comments
+
+**Step 3**: Save demonstration
+```
+Demonstration saved to: vectordb/demonstration/
+```
+
+**Step 4**: Enable in config
+```yaml
+RAG_DEMONSTRATION: True
+RAG_DEMONSTRATION_RETRIEVED_TOPK: 5
+```
+
+**Step 5**: Use demonstrations
+```powershell
+python -m ufo --request "Format the report like I showed you"
+```
+
+---
+
+## Programmatic Access
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Check RAG settings
+if config.rag.online_search:
+ print(f"Online search enabled")
+ print(f"Top K: {config.rag.online_search_topk}")
+
+if config.rag.experience:
+ print(f"Experience learning enabled")
+ print(f"Experience path: {config.rag.EXPERIENCE_SAVED_PATH}")
+
+if config.rag.offline_docs:
+ print(f"Offline docs enabled")
+
+# Access specific fields
+bing_key = config.rag.BING_API_KEY
+exp_topk = config.rag.experience_retrieved_topk
+```
+
+---
+
+## Performance Considerations
+
+### Impact on Speed
+
+| RAG Type | Speed Impact | When to Use |
+|----------|--------------|-------------|
+| **Offline Docs** | Low | Always (if indexed) |
+| **Online Search** | Medium | For unfamiliar tasks |
+| **Experience** | Low | Always (improves over time) |
+| **Demonstration** | Low | For specific workflows |
+
+### Impact on Cost
+
+| RAG Type | Cost Impact | Notes |
+|----------|-------------|-------|
+| **Offline Docs** | None | One-time indexing cost |
+| **Online Search** | Low | Bing API: ~$3/1000 queries |
+| **Experience** | None | Free storage |
+| **Demonstration** | None | Free storage |
+
+!!!tip "Recommended Configuration"
+ For most users:
+ ```yaml
+ RAG_ONLINE_SEARCH: True # Useful for general tasks
+ RAG_EXPERIENCE: True # Improves over time
+ RAG_OFFLINE_DOCS: False # Unless you have specific docs
+ RAG_DEMONSTRATION: False # Unless training specific workflows
+ ```
+
+---
+
+## Troubleshooting
+
+### Issue 1: Bing Search Not Working
+
+!!!bug "Error Message"
+ ```
+ BingSearchError: Invalid API key
+ ```
+
+ **Solutions**:
+ 1. Verify API key is correct
+ 2. Check key has not expired
+ 3. Ensure Bing Search v7 resource is active
+ 4. Check Azure subscription is active
+
+---
+
+### Issue 2: Experience Not Retrieved
+
+!!!bug "Symptom"
+ UFO² doesn't seem to learn from past tasks
+
+ **Solutions**:
+ 1. Check experience database exists:
+ ```powershell
+ Test-Path "vectordb\experience"
+ ```
+ 2. Verify tasks completed successfully
+ 3. Check similarity threshold (may be too strict)
+ 4. Increase `RAG_EXPERIENCE_RETRIEVED_TOPK`
+
+---
+
+### Issue 3: Offline Docs Not Indexed
+
+!!!bug "Error Message"
+ ```
+ No offline documents found
+ ```
+
+ **Solutions**:
+ 1. Run indexing:
+ ```powershell
+ python -m learner --index-docs
+ ```
+ 2. Check documents are in `vectordb/docs/`
+ 3. Verify supported formats (PDF, TXT, MD)
+
+---
+
+### Issue 4: Too Much Context
+
+!!!bug "Symptom"
+ Token limits exceeded, slow responses
+
+ **Solution**: Reduce Top-K values
+ ```yaml
+ RAG_ONLINE_RETRIEVED_TOPK: 3 # Instead of 5
+ RAG_EXPERIENCE_RETRIEVED_TOPK: 3
+ RAG_DEMONSTRATION_RETRIEVED_TOPK: 3
+ ```
+
+---
+
+## Best Practices
+
+### When to Enable Each Component
+
+| Scenario | Recommended RAG |
+|----------|----------------|
+| **General automation** | Online Search |
+| **Repetitive tasks** | Experience Learning |
+| **Domain-specific workflows** | Offline Docs + Demonstration |
+| **Learning over time** | Experience |
+| **New to UFO²** | Online Search only |
+| **Production deployment** | Experience + Offline Docs |
+
+### Top-K Selection
+
+| Field | Recommended Range | Notes |
+|-------|-------------------|-------|
+| `RAG_ONLINE_SEARCH_TOPK` | 3-10 | More = better context, slower |
+| `RAG_ONLINE_RETRIEVED_TOPK` | 3-5 | Balance quality vs tokens |
+| `RAG_EXPERIENCE_RETRIEVED_TOPK` | 3-5 | Most relevant experiences |
+| `RAG_DEMONSTRATION_RETRIEVED_TOPK` | 1-3 | Usually need few examples |
+| `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` | 1-2 | Docs are usually long |
+
+---
+
+## Environment Variables
+
+Store API keys securely:
+
+```yaml
+# Use environment variable instead of hardcoded key
+BING_API_KEY: "${BING_API_KEY}"
+```
+
+**Set environment variable**:
+
+**Windows PowerShell:**
+```powershell
+$env:BING_API_KEY = "your-key-here"
+```
+
+**Windows (Persistent):**
+```powershell
+[System.Environment]::SetEnvironmentVariable('BING_API_KEY', 'your-key', 'User')
+```
+
+---
+
+## Related Documentation
+
+- **[Agent Configuration](agents_config.md)** - LLM settings
+- **[System Configuration](system_config.md)** - Runtime settings
+
+---
+
+## Summary
+
+!!!success "Key Takeaways"
+ ✅ **RAG is optional** - UFO² works without it
+ ✅ **Online Search** - Most useful for general tasks (needs Bing API key)
+ ✅ **Experience** - Free, improves over time automatically
+ ✅ **Offline Docs** - Great for domain-specific knowledge
+ ✅ **Demonstration** - Best for complex, specific workflows
+ ✅ **Start simple** - Enable Online Search first, add others as needed
+
+ **Enhance UFO² with knowledge retrieval!** 🧠
diff --git a/documents/docs/configuration/system/system_config.md b/documents/docs/configuration/system/system_config.md
new file mode 100644
index 000000000..08c092597
--- /dev/null
+++ b/documents/docs/configuration/system/system_config.md
@@ -0,0 +1,679 @@
+# System Configuration (system.yaml)
+
+Configure UFO²'s runtime behavior, execution limits, control backends, logging, and operational parameters. This file controls how UFO² interacts with the Windows environment.
+
+## Overview
+
+The `system.yaml` file defines runtime settings that control UFO²'s behavior during task execution. Unlike `agents.yaml` (which configures LLMs), this file configures **how** UFO² operates on Windows.
+
+**File Location**: `config/ufo/system.yaml`
+
+**Note:** Unlike `agents.yaml`, the `system.yaml` file is **already present** in the repository with sensible defaults. You can use it as-is or customize it for your needs.
+
+## Quick Configuration
+
+### Default Configuration (Works Out of Box)
+
+```yaml
+# Most users can use default settings
+MAX_STEP: 50
+MAX_ROUND: 1
+CONTROL_BACKEND: ["uia"]
+USE_MCP: True
+PRINT_LOG: False
+```
+
+### Recommended for Development
+
+```yaml
+# More verbose logging for debugging
+MAX_STEP: 50
+MAX_ROUND: 1
+PRINT_LOG: True
+LOG_LEVEL: "DEBUG"
+CONTROL_BACKEND: ["uia"]
+```
+
+### Recommended for Production
+
+```yaml
+# Optimized for reliability
+MAX_STEP: 100
+MAX_ROUND: 3
+CONTROL_BACKEND: ["uia"]
+USE_MCP: True
+SAFE_GUARD: True
+LOG_TO_MARKDOWN: True
+```
+
+## Configuration Categories
+
+The `system.yaml` file is organized into logical sections:
+
+| Category | Purpose | Key Fields |
+|----------|---------|------------|
+| **[LLM Parameters](#llm-parameters)** | API call settings | `MAX_TOKENS`, `TEMPERATURE`, `TIMEOUT` |
+| **[Execution Limits](#execution-limits)** | Task boundaries | `MAX_STEP`, `MAX_ROUND`, `SLEEP_TIME` |
+| **[Control Backend](#control-backend)** | UI detection methods | `CONTROL_BACKEND`, `IOU_THRESHOLD` |
+| **[Action Configuration](#action-configuration)** | Interaction behavior | `CLICK_API`, `INPUT_TEXT_API`, `MAXIMIZE_WINDOW` |
+| **[Logging](#logging)** | Output and debugging | `PRINT_LOG`, `LOG_LEVEL`, `LOG_XML` |
+| **[MCP Settings](#mcp-settings)** | Tool server integration | `USE_MCP`, `MCP_SERVERS_CONFIG` |
+| **[Safety](#safety)** | Security controls | `SAFE_GUARD`, `CONTROL_LIST` |
+| **[Control Filtering](#control-filtering)** | UI element filtering | `CONTROL_FILTER_TYPE`, `CONTROL_FILTER_TOP_K` |
+
+## LLM Parameters
+
+These settings control how UFO² communicates with LLM APIs.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `MAX_TOKENS` | Integer | `2000` | Maximum tokens for LLM response |
+| `MAX_RETRY` | Integer | `20` | Maximum retries for failed API calls |
+| `TEMPERATURE` | Float | `0.0` | Sampling temperature (0.0 = deterministic, 1.0 = creative) |
+| `TOP_P` | Float | `0.0` | Nucleus sampling threshold |
+| `TIMEOUT` | Integer | `60` | API call timeout (seconds) |
+
+### Example
+
+```yaml
+# Conservative settings (recommended)
+MAX_TOKENS: 2000
+MAX_RETRY: 20
+TEMPERATURE: 0.0 # Deterministic
+TOP_P: 0.0
+TIMEOUT: 60
+
+# Creative settings (experimental)
+# MAX_TOKENS: 4000
+# TEMPERATURE: 0.7 # More creative
+# TOP_P: 0.9
+```
+
+**When to Adjust:**
+
+- **Increase MAX_TOKENS** if responses are getting cut off
+- **Increase TEMPERATURE** if you want more varied responses (not recommended)
+- **Keep at 0.0** for consistent, repeatable automation
+- **Increase TIMEOUT** for slow API connections
+
+## Execution Limits
+
+Control how long and how many attempts UFO² makes for tasks.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `MAX_STEP` | Integer | `50` | Maximum steps per task |
+| `MAX_ROUND` | Integer | `1` | Maximum rounds per task (retries from start) |
+| `SLEEP_TIME` | Integer | `1` | Wait time between steps (seconds) |
+| `RECTANGLE_TIME` | Integer | `1` | Duration to show visual highlights (seconds) |
+
+### Example
+
+```yaml
+# Default settings
+MAX_STEP: 50
+MAX_ROUND: 1
+SLEEP_TIME: 1
+RECTANGLE_TIME: 1
+
+# For complex tasks
+# MAX_STEP: 100
+# MAX_ROUND: 3
+
+# For faster execution (risky)
+# SLEEP_TIME: 0
+```
+
+**Note on Step vs Round:**
+
+- **STEP**: Individual action (click, type, etc.)
+- **ROUND**: Complete task attempt from start
+
+Example: If `MAX_ROUND: 3`, UFO² will retry the entire task up to 3 times if it fails.
+
+## Control Backend
+
+Configure how UFO² detects and interacts with UI elements.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `CONTROL_BACKEND` | List[String] | `["uia"]` | UI detection backends to use |
+| `IOU_THRESHOLD_FOR_MERGE` | Float | `0.1` | IoU threshold for merging overlapping controls |
+
+### Available Backends
+
+| Backend | Description | Pros | Cons |
+|---------|-------------|------|------|
+| `"uia"` | UI Automation | Fast, reliable, Windows native | May miss some controls |
+| `"omniparser"` | Vision-based | Finds visual-only elements | Requires GPU, slow |
+
+**Note:** `win32` backend is no longer supported.
+
+### Example
+
+```yaml
+# Recommended: Use UIA (default)
+CONTROL_BACKEND: ["uia"]
+IOU_THRESHOLD_FOR_MERGE: 0.1
+
+# With vision-based parsing (slow)
+# CONTROL_BACKEND: ["uia", "omniparser"]
+```
+
+**Best Practice:** Use `["uia"]` as the default backend. Add `"omniparser"` only if you need vision-based control detection.
+
+## Action Configuration
+
+Configure how UFO² performs actions on UI elements.
+
+### Core Action Settings
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `ACTION_SEQUENCE` | Boolean | `False` | Enable multi-action sequences in one step |
+| `SHOW_VISUAL_OUTLINE_ON_SCREEN` | Boolean | `False` | Show visual highlights during execution |
+| `MAXIMIZE_WINDOW` | Boolean | `False` | Maximize application windows before actions |
+| `JSON_PARSING_RETRY` | Integer | `3` | Retries for parsing LLM JSON responses |
+
+### Click Settings
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `CLICK_API` | String | `"click_input"` | Click method to use |
+| `AFTER_CLICK_WAIT` | Integer | `0` | Wait time after clicking (seconds) |
+
+### Input Settings
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `INPUT_TEXT_API` | String | `"type_keys"` | Text input method |
+| `INPUT_TEXT_ENTER` | Boolean | `False` | Press Enter after typing |
+| `INPUT_TEXT_INTER_KEY_PAUSE` | Float | `0.05` | Pause between keystrokes (seconds) |
+
+### Example
+
+```yaml
+# Recommended settings
+ACTION_SEQUENCE: True # Enable multi-action for speed
+SHOW_VISUAL_OUTLINE_ON_SCREEN: False
+MAXIMIZE_WINDOW: False
+JSON_PARSING_RETRY: 3
+
+CLICK_API: "click_input"
+AFTER_CLICK_WAIT: 0
+
+INPUT_TEXT_API: "type_keys"
+INPUT_TEXT_ENTER: False
+INPUT_TEXT_INTER_KEY_PAUSE: 0.05
+
+# For visual debugging
+# SHOW_VISUAL_OUTLINE_ON_SCREEN: True
+
+# If clicks are too fast
+# AFTER_CLICK_WAIT: 1
+
+# For automation that needs Enter key
+# INPUT_TEXT_ENTER: True
+```
+
+!!!info "Input Methods"
+ - **`type_keys`**: Simulates keyboard (slower, more realistic)
+ - **`set_text`**: Direct text insertion (faster, may not trigger events)
+
+---
+
+## Logging
+
+Control UFO²'s logging output and debugging information.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `PRINT_LOG` | Boolean | `False` | Print logs to console |
+| `LOG_LEVEL` | String | `"DEBUG"` | Logging verbosity level |
+| `LOG_TO_MARKDOWN` | Boolean | `True` | Save logs as Markdown files |
+| `LOG_XML` | Boolean | `False` | Log UI tree XML at each step |
+| `CONCAT_SCREENSHOT` | Boolean | `False` | Concatenate control screenshots |
+| `INCLUDE_LAST_SCREENSHOT` | Boolean | `True` | Include previous screenshot in context |
+| `SCREENSHOT_TO_MEMORY` | Boolean | `True` | Load screenshots into memory |
+| `REQUEST_TIMEOUT` | Integer | `250` | Request timeout for vision models |
+
+### Log Levels
+
+| Level | Usage | When to Use |
+|-------|-------|-------------|
+| `"DEBUG"` | Detailed debugging info | Development, troubleshooting |
+| `"INFO"` | General information | Normal operation |
+| `"WARNING"` | Warning messages | Production |
+| `"ERROR"` | Errors only | Production (minimal logs) |
+
+### Example
+
+```yaml
+# Development settings
+PRINT_LOG: True
+LOG_LEVEL: "DEBUG"
+LOG_TO_MARKDOWN: True
+LOG_XML: True # Useful for debugging UI detection
+
+# Production settings
+# PRINT_LOG: False
+# LOG_LEVEL: "WARNING"
+# LOG_TO_MARKDOWN: True
+# LOG_XML: False
+
+# Memory optimization
+# SCREENSHOT_TO_MEMORY: False
+```
+
+!!!tip "Log Files Location"
+ Logs are saved to `logs//` directory.
+
+---
+
+## MCP Settings
+
+Configure Model Context Protocol (MCP) tool servers.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `USE_MCP` | Boolean | `True` | Enable MCP tool integration |
+| `MCP_SERVERS_CONFIG` | String | `"config/ufo/mcp.yaml"` | Path to MCP servers config |
+| `MCP_PREFERRED_APPS` | List[String] | `[]` | Apps that prefer MCP over UI automation |
+| `MCP_FALLBACK_TO_UI` | Boolean | `True` | Fall back to UI if MCP fails |
+| `MCP_INSTRUCTIONS_PATH` | String | `"ufo/config/mcp_instructions"` | MCP instruction templates path |
+| `MCP_TOOL_TIMEOUT` | Integer | `30` | MCP tool execution timeout (seconds) |
+| `MCP_LOG_EXECUTION` | Boolean | `False` | Log detailed MCP execution |
+
+### Example
+
+```yaml
+# Recommended settings
+USE_MCP: True
+MCP_SERVERS_CONFIG: "config/ufo/mcp.yaml"
+MCP_FALLBACK_TO_UI: True
+MCP_TOOL_TIMEOUT: 30
+MCP_LOG_EXECUTION: False
+
+# Prefer MCP for VS Code and Terminal
+MCP_PREFERRED_APPS:
+ - "Code.exe"
+ - "WindowsTerminal.exe"
+
+# Debugging MCP issues
+# MCP_LOG_EXECUTION: True
+# MCP_TOOL_TIMEOUT: 60
+```
+
+!!!info "What is MCP?"
+ MCP (Model Context Protocol) provides programmatic APIs for applications, offering more reliable automation than UI-based control.
+
+ See [MCP Configuration](mcp_reference.md) for details.
+
+---
+
+## Safety
+
+Security and safety controls to prevent dangerous operations.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `SAFE_GUARD` | Boolean | `False` | Enable safety checks |
+| `CONTROL_LIST` | List[String] | See below | Allowed UI control types |
+
+### Default CONTROL_LIST
+
+```yaml
+CONTROL_LIST:
+ - "Button"
+ - "Edit"
+ - "TabItem"
+ - "Document"
+ - "ListItem"
+ - "MenuItem"
+ - "ScrollBar"
+ - "TreeItem"
+ - "Hyperlink"
+ - "ComboBox"
+ - "RadioButton"
+ - "Spinner"
+ - "CheckBox"
+ - "Group"
+ - "Text"
+```
+
+### Example
+
+```yaml
+# Enable safety for production
+SAFE_GUARD: True
+CONTROL_LIST:
+ - "Button"
+ - "Edit"
+ - "TabItem"
+ # Add only safe control types
+
+# Disable for full automation (risky)
+# SAFE_GUARD: False
+```
+
+!!!danger "Safety Warning"
+ When `SAFE_GUARD: True`, UFO² will only interact with control types in `CONTROL_LIST`. This prevents accidental dangerous operations but may limit functionality.
+
+---
+
+## Control Filtering
+
+Advanced UI element filtering using semantic and icon similarity.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `CONTROL_FILTER_TYPE` | List[String] | `[]` | Filter types to enable |
+| `CONTROL_FILTER_TOP_K_PLAN` | Integer | `2` | Top K plans to consider |
+| `CONTROL_FILTER_TOP_K_SEMANTIC` | Integer | `15` | Top K controls by text similarity |
+| `CONTROL_FILTER_TOP_K_ICON` | Integer | `15` | Top K controls by icon similarity |
+| `CONTROL_FILTER_MODEL_SEMANTIC_NAME` | String | `"all-MiniLM-L6-v2"` | Semantic embedding model |
+| `CONTROL_FILTER_MODEL_ICON_NAME` | String | `"clip-ViT-B-32"` | Icon embedding model |
+
+### Filter Types
+
+| Type | Description | Use Case |
+|------|-------------|----------|
+| `"TEXT"` | Text-based filtering | Filter by control labels |
+| `"SEMANTIC"` | Semantic similarity | Find similar controls by meaning |
+| `"ICON"` | Icon similarity | Find controls by icon appearance |
+
+### Example
+
+```yaml
+# Disable filtering (use all controls)
+CONTROL_FILTER_TYPE: []
+
+# Enable semantic filtering (recommended)
+CONTROL_FILTER_TYPE: ["SEMANTIC"]
+CONTROL_FILTER_TOP_K_SEMANTIC: 15
+CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2"
+
+# Enable all filtering (most selective)
+# CONTROL_FILTER_TYPE: ["TEXT", "SEMANTIC", "ICON"]
+# CONTROL_FILTER_TOP_K_SEMANTIC: 20
+# CONTROL_FILTER_TOP_K_ICON: 20
+```
+
+!!!warning "Performance Impact"
+ - Filtering reduces the number of controls sent to LLM (faster, cheaper)
+ - But may filter out the target control (less reliable)
+ - Start without filtering, add if you have too many controls
+
+---
+
+## API Usage Configuration
+
+Configure native API usage for Office applications.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `USE_APIS` | Boolean | `True` | Enable COM API usage for Office applications |
+| `API_PROMPT` | String | `"ufo/prompts/share/base/api.yaml"` | API prompt template |
+| `APP_API_PROMPT_ADDRESS` | Dict | See below | App-specific API prompts |
+
+### Default APP_API_PROMPT_ADDRESS
+
+```yaml
+APP_API_PROMPT_ADDRESS:
+ "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
+ "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
+ "msedge.exe": "ufo/prompts/apps/web/api.yaml"
+ "chrome.exe": "ufo/prompts/apps/web/api.yaml"
+ "POWERPNT.EXE": "ufo/prompts/apps/powerpoint/api.yaml"
+```
+
+### Example
+
+```yaml
+# Enable API usage (recommended for Office)
+USE_APIS: True
+API_PROMPT: "ufo/prompts/share/base/api.yaml"
+APP_API_PROMPT_ADDRESS:
+ "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
+ "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
+
+# Disable for pure UI automation
+# USE_APIS: False
+```
+
+!!!tip "When to Use APIs"
+ COM APIs are faster and more reliable for Office applications. Keep `USE_APIS: True` for best results with Word, Excel, PowerPoint.
+
+---
+
+## Complete Example Configuration
+
+Here's a complete, production-ready `system.yaml`:
+
+```yaml
+# LLM Parameters
+MAX_TOKENS: 2000
+MAX_RETRY: 20
+TEMPERATURE: 0.0
+TOP_P: 0.0
+TIMEOUT: 60
+
+# Execution Limits
+MAX_STEP: 100
+MAX_ROUND: 3
+SLEEP_TIME: 1
+RECTANGLE_TIME: 1
+
+# Control Backend
+CONTROL_BACKEND: ["uia"]
+IOU_THRESHOLD_FOR_MERGE: 0.1
+
+# Action Configuration
+ACTION_SEQUENCE: True
+SHOW_VISUAL_OUTLINE_ON_SCREEN: False
+MAXIMIZE_WINDOW: False
+JSON_PARSING_RETRY: 3
+
+CLICK_API: "click_input"
+AFTER_CLICK_WAIT: 0
+
+INPUT_TEXT_API: "type_keys"
+INPUT_TEXT_ENTER: False
+INPUT_TEXT_INTER_KEY_PAUSE: 0.05
+
+# Logging
+PRINT_LOG: False
+LOG_LEVEL: "INFO"
+LOG_TO_MARKDOWN: True
+LOG_XML: False
+CONCAT_SCREENSHOT: False
+INCLUDE_LAST_SCREENSHOT: True
+SCREENSHOT_TO_MEMORY: True
+REQUEST_TIMEOUT: 250
+
+# MCP Settings
+USE_MCP: True
+MCP_SERVERS_CONFIG: "config/ufo/mcp.yaml"
+MCP_PREFERRED_APPS:
+ - "Code.exe"
+ - "WindowsTerminal.exe"
+MCP_FALLBACK_TO_UI: True
+MCP_TOOL_TIMEOUT: 30
+MCP_LOG_EXECUTION: False
+
+# Safety
+SAFE_GUARD: True
+CONTROL_LIST:
+ - "Button"
+ - "Edit"
+ - "TabItem"
+ - "Document"
+ - "ListItem"
+ - "MenuItem"
+ - "ScrollBar"
+ - "TreeItem"
+ - "Hyperlink"
+ - "ComboBox"
+ - "RadioButton"
+
+# API Usage
+USE_APIS: True
+API_PROMPT: "ufo/prompts/share/base/api.yaml"
+APP_API_PROMPT_ADDRESS:
+ "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
+ "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
+ "msedge.exe": "ufo/prompts/apps/web/api.yaml"
+
+# Control Filtering (disabled by default)
+CONTROL_FILTER_TYPE: []
+CONTROL_FILTER_TOP_K_PLAN: 2
+CONTROL_FILTER_TOP_K_SEMANTIC: 15
+CONTROL_FILTER_TOP_K_ICON: 15
+CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2"
+CONTROL_FILTER_MODEL_ICON_NAME: "clip-ViT-B-32"
+```
+
+---
+
+## Programmatic Access
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Access system settings
+max_step = config.system.max_step
+log_level = config.system.log_level
+control_backends = config.system.control_backend
+
+# Check MCP settings
+if config.system.use_mcp:
+ mcp_config_path = config.system.mcp_servers_config
+ print(f"MCP enabled, config: {mcp_config_path}")
+
+# Modify at runtime (not recommended)
+# config.system.max_step = 200
+```
+
+---
+
+## Troubleshooting
+
+### Issue 1: Tasks Failing After X Steps
+
+!!!bug "Error Message"
+ ```
+ Task stopped: Maximum steps (50) reached
+ ```
+
+ **Solution**: Increase `MAX_STEP`
+ ```yaml
+ MAX_STEP: 100 # or higher
+ ```
+
+### Issue 2: Controls Not Detected
+
+**Symptom:** UFO² can't find UI elements
+
+**Solutions:**
+1. Try enabling omniparser for vision-based detection:
+ ```yaml
+ CONTROL_BACKEND: ["uia", "omniparser"]
+ ```
+2. Disable filtering:
+ ```yaml
+ CONTROL_FILTER_TYPE: []
+ ```
+
+### Issue 3: Actions Too Fast
+
+**Symptom:** Actions execute before UI is ready
+
+**Solution:** Add delays
+```yaml
+SLEEP_TIME: 2
+AFTER_CLICK_WAIT: 1
+```
+
+### Issue 4: Logs Too Verbose
+
+**Symptom:** Too much console output
+
+**Solution:** Reduce logging
+```yaml
+ PRINT_LOG: False
+ LOG_LEVEL: "WARNING"
+ ```
+
+---
+
+## Performance Tuning
+
+### For Speed
+
+```yaml
+MAX_STEP: 50
+SLEEP_TIME: 0
+CONTROL_BACKEND: ["uia"]
+CONTROL_FILTER_TYPE: ["SEMANTIC"] # Reduce LLM input
+ACTION_SEQUENCE: True # Multi-action in one step
+```
+
+### For Reliability
+
+```yaml
+MAX_STEP: 100
+MAX_ROUND: 3
+SLEEP_TIME: 2
+AFTER_CLICK_WAIT: 1
+CONTROL_BACKEND: ["uia"]
+CONTROL_FILTER_TYPE: [] # Don't filter out controls
+```
+
+### For Debugging
+
+```yaml
+PRINT_LOG: True
+LOG_LEVEL: "DEBUG"
+LOG_XML: True
+SHOW_VISUAL_OUTLINE_ON_SCREEN: True
+MCP_LOG_EXECUTION: True
+```
+
+---
+
+## Related Documentation
+
+- **[Agent Configuration](agents_config.md)** - LLM and API settings
+- **[MCP Configuration](mcp_reference.md)** - Tool server configuration
+- **[RAG Configuration](rag_config.md)** - Knowledge retrieval
+
+## Summary
+
+**Key Takeaways:**
+
+✅ **Default settings work** - Start with defaults, adjust as needed
+✅ **Increase MAX_STEP** for complex tasks
+✅ **Use ["uia"]** for control detection
+✅ **Enable ACTION_SEQUENCE** for faster execution
+✅ **Adjust logging** based on dev vs production
+✅ **Enable MCP** for better Office automation
+
+**Fine-tune system settings for optimal performance!** ⚙️
diff --git a/documents/docs/configuration/system/third_party_config.md b/documents/docs/configuration/system/third_party_config.md
new file mode 100644
index 000000000..014d03946
--- /dev/null
+++ b/documents/docs/configuration/system/third_party_config.md
@@ -0,0 +1,389 @@
+# Third-Party Agent Configuration (third_party.yaml)
+
+Configure third-party agents that extend UFO²'s capabilities beyond Windows GUI automation, such as LinuxAgent for CLI operations and HardwareAgent for physical device control.
+
+---
+
+## Overview
+
+The `third_party.yaml` file configures external agents that integrate with UFO² to provide specialized capabilities. These agents work alongside the standard HostAgent and AppAgent to handle tasks that require non-GUI interactions.
+
+**File Location**: `config/ufo/third_party.yaml`
+
+**Advanced Feature:** Third-party agent configuration is an **advanced optional feature**. Most users only need the core agents (HostAgent, AppAgent). Configure third-party agents only when you need specialized capabilities.
+
+!!!tip "Creating Custom Third-Party Agents"
+ Want to build your own third-party agent? See the **[Creating Custom Third-Party Agents Tutorial](../../tutorials/creating_third_party_agents.md)** for a complete step-by-step guide using HardwareAgent as an example.
+
+---
+
+## Quick Start
+
+### Default Configuration
+
+```yaml
+# Enable third-party agents
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+```
+
+### Disable All Third-Party Agents
+
+```yaml
+# Disable all third-party agents
+ENABLED_THIRD_PARTY_AGENTS: []
+```
+
+---
+
+## Available Third-Party Agents
+
+### LinuxAgent
+
+**Purpose**: Execute Linux CLI commands and server operations.
+
+!!!info "UFO³ Integration"
+ LinuxAgent is used by **UFO³ Galaxy** to orchestrate Linux devices as sub-agents in multi-device workflows. When Galaxy routes a task to a Linux device, it uses LinuxAgent to execute commands via CLI.
+
+**Configuration**:
+```yaml
+THIRD_PARTY_AGENT_CONFIG:
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+```
+
+**Fields**:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `AGENT_NAME` | String | Agent identifier (must be "LinuxAgent") |
+| `APPAGENT_PROMPT` | String | Path to main prompt template |
+| `APPAGENT_EXAMPLE_PROMPT` | String | Path to example prompt template |
+| `INTRODUCTION` | String | Agent description for LLM context |
+
+**When to Enable**:
+- ✅ Using UFO³ Galaxy with Linux devices
+- ✅ Need to execute Linux CLI commands
+- ✅ Managing Linux servers from Windows
+- ✅ Cross-platform automation workflows
+
+**Related Documentation**:
+- [Linux Agent as Galaxy Device](../../linux/as_galaxy_device.md)
+- [Linux Agent Quick Start](../../getting_started/quick_start_linux.md)
+
+---
+
+### HardwareAgent
+
+**Purpose**: Control physical hardware components (robotic arms, USB devices, etc.).
+
+!!!warning "Experimental Feature"
+ HardwareAgent is an experimental feature for controlling physical hardware. Requires specialized hardware setup and is not commonly used.
+
+**Configuration**:
+```yaml
+THIRD_PARTY_AGENT_CONFIG:
+ HardwareAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "HardwareAgent"
+ APPAGENT_PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/examples/visual/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/third_party/hardware_agent_api.yaml"
+ INTRODUCTION: "The HardwareAgent is used to manipulate hardware components of the computer without using GUI, such as robotic arms for keyboard input and mouse control, plug and unplug devices such as USB drives, and other hardware-related tasks."
+```
+
+**Fields**:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `VISUAL_MODE` | Boolean | Enable visual mode (screenshot understanding) |
+| `AGENT_NAME` | String | Agent identifier (must be "HardwareAgent") |
+| `APPAGENT_PROMPT` | String | Path to main prompt template |
+| `APPAGENT_EXAMPLE_PROMPT` | String | Path to example prompt template |
+| `API_PROMPT` | String | Path to hardware API prompt template |
+| `INTRODUCTION` | String | Agent description for LLM context |
+
+**When to Enable**:
+- ✅ Using robotic arms for physical input
+- ✅ Automated USB device management
+- ✅ Physical hardware testing/automation
+- ✅ Research projects with hardware control
+
+**Related Documentation**:
+- [Creating Custom Third-Party Agents](../../tutorials/creating_third_party_agents.md) - Tutorial using HardwareAgent as example
+
+---
+
+## Configuration Fields
+
+### ENABLED_THIRD_PARTY_AGENTS
+
+**Type**: `List[String]`
+**Default**: `[]` (no third-party agents enabled)
+
+List of third-party agent names to enable. Only agents listed here will be loaded and available.
+
+**Options**:
+- `"LinuxAgent"` - Linux CLI execution
+- `"HardwareAgent"` - Physical hardware control
+
+**Examples**:
+```yaml
+# Enable LinuxAgent only (recommended for UFO³)
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent"]
+
+# Enable both agents
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent", "HardwareAgent"]
+
+# Disable all third-party agents
+ENABLED_THIRD_PARTY_AGENTS: []
+```
+
+### THIRD_PARTY_AGENT_CONFIG
+
+**Type**: `Dict[String, Dict]`
+
+Configuration dictionary for each third-party agent. Each agent has its own configuration block.
+
+**Structure**:
+```yaml
+THIRD_PARTY_AGENT_CONFIG:
+ AgentName:
+ AGENT_NAME: "AgentName"
+ # Agent-specific fields...
+```
+
+---
+
+## Complete Configuration Example
+
+### For UFO³ Galaxy (Recommended)
+
+```yaml
+# Enable LinuxAgent for UFO³ Galaxy multi-device orchestration
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+```
+
+### With Hardware Support
+
+```yaml
+# Enable both Linux and Hardware agents
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent", "HardwareAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+
+ HardwareAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "HardwareAgent"
+ APPAGENT_PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/examples/visual/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/third_party/hardware_agent_api.yaml"
+ INTRODUCTION: "The HardwareAgent is used to manipulate hardware components of the computer without using GUI, such as robotic arms for keyboard input and mouse control, plug and unplug devices such as USB drives, and other hardware-related tasks."
+```
+
+### Minimal (No Third-Party Agents)
+
+```yaml
+# Disable all third-party agents (default for standalone UFO²)
+ENABLED_THIRD_PARTY_AGENTS: []
+```
+
+---
+
+## UFO³ Galaxy Integration
+
+When using UFO³ Galaxy for multi-device orchestration, LinuxAgent must be enabled to support Linux devices.
+
+### Setup for Galaxy
+
+**Step 1**: Enable LinuxAgent in `config/ufo/third_party.yaml`
+
+```yaml
+ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ LinuxAgent:
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+ INTRODUCTION: "For Linux Use Only."
+```
+
+**Step 2**: Configure Linux devices in `config/galaxy/devices.yaml`
+
+```yaml
+devices:
+ - device_id: "linux_server_1"
+ server_url: "ws://192.168.1.100:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ - "cli"
+ - "database"
+```
+
+**Step 3**: Start Linux Agent components
+
+See [Linux Agent as Galaxy Device](../../linux/as_galaxy_device.md) for complete setup.
+
+---
+
+## Programmatic Access
+
+```python
+from config.config_loader import get_ufo_config
+
+config = get_ufo_config()
+
+# Check which third-party agents are enabled
+enabled_agents = config.ENABLED_THIRD_PARTY_AGENTS
+print(f"Enabled third-party agents: {enabled_agents}")
+
+# Access agent configuration
+if "LinuxAgent" in enabled_agents:
+ linux_config = config.THIRD_PARTY_AGENT_CONFIG["LinuxAgent"]
+ print(f"LinuxAgent prompt: {linux_config['APPAGENT_PROMPT']}")
+
+# Check if specific agent is enabled
+linux_enabled = "LinuxAgent" in config.ENABLED_THIRD_PARTY_AGENTS
+print(f"LinuxAgent enabled: {linux_enabled}")
+```
+
+---
+
+## Adding Custom Third-Party Agents
+
+You can add your own third-party agents by following the patterns described below. For a complete tutorial, see **[Creating Custom Third-Party Agents](../../tutorials/creating_third_party_agents.md)**.
+
+### Quick Overview
+
+### Step 1: Create Agent Implementation
+
+```python
+# ufo/agents/third_party/my_agent.py
+class MyCustomAgent:
+ def __init__(self, config):
+ self.config = config
+ # Initialize your agent
+```
+
+### Step 2: Add Configuration
+
+```yaml
+ENABLED_THIRD_PARTY_AGENTS: ["MyCustomAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ MyCustomAgent:
+ AGENT_NAME: "MyCustomAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/my_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/my_agent_example.yaml"
+ INTRODUCTION: "Custom agent description."
+ # Your custom fields
+ CUSTOM_FIELD: "value"
+```
+
+### Step 3: Register Agent
+
+Add your agent to the third-party agent registry in UFO²'s agent loader.
+
+---
+
+## Troubleshooting
+
+### Issue 1: LinuxAgent Not Working
+
+!!!bug "Error Message"
+ ```
+ LinuxAgent not found or not enabled
+ ```
+
+ **Solution**: Check configuration
+ ```yaml
+ # Verify LinuxAgent is in enabled list
+ ENABLED_THIRD_PARTY_AGENTS: ["LinuxAgent"]
+ ```
+
+### Issue 2: Prompt Files Not Found
+
+!!!bug "Error Message"
+ ```
+ FileNotFoundError: ufo/prompts/third_party/linux_agent.yaml
+ ```
+
+ **Solution**: Verify prompt files exist
+ ```powershell
+ # Check if prompt files exist
+ Test-Path "ufo\prompts\third_party\linux_agent.yaml"
+ Test-Path "ufo\prompts\third_party\linux_agent_example.yaml"
+ ```
+
+### Issue 3: Agent Configuration Not Loaded
+
+!!!bug "Symptom"
+ Third-party agent configuration changes not taking effect
+
+ **Solution**: Restart UFO² application
+ ```powershell
+ # Configuration is loaded at startup
+ # Restart UFO² to apply changes
+ ```
+
+---
+
+## Best Practices
+
+!!!tip "Recommendations"
+ - ✅ **Enable only what you need** - Don't enable agents you're not using
+ - ✅ **For UFO³ Galaxy** - Always enable LinuxAgent when using Linux devices
+ - ✅ **Keep prompts up to date** - Ensure prompt files exist and are current
+ - ✅ **Document custom agents** - Add clear introduction text for LLM context
+ - ✅ **Test configurations** - Verify agents load correctly after configuration changes
+
+!!!danger "Warnings"
+ - ❌ **Don't enable HardwareAgent** without proper hardware setup
+ - ❌ **Don't modify AGENT_NAME** - Must match the agent class name
+ - ❌ **Don't delete prompt files** - Agents will fail to initialize
+
+---
+
+## Related Documentation
+
+- **[Creating Custom Third-Party Agents](../../tutorials/creating_third_party_agents.md)** - Complete tutorial for building third-party agents
+- **[Linux Agent as Galaxy Device](../../linux/as_galaxy_device.md)** - Using LinuxAgent in UFO³
+- **[Linux Agent Quick Start](../../getting_started/quick_start_linux.md)** - Setting up Linux Agent
+- **[Agent Configuration](./agents_config.md)** - Core agent LLM settings
+- **[Galaxy Devices Configuration](./galaxy_devices.md)** - Multi-device setup
+
+---
+
+## Summary
+
+!!!success "Key Takeaways"
+ ✅ **third_party.yaml is optional** - Only needed for specialized agents
+ ✅ **LinuxAgent for UFO³** - Required when using Linux devices in Galaxy
+ ✅ **HardwareAgent is experimental** - For physical hardware control
+ ✅ **Enable selectively** - Only enable agents you actually use
+ ✅ **Configuration is simple** - Just add agent names to enabled list
+
+ **Extend UFO² with specialized capabilities!** 🔧
diff --git a/documents/docs/configurations/developer_configuration.md b/documents/docs/configurations/developer_configuration.md
deleted file mode 100644
index dc5dac203..000000000
--- a/documents/docs/configurations/developer_configuration.md
+++ /dev/null
@@ -1,125 +0,0 @@
-# Developer Configuration
-
-This section provides detailed information on how to configure the UFO agent for developers. The configuration file `config_dev.yaml` is located in the `ufo/config` directory and contains various settings and switches to customize the UFO agent for development purposes.
-
-## System Configuration
-
-The following parameters are included in the system configuration of the UFO agent:
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
-| `CONTROL_BACKEND` | The list of backend for control action, currently supporting `uia` and `win32` and `onmiparser` | List | ["uia"] |
-| `ACTION_SEQUENCE` | Whether to use output multiple actions in a single step. | Boolean | False |
-| `MAX_STEP` | The maximum step limit for completing the user request in a session. | Integer | 100 |
-| `MAX_ROUND` | The maximum round limit for completing the user request in a session. | Integer | 10 |
-| `SLEEP_TIME` | The sleep time in seconds between each step to wait for the window to be ready. | Integer | 5 |
-| `RECTANGLE_TIME` | The time in seconds for the rectangle display around the selected control. | Integer | 1 |
-| `SAFE_GUARD` | Whether to use the safe guard to ask for user confirmation before performing sensitive operations. | Boolean | True |
-| `CONTROL_LIST` | The list of widgets allowed to be selected. | List | ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"] |
-| `HISTORY_KEYS` | The keys of the step history added to the [`Blackboard`](../agents/design/blackboard.md) for agent decision-making. | List | ["Step", "Thought", "ControlText", "Subtask", "Action", "Comment", "Results", "UserConfirm"] |
-| `ANNOTATION_COLORS` | The colors assigned to different control types for annotation. | Dictionary | {"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"} |
-| `ANNOTATION_FONT_SIZE` | The font size for the annotation. | Integer | 22 |
-| `PRINT_LOG` | Whether to print the log in the console. | Boolean | False |
-| `CONCAT_SCREENSHOT` | Whether to concatenate the screenshots into a single image for the LLM input. | Boolean | False |
-| `INCLUDE_LAST_SCREENSHOT` | Whether to include the screenshot from the last step in the observation. | Boolean | True |
-| `LOG_LEVEL` | The log level for the UFO agent. | String | "DEBUG" |
-| `REQUEST_TIMEOUT` | The call timeout in seconds for the LLM model. | Integer | 250 |
-| `USE_APIS` | Whether to allow the use of application APIs. | Boolean | True |
-| `LOG_XML` | Whether to log the XML file at every step. | Boolean | False |
-| `SCREENSHOT_TO_MEMORY` | Whether to allow the screenshot to [`Blackboard`](../agents/design/blackboard.md) for the agent's decision making. | Boolean | True |
-| `SAVE_UI_TREE` | Whether to save the UI tree in the log. | Boolean | False |
-| `SAVE_EXPERIENCE` | Whether to save the experience, can be "always" for always save, "always_not" for always not save, "ask" for asking the user to save or not. By default, it is "always_not" | String | "always_not" |
-| `TASK_STATUS` | Whether to record the status of the tasks in batch execution mode. | Boolean | True |
-
-
-## Main Prompt Configuration
-
-### Main Prompt Templates
-
-The main prompt templates include the prompts in the UFO agent for both `system` and `user` roles.
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------|---------------------------------------------------------------------|--------|----------------------------------------------------|
-| `HOSTAGENT_PROMPT` | The main prompt template for the `HostAgent`. | String | "ufo/prompts/share/base/host_agent.yaml" |
-| `APPAGENT_PROMPT` | The main prompt template for the `AppAgent`. | String | "ufo/prompts/share/base/app_agent.yaml" |
-| `FOLLOWERAGENT_PROMPT` | The main prompt template for the `FollowerAgent`. | String | "ufo/prompts/share/base/app_agent.yaml" |
-| `EVALUATION_PROMPT` | The prompt template for the evaluation. | String | "ufo/prompts/evaluation/evaluate.yaml" |
-
-Lite versions of the main prompt templates can be found in the `ufo/prompts/share/lite` directory to reduce the input size for specific token limits.
-
-### Example Prompt Templates
-
-Example prompt templates are used for demonstration purposes in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|------------------------------|------------------------------------------------------------------------|--------|-----------------------------------------------------|
-| `HOSTAGENT_EXAMPLE_PROMPT` | The example prompt template for the `HostAgent` used for demonstration. | String | "ufo/prompts/examples/{mode}/host_agent_example.yaml"|
-| `APPAGENT_EXAMPLE_PROMPT` | The example prompt template for the `AppAgent` used for demonstration. | String | "ufo/prompts/examples/{mode}/app_agent_example.yaml" |
-
-Lite versions of the example prompt templates can be found in the `ufo/prompts/examples/lite/{mode}` directory to reduce the input size for demonstration purposes.
-
-### Experience and Demonstration Learning
-
-These configuration parameters are used for experience and demonstration learning in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------------|------------------------------------------------|--------|----------------------------------------------------|
-| `EXPERIENCE_PROMPT` | The prompt for self-experience learning. | String | "ufo/prompts/experience/experience_summary.yaml" |
-| `EXPERIENCE_SAVED_PATH` | The path to save the experience learning data. | String | "vectordb/experience/" |
-| `DEMONSTRATION_PROMPT` | The prompt for user demonstration learning. | String | "ufo/prompts/demonstration/demonstration_summary.yaml" |
-| `DEMONSTRATION_SAVED_PATH` | The path to save the demonstration learning data. | String | "vectordb/demonstration/" |
-
-### Application API Configuration
-
-These prompt configuration parameters are used for the application and control APIs in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|------------------------|-------------------------------------|--------|--------------------------------------------|
-| `API_PROMPT` | The prompt for the UI automation API. | String | "ufo/prompts/share/base/api.yaml" |
-| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |
-
-## pywinauto Configuration
-
-The API configuration parameters are used for the pywinauto API in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|--------------------------|--------------------------------------------------|---------|---------------|
-| `CLICK_API` | The API used for click action, can be `click_input` or `click`. | String | "click_input" |
-| `INPUT_TEXT_API` | The API used for input text action, can be `type_keys` or `set_text`. | String | "type_keys" |
-| `INPUT_TEXT_ENTER` | Whether to press enter after typing the text. | Boolean | False |
-
-## Control Filtering
-
-The control filtering configuration parameters are used for control filtering in the agent's observation.
-
-| Configuration Option | Description | Type | Default Value |
-|-------------------------------------|--------------------------------------------------|---------|-------------------------|
-| `CONTROL_FILTER` | The control filter type, can be `TEXT`, `SEMANTIC`, or `ICON`. | List | [] |
-| `CONTROL_FILTER_TOP_K_PLAN` | The control filter effect on top k plans from the agent. | Integer | 2 |
-| `CONTROL_FILTER_TOP_K_SEMANTIC` | The control filter top k for semantic similarity. | Integer | 15 |
-| `CONTROL_FILTER_TOP_K_ICON` | The control filter top k for icon similarity. | Integer | 15 |
-| `CONTROL_FILTER_MODEL_SEMANTIC_NAME`| The control filter model name for semantic similarity. | String | "all-MiniLM-L6-v2" |
-| `CONTROL_FILTER_MODEL_ICON_NAME` | The control filter model name for icon similarity. | String | "clip-ViT-B-32" |
-
-## Customizations
-
-The customization configuration parameters are used for customizations in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|------------------------|----------------------------------------------|---------|---------------------------------------|
-| `ASK_QUESTION` | Whether to ask the user for a question. | Boolean | True |
-| `USE_CUSTOMIZATION` | Whether to enable the customization. | Boolean | True |
-| `QA_PAIR_FILE` | The path for the historical QA pairs. | String | "customization/historical_qa.txt" |
-| `QA_PAIR_NUM` | The number of QA pairs for the customization.| Integer | 20 |
-
-## Evaluation
-
-The evaluation configuration parameters are used for the evaluation in the UFO agent.
-
-| Configuration Option | Description | Type | Default Value |
-|---------------------------|-----------------------------------------------|---------|---------------|
-| `EVA_SESSION` | Whether to include the session in the evaluation. | Boolean | True |
-| `EVA_ROUND` | Whether to include the round in the evaluation. | Boolean | False |
-| `EVA_ALL_SCREENSHOTS` | Whether to include all the screenshots in the evaluation. | Boolean | True |
-
-You can customize the configuration parameters in the `config_dev.yaml` file to suit your development needs and enhance the functionality of the UFO agent.
\ No newline at end of file
diff --git a/documents/docs/configurations/pricing_configuration.md b/documents/docs/configurations/pricing_configuration.md
deleted file mode 100644
index 5bdac7d2c..000000000
--- a/documents/docs/configurations/pricing_configuration.md
+++ /dev/null
@@ -1,51 +0,0 @@
-# Pricing Configuration
-
-We provide a configuration file `pricing_config.yaml` to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the `ufo/config` directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.
-
-You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the `pricing_config.yaml` file. Below is the default pricing configuration:
-
-```yaml
-# Prices in $ per 1000 tokens
-# Last updated: 2024-05-13
-PRICES: {
- "openai/gpt-4-0613": {"input": 0.03, "output": 0.06},
- "openai/gpt-3.5-turbo-0613": {"input": 0.0015, "output": 0.002},
- "openai/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
- "openai/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
- "openai/gpt-4-1106-vision-preview": {"input": 0.01, "output": 0.03},
- "openai/gpt-4": {"input": 0.03, "output": 0.06},
- "openai/gpt-4-32k": {"input": 0.06, "output": 0.12},
- "openai/gpt-4-turbo": {"input":0.01,"output": 0.03},
- "openai/gpt-4o": {"input": 0.005,"output": 0.015},
- "openai/gpt-4o-2024-05-13": {"input": 0.005, "output": 0.015},
- "openai/gpt-3.5-turbo-0125": {"input": 0.0005, "output": 0.0015},
- "openai/gpt-3.5-turbo-1106": {"input": 0.001, "output": 0.002},
- "openai/gpt-3.5-turbo-instruct": {"input": 0.0015, "output": 0.002},
- "openai/gpt-3.5-turbo-16k-0613": {"input": 0.003, "output": 0.004},
- "openai/whisper-1": {"input": 0.006, "output": 0.006},
- "openai/tts-1": {"input": 0.015, "output": 0.015},
- "openai/tts-hd-1": {"input": 0.03, "output": 0.03},
- "openai/text-embedding-ada-002-v2": {"input": 0.0001, "output": 0.0001},
- "openai/text-davinci:003": {"input": 0.02, "output": 0.02},
- "openai/text-ada-001": {"input": 0.0004, "output": 0.0004},
- "azure/gpt-35-turbo-20220309":{"input": 0.0015, "output": 0.002},
- "azure/gpt-35-turbo-20230613":{"input": 0.0015, "output": 0.002},
- "azure/gpt-35-turbo-16k-20230613":{"input": 0.003, "output": 0.004},
- "azure/gpt-35-turbo-1106":{"input": 0.001, "output": 0.002},
- "azure/gpt-4-20230321":{"input": 0.03, "output": 0.06},
- "azure/gpt-4-32k-20230321":{"input": 0.06, "output": 0.12},
- "azure/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
- "azure/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
- "azure/gpt-4-visual-preview": {"input": 0.01, "output": 0.03},
- "azure/gpt-4-turbo-20240409": {"input":0.01,"output": 0.03},
- "azure/gpt-4o": {"input": 0.005,"output": 0.015},
- "azure/gpt-4o-20240513": {"input": 0.005, "output": 0.015},
- "qwen/qwen-vl-plus": {"input": 0.008, "output": 0.008},
- "qwen/qwen-vl-max": {"input": 0.02, "output": 0.02},
- "gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105},
- "gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105},
- "gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015},
-}
-```
-
-Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.
\ No newline at end of file
diff --git a/documents/docs/configurations/user_configuration.md b/documents/docs/configurations/user_configuration.md
deleted file mode 100644
index 9bdc3ca11..000000000
--- a/documents/docs/configurations/user_configuration.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# User Configuration
-
-An overview of the user configuration options available in UFO. You need to rename the `config.yaml.template` in the folder `ufo/config` to `config.yaml` to configure the LLMs and other custom settings.
-
-## LLM Configuration
-
-You can configure the LLMs for the `HOST_AGENT` and `APP_AGENT` separately in the `config.yaml` file. The `FollowerAgent` and `EvaluationAgent` share the same LLM configuration as the `APP_AGENT`. Additionally, you can configure a backup LLM engine in the `BACKUP_AGENT` field to handle cases where the primary engines fail during inference.
-
-Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the `Supported Models` section of the documentation.
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `VISUAL_MODE` | Whether to use visual mode to understand screenshots and take actions | Boolean | True |
-| `API_TYPE` | The API type: "openai" for the OpenAI API, "aoai" for the AOAI API. | String | "openai" |
-| `API_BASE` | The API endpoint for the LLM | String | "https://api.openai.com/v1/chat/completions" |
-| `API_KEY` | The API key for the LLM | String | "sk-" |
-| `API_VERSION` | The version of the API | String | "2024-02-15-preview" |
-| `API_MODEL` | The LLM model name | String | "gpt-4-vision-preview" |
-
-### For Azure OpenAI (AOAI) API
-The following additional configuration option is available for the AOAI API:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `API_DEPLOYMENT_ID` | The deployment ID, only available for the AOAI API | String | "" |
-
-Ensure to fill in the necessary API details for both the `HOST_AGENT` and `APP_AGENT` to enable UFO to interact with the LLMs effectively.
-
-### LLM Parameters
-You can also configure additional parameters for the LLMs in the `config.yaml` file:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `MAX_TOKENS` | The maximum token limit for the response completion | Integer | 2000 |
-| `MAX_RETRY` | The maximum retry limit for the response completion | Integer | 3 |
-| `TEMPERATURE` | The temperature of the model: the lower the value, the more consistent the output of the model | Float | 0.0 |
-| `TOP_P` | The top_p of the model: the lower the value, the more conservative the output of the model | Float | 0.0 |
-| `TIMEOUT` | The call timeout in seconds | Integer | 60 |
-
-### For RAG Configuration to Enhance the UFO Agent
-You can configure the RAG parameters in the `config.yaml` file to enhance the UFO agent with additional knowledge sources:
-
-#### RAG Configuration for the Offline Docs
-Configure the following parameters to allow UFO to use offline documents for the decision-making process:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_OFFLINE_DOCS` | Whether to use the offline RAG | Boolean | False |
-| `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 1 |
-
-
-#### RAG Configuration for the Bing search
-Configure the following parameters to allow UFO to use online Bing search for the decision-making process:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_ONLINE_SEARCH` | Whether to use the Bing search | Boolean | False |
-| `BING_API_KEY` | The Bing search API key | String | "" |
-| `RAG_ONLINE_SEARCH_TOPK` | The topk for the online search | Integer | 5 |
-| `RAG_ONLINE_RETRIEVED_TOPK` | The topk for the online retrieved searched results | Integer | 1 |
-
-
-#### RAG Configuration for experience
-Configure the following parameters to allow UFO to use the RAG from its self-experience:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_EXPERIENCE` | Whether to use the RAG from its self-experience | Boolean | False |
-| `RAG_EXPERIENCE_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 5 |
-
-#### RAG Configuration for demonstration
-Configure the following parameters to allow UFO to use the RAG from user demonstration:
-
-| Configuration Option | Description | Type | Default Value |
-|----------------------|-------------|------|---------------|
-| `RAG_DEMONSTRATION` | Whether to use the RAG from its user demonstration | Boolean | False |
-| `RAG_DEMONSTRATION_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 5 |
-| `RAG_DEMONSTRATION_COMPLETION_N` | The number of completion choices for the demonstration result | Integer | 3 |
-
-
-Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.
-
-
-
-
-
-
diff --git a/documents/docs/creating_app_agent/overview.md b/documents/docs/creating_app_agent/overview.md
deleted file mode 100644
index a6b9da299..000000000
--- a/documents/docs/creating_app_agent/overview.md
+++ /dev/null
@@ -1,11 +0,0 @@
-# Creating Your AppAgent
-
-UFO provides a flexible framework and SDK for application developers to empower their applications with AI capabilities by wrapping them into an `AppAgent`. By creating an `AppAgent`, you can leverage the power of UFO to interact with your application and automate tasks.
-
-To create an `AppAgent`, you can provide the following components:
-
-| Component | Description | Usage Documentation |
-| --- | --- | --- |
-| [Help Documents](./help_document_provision.md) | The help documents for the application to guide the `AppAgent` in executing tasks. | [Learning from Help Documents](../advanced_usage/reinforce_appagent/learning_from_help_document.md) |
-| [User Demonstrations](./demonstration_provision.md) | The user demonstrations for the application to guide the `AppAgent` in executing tasks. | [Learning from User Demonstrations](../advanced_usage/reinforce_appagent/learning_from_demonstration.md) |
-| [Native API Wrappers](./warpping_app_native_api.md) | The native API wrappers for the application to interact with the application. | [Automator](../automator/overview.md) |
\ No newline at end of file
diff --git a/documents/docs/creating_app_agent/warpping_app_native_api.md b/documents/docs/creating_app_agent/warpping_app_native_api.md
deleted file mode 100644
index a73bdc526..000000000
--- a/documents/docs/creating_app_agent/warpping_app_native_api.md
+++ /dev/null
@@ -1,259 +0,0 @@
-# Wrapping Your App's Native API
-
-UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes.
-
-## How to Wrap Your App's Native API?
-
-Before developing the native API wrappers, we strongly recommend that you read the design of the [Automator](../automator/overview.md).
-
-### Step 1: Create a Receiver for the Native API
-
-The `Receiver` is a class that receives the native API calls from the `AppAgent` and executes them. To wrap your application's native API, you need to create a `Receiver` class that contains the methods to execute the native API calls.
-
-To create a `Receiver` class, follow these steps:
-
-#### 1. Create a Folder for Your Application
-
-- Navigate to the `ufo/automator/app_api/` directory.
-- Create a folder named after your application.
-
-#### 2. Create a Python File
-
-- Inside the folder you just created, add a Python file named after your application, for example, `{your_application}_client.py`.
-
-#### 3. Define the Receiver Class
-
-- In the Python file, define a class named `{Your_Receiver}`, inheriting from the `ReceiverBasic` class located in `ufo/automator/basic.py`.
-- Initialize the `Your_Receiver` class with the object that executes the native API calls. For example, if your API is based on a `com` object, initialize the `com` object in the `__init__` method of the `Your_Receiver` class.
-
-Example of `WinCOMReceiverBasic` class:
-
-```python
-class WinCOMReceiverBasic(ReceiverBasic):
- """
- The base class for Windows COM client.
- """
-
- _command_registry: Dict[str, Type[CommandBasic]] = {}
-
- def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
- """
- Initialize the Windows COM client.
- :param app_root_name: The app root name.
- :param process_name: The process name.
- :param clsid: The CLSID of the COM object.
- """
-
- self.app_root_name = app_root_name
- self.process_name = process_name
- self.clsid = clsid
- self.client = win32com.client.Dispatch(self.clsid)
- self.com_object = self.get_object_from_process_name()
-```
----
-
-#### 4. Define Methods to Execute Native API Calls
-
-- Define the methods in the `Your_Receiver` class to execute the native API calls.
-
-Example of `ExcelWinCOMReceiver` class:
-
-```python
-def table2markdown(self, sheet_name: str) -> str:
- """
- Convert the table in the sheet to a markdown table string.
- :param sheet_name: The sheet name.
- :return: The markdown table string.
- """
-
- sheet = self.com_object.Sheets(sheet_name)
- data = sheet.UsedRange()
- df = pd.DataFrame(data[1:], columns=data[0])
- df = df.dropna(axis=0, how="all")
- df = df.applymap(self.format_value)
-
- return df.to_markdown(index=False)
-```
----
-
-
-#### 5. Create a Factory Class
-
-- Create your Factory class inheriting from the `APIReceiverFactory` class to manage multiple `Receiver` classes that share the same API type.
-- Implement the `create_receiver` and `name` methods in the `ReceiverFactory` class. The `create_receiver` method should return the `Receiver` class.
-- By default, the `create_receiver` takes the `app_root_name` and `process_name` as parameters and returns the `Receiver` class.
-- Register the `ReceiverFactory` class with the decorator `@ReceiverManager.register`.
-
-Example of the `COMReceiverFactory` class:
-
-```python
-from ufo.automator.puppeteer import ReceiverManager
-
-@ReceiverManager.register
-class COMReceiverFactory(APIReceiverFactory):
- """
- The factory class for the COM receiver.
- """
-
- def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic:
- """
- Create the wincom receiver.
- :param app_root_name: The app root name.
- :param process_name: The process name.
- :return: The receiver.
- """
-
- com_receiver = self.__com_client_mapper(app_root_name)
- clsid = self.__app_root_mappping(app_root_name)
-
- if clsid is None or com_receiver is None:
- # print_with_color(f"Warning: Win32COM API is not supported for {process_name}.", "yellow")
- return None
-
- return com_receiver(app_root_name, process_name, clsid)
-
- @classmethod
- def name(cls) -> str:
- """
- Get the name of the receiver factory.
- :return: The name of the receiver factory.
- """
- return "COM"
-```
-
-!!!note
- The `create_receiver` method should return `None` if the application is not supported.
-
-
-!!!note
- You must register your `ReceiverFactory` with the decorator `@ReceiverManager.register` for the `ReceiverManager` to manage the `ReceiverFactory`.
-
-The `Receiver` class is now ready to receive the native API calls from the `AppAgent`.
-
-### Step 2: Create a Command for the Native API
-
-Commands are the actions that the `AppAgent` can execute on the application. To create a command for the native API, you need to create a `Command` class that contains the method to execute the native API calls.
-
-#### 1. Create a Command Class
-
-- Create a `Command` class in the same Python file where the `Receiver` class is located. The `Command` class should inherit from the `CommandBasic` class located in `ufo/automator/basic.py`.
-
-Example:
-```python
-class WinCOMCommand(CommandBasic):
- """
- The abstract command interface.
- """
-
- def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None:
- """
- Initialize the command.
- :param receiver: The receiver of the command.
- """
- self.receiver = receiver
- self.params = params if params is not None else {}
-
- @abstractmethod
- def execute(self):
- pass
-
- @classmethod
- def name(cls) -> str:
- """
- Get the name of the command.
- :return: The name of the command.
- """
- return cls.__name__
-```
----
-
-#### 2. Define the Execute Method
-
-- Define the `execute` method in the `Command` class to call the receiver to execute the native API calls.
-
-Example:
-```python
-def execute(self):
- """
- Execute the command to insert a table.
- :return: The inserted table.
- """
- return self.receiver.insert_excel_table(
- sheet_name=self.params.get("sheet_name", 1),
- table=self.params.get("table"),
- start_row=self.params.get("start_row", 1),
- start_col=self.params.get("start_col", 1),
- )
-```
-
-
-**3. Register the Command Class:**
-
-- Register the `Command` class in the corresponding `Receiver` class using the `@your_receiver.register` decorator.
-
-Example:
-```python
-@ExcelWinCOMReceiver.register
-class InsertExcelTable(WinCOMCommand):
- ...
-```
-
-The `Command` class is now registered in the `Receiver` class and available for the `AppAgent` to execute the native API calls.
-
-### Step 3: Provide Prompt Descriptions for the Native API
-
-To let the `AppAgent` know the usage of the native API calls, you need to provide prompt descriptions.
-
-#### 1. Create an api.yaml File
-
- - Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory.
-
-#### 2. Define Prompt Descriptions
-
-- Define the prompt descriptions for the native API calls in the `api.yaml` file.
-
-Example:
-
-```yaml
-table2markdown:
-summary: |-
- "table2markdown" is to get the table content in a sheet of the Excel app and convert it to markdown format.
-class_name: |-
- GetSheetContent
-usage: |-
- [1] API call: table2markdown(sheet_name: str)
- [2] Args:
- - sheet_name: The name of the sheet in the Excel app.
- [3] Example: table2markdown(sheet_name="Sheet1")
- [4] Available control item: Any control item in the Excel app.
- [5] Return: the markdown format string of the table content of the sheet.
-```
-
-
-!!! note
- The `table2markdown` is the name of the native API call. It `MUST` match the `name()` defined in the corresponding `Command` class!
-
-
-#### 3. Register the Prompt Address in `config_dev.yaml`
-
-- Register the prompt address by adding to the `APP_API_PROMPT_ADDRESS` field of `config_dev.yaml` file with the application program name as the key and the prompt file address as the value.
-
-Example:
-```yaml
-APP_API_PROMPT_ADDRESS: {
- "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml",
- "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml",
- "msedge.exe": "ufo/prompts/apps/web/api.yaml",
- "chrome.exe": "ufo/prompts/apps/web/api.yaml"
- "your_application_program_name": "YOUR_APPLICATION_API_PROMPT"
-}
-```
-
-!!!note
- The `your_application_program_name` **must** match the name of the application program.
-
-The `AppAgent` can now use the prompt descriptions to understand the usage of the native API calls.
-
----
-
-By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the `AppAgent` to execute the native API calls on the application!
diff --git a/documents/docs/faq.md b/documents/docs/faq.md
index 9f0ca0e7f..2f9c7d28d 100644
--- a/documents/docs/faq.md
+++ b/documents/docs/faq.md
@@ -1,31 +1,560 @@
-# FAQ
+# Frequently Asked Questions (FAQ)
-We provide answers to some frequently asked questions about the UFO.
+Quick answers to common questions about UFO³ Galaxy, UFO², Linux Agents, and general troubleshooting.
-## Q1: Why is it called UFO?
+---
-A: UFO stands for **U**I **Fo**cused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic.
+## 🎯 General Questions
-## Q2: Can I use UFO on Linux or macOS?
-A: UFO is currently only supported on Windows OS.
+### Q: What is UFO³?
-## Q3: Why the latency of UFO is high?
-A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency.
+**A:** UFO³ is the third iteration of the UFO project, encompassing three major frameworks:
-## Q4: What models does UFO support?
-A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the `Supported Models` section of the documentation.
+- **UFO²** - Desktop AgentOS for Windows automation
+- **UFO³ Galaxy** - Multi-device orchestration framework
+- **Linux Agent** - Server and CLI automation for Linux
-## Q5: Can I use non-vision models in UFO?
-A: Yes, you can use non-vision models in UFO. You can set the `VISUAL_MODE` to `False` in the `config.yaml` file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance.
+### Q: Why is it called UFO?
-## Q6: Can I host my own LLM endpoint?
-A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the `Supported Models` section for more details.
+**A:** UFO stands for **U**I **Fo**cused agent. The name was given to the first version of the project and has been retained through all iterations (UFO v1, UFO², UFO³) as the project evolved from a simple UI-focused agent to a comprehensive multi-device orchestration framework.
-## Q7: Can I use non-English requests in UFO?
-A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages.
+### Q: Which version should I use?
-## Q8: Why it shows the error `Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))`?
-A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint.
+**A:** Choose based on your needs:
-!!! info
- To get more support, please submit an issue on the [GitHub Issues](https://github.com/microsoft/UFO/issues), or send an email to [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com).
\ No newline at end of file
+| Use Case | Recommended Version |
+|----------|-------------------|
+| Windows desktop automation only | [UFO²](getting_started/quick_start_ufo2.md) |
+| Cross-device workflows (Windows + Linux) | [UFO³ Galaxy](getting_started/quick_start_galaxy.md) |
+| Linux server management only | [Linux Agent](getting_started/quick_start_linux.md) |
+| Multi-device orchestration | [UFO³ Galaxy](getting_started/quick_start_galaxy.md) |
+
+### Q: What's the difference between UFO² and UFO³ Galaxy?
+
+**A: UFO²** is for single Windows desktop automation with:
+- Deep Windows OS integration (UIA, Win32, COM)
+- Office application automation
+- GUI + API hybrid execution
+
+**UFO³ Galaxy** orchestrates multiple devices with:
+- Cross-platform support (Windows + Linux)
+- Distributed task execution
+- Device capability-based routing
+- Constellation-based DAG orchestration
+
+See [Migration Guide](getting_started/migration_ufo2_to_galaxy.md) for details.
+
+### Q: Can I use UFO on Linux or macOS?
+
+**A:** Yes and No:
+
+- **✅ Linux:** Supported via Linux Agent for server/CLI automation
+- **❌ macOS:** Not currently supported (Windows and Linux only)
+- **Windows:** Full UFO² desktop automation support
+
+---
+
+## 🔧 Installation & Setup
+
+### Q: Which Python version do I need?
+
+**A:** Python **3.10 or higher** is required for all UFO³ components.
+
+```bash
+# Check your Python version
+python --version
+```
+
+### Q: What models does UFO support?
+
+**A:** UFO³ supports multiple LLM providers:
+
+- **OpenAI** - GPT-4o, GPT-4, GPT-3.5
+- **Azure OpenAI** - All Azure-hosted models
+- **Google Gemini** - Gemini Pro, Gemini Flash
+- **Anthropic Claude** - Claude 3.5, Claude 3
+- **Qwen** - Local or API deployment
+- **DeepSeek** - DeepSeek models
+- **Ollama** - Local model hosting
+- And more...
+
+See [Model Configuration Guide](configuration/models/overview.md) for the complete list and setup instructions.
+
+### Q: Can I use non-vision models in UFO?
+
+**A:** Yes! You can disable visual mode:
+
+```yaml
+# config/ufo/system.yaml
+VISUAL_MODE: false
+```
+
+However, UFO² is designed for vision models. Non-vision models may have reduced performance for GUI automation tasks.
+
+### Q: Can I host my own LLM endpoint?
+
+**A:** Yes! UFO³ supports custom endpoints:
+
+```yaml
+# config/ufo/agents.yaml
+HOST_AGENT:
+ API_TYPE: "openai" # Or compatible API
+ API_BASE: "http://your-endpoint.com/v1/chat/completions"
+ API_KEY: "your-key"
+ API_MODEL: "your-model-name"
+```
+
+See [Model Configuration](configuration/models/overview.md) for details.
+
+### Q: Do I need API keys for all agents?
+
+**A:** No, only for LLM-powered agents:
+
+| Component | Requires API Key | Purpose |
+|-----------|-----------------|---------|
+| **ConstellationAgent** (Galaxy) | ✅ Yes | Orchestration reasoning |
+| **HostAgent** (UFO²) | ✅ Yes | Task planning |
+| **AppAgent** (UFO²) | ✅ Yes | Action execution |
+| **LinuxAgent** | ✅ Yes | Command planning |
+| **Device Server** | ❌ No | Message routing only |
+| **MCP Servers** | ❌ No | Tool provider only |
+
+---
+
+## ⚙️ Configuration
+
+### Q: Where are configuration files located?
+
+**A:** UFO³ uses a modular configuration system in `config/`:
+
+```
+config/
+├── ufo/ # UFO² configuration
+│ ├── agents.yaml # LLM and agent settings
+│ ├── system.yaml # Runtime settings
+│ ├── rag.yaml # Knowledge retrieval
+│ └── mcp.yaml # MCP server configuration
+└── galaxy/ # Galaxy configuration
+ ├── agent.yaml # ConstellationAgent LLM
+ ├── devices.yaml # Device pool
+ └── constellation.yaml # Runtime settings
+```
+
+### Q: Can I still use the old `ufo/config/config.yaml`?
+
+**A:** Yes, for backward compatibility, but we recommend migrating to the new modular system:
+
+```bash
+# Check current configuration
+python -m ufo.tools.validate_config ufo --show-config
+
+# Migrate from legacy to new
+python -m ufo.tools.migrate_config
+```
+
+See [Configuration Migration Guide](configuration/system/migration.md) for details.
+
+### Q: How do I protect my API keys?
+
+**A:** Best practices for API key security:
+
+1. **Never commit `.yaml` files with keys** - Use `.template` files
+ ```bash
+ # Good pattern
+ config/ufo/agents.yaml.template # Commit this (with placeholders)
+ config/ufo/agents.yaml # DON'T commit (has real keys)
+ ```
+
+2. **Use environment variables** for sensitive data:
+ ```yaml
+ # In agents.yaml
+ HOST_AGENT:
+ API_KEY: ${OPENAI_API_KEY} # Reads from environment
+ ```
+
+3. **Add to `.gitignore`**:
+ ```
+ config/**/agents.yaml
+ config/**/agent.yaml
+ !**/*.template
+ ```
+
+---
+
+## 🌌 UFO³ Galaxy Questions
+
+### Q: What's the minimum number of devices for Galaxy?
+
+**A:** Galaxy requires **at least 1 device agent** (Windows or Linux) to be useful, but you can start with just one device and add more later.
+
+```yaml
+# Minimal Galaxy setup (1 device)
+devices:
+ - device_id: "my_windows_pc"
+ server_url: "ws://localhost:5000/ws"
+ os: "windows"
+```
+
+### Q: Can Galaxy mix Windows and Linux devices?
+
+**A:** Yes! Galaxy can orchestrate heterogeneous devices:
+
+```yaml
+devices:
+ - device_id: "windows_desktop"
+ os: "windows"
+ capabilities: ["office", "excel", "outlook"]
+
+ - device_id: "linux_server"
+ os: "linux"
+ capabilities: ["server", "database", "log_analysis"]
+```
+
+Galaxy automatically routes tasks based on device capabilities.
+
+### Q: Do all devices need to be on the same network?
+
+**A:** No, devices can be distributed across networks using SSH tunneling:
+
+- **Same network:** Direct WebSocket connections
+- **Different networks:** Use SSH tunnels (reverse/forward)
+- **Cloud + local:** SSH tunnels with public gateways
+
+See [Linux Quick Start - SSH Tunneling](getting_started/quick_start_linux.md#network-connectivity-ssh-tunneling) for examples.
+
+### Q: How does Galaxy decide which device to use?
+
+**A:** Galaxy uses **capability-based routing**:
+
+1. Analyzes the task requirements
+2. Matches against device `capabilities` in `devices.yaml`
+3. Considers device `metadata` (OS, performance, etc.)
+4. Selects the best-fit device(s)
+
+Example:
+```yaml
+# Task: "Analyze error logs on the production server"
+# → Galaxy routes to device with:
+capabilities:
+ - "log_analysis"
+ - "server_management"
+os: "linux"
+```
+
+---
+
+## 🐧 Linux Agent Questions
+
+### Q: Does the Linux Agent require a GUI?
+
+**A:** No! The Linux Agent is designed for headless servers:
+
+- Executes CLI commands via MCP
+- No X11/desktop environment needed
+- Works over SSH
+- Perfect for remote servers
+
+### Q: Can I run multiple Linux Agents on one machine?
+
+**A:** Yes, using different ports and client IDs:
+
+```bash
+# Agent 1
+python -m ufo.server.app --port 5001
+python -m ufo.client.client --ws --client-id linux_1 --platform linux
+
+# Agent 2 (same machine)
+python -m ufo.server.app --port 5002
+python -m ufo.client.client --ws --client-id linux_2 --platform linux
+```
+
+### Q: What's the MCP service for?
+
+**A:** The MCP (Model Context Protocol) service provides the **actual command execution tools** for the Linux Agent:
+
+```
+Linux Agent (LLM reasoning)
+ ↓
+MCP Service (tool provider)
+ ↓
+Bash commands (actual execution)
+```
+
+Without MCP, the Linux Agent can't execute commands - it can only plan them.
+
+---
+
+## 🪟 UFO² Questions
+
+### Q: Does UFO² work on Windows 10?
+
+**A:** Yes! UFO² supports:
+- ✅ Windows 11 (recommended)
+- ✅ Windows 10 (fully supported)
+- ❌ Windows 8.1 or earlier (not tested)
+
+### Q: Can UFO² automate Office apps?
+
+**A:** Yes! UFO² has enhanced Office support through:
+- **MCP Office servers** - Direct API access to Excel, Word, Outlook, PowerPoint
+- **GUI automation** - Fallback for unsupported operations
+- **Hybrid execution** - Automatically chooses API or GUI
+
+Enable MCP in `config/ufo/mcp.yaml` for better Office automation.
+
+### Q: Does UFO² interrupt my work?
+
+**A:** UFO² can run automation tasks on your current desktop. For non-disruptive operation, you can run it on a separate machine or virtual desktop environment.
+
+> **Note:** Picture-in-Picture mode is planned for future releases.
+
+### Q: Can I use UFO² without MCP?
+
+**A:** UFO² requires MCP (Model Context Protocol) servers for tool execution. MCP provides the interface between the LLM agents and system operations (Windows APIs, Office automation, etc.). Without MCP, UFO² cannot perform actions.
+
+---
+
+## 🐛 Common Issues & Troubleshooting
+
+### Issue: "Configuration file not found"
+
+**Error:**
+```
+FileNotFoundError: config/ufo/agents.yaml not found
+```
+
+**Solution:**
+```bash
+# Copy template files
+cp config/ufo/agents.yaml.template config/ufo/agents.yaml
+
+# Edit with your API keys
+notepad config/ufo/agents.yaml # Windows
+nano config/ufo/agents.yaml # Linux
+```
+
+### Issue: "API Authentication Error"
+
+**Error:**
+```
+openai.AuthenticationError: Invalid API key
+```
+
+**Solutions:**
+
+1. **Check API key format:**
+ ```yaml
+ API_KEY: "sk-..." # OpenAI starts with sk-
+ API_KEY: "..." # Azure uses deployment key
+ ```
+
+2. **Verify API_TYPE matches your provider:**
+ ```yaml
+ API_TYPE: "openai" # For OpenAI
+ API_TYPE: "aoai" # For Azure OpenAI
+ ```
+
+3. **Check for extra spaces/quotes** in YAML
+
+4. **For Azure:** Verify `API_DEPLOYMENT_ID` is set
+
+### Issue: "Connection aborted / Remote end closed connection"
+
+**Error:**
+```
+Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
+```
+
+**Solutions:**
+
+- Check network connection (VPN, proxy, firewall)
+- Verify LLM endpoint is accessible: `curl https://api.openai.com/v1/models`
+- Check endpoint status (Azure, OpenAI, etc.)
+- Try increasing timeout in config
+- Verify API base URL is correct
+
+### Issue: "Device not connecting to Galaxy"
+
+**Error:**
+```
+ERROR - [WS] Failed to connect to ws://localhost:5000/ws
+Connection refused
+```
+
+**Checklist:**
+
+- [ ] Is the server running? (`curl http://localhost:5000/api/health`)
+- [ ] Port number correct? (Server: `--port 5000`, Client: `ws://...:5000/ws`)
+- [ ] Platform flag set? (`--platform windows` or `--platform linux`)
+- [ ] Firewall blocking? (Allow port 5000)
+- [ ] SSH tunnel established? (If using remote devices)
+
+### Issue: "device_id mismatch in Galaxy"
+
+**Error:**
+```
+ERROR - Device 'linux_agent_1' not found in configuration
+```
+
+**Cause:** Mismatch between `devices.yaml` and client command
+
+**Solution:** Ensure exact match:
+
+| Location | Field | Example |
+|----------|-------|---------|
+| `devices.yaml` | `device_id:` | `"linux_agent_1"` |
+| Client command | `--client-id` | `linux_agent_1` |
+
+**Critical:** IDs must match **exactly** (case-sensitive, no typos).
+
+### Issue: "MCP service not responding (Linux)"
+
+**Error:**
+```
+ERROR - Cannot connect to MCP server at http://127.0.0.1:8010
+```
+
+**Solutions:**
+
+1. **Check if MCP service is running:**
+ ```bash
+ curl http://localhost:8010/health
+ ps aux | grep linux_mcp_server
+ ```
+
+2. **Restart MCP service:**
+ ```bash
+ pkill -f linux_mcp_server
+ python -m ufo.client.mcp.http_servers.linux_mcp_server
+ ```
+
+3. **Check port conflict:**
+ ```bash
+ lsof -i :8010
+ # If port taken, use different port:
+ python -m ufo.client.mcp.http_servers.linux_mcp_server --port 8011
+ ```
+
+### Issue: "Tasks failing after X steps"
+
+**Cause:** `MAX_STEP` limit reached
+
+**Solution:** Increase step limit in `config/ufo/system.yaml`:
+
+```yaml
+# Default is 50
+MAX_STEP: 100 # For complex tasks
+
+# Or disable limit (not recommended)
+MAX_STEP: -1
+```
+
+### Issue: "Too many LLM calls / high cost"
+
+**Solutions:**
+
+1. **Enable action sequences** (bundles actions):
+ ```yaml
+ # config/ufo/system.yaml
+ ACTION_SEQUENCE: true
+ ```
+
+2. **Use vision-capable models for GUI tasks:**
+ ```yaml
+ # config/ufo/agents.yaml
+ APP_AGENT:
+ API_MODEL: "gpt-4o" # Use vision models for GUI automation
+ ```
+
+ > **Note:** Non-vision models like gpt-3.5-turbo cannot process screenshots and should not be used for GUI automation tasks.
+
+3. **Enable experience learning** (reuse patterns):
+ ```yaml
+ # config/ufo/rag.yaml
+ RAG_EXPERIENCE: true
+ ```
+
+### Issue: "Why is the latency high?"
+
+**A:** Latency depends on several factors:
+
+- **LLM response time** - GPT-4o typically takes 10-30 seconds per step
+- **Network speed** - API calls to OpenAI/Azure endpoints
+- **Endpoint workload** - Provider server load
+- **Visual mode** - Image processing adds overhead
+
+**To reduce latency:**
+- Use faster models (gpt-3.5-turbo vs gpt-4o)
+- Enable action sequences to batch operations
+- Use local models (Ollama) if acceptable
+- Disable visual mode if not needed
+
+### Issue: "Can I use non-English requests?"
+
+**A:** Yes! Most modern LLMs support multiple languages:
+
+- GPT-4o, GPT-4: Excellent multilingual support
+- Gemini: Good multilingual support
+- Qwen: Excellent for Chinese
+- Claude: Good multilingual support
+
+Performance may vary by language and model. Test with your specific language and model combination.
+
+---
+
+## 📚 Where to Find More Help
+
+### Documentation
+
+| Topic | Link |
+|-------|------|
+| **Getting Started** | [UFO² Quick Start](getting_started/quick_start_ufo2.md), [Galaxy Quick Start](getting_started/quick_start_galaxy.md), [Linux Quick Start](getting_started/quick_start_linux.md) |
+| **Configuration** | [Configuration Overview](configuration/system/overview.md) |
+| **Troubleshooting** | Quick start guides have detailed troubleshooting sections |
+| **Architecture** | [Project Structure](project_directory_structure.md) |
+| **More Guidance** | [User & Developer Guide](getting_started/more_guidance.md) |
+
+### Community & Support
+
+- **GitHub Discussions:** [https://github.com/microsoft/UFO/discussions](https://github.com/microsoft/UFO/discussions)
+- **GitHub Issues:** [https://github.com/microsoft/UFO/issues](https://github.com/microsoft/UFO/issues)
+- **Email:** ufo-agent@microsoft.com
+
+### Debugging Tips
+
+1. **Enable debug logging:**
+ ```yaml
+ # config/ufo/system.yaml
+ LOG_LEVEL: "DEBUG"
+ ```
+
+2. **Check log files:**
+ ```
+ logs//
+ ├── request.log # Request logs
+ ├── response.log # Response logs
+ ├── action_step*.png # Screenshots at each step
+ └── action_step*_annotated.png # Annotated screenshots
+ ```
+
+3. **Validate configuration:**
+ ```bash
+ python -m ufo.tools.validate_config ufo --show-config
+ python -m ufo.tools.validate_config galaxy --show-config
+ ```
+
+4. **Test LLM connectivity:**
+ ```python
+ # Test your API key
+ from openai import OpenAI
+ client = OpenAI(api_key="your-key")
+ response = client.chat.completions.create(
+ model="gpt-4o",
+ messages=[{"role": "user", "content": "Hello"}]
+ )
+ print(response.choices[0].message.content)
+ ```
+
+---
+
+> **💡 Still have questions?** Check the [More Guidance](getting_started/more_guidance.md) page for additional resources, or reach out to the community!
diff --git a/documents/docs/galaxy/agent_registration/agent_profile.md b/documents/docs/galaxy/agent_registration/agent_profile.md
new file mode 100644
index 000000000..42a27712d
--- /dev/null
+++ b/documents/docs/galaxy/agent_registration/agent_profile.md
@@ -0,0 +1,808 @@
+# 📊 AgentProfile - Comprehensive Agent Representation
+
+The **AgentProfile** is a multi-source data structure that consolidates administrator configuration, service-level capabilities, and real-time client telemetry into a unified, dynamically updated representation of each constellation agent.
+
+---
+
+## 📋 Overview
+
+The **AgentProfile** is the primary data structure representing a registered constellation agent. It aggregates information from **three distinct sources** to provide a comprehensive view of each agent's identity, capabilities, operational status, and hardware characteristics.
+
+For a complete understanding of how agents work in the constellation system, see:
+
+- [Constellation Overview](../constellation/overview.md) - Architecture and multi-device coordination
+- [Constellation Agent](../constellation_agent/overview.md) - Agent behavior and lifecycle
+
+| Function | Description |
+|----------|-------------|
+| **Identity Management** | Unique identification and endpoint tracking |
+| **Capability Advertisement** | Declare supported features and tools |
+| **Status Monitoring** | Real-time operational state tracking |
+| **Resource Profiling** | Hardware and system information |
+| **Task Assignment** | Enable intelligent task routing decisions |
+
+---
+
+## 🏗️ Structure Definition
+
+### Core Dataclass
+
+```python
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+from datetime import datetime
+from enum import Enum
+
+class DeviceStatus(Enum):
+ """Device connection status"""
+ DISCONNECTED = "disconnected"
+ CONNECTING = "connecting"
+ CONNECTED = "connected"
+ FAILED = "failed"
+ REGISTERING = "registering"
+ BUSY = "busy"
+ IDLE = "idle"
+
+@dataclass
+class AgentProfile:
+ """
+ Device information and capabilities.
+
+ Consolidates information from three sources:
+ 1. User-specified registration (devices.yaml)
+ 2. Service-level manifest (AIP registration)
+ 3. Client-side telemetry (DeviceInfoProvider)
+ """
+
+ # === Identity ===
+ device_id: str # Unique device identifier
+ server_url: str # WebSocket endpoint URL
+
+ # === Platform & Capabilities ===
+ os: Optional[str] = None # Operating system (windows, linux, darwin)
+ capabilities: List[str] = field(default_factory=list) # Advertised capabilities
+ metadata: Dict[str, Any] = field(default_factory=dict) # Multi-source metadata
+
+ # === Operational Status ===
+ status: DeviceStatus = DeviceStatus.DISCONNECTED # Current state
+ last_heartbeat: Optional[datetime] = None # Last heartbeat timestamp
+
+ # === Connection Management ===
+ connection_attempts: int = 0 # Connection retry counter
+ max_retries: int = 5 # Maximum retry attempts
+
+ # === Task Execution ===
+ current_task_id: Optional[str] = None # Currently executing task ID
+```
+
+---
+
+## 🔍 Field Reference
+
+### Identity Fields
+
+| Field | Type | Source | Description | Example |
+|-------|------|--------|-------------|---------|
+| `device_id` | `str` | User Config | Unique identifier for the device | `"windowsagent"`, `"linux_gpu_01"` |
+| `server_url` | `str` | User Config | WebSocket endpoint of device agent server | `"ws://localhost:5005/ws"` |
+
+The `device_id` must be unique across the entire constellation. Attempting to register a duplicate `device_id` will fail.
+
+### Platform & Capabilities
+
+| Field | Type | Source | Description | Example |
+|-------|------|--------|-------------|---------|
+| `os` | `Optional[str]` | User Config + Telemetry | Operating system type | `"windows"`, `"linux"`, `"darwin"` |
+| `capabilities` | `List[str]` | User Config + Telemetry | Advertised capabilities/features | `["gui", "browser", "office"]` |
+| `metadata` | `Dict[str, Any]` | All Sources | Multi-source metadata aggregation | See [Metadata Structure](#metadata-structure) |
+
+**Capabilities Merging:**
+
+```python
+# Initial capabilities from user config
+capabilities = ["web_browsing", "office_applications"]
+
+# After telemetry collection, auto-detected features are merged
+# Result: ["web_browsing", "office_applications", "gui", "cli", "browser", "file_system"]
+```
+
+### Operational Status
+
+| Field | Type | Source | Description | Example |
+|-------|------|--------|-------------|---------|
+| `status` | `DeviceStatus` | Runtime | Current connection/operational state | `DeviceStatus.IDLE` |
+| `last_heartbeat` | `Optional[datetime]` | Runtime | Timestamp of last heartbeat | `2025-11-06T10:30:45Z` |
+
+**Status Values:**
+
+```python
+DeviceStatus.DISCONNECTED # Not connected
+DeviceStatus.CONNECTING # Connection in progress
+DeviceStatus.CONNECTED # WebSocket established
+DeviceStatus.REGISTERING # Performing AIP registration
+DeviceStatus.IDLE # Ready for tasks
+DeviceStatus.BUSY # Executing a task
+DeviceStatus.FAILED # Connection or execution failed
+```
+
+### Connection Management
+
+| Field | Type | Source | Description | Example |
+|-------|------|--------|-------------|---------|
+| `connection_attempts` | `int` | Runtime | Number of connection attempts made | `0`, `3` |
+| `max_retries` | `int` | User Config | Maximum reconnection attempts before giving up | `5`, `10` |
+
+When a device disconnects, the system automatically retries connection up to `max_retries` times with exponential backoff.
+
+### Task Execution
+
+| Field | Type | Source | Description | Example |
+|-------|------|--------|-------------|---------|
+| `current_task_id` | `Optional[str]` | Runtime | ID of task currently being executed | `"task_12345"`, `None` |
+
+**Usage in Task Queue:**
+
+```python
+# When task is assigned
+profile.status = DeviceStatus.BUSY
+profile.current_task_id = "task_12345"
+
+# When task completes
+profile.status = DeviceStatus.IDLE
+profile.current_task_id = None
+```
+
+---
+
+## 🗂️ Metadata Structure
+
+The `metadata` dictionary is a flexible container that aggregates information from all three profiling sources:
+
+### Metadata Schema
+
+```python
+metadata = {
+ # ===== Source 1: User Configuration =====
+ "location": str, # Physical location
+ "performance": str, # Performance tier
+ "description": str, # Human-readable description
+ "operation_engineer_email": str, # Contact information
+ "tags": List[str], # Custom tags
+ # ... any custom user-defined fields
+
+ # ===== Source 2: Service Manifest =====
+ "platform": str, # Platform type (from registration)
+ "registration_time": str, # ISO timestamp of registration
+
+ # ===== Source 3: Client Telemetry =====
+ "system_info": {
+ "platform": str, # OS platform (windows, linux, darwin)
+ "os_version": str, # OS version string
+ "cpu_count": int, # Number of CPU cores
+ "memory_total_gb": float, # Total RAM in GB
+ "hostname": str, # Device hostname
+ "ip_address": str, # Device IP address
+ "platform_type": str, # Device category (computer, mobile, etc.)
+ "schema_version": str # Telemetry schema version
+ },
+ "custom_metadata": { # Optional custom metadata from config
+ "datacenter": str,
+ "tier": str,
+ # ... server-configured metadata
+ }
+}
+```
+
+### Example Metadata
+
+```python
+# Complete metadata example from a Windows GPU workstation
+metadata = {
+ # User Configuration
+ "location": "office_desktop",
+ "performance": "very_high",
+ "description": "Primary Windows workstation with GPU",
+ "operation_engineer_email": "admin@example.com",
+ "tags": ["production", "gpu-enabled", "high-priority"],
+
+ # Service Manifest
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00.000Z",
+
+ # Client Telemetry
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-GPU01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ },
+ "custom_metadata": {
+ "datacenter": "us-west-2",
+ "tier": "premium",
+ "gpu_type": "NVIDIA RTX 4090",
+ "gpu_count": 1
+ }
+}
+```
+
+---
+
+## 🔄 Multi-Source Construction
+
+### Three-Source Architecture
+
+```mermaid
+graph LR
+ A[User Config devices.yaml]
+ B[AIP Registration Service Manifest]
+ C[Device Telemetry DeviceInfoProvider]
+
+ A -->|device_id, server_url capabilities, metadata| D[AgentProfile]
+ B -->|platform, registration_time| D
+ C -->|system_info, features| D
+
+ style A fill:#e1f5ff
+ style B fill:#fff4e1
+ style C fill:#e8f5e9
+ style D fill:#f3e5f5
+```
+
+### Construction Timeline
+
+```mermaid
+sequenceDiagram
+ participant Config as devices.yaml
+ participant Manager as DeviceManager
+ participant Server as UFO Server
+ participant Telemetry as DeviceInfoProvider
+
+ Note over Config,Telemetry: Phase 1: Initial Registration
+ Config->>Manager: Load device config
+ Manager->>Manager: Create AgentProfile (device_id, server_url, capabilities)
+
+ Note over Config,Telemetry: Phase 2: Service Registration
+ Manager->>Server: WebSocket REGISTER
+ Server-->>Manager: Add platform, registration_time
+
+ Note over Config,Telemetry: Phase 3: Telemetry Collection
+ Manager->>Server: request_device_info()
+ Server->>Telemetry: collect_system_info()
+ Telemetry-->>Server: system_info
+ Server-->>Manager: system_info
+ Manager->>Manager: Update AgentProfile (merge system_info & features)
+```
+
+### Merging Strategy
+
+**1. User Configuration (Priority: Baseline)**
+
+```python
+# Initial AgentProfile creation
+profile = AgentProfile(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office_applications"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high"
+ }
+)
+```
+
+**2. Service Manifest (Priority: Override `os`, Add registration data)**
+
+```python
+# During AIP registration
+profile.metadata.update({
+ "platform": "windows", # From registration message
+ "registration_time": "2025-11-06T10:30:00Z"
+})
+```
+
+**3. Client Telemetry (Priority: Merge capabilities, Add system_info)**
+
+```python
+# After DeviceInfoProvider collects data
+system_info = {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-GPU01",
+ "ip_address": "192.168.1.100",
+ "supported_features": ["gui", "cli", "browser", "file_system", "office", "windows_apps"],
+ "platform_type": "computer"
+}
+
+# Update OS if not already set
+if not profile.os:
+ profile.os = system_info["platform"]
+
+# Merge capabilities (avoid duplicates)
+existing_caps = set(profile.capabilities)
+new_caps = set(system_info["supported_features"])
+profile.capabilities = list(existing_caps.union(new_caps))
+# Result: ["web_browsing", "office_applications", "gui", "cli", "browser", "file_system", "windows_apps"]
+
+# Add system_info to metadata
+profile.metadata["system_info"] = system_info
+```
+
+---
+
+## 📊 Example Profiles
+
+### Example 1: Windows GPU Workstation
+
+```python
+AgentProfile(
+ # Identity
+ device_id="gpu_workstation_01",
+ server_url="ws://192.168.1.100:5005/ws",
+
+ # Platform & Capabilities
+ os="windows",
+ capabilities=[
+ # User-configured
+ "web_browsing",
+ "office_applications",
+ "gpu_computation",
+ "model_training",
+ # Auto-detected
+ "gui",
+ "cli",
+ "browser",
+ "file_system",
+ "windows_apps"
+ ],
+
+ # Metadata
+ metadata={
+ # User Configuration
+ "location": "office_desktop",
+ "performance": "very_high",
+ "description": "Primary GPU workstation for ML training",
+ "operation_engineer_email": "ml-team@example.com",
+ "tags": ["production", "gpu", "ml"],
+
+ # Service Manifest
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00Z",
+
+ # Client Telemetry
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 64.0,
+ "hostname": "DESKTOP-GPU01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ },
+ "custom_metadata": {
+ "gpu_type": "NVIDIA RTX 4090",
+ "gpu_count": 2,
+ "gpu_memory_gb": 48
+ }
+ },
+
+ # Status
+ status=DeviceStatus.IDLE,
+ last_heartbeat=datetime(2025, 11, 6, 10, 45, 30),
+
+ # Connection
+ connection_attempts=0,
+ max_retries=5,
+
+ # Task
+ current_task_id=None
+)
+```
+
+### Profile Summary
+
+```mermaid
+graph TB
+ subgraph "AgentProfile: gpu_workstation_01"
+ A["Status: IDLE Last Heartbeat: 10:45:30"]
+
+ B["System ━━━━━ OS: Windows 10.0.22631 CPU: 16 cores Memory: 64.0 GB Host: DESKTOP-GPU01 IP: 192.168.1.100"]
+
+ C["Capabilities ━━━━━ • web_browsing • office_applications • gpu_computation • model_training • gui, cli, browser • file_system"]
+
+ D["Metadata ━━━━━ Location: office_desktop Performance: very_high Tags: production, gpu, ml GPU: 2× NVIDIA RTX 4090"]
+ end
+
+ style A fill:#e3f2fd
+ style B fill:#f3e5f5
+ style C fill:#e8f5e9
+ style D fill:#fff3e0
+```
+
+### Example 2: Linux Server
+
+```python
+AgentProfile(
+ # Identity
+ device_id="linux_server_01",
+ server_url="ws://10.0.0.50:5001/ws",
+
+ # Platform & Capabilities
+ os="linux",
+ capabilities=[
+ # User-configured
+ "server_management",
+ "log_monitoring",
+ "database_operations",
+ # Auto-detected
+ "cli",
+ "file_system",
+ "linux_apps"
+ ],
+
+ # Metadata
+ metadata={
+ # User Configuration
+ "location": "datacenter_rack_a42",
+ "performance": "medium",
+ "description": "Production Linux server for backend services",
+ "logs_file_path": "/var/log/application.log",
+ "dev_path": "/home/deploy/",
+
+ # Service Manifest
+ "platform": "linux",
+ "registration_time": "2025-11-06T09:15:00Z",
+
+ # Client Telemetry
+ "system_info": {
+ "platform": "linux",
+ "os_version": "#1 SMP PREEMPT_DYNAMIC Wed Nov 1 15:36:23 UTC 2023",
+ "cpu_count": 8,
+ "memory_total_gb": 16.0,
+ "hostname": "prod-server-01",
+ "ip_address": "10.0.0.50",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ }
+ },
+
+ # Status
+ status=DeviceStatus.BUSY,
+ last_heartbeat=datetime(2025, 11, 6, 10, 44, 15),
+
+ # Connection
+ connection_attempts=0,
+ max_retries=3,
+
+ # Task
+ current_task_id="task_monitoring_567"
+)
+```
+
+---
+
+## 🔄 Lifecycle Operations
+
+### Creation
+
+```python
+from galaxy.client.components import DeviceRegistry, AgentProfile, DeviceStatus
+
+registry = DeviceRegistry()
+
+# Create AgentProfile during registration
+profile = registry.register_device(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office"],
+ metadata={"location": "office"},
+ max_retries=5
+)
+
+print(f"Created: {profile.device_id}")
+print(f"Status: {profile.status.value}") # "disconnected"
+```
+
+### Status Updates
+
+```python
+# Update connection status
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTING)
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTED)
+registry.update_device_status("windowsagent", DeviceStatus.IDLE)
+
+# Set device busy with task
+registry.set_device_busy("windowsagent", task_id="task_123")
+profile = registry.get_device("windowsagent")
+print(f"Status: {profile.status.value}") # "busy"
+print(f"Current Task: {profile.current_task_id}") # "task_123"
+
+# Set device idle (task complete)
+registry.set_device_idle("windowsagent")
+profile = registry.get_device("windowsagent")
+print(f"Status: {profile.status.value}") # "idle"
+print(f"Current Task: {profile.current_task_id}") # None
+```
+
+### System Info Updates
+
+```python
+# Update with telemetry data
+system_info = {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "supported_features": ["gui", "cli", "browser", "file_system", "office"],
+ "platform_type": "computer",
+ "schema_version": "1.0"
+}
+
+registry.update_device_system_info("windowsagent", system_info)
+
+# Verify update
+profile = registry.get_device("windowsagent")
+print(f"OS: {profile.os}") # "windows"
+print(f"CPU Cores: {profile.metadata['system_info']['cpu_count']}") # 16
+print(f"Memory: {profile.metadata['system_info']['memory_total_gb']} GB") # 32.0
+print(f"Capabilities: {profile.capabilities}")
+# ["web_browsing", "office", "gui", "cli", "browser", "file_system"]
+```
+
+### Heartbeat Tracking
+
+```python
+from datetime import datetime, timezone
+
+# Update heartbeat
+registry.update_heartbeat("windowsagent")
+
+profile = registry.get_device("windowsagent")
+print(f"Last Heartbeat: {profile.last_heartbeat}")
+# 2025-11-06 10:45:30.123456+00:00
+```
+
+### Connection Retry Management
+
+```python
+# Increment connection attempts
+attempts = registry.increment_connection_attempts("windowsagent")
+print(f"Attempts: {attempts}/{profile.max_retries}")
+
+# Reset after successful connection
+registry.reset_connection_attempts("windowsagent")
+profile = registry.get_device("windowsagent")
+print(f"Attempts: {profile.connection_attempts}") # 0
+```
+
+---
+
+## 🎯 Usage Patterns
+
+The following patterns demonstrate how AgentProfile is used for intelligent task routing and device management. For more details on task constellation concepts, see [Constellation Overview](../constellation/overview.md).
+
+### Task Assignment Decision
+
+```python
+def can_assign_task(profile: AgentProfile, required_capabilities: List[str]) -> bool:
+ """
+ Check if device can handle a task based on its profile.
+ """
+ # Check if device is available
+ if profile.status != DeviceStatus.IDLE:
+ return False
+
+ # Check if all required capabilities are supported
+ device_caps = set(profile.capabilities)
+ required_caps = set(required_capabilities)
+
+ if not required_caps.issubset(device_caps):
+ return False
+
+ # Optional: Check system resources
+ system_info = profile.metadata.get("system_info", {})
+ if system_info.get("memory_total_gb", 0) < 8: # Require at least 8GB
+ return False
+
+ return True
+
+# Usage
+profile = registry.get_device("windowsagent")
+if can_assign_task(profile, ["browser", "gui"]):
+ await manager.assign_task_to_device(
+ task_id="task_web_001",
+ device_id="windowsagent",
+ task_description="Navigate to website and extract data",
+ task_data={"url": "https://example.com"}
+ )
+```
+
+### Device Selection
+
+```python
+def select_best_device(
+ all_devices: Dict[str, AgentProfile],
+ required_capabilities: List[str],
+ prefer_high_performance: bool = True
+) -> Optional[str]:
+ """
+ Select the best available device for a task.
+ """
+ candidates = []
+
+ for device_id, profile in all_devices.items():
+ # Must be idle
+ if profile.status != DeviceStatus.IDLE:
+ continue
+
+ # Must have required capabilities
+ device_caps = set(profile.capabilities)
+ if not set(required_capabilities).issubset(device_caps):
+ continue
+
+ # Calculate score
+ score = 0
+ if profile.metadata.get("performance") == "very_high":
+ score += 10
+ elif profile.metadata.get("performance") == "high":
+ score += 5
+
+ # Prefer devices with more memory
+ system_info = profile.metadata.get("system_info", {})
+ score += system_info.get("memory_total_gb", 0) / 10
+
+ candidates.append((device_id, score))
+
+ if not candidates:
+ return None
+
+ # Sort by score (descending)
+ candidates.sort(key=lambda x: x[1], reverse=True)
+ return candidates[0][0]
+
+# Usage
+all_devices = registry.get_all_devices(connected=True)
+best_device = select_best_device(
+ all_devices,
+ required_capabilities=["gpu_computation", "model_training"],
+ prefer_high_performance=True
+)
+print(f"Selected device: {best_device}")
+```
+
+### Health Monitoring
+
+```python
+from datetime import datetime, timezone, timedelta
+
+def check_device_health(profile: AgentProfile) -> Dict[str, Any]:
+ """
+ Check device health based on profile data.
+ """
+ health = {
+ "device_id": profile.device_id,
+ "healthy": True,
+ "warnings": [],
+ "errors": []
+ }
+
+ # Check heartbeat freshness
+ if profile.last_heartbeat:
+ age = datetime.now(timezone.utc) - profile.last_heartbeat
+ if age > timedelta(minutes=5):
+ health["warnings"].append(
+ f"No heartbeat for {age.total_seconds():.0f} seconds"
+ )
+ if age > timedelta(minutes=10):
+ health["errors"].append("Heartbeat timeout")
+ health["healthy"] = False
+
+ # Check connection attempts
+ if profile.connection_attempts > profile.max_retries / 2:
+ health["warnings"].append(
+ f"High connection attempts: {profile.connection_attempts}/{profile.max_retries}"
+ )
+
+ # Check if device is stuck in BUSY state
+ if profile.status == DeviceStatus.BUSY and profile.current_task_id:
+ # Would need to check task age here
+ health["warnings"].append(f"Device busy with task {profile.current_task_id}")
+
+ return health
+
+# Usage
+profile = registry.get_device("windowsagent")
+health = check_device_health(profile)
+print(f"Health: {health['healthy']}")
+print(f"Warnings: {health['warnings']}")
+print(f"Errors: {health['errors']}")
+```
+
+---
+
+## 🔗 Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Overview** | [Agent Registration Overview](./overview.md) | Registration architecture and process |
+| **Registration Flow** | [Registration Flow](./registration_flow.md) | Step-by-step registration process |
+| **Device Registry** | [Device Registry](./device_registry.md) | Registry component implementation |
+| **Galaxy Devices Config** | [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md) | YAML configuration reference |
+| **Device Info** | [Device Info Provider](../../client/device_info.md) | Telemetry collection details |
+| **AIP Protocol** | [AIP Overview](../../aip/overview.md) | Agent Interaction Protocol |
+| **Constellation System** | [Constellation Overview](../constellation/overview.md) | Multi-device coordination |
+| **WebSocket Client** | [Client AIP Integration](../client/aip_integration.md) | Client-side implementation |
+
+---
+
+## 💡 Best Practices
+
+### 1. Meaningful Capabilities
+
+```python
+# ✅ Good: Specific, actionable capabilities
+capabilities = ["web_browsing", "office_excel", "file_management", "email_sending"]
+
+# ❌ Bad: Vague capabilities
+capabilities = ["desktop", "general"]
+```
+
+### 2. Rich Metadata
+
+```python
+# ✅ Good: Comprehensive metadata for smart routing
+metadata = {
+ "location": "datacenter_us_west",
+ "performance": "high",
+ "description": "GPU workstation for ML training",
+ "tags": ["production", "gpu", "ml"],
+ "operation_engineer_email": "ml-team@example.com"
+}
+```
+
+### 3. Monitor Heartbeats
+
+```python
+# Regularly check heartbeat freshness
+if profile.last_heartbeat:
+ age = datetime.now(timezone.utc) - profile.last_heartbeat
+ if age > timedelta(minutes=5):
+ logger.warning(f"Device {profile.device_id} heartbeat stale")
+```
+
+### 4. Use System Info for Resource-Aware Routing
+
+```python
+# Check if device has enough resources
+system_info = profile.metadata.get("system_info", {})
+if system_info.get("memory_total_gb", 0) >= 16:
+ # Assign memory-intensive task
+ pass
+```
+
+---
+
+## 🚀 Next Steps
+
+1. **Learn Registration Process**: Read [Registration Flow](./registration_flow.md)
+2. **Configure Devices**: See [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md)
+3. **Understand DeviceRegistry**: Check [Device Registry](./device_registry.md)
+4. **Study Telemetry**: Read [Device Info Provider](../../client/device_info.md)
+
+---
+
+## 📚 Source Code References
+
+- **AgentProfile Definition**: `galaxy/client/components/types.py`
+- **DeviceRegistry**: `galaxy/client/components/device_registry.py`
+- **ConstellationDeviceManager**: `galaxy/client/device_manager.py`
+- **DeviceInfoProvider**: `ufo/client/device_info_provider.py`
diff --git a/documents/docs/galaxy/agent_registration/device_registry.md b/documents/docs/galaxy/agent_registration/device_registry.md
new file mode 100644
index 000000000..04d7ab300
--- /dev/null
+++ b/documents/docs/galaxy/agent_registration/device_registry.md
@@ -0,0 +1,976 @@
+# 🗄️ DeviceRegistry - Device Data Management
+
+## 📋 Overview
+
+The **DeviceRegistry** is a focused component that manages device registration and information storage, providing a clean separation of concerns in the constellation architecture. It is responsible for **device data management only** - storing, retrieving, and updating AgentProfile instances without handling networking, task execution, or protocol logic.
+
+> For details on how devices connect and register using the AIP protocol, see [Registration Flow](./registration_flow.md).
+
+**Core Responsibilities:**
+
+| Responsibility | Description |
+|----------------|-------------|
+| **Registration** | Create and store AgentProfile instances |
+| **Status Tracking** | Update device connection and operational states |
+| **Metadata Management** | Store and update device metadata from all sources |
+| **Information Retrieval** | Provide device information to other components |
+| **Task State Tracking** | Track which device is executing which task |
+
+**Delegation to Other Components:**
+
+- Network communication → [`WebSocketConnectionManager`](../client/components.md#websocketconnectionmanager-network-communication-handler)
+- Message processing → [`MessageProcessor`](../client/components.md#messageprocessor-message-router-and-handler)
+- Task execution → [`TaskQueueManager`](../client/components.md#taskqueuemanager-task-scheduling-and-queuing)
+- Heartbeat monitoring → [`HeartbeatManager`](../client/components.md#heartbeatmanager-connection-health-monitor)
+
+## 🏗️ Architecture
+
+### Class Structure
+
+```mermaid
+classDiagram
+ class DeviceRegistry {
+ -Dict~str, AgentProfile~ _devices
+ -Dict~str, Dict~ _device_capabilities
+ -Logger logger
+
+ +register_device(device_id, server_url, ...) AgentProfile
+ +get_device(device_id) Optional~AgentProfile~
+ +get_all_devices(connected) Dict~str, AgentProfile~
+ +update_device_status(device_id, status)
+ +set_device_busy(device_id, task_id)
+ +set_device_idle(device_id)
+ +is_device_busy(device_id) bool
+ +get_current_task(device_id) Optional~str~
+ +increment_connection_attempts(device_id) int
+ +reset_connection_attempts(device_id)
+ +update_heartbeat(device_id)
+ +update_device_system_info(device_id, system_info) bool
+ +get_device_system_info(device_id) Optional~Dict~
+ +get_device_capabilities(device_id) Dict
+ +get_connected_devices() List~str~
+ +is_device_registered(device_id) bool
+ +remove_device(device_id) bool
+ }
+
+ class AgentProfile {
+ +str device_id
+ +str server_url
+ +Optional~str~ os
+ +List~str~ capabilities
+ +Dict metadata
+ +DeviceStatus status
+ +Optional~datetime~ last_heartbeat
+ +int connection_attempts
+ +int max_retries
+ +Optional~str~ current_task_id
+ }
+
+ class DeviceStatus {
+ <>
+ DISCONNECTED
+ CONNECTING
+ CONNECTED
+ REGISTERING
+ IDLE
+ BUSY
+ FAILED
+ }
+
+ DeviceRegistry "1" --> "*" AgentProfile : manages
+ AgentProfile --> DeviceStatus : has
+```
+
+### Internal Storage
+
+```python
+class DeviceRegistry:
+ def __init__(self):
+ # Primary storage: device_id -> AgentProfile
+ self._devices: Dict[str, AgentProfile] = {}
+
+ # Secondary storage: device_id -> capabilities dict
+ # (Legacy, mostly superseded by AgentProfile.capabilities)
+ self._device_capabilities: Dict[str, Dict[str, Any]] = {}
+
+ self.logger = logging.getLogger(f"{__name__}.DeviceRegistry")
+```
+
+**Storage Structure:**
+
+```python
+# Internal state example
+_devices = {
+ "windowsagent": AgentProfile(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["gui", "browser", "office"],
+ metadata={...},
+ status=DeviceStatus.IDLE,
+ ...
+ ),
+ "linux_server_01": AgentProfile(
+ device_id="linux_server_01",
+ server_url="ws://10.0.0.50:5001/ws",
+ os="linux",
+ capabilities=["cli", "server"],
+ metadata={...},
+ status=DeviceStatus.BUSY,
+ current_task_id="task_123"
+ )
+}
+```
+
+---
+
+## 🔧 Core Operations
+
+### 1. Device Registration
+
+#### Method: `register_device()`
+
+```python
+def register_device(
+ self,
+ device_id: str,
+ server_url: str,
+ os: Optional[str] = None,
+ capabilities: Optional[List[str]] = None,
+ metadata: Optional[Dict[str, Any]] = None,
+ max_retries: int = 5,
+) -> AgentProfile:
+ """
+ Register a new device.
+
+ :param device_id: Unique device identifier
+ :param server_url: UFO WebSocket server URL
+ :param os: Operating system type
+ :param capabilities: Device capabilities
+ :param metadata: Additional metadata
+ :param max_retries: Maximum connection retry attempts
+ :return: Created AgentProfile object
+ """
+```
+
+**Process:**
+
+```mermaid
+sequenceDiagram
+ participant Caller
+ participant Registry as DeviceRegistry
+ participant Profile as AgentProfile
+
+ Caller->>Registry: register_device(device_id, server_url, ...)
+
+ Registry->>Profile: Create AgentProfile
+ Note over Profile: device_id, server_url os, capabilities metadata, max_retries status=DISCONNECTED
+
+ Registry->>Registry: Store in _devices[device_id]
+ Registry->>Registry: Log registration
+
+ Registry-->>Caller: Return AgentProfile
+```
+
+**Example:**
+
+```python
+registry = DeviceRegistry()
+
+# Register device
+profile = registry.register_device(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office_applications"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high"
+ },
+ max_retries=5
+)
+
+print(f"Registered: {profile.device_id}")
+print(f"Status: {profile.status.value}") # "disconnected"
+```
+
+> **Note:** The `register_device()` method will overwrite an existing device if the same `device_id` is used. Consider adding validation if duplicate prevention is needed.
+
+### 2. Device Retrieval
+
+#### Method: `get_device()`
+
+```python
+def get_device(self, device_id: str) -> Optional[AgentProfile]:
+ """Get device information by ID"""
+ return self._devices.get(device_id)
+```
+
+**Example:**
+
+```python
+profile = registry.get_device("windowsagent")
+
+if profile:
+ print(f"Device: {profile.device_id}")
+ print(f"Status: {profile.status.value}")
+ print(f"Capabilities: {profile.capabilities}")
+else:
+ print("Device not found")
+```
+
+#### Method: `get_all_devices()`
+
+```python
+def get_all_devices(self, connected: bool = False) -> Dict[str, AgentProfile]:
+ """
+ Get all registered devices
+ :param connected: If True, return only connected devices
+ :return: Dictionary of device_id to AgentProfile
+ """
+```
+
+**Example:**
+
+```python
+# Get all devices
+all_devices = registry.get_all_devices(connected=False)
+print(f"Total devices: {len(all_devices)}")
+
+# Get only connected devices
+connected_devices = registry.get_all_devices(connected=True)
+print(f"Connected devices: {len(connected_devices)}")
+
+for device_id, profile in connected_devices.items():
+ print(f" - {device_id}: {profile.status.value}")
+```
+
+**Connected Device Filter:**
+
+```python
+# Implementation detail
+if connected:
+ return {
+ device_id: device_info
+ for device_id, device_info in self._devices.items()
+ if device_info.status in [
+ DeviceStatus.CONNECTED,
+ DeviceStatus.IDLE,
+ DeviceStatus.BUSY
+ ]
+ }
+```
+
+#### Method: `get_connected_devices()`
+
+```python
+def get_connected_devices(self) -> List[str]:
+ """Get list of connected device IDs"""
+ return [
+ device_id
+ for device_id, device_info in self._devices.items()
+ if device_info.status == DeviceStatus.CONNECTED
+ ]
+```
+
+**Example:**
+
+```python
+connected = registry.get_connected_devices()
+print(f"Connected: {connected}")
+# ['windowsagent', 'linux_server_01']
+```
+
+---
+
+### 3. Status Management
+
+#### Method: `update_device_status()`
+
+```python
+def update_device_status(self, device_id: str, status: DeviceStatus) -> None:
+ """Update device connection status"""
+ if device_id in self._devices:
+ self._devices[device_id].status = status
+```
+
+**Example:**
+
+```python
+# Update status progression
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTING)
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTED)
+registry.update_device_status("windowsagent", DeviceStatus.IDLE)
+```
+
+**Status Lifecycle:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> DISCONNECTED: register_device()
+
+ DISCONNECTED --> CONNECTING: update_device_status()
+ CONNECTING --> CONNECTED: update_device_status()
+ CONNECTING --> FAILED: update_device_status()
+
+ CONNECTED --> REGISTERING: update_device_status()
+ REGISTERING --> IDLE: update_device_status()
+ REGISTERING --> FAILED: update_device_status()
+
+ IDLE --> BUSY: set_device_busy()
+ BUSY --> IDLE: set_device_idle()
+
+ IDLE --> DISCONNECTED: update_device_status()
+ BUSY --> DISCONNECTED: update_device_status()
+
+ FAILED --> CONNECTING: update_device_status()
+
+ DISCONNECTED --> [*]: remove_device()
+```
+
+---
+
+### 4. Task State Management
+
+#### Method: `set_device_busy()`
+
+```python
+def set_device_busy(self, device_id: str, task_id: str) -> None:
+ """
+ Set device to BUSY status and track current task.
+
+ :param device_id: Device ID
+ :param task_id: Task ID being executed
+ """
+ if device_id in self._devices:
+ self._devices[device_id].status = DeviceStatus.BUSY
+ self._devices[device_id].current_task_id = task_id
+ self.logger.info(f"🔄 Device {device_id} set to BUSY (task: {task_id})")
+```
+
+**Example:**
+
+```python
+# Assign task to device
+registry.set_device_busy("windowsagent", task_id="task_12345")
+
+profile = registry.get_device("windowsagent")
+print(f"Status: {profile.status.value}") # "busy"
+print(f"Current Task: {profile.current_task_id}") # "task_12345"
+```
+
+#### Method: `set_device_idle()`
+
+```python
+def set_device_idle(self, device_id: str) -> None:
+ """
+ Set device to IDLE status and clear current task.
+
+ :param device_id: Device ID
+ """
+ if device_id in self._devices:
+ self._devices[device_id].status = DeviceStatus.IDLE
+ self._devices[device_id].current_task_id = None
+ self.logger.info(f"✅ Device {device_id} set to IDLE")
+```
+
+**Example:**
+
+```python
+# Task completes
+registry.set_device_idle("windowsagent")
+
+profile = registry.get_device("windowsagent")
+print(f"Status: {profile.status.value}") # "idle"
+print(f"Current Task: {profile.current_task_id}") # None
+```
+
+#### Method: `is_device_busy()`
+
+```python
+def is_device_busy(self, device_id: str) -> bool:
+ """
+ Check if device is currently busy.
+
+ :param device_id: Device ID
+ :return: True if device is busy
+ """
+ if device_id in self._devices:
+ return self._devices[device_id].status == DeviceStatus.BUSY
+ return False
+```
+
+**Example:**
+
+```python
+if registry.is_device_busy("windowsagent"):
+ print("Device is busy, task will be queued")
+else:
+ print("Device is available")
+```
+
+#### Method: `get_current_task()`
+
+```python
+def get_current_task(self, device_id: str) -> Optional[str]:
+ """
+ Get the current task ID being executed on device.
+
+ :param device_id: Device ID
+ :return: Current task ID or None
+ """
+ if device_id in self._devices:
+ return self._devices[device_id].current_task_id
+ return None
+```
+
+**Example:**
+
+```python
+task_id = registry.get_current_task("windowsagent")
+if task_id:
+ print(f"Device executing: {task_id}")
+else:
+ print("Device idle")
+```
+
+---
+
+### 5. Connection Management
+
+#### Method: `increment_connection_attempts()`
+
+```python
+def increment_connection_attempts(self, device_id: str) -> int:
+ """Increment connection attempts counter"""
+ if device_id in self._devices:
+ self._devices[device_id].connection_attempts += 1
+ return self._devices[device_id].connection_attempts
+ return 0
+```
+
+**Example:**
+
+```python
+attempts = registry.increment_connection_attempts("windowsagent")
+print(f"Attempts: {attempts}")
+
+profile = registry.get_device("windowsagent")
+if profile.connection_attempts >= profile.max_retries:
+ print("Max retries reached, giving up")
+```
+
+#### Method: `reset_connection_attempts()`
+
+```python
+def reset_connection_attempts(self, device_id: str) -> None:
+ """Reset connection attempts counter to 0"""
+ if device_id in self._devices:
+ self._devices[device_id].connection_attempts = 0
+ self.logger.info(f"🔄 Reset connection attempts for device {device_id}")
+```
+
+**Example:**
+
+```python
+# After successful connection
+registry.reset_connection_attempts("windowsagent")
+
+profile = registry.get_device("windowsagent")
+print(f"Attempts: {profile.connection_attempts}") # 0
+```
+
+---
+
+### 6. Heartbeat Tracking
+
+#### Method: `update_heartbeat()`
+
+```python
+def update_heartbeat(self, device_id: str) -> None:
+ """Update last heartbeat timestamp"""
+ if device_id in self._devices:
+ self._devices[device_id].last_heartbeat = datetime.now(timezone.utc)
+```
+
+**Example:**
+
+```python
+from datetime import datetime, timezone, timedelta
+
+# Update heartbeat
+registry.update_heartbeat("windowsagent")
+
+profile = registry.get_device("windowsagent")
+print(f"Last heartbeat: {profile.last_heartbeat}")
+
+# Check heartbeat freshness
+age = datetime.now(timezone.utc) - profile.last_heartbeat
+if age > timedelta(minutes=5):
+ print("⚠️ Heartbeat stale!")
+```
+
+---
+
+### 7. System Information Management
+
+#### Method: `update_device_system_info()`
+
+```python
+def update_device_system_info(
+ self, device_id: str, system_info: Dict[str, Any]
+) -> bool:
+ """
+ Update AgentProfile with system information retrieved from server.
+
+ This method updates the device's OS, capabilities, and metadata with
+ the system information that was automatically collected by the device
+ and stored on the server.
+
+ :param device_id: Device ID
+ :param system_info: System information dictionary from server
+ :return: True if update successful, False if device not found
+ """
+```
+
+> **Note:** System information is collected from the device agent and retrieved via the server. See [Client Connection Manager](../../server/client_connection_manager.md) for server-side information management.
+
+**Process:**
+
+```mermaid
+sequenceDiagram
+ participant Caller
+ participant Registry as DeviceRegistry
+ participant Profile as AgentProfile
+
+ Caller->>Registry: update_device_system_info(device_id, system_info)
+
+ Registry->>Profile: Get device
+
+ alt Device exists
+ Registry->>Profile: Update os = system_info["platform"]
+ Registry->>Profile: Merge supported_features into capabilities
+ Registry->>Profile: Add system_info to metadata
+ Registry->>Profile: Add custom_metadata if present
+ Registry->>Registry: Log update
+ Registry-->>Caller: True
+ else Device not found
+ Registry->>Registry: Log warning
+ Registry-->>Caller: False
+ end
+```
+
+**Implementation:**
+
+```python
+device_info = self.get_device(device_id)
+if not device_info:
+ self.logger.warning(f"Cannot update system info: device {device_id} not found")
+ return False
+
+# 1. Update OS information
+if "platform" in system_info:
+ device_info.os = system_info["platform"]
+
+# 2. Merge capabilities with supported features (avoid duplicates)
+if "supported_features" in system_info:
+ features = system_info["supported_features"]
+ existing_caps = set(device_info.capabilities)
+ new_caps = existing_caps.union(set(features))
+ device_info.capabilities = list(new_caps)
+
+# 3. Update metadata with system information
+device_info.metadata.update({
+ "system_info": {
+ "platform": system_info.get("platform"),
+ "os_version": system_info.get("os_version"),
+ "cpu_count": system_info.get("cpu_count"),
+ "memory_total_gb": system_info.get("memory_total_gb"),
+ "hostname": system_info.get("hostname"),
+ "ip_address": system_info.get("ip_address"),
+ "platform_type": system_info.get("platform_type"),
+ "schema_version": system_info.get("schema_version"),
+ }
+})
+
+# 4. Add custom metadata if present
+if "custom_metadata" in system_info:
+ device_info.metadata["custom_metadata"] = system_info["custom_metadata"]
+
+# 5. Add tags if present
+if "tags" in system_info:
+ device_info.metadata["tags"] = system_info["tags"]
+
+return True
+```
+
+**Example:**
+
+```python
+system_info = {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "supported_features": ["gui", "cli", "browser", "file_system", "office"],
+ "platform_type": "computer",
+ "schema_version": "1.0"
+}
+
+success = registry.update_device_system_info("windowsagent", system_info)
+
+if success:
+ profile = registry.get_device("windowsagent")
+ print(f"OS: {profile.os}") # "windows"
+ print(f"CPU: {profile.metadata['system_info']['cpu_count']}") # 16
+ print(f"Memory: {profile.metadata['system_info']['memory_total_gb']} GB") # 32.0
+```
+
+#### Method: `get_device_system_info()`
+
+```python
+def get_device_system_info(self, device_id: str) -> Optional[Dict[str, Any]]:
+ """
+ Get device system information (hardware, OS, features).
+
+ :param device_id: Device ID
+ :return: System information dictionary or None if not available
+ """
+ device_info = self.get_device(device_id)
+ if not device_info:
+ return None
+
+ return device_info.metadata.get("system_info")
+```
+
+**Example:**
+
+```python
+system_info = registry.get_device_system_info("windowsagent")
+
+if system_info:
+ print(f"Platform: {system_info['platform']}")
+ print(f"CPU Cores: {system_info['cpu_count']}")
+ print(f"Memory: {system_info['memory_total_gb']} GB")
+ print(f"Hostname: {system_info['hostname']}")
+else:
+ print("System info not available")
+```
+
+---
+
+### 8. Capabilities Management
+
+#### Method: `set_device_capabilities()`
+
+```python
+def set_device_capabilities(
+ self, device_id: str, capabilities: Dict[str, Any]
+) -> None:
+ """Store device capabilities information"""
+ self._device_capabilities[device_id] = capabilities
+
+ # Also update device info with capabilities
+ if device_id in self._devices:
+ device_info = self._devices[device_id]
+ if "capabilities" in capabilities:
+ device_info.capabilities.extend(capabilities["capabilities"])
+ if "metadata" in capabilities:
+ device_info.metadata.update(capabilities["metadata"])
+```
+
+> **Note:** This method is primarily for backwards compatibility. Modern code should use `update_device_system_info()` instead.
+
+#### Method: `get_device_capabilities()`
+
+```python
+def get_device_capabilities(self, device_id: str) -> Dict[str, Any]:
+ """Get device capabilities"""
+ return self._device_capabilities.get(device_id, {})
+```
+
+---
+
+### 9. Utility Methods
+
+#### Method: `is_device_registered()`
+
+```python
+def is_device_registered(self, device_id: str) -> bool:
+ """Check if device is registered"""
+ return device_id in self._devices
+```
+
+**Example:**
+
+```python
+if registry.is_device_registered("windowsagent"):
+ print("Device exists")
+else:
+ print("Device not registered")
+```
+
+#### Method: `remove_device()`
+
+```python
+def remove_device(self, device_id: str) -> bool:
+ """Remove a device from registry"""
+ if device_id in self._devices:
+ del self._devices[device_id]
+ self._device_capabilities.pop(device_id, None)
+ return True
+ return False
+```
+
+**Example:**
+
+```python
+success = registry.remove_device("windowsagent")
+if success:
+ print("Device removed")
+else:
+ print("Device not found")
+```
+
+---
+
+## 💡 Usage Patterns
+
+### Pattern 1: Complete Registration Flow
+
+```python
+from galaxy.client.components import DeviceRegistry, DeviceStatus
+
+registry = DeviceRegistry()
+
+# 1. Register device
+profile = registry.register_device(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing"],
+ metadata={"location": "office"},
+ max_retries=5
+)
+
+# 2. Update status through connection process
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTING)
+registry.increment_connection_attempts("windowsagent")
+registry.update_device_status("windowsagent", DeviceStatus.CONNECTED)
+registry.reset_connection_attempts("windowsagent")
+
+# 3. Update with system info
+system_info = {
+ "platform": "windows",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "supported_features": ["gui", "cli", "browser"]
+}
+registry.update_device_system_info("windowsagent", system_info)
+
+# 4. Set to IDLE (ready for tasks)
+registry.set_device_idle("windowsagent")
+
+# 5. Update heartbeat
+registry.update_heartbeat("windowsagent")
+```
+
+### Pattern 2: Task Assignment
+
+```python
+# Check if device can accept task
+if not registry.is_device_busy("windowsagent"):
+ # Assign task
+ registry.set_device_busy("windowsagent", task_id="task_123")
+
+ # ... execute task ...
+
+ # Task complete
+ registry.set_device_idle("windowsagent")
+else:
+ print("Device busy, task queued")
+```
+
+### Pattern 3: Device Selection
+
+```python
+def find_available_device_with_capability(
+ registry: DeviceRegistry,
+ required_capability: str
+) -> Optional[str]:
+ """Find an idle device with specific capability."""
+
+ all_devices = registry.get_all_devices(connected=True)
+
+ for device_id, profile in all_devices.items():
+ # Check if idle
+ if profile.status != DeviceStatus.IDLE:
+ continue
+
+ # Check capability
+ if required_capability in profile.capabilities:
+ return device_id
+
+ return None
+
+# Usage
+device_id = find_available_device_with_capability(registry, "browser")
+if device_id:
+ print(f"Selected: {device_id}")
+```
+
+### Pattern 4: Health Monitoring
+
+```python
+from datetime import datetime, timezone, timedelta
+
+def check_all_devices_health(registry: DeviceRegistry):
+ """Check health of all registered devices."""
+
+ all_devices = registry.get_all_devices()
+
+ for device_id, profile in all_devices.items():
+ print(f"\n{device_id}:")
+ print(f" Status: {profile.status.value}")
+
+ # Check heartbeat
+ if profile.last_heartbeat:
+ age = datetime.now(timezone.utc) - profile.last_heartbeat
+ print(f" Heartbeat age: {age.total_seconds():.0f}s")
+
+ if age > timedelta(minutes=5):
+ print(f" ⚠️ WARNING: Stale heartbeat!")
+ else:
+ print(f" ⚠️ WARNING: No heartbeat recorded")
+
+ # Check connection attempts
+ if profile.connection_attempts > 0:
+ print(f" Connection attempts: {profile.connection_attempts}/{profile.max_retries}")
+
+ # Check task status
+ if profile.current_task_id:
+ print(f" Current task: {profile.current_task_id}")
+```
+
+---
+
+## 🔗 Integration with Other Components
+
+DeviceRegistry is used internally by other components in the constellation system. See [Components Overview](../client/components.md) for details on the component architecture.
+
+### With ConstellationDeviceManager
+
+```python
+# ConstellationDeviceManager uses DeviceRegistry internally
+
+class ConstellationDeviceManager:
+ def __init__(self, ...):
+ self.device_registry = DeviceRegistry() # Internal registry
+
+ async def register_device(self, ...):
+ # Delegate to registry
+ self.device_registry.register_device(...)
+
+ def get_device_info(self, device_id: str):
+ # Delegate to registry
+ return self.device_registry.get_device(device_id)
+```
+
+### With MessageProcessor
+
+```python
+# MessageProcessor updates registry when messages arrive
+
+class MessageProcessor:
+ def __init__(self, device_registry: DeviceRegistry, ...):
+ self.device_registry = device_registry
+
+ async def handle_heartbeat(self, device_id: str):
+ # Update heartbeat in registry
+ self.device_registry.update_heartbeat(device_id)
+```
+
+### With TaskQueueManager
+
+```python
+# TaskQueueManager checks device status via registry
+
+class TaskQueueManager:
+ def can_assign_task(self, device_id: str) -> bool:
+ # Check if device is busy
+ return not self.device_registry.is_device_busy(device_id)
+```
+
+---
+
+## 🔗 Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Overview** | [Agent Registration Overview](./overview.md) | Registration architecture |
+| **AgentProfile** | [AgentProfile](./agent_profile.md) | Profile structure details |
+| **Registration Flow** | [Registration Flow](./registration_flow.md) | Step-by-step registration |
+| **Galaxy Devices Config** | [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md) | YAML config reference |
+| **Components** | [Client Components](../client/components.md) | Component architecture |
+
+---
+
+## 💡 Best Practices
+
+**1. Always Check Device Exists**
+
+```python
+profile = registry.get_device(device_id)
+if not profile:
+ logger.error(f"Device {device_id} not found")
+ return
+```
+
+**2. Use Defensive Copies for Lists/Dicts**
+
+```python
+# Registry already creates copies, but be aware
+capabilities = ["web", "office"]
+registry.register_device(..., capabilities=capabilities)
+# Modifying original list won't affect registry
+capabilities.append("new") # Safe
+```
+
+**3. Monitor Heartbeats Regularly**
+
+```python
+# Periodic check
+for device_id in registry.get_all_devices():
+ profile = registry.get_device(device_id)
+ if profile.last_heartbeat:
+ age = datetime.now(timezone.utc) - profile.last_heartbeat
+ if age > timedelta(minutes=5):
+ logger.warning(f"Stale heartbeat: {device_id}")
+```
+
+**4. Clear Task State After Completion**
+
+```python
+# Always set to IDLE after task completes
+registry.set_device_idle(device_id)
+# This automatically clears current_task_id
+```
+
+---
+
+## 🚀 Next Steps
+
+1. **Understand AgentProfile**: Read [AgentProfile Documentation](./agent_profile.md)
+2. **Learn Configuration**: See [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md)
+3. **Study Registration**: Check [Registration Flow](./registration_flow.md)
+4. **Explore Components**: See ConstellationDeviceManager implementation
+
+---
+
+## 📚 Source Code Reference
+
+- **DeviceRegistry**: `galaxy/client/components/device_registry.py`
+- **AgentProfile**: `galaxy/client/components/types.py`
+- **ConstellationDeviceManager**: `galaxy/client/device_manager.py`
diff --git a/documents/docs/galaxy/agent_registration/overview.md b/documents/docs/galaxy/agent_registration/overview.md
new file mode 100644
index 000000000..fa4d1fda0
--- /dev/null
+++ b/documents/docs/galaxy/agent_registration/overview.md
@@ -0,0 +1,600 @@
+# 🌟 Agent Registration & Profiling - Overview
+
+**Agent Registration** is the cornerstone of the AIP (Agent Interaction Protocol) initialization process. It enables dynamic discovery, capability advertisement, and intelligent task allocation across distributed constellation agents.
+
+---
+
+## 📋 Introduction
+
+
+*An overview of the Constellation Agent architecture showing the registration and profiling system.*
+
+At the core of AIP's initialization process is the **ConstellationClient** (implemented as `ConstellationDeviceManager`), which maintains a global registry of active agents. Any device agent service that exposes a WebSocket endpoint and implements the AIP task dispatch and result-return protocol can be seamlessly integrated into UFO, providing remarkable **extensibility**.
+
+The multi-source profiling pipeline enables **transparent capability discovery** and **safe adaptation** to environmental drift without direct administrator intervention.
+
+For a complete understanding of the constellation system, see:
+
+- [Constellation Overview](../constellation/overview.md) - Multi-device coordination architecture
+- [Constellation Agent Overview](../constellation_agent/overview.md) - Agent behavior and patterns
+- [AIP Protocol Overview](../../aip/overview.md) - Message protocol details
+
+---
+
+## 🎯 Core Concepts
+
+### Agent Registry
+
+The agent registry is a centralized store that tracks all active constellation agents. Each registered agent is represented by an **AgentProfile** that consolidates comprehensive information about the agent's capabilities, system resources, and operational status.
+
+| Component | Responsibility | Location |
+|-----------|---------------|----------|
+| **ConstellationDeviceManager** | Central coordinator for device management | `galaxy/client/device_manager.py` |
+| **DeviceRegistry** | Device registration and information storage | `galaxy/client/components/device_registry.py` |
+| **AgentProfile** | Multi-source agent metadata representation | `galaxy/client/components/types.py` |
+| **ClientConnectionManager** | Server-side client connection tracking | `ufo/server/services/client_connection_manager.py` |
+
+### Multi-Source Profiling
+
+Each **AgentProfile** consolidates information from **three distinct sources**, creating a comprehensive and dynamically updated view of each agent.
+
+```mermaid
+graph TB
+ subgraph Sources
+ UC[User Config devices.yaml]
+ SM[AIP Registration Service Manifest]
+ CT[Device Telemetry DeviceInfoProvider]
+ end
+
+ UC -->|device_id, capabilities metadata| AP[AgentProfile]
+ SM -->|platform, client_type registration_time| AP
+ CT -->|system_info supported_features| AP
+
+ AP --> CR[ConstellationDeviceManager]
+ CR --> TA[Intelligent Task Routing]
+
+ style UC fill:#e1f5ff
+ style SM fill:#fff4e1
+ style CT fill:#e8f5e9
+ style AP fill:#f3e5f5
+```
+
+**Source Details:**
+
+| Source | Provider | Information Type | Update Frequency |
+|--------|----------|------------------|------------------|
+| **1. User Configuration** | Administrator (devices.yaml + constellation.yaml) | Endpoint identity, user preferences, capabilities | Static (config load) |
+| **2. Service Manifest** | Device Agent Service (AIP) | Client type, platform, registration metadata | On registration |
+| **3. Client Telemetry** | Device Client (DeviceInfoProvider) | Hardware specs, OS info, network status | On connection + periodic updates |
+
+**Note:** While constellation.yaml contains runtime settings like heartbeat intervals, the device-specific configuration is in devices.yaml.
+
+---
+
+## 🔄 Registration Flow
+
+
+*Agent registration flow: multi-source AgentProfile construction and registration.*
+
+### Registration Process Overview
+
+The registration process follows a well-defined sequence that ensures comprehensive profiling and validation:
+
+```mermaid
+sequenceDiagram
+ participant Admin as Administrator
+ participant CDM as ConstellationDeviceManager
+ participant Server as UFO Server
+ participant DIP as DeviceInfoProvider
+
+ Note over Admin,DIP: Phase 1: User Configuration
+ Admin->>CDM: register_device(device_id, capabilities)
+ CDM->>CDM: Create AgentProfile
+
+ Note over Admin,DIP: Phase 2: WebSocket Connection
+ CDM->>Server: connect_device()
+ Server-->>CDM: Connection established
+
+ Note over Admin,DIP: Phase 3: Service Registration
+ CDM->>Server: REGISTER message
+ Server-->>CDM: Registration confirmed
+
+ Note over Admin,DIP: Phase 4: Telemetry Collection
+ CDM->>Server: request_device_info()
+ Server->>DIP: collect_system_info()
+ DIP-->>Server: system_info
+ Server-->>CDM: system_info
+ CDM->>CDM: Merge into AgentProfile
+
+ Note over Admin,DIP: Phase 5: Ready for Tasks
+ CDM->>CDM: Set device to IDLE
+```
+
+**Registration Phases:**
+
+| Phase | Description | Components Involved | Result |
+|-------|-------------|---------------------|--------|
+| **1. User Configuration** | Administrator registers device with endpoint and capabilities | ConstellationDeviceManager, DeviceRegistry | AgentProfile created with user-specified data |
+| **2. WebSocket Connection** | Establish persistent connection to device agent server | WebSocketConnectionManager | Active WebSocket channel |
+| **3. Service Registration** | AIP registration protocol exchange with capability advertisement | RegistrationProtocol, UFOWebSocketHandler | Client type and platform recorded |
+| **4. Telemetry Collection** | Retrieve runtime system information from device | DeviceInfoProvider, DeviceInfoProtocol | Hardware, OS, and feature data merged |
+| **5. Activation** | Set device to IDLE state, ready for task assignment | DeviceRegistry | Agent ready for constellation tasks |
+
+Devices can be registered with `auto_connect=True` to automatically establish connection, or `auto_connect=False` to require manual connection via `connect_device()`.
+
+---
+
+## 📊 AgentProfile Structure
+
+The **AgentProfile** is the primary data structure representing a registered constellation agent. For detailed information about the AgentProfile and its lifecycle operations, see [Agent Profile Documentation](./agent_profile.md).
+
+### Core Fields
+
+The **AgentProfile** is the primary data structure representing a registered constellation agent:
+
+```python
+@dataclass
+class AgentProfile:
+ """Device information and capabilities"""
+
+ # Identity
+ device_id: str # Unique device identifier
+ server_url: str # WebSocket endpoint URL
+
+ # Platform & Capabilities
+ os: Optional[str] = None # Operating system (windows, linux, darwin)
+ capabilities: List[str] # Advertised capabilities/features
+ metadata: Dict[str, Any] # Additional metadata
+
+ # Operational Status
+ status: DeviceStatus # Current connection/operational status
+ last_heartbeat: Optional[datetime] # Last heartbeat timestamp
+
+ # Connection Management
+ connection_attempts: int = 0 # Connection retry counter
+ max_retries: int = 5 # Maximum retry attempts
+
+ # Task Execution
+ current_task_id: Optional[str] = None # Currently executing task ID
+```
+
+**Field Categories:**
+
+| Category | Fields | Purpose |
+|----------|--------|---------|
+| **Identity** | `device_id`, `server_url` | Unique identification and endpoint location |
+| **Platform** | `os`, `capabilities`, `metadata` | System type and advertised features |
+| **Status** | `status`, `last_heartbeat` | Real-time operational state tracking |
+| **Resilience** | `connection_attempts`, `max_retries` | Connection retry management |
+| **Execution** | `current_task_id` | Task assignment tracking |
+
+### Metadata Structure
+
+The `metadata` field is a flexible dictionary that aggregates information from all three sources:
+
+```python
+metadata = {
+ # From User Configuration (Source 1)
+ "location": "home_office",
+ "performance": "high",
+ "description": "Primary development laptop",
+ "operation_engineer_email": "admin@example.com",
+
+ # From Service Manifest (Source 2)
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00Z",
+
+ # From Client Telemetry (Source 3)
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ },
+ "custom_metadata": {
+ "datacenter": "us-west-2",
+ "tier": "production"
+ }
+}
+```
+
+For a complete example, see the [Agent Profile Documentation](./agent_profile.md#example-profiles).
+
+---
+
+## 🔄 Agent Lifecycle States
+
+
+*Lifecycle state transitions of the Constellation Agent.*
+
+The agent lifecycle is managed through a state machine that tracks connection, registration, and task execution states. For more details on agent behavior and state management, see [Constellation Agent State Management](../constellation_agent/state.md).
+
+### State Definitions
+
+```python
+class DeviceStatus(Enum):
+ """Device connection status"""
+ DISCONNECTED = "disconnected" # Not connected to server
+ CONNECTING = "connecting" # Attempting to establish connection
+ CONNECTED = "connected" # Connected, initializing
+ REGISTERING = "registering" # Performing registration handshake
+ IDLE = "idle" # Connected and ready for tasks
+ BUSY = "busy" # Executing a task
+ FAILED = "failed" # Connection/execution failed
+```
+
+### State Transition Diagram
+
+```mermaid
+stateDiagram-v2
+ [*] --> DISCONNECTED: Initial State
+
+ DISCONNECTED --> CONNECTING: register_device() / connect_device()
+ CONNECTING --> CONNECTED: WebSocket established
+ CONNECTING --> FAILED: Connection error
+
+ CONNECTED --> REGISTERING: Send REGISTER message
+ REGISTERING --> IDLE: Registration confirmed + system info collected
+ REGISTERING --> FAILED: Registration rejected
+
+ IDLE --> BUSY: assign_task_to_device()
+ BUSY --> IDLE: Task completed
+ BUSY --> FAILED: Task failed / device disconnected
+
+ FAILED --> CONNECTING: Automatic reconnection
+
+ IDLE --> DISCONNECTED: disconnect_device() / connection lost
+ BUSY --> DISCONNECTED: disconnect_device() / connection lost
+
+ DISCONNECTED --> [*]: shutdown()
+```
+
+**Transition Events:**
+
+| From State | To State | Trigger | Action |
+|------------|----------|---------|--------|
+| DISCONNECTED | CONNECTING | `connect_device()` | Initiate WebSocket connection |
+| CONNECTING | CONNECTED | WebSocket handshake complete | Update status |
+| CONNECTED | REGISTERING | Send REGISTER message | AIP registration protocol |
+| REGISTERING | IDLE | Registration confirmed | Collect system info, ready for tasks |
+| IDLE | BUSY | `assign_task_to_device()` | Execute task |
+| BUSY | IDLE | Task completes | Clear current_task_id |
+| Any | DISCONNECTED | Connection lost | Cleanup, schedule reconnection |
+| FAILED | CONNECTING | Retry timer | Attempt reconnection (if under max_retries) |
+
+**Important:** When a device disconnects or enters FAILED state, the system automatically schedules reconnection attempts up to `max_retries` times with `reconnect_delay` interval.
+
+---
+
+## 🛠️ Key Components
+
+### 1. ConstellationDeviceManager
+
+**File:** `galaxy/client/device_manager.py`
+
+The central coordinator for all device management operations in the constellation system.
+
+**Responsibilities:**
+
+- Device registration and lifecycle management
+- Connection establishment and monitoring
+- Task assignment and execution coordination
+- Automatic reconnection handling
+
+**Key Methods:**
+
+```python
+class ConstellationDeviceManager:
+ async def register_device(
+ device_id: str,
+ server_url: str,
+ os: str,
+ capabilities: List[str],
+ metadata: Dict[str, Any],
+ auto_connect: bool = True
+ ) -> bool
+
+ async def connect_device(device_id: str) -> bool
+
+ async def assign_task_to_device(
+ task_id: str,
+ device_id: str,
+ task_description: str,
+ task_data: Dict[str, Any]
+ ) -> ExecutionResult
+
+ def get_device_info(device_id: str) -> Optional[AgentProfile]
+```
+
+See [Device Registry Documentation](./device_registry.md) for DeviceRegistry details.
+
+### 2. DeviceRegistry
+
+**File:** `galaxy/client/components/device_registry.py`
+
+Manages device registration and information storage with a focus on data management.
+
+**Responsibilities:**
+
+- Store and retrieve AgentProfile instances
+- Update device status and metadata
+- Track connection attempts and heartbeats
+- Merge multi-source information
+
+**Key Methods:**
+
+```python
+class DeviceRegistry:
+ def register_device(...) -> AgentProfile
+ def update_device_status(device_id: str, status: DeviceStatus)
+ def update_device_system_info(device_id: str, system_info: Dict)
+ def set_device_busy(device_id: str, task_id: str)
+ def set_device_idle(device_id: str)
+```
+
+### 3. RegistrationProtocol (AIP)
+
+**File:** `aip/protocol/registration.py`
+
+Handles AIP registration message exchange for both device and constellation clients.
+
+**Responsibilities:**
+
+- Device agent registration
+- Constellation client registration
+- Capability advertisement
+- Registration validation and confirmation
+
+**Key Methods:**
+
+```python
+class RegistrationProtocol(AIPProtocol):
+ async def register_as_device(
+ device_id: str,
+ metadata: Dict[str, Any],
+ platform: str
+ ) -> bool
+
+ async def register_as_constellation(
+ constellation_id: str,
+ target_device: str,
+ metadata: Dict[str, Any]
+ ) -> bool
+
+ async def send_registration_confirmation()
+ async def send_registration_error(error: str)
+```
+
+See [AIP Protocol Documentation](../../aip/overview.md) for protocol details.
+
+### 4. DeviceInfoProvider
+
+**File:** `ufo/client/device_info_provider.py`
+
+Collects device system information (telemetry source).
+
+**Responsibilities:**
+
+- Auto-detect platform, OS, and hardware
+- Collect CPU, memory, network information
+- Detect supported features based on platform
+- Provide DeviceSystemInfo structure
+
+**Key Methods:**
+
+```python
+class DeviceInfoProvider:
+ @staticmethod
+ def collect_system_info(
+ client_id: str,
+ custom_metadata: Optional[Dict]
+ ) -> DeviceSystemInfo
+```
+
+See [Device Info Provider Documentation](../../client/device_info.md) for telemetry details.
+
+### 5. ClientConnectionManager (Server)
+
+**File:** `ufo/server/services/client_connection_manager.py`
+
+Server-side client connection tracking and management. For detailed information about the server-side implementation, see [Client Connection Manager Documentation](../../server/client_connection_manager.md).
+
+**Responsibilities:**
+
+- Track connected clients (devices and constellations)
+- Store device system information received during registration
+- Manage session-to-client mappings
+- Merge server configuration with client telemetry
+
+**Key Methods:**
+
+```python
+class ClientConnectionManager:
+ def add_client(
+ client_id: str,
+ platform: str,
+ ws: WebSocket,
+ client_type: ClientType,
+ metadata: Dict
+ )
+ def get_device_system_info(device_id: str) -> Optional[Dict]
+```
+
+---
+
+## 📝 Configuration
+
+Agent registration uses two configuration files:
+
+**1. `config/galaxy/devices.yaml`** - Device definitions:
+- Device endpoints and identities
+- User-specified capabilities and metadata
+- Connection parameters (max retries, auto-connect)
+
+**2. `config/galaxy/constellation.yaml`** - Runtime settings:
+- Constellation identification and logging
+- Heartbeat interval and reconnection delay
+- Task concurrency and step limits
+
+See [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md) and [Galaxy Constellation Configuration](../../configuration/system/galaxy_constellation.md) for details.
+
+**Example Device Configuration (devices.yaml):**
+
+```yaml
+# Device Configuration - YAML Format
+# Runtime settings are configured in constellation.yaml
+
+devices:
+ - device_id: "windowsagent"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities: ["web_browsing", "office_applications"]
+ metadata:
+ location: "office_desktop"
+ performance: "high"
+ max_retries: 5
+ auto_connect: true
+```
+
+For complete configuration schema, examples, and best practices, see:
+
+👉 **[Galaxy Devices Configuration Guide](../../configuration/system/galaxy_devices.md)**
+
+---
+
+## 🚀 Usage Example
+
+### Basic Registration
+
+```python
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Initialize manager
+manager = ConstellationDeviceManager(
+ task_name="test_constellation",
+ heartbeat_interval=30.0,
+ reconnect_delay=5.0
+)
+
+# Register and connect device
+success = await manager.register_device(
+ device_id="windows_workstation",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["gui", "browser", "office"],
+ metadata={
+ "location": "home_office",
+ "performance": "medium"
+ },
+ auto_connect=True # Automatically connect after registration
+)
+
+if success:
+ print("✅ Device registered and connected")
+
+ # Get device profile
+ profile = manager.get_device_info("windows_workstation")
+ print(f"Device: {profile.device_id}")
+ print(f"Status: {profile.status.value}")
+ print(f"Capabilities: {profile.capabilities}")
+ print(f"System Info: {profile.metadata.get('system_info')}")
+```
+
+### Task Assignment
+
+```python
+# Assign task to registered device
+result = await manager.assign_task_to_device(
+ task_id="task_001",
+ device_id="windows_workstation",
+ task_description="Open Excel and create a report",
+ task_data={"file_path": "C:\\Reports\\monthly.xlsx"},
+ timeout=300.0
+)
+
+print(f"Task Status: {result.status}")
+print(f"Result: {result.result}")
+```
+
+For more details on task assignment and execution, see:
+- [Registration Flow Documentation](./registration_flow.md) - Detailed examples
+- [Constellation Task Distribution](../constellation/overview.md) - Task routing strategies
+
+---
+
+## 🔗 Cross-References
+
+### Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Device Info Collection** | [Device Info Provider](../../client/device_info.md) | Client-side telemetry collection |
+| **AIP Protocol** | [AIP Overview](../../aip/overview.md) | Agent Interaction Protocol fundamentals |
+| **AIP Messages** | [AIP Messages](../../aip/messages.md) | Message structure and types |
+| **Agent Profile** | [Agent Profile](./agent_profile.md) | Detailed AgentProfile structure |
+| **Registration Flow** | [Registration Flow](./registration_flow.md) | Step-by-step registration process |
+| **Galaxy Devices Config** | [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md) | YAML configuration reference |
+| **Device Registry** | [Device Registry](./device_registry.md) | Registry component details |
+| **Constellation System** | [Constellation Overview](../constellation/overview.md) | Multi-device coordination |
+| **Client Connection Manager** | [Server Connection Manager](../../server/client_connection_manager.md) | Server-side connection tracking |
+
+### Architecture Diagrams
+
+- **Constellation Agent Overview**: `documents/docs/img/constellation_agent.png`
+- **Agent Registration Flow**: `documents/docs/img/agent_registry.png`
+- **Agent Lifecycle States**: `documents/docs/img/agent_state.png`
+
+---
+
+## 💡 Key Benefits
+
+The multi-source profiling approach provides several advantages:
+
+**1. Improved Task Allocation Accuracy**
+
+- Administrators specify high-level capabilities
+- Service manifests advertise supported tools
+- Telemetry provides real-time hardware status
+
+**2. Transparent Capability Discovery**
+
+- No manual system info entry required
+- Automatic feature detection based on platform
+- Dynamic updates without configuration changes
+
+**3. Safe Adaptation to Environmental Drift**
+
+- System changes (upgrades, hardware additions) automatically reflected
+- No administrator intervention needed for routine updates
+- Consistent metadata across distributed agents
+
+**4. Reliable Scheduling Decisions**
+
+- Fresh and accurate information for task routing
+- Hardware-aware task assignment (CPU/memory requirements)
+- Platform-specific capability matching
+
+---
+
+## 🎯 Next Steps
+
+1. **Understand AgentProfile in Detail**: Read [Agent Profile Documentation](./agent_profile.md)
+2. **Learn Registration Process**: Follow [Registration Flow](./registration_flow.md)
+3. **Configure Your Devices**: See [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md)
+4. **Explore Device Registry**: Check [Device Registry](./device_registry.md)
+5. **Study AIP Protocol**: Read [AIP Documentation](../../aip/overview.md)
+
+---
+
+## 📚 Additional Resources
+
+- **Source Code**: `galaxy/client/device_manager.py`
+- **AIP Protocol**: `aip/protocol/registration.py`
+- **Device Info**: `ufo/client/device_info_provider.py`
+- **Configuration**: `config/galaxy/devices.yaml`
+
+**Best Practice:** Always configure devices with meaningful metadata and capabilities to enable intelligent task routing. The system will automatically enhance this information with telemetry data.
diff --git a/documents/docs/galaxy/agent_registration/registration_flow.md b/documents/docs/galaxy/agent_registration/registration_flow.md
new file mode 100644
index 000000000..0352e8d0e
--- /dev/null
+++ b/documents/docs/galaxy/agent_registration/registration_flow.md
@@ -0,0 +1,938 @@
+# 🔄 Registration Flow - Complete Process Guide
+
+## 📋 Overview
+
+The registration flow transforms a device configuration entry into a fully profiled, connected, and task-ready constellation agent through a coordinated **5-phase process**:
+
+1. **Loads user configuration** from YAML
+2. **Establishes WebSocket connection** to device agent server
+3. **Performs AIP registration protocol** exchange
+4. **Collects client telemetry** data
+5. **Activates the agent** as task-ready
+
+See [Agent Registration Overview](./overview.md) for architecture context and [DeviceRegistry](./device_registry.md) for data management details.
+
+
+*Multi-source AgentProfile construction and registration flow.*
+
+## 🎯 Registration Phases
+
+### Phase Overview
+
+```mermaid
+graph TB
+ Start([Start]) --> P1[Phase 1: User Configuration]
+ P1 --> P2[Phase 2: WebSocket Connection]
+ P2 --> P3[Phase 3: Service Registration]
+ P3 --> P4[Phase 4: Telemetry Collection]
+ P4 --> P5[Phase 5: Agent Activation]
+ P5 --> End([Agent Ready])
+
+ P2 -->|Connection Failed| Retry{Retry < Max?}
+ Retry -->|Yes| P2
+ Retry -->|No| Failed([Failed])
+
+ P3 -->|Registration Rejected| Failed
+
+ style P1 fill:#e1f5ff
+ style P2 fill:#fff4e1
+ style P3 fill:#ffe1e1
+ style P4 fill:#e8f5e9
+ style P5 fill:#f3e5f5
+ style End fill:#c8e6c9
+ style Failed fill:#ffcdd2
+```
+
+| Phase | Duration | Can Fail? | Retry? | Result |
+|-------|----------|-----------|--------|--------|
+| **1. User Configuration** | < 1s | Yes | No | AgentProfile created |
+| **2. WebSocket Connection** | 1-5s | Yes | Yes (up to max_retries) | Active WebSocket |
+| **3. Service Registration** | 1-2s | Yes | No | Client type recorded |
+| **4. Telemetry Collection** | 1-3s | No (graceful degradation) | No | System info merged |
+| **5. Agent Activation** | < 1s | No | No | Status = IDLE |
+
+## 📝 Phase 1: User Configuration
+
+### Purpose
+
+Load device configuration from YAML file and create initial AgentProfile with user-specified data.
+
+### Input
+
+`config/galaxy/devices.yaml`:
+
+```yaml
+devices:
+ - device_id: "windowsagent"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "web_browsing"
+ - "office_applications"
+ - "file_management"
+ metadata:
+ location: "office_desktop"
+ performance: "high"
+ description: "Primary Windows workstation"
+ operation_engineer_email: "admin@example.com"
+ max_retries: 5
+ auto_connect: true
+```
+
+### Process
+
+```mermaid
+sequenceDiagram
+ participant YAML as devices.yaml
+ participant Manager as DeviceManager
+ participant Registry as DeviceRegistry
+
+ YAML->>Manager: Load configuration
+ Manager->>Manager: Parse YAML
+
+ loop For each device in config
+ Manager->>Registry: register_device(device_id, server_url, ...)
+ Registry->>Registry: Create AgentProfile
+ Registry->>Registry: Set status = DISCONNECTED
+ Registry-->>Manager: AgentProfile created
+ end
+
+ Manager->>Manager: Check auto_connect flag
+
+ alt auto_connect == true
+ Manager->>Manager: Schedule connect_device()
+ end
+```
+
+### Code Example
+
+```python
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Initialize manager
+manager = ConstellationDeviceManager(
+ task_name="production_constellation",
+ heartbeat_interval=30.0,
+ reconnect_delay=5.0
+)
+
+# Phase 1: Register device from configuration
+success = await manager.register_device(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office_applications", "file_management"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high",
+ "description": "Primary Windows workstation"
+ },
+ auto_connect=True # Proceed to Phase 2 automatically
+)
+```
+
+### Output
+
+**AgentProfile (Version 1):**
+
+```python
+AgentProfile(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office_applications", "file_management"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high",
+ "description": "Primary Windows workstation"
+ },
+ status=DeviceStatus.DISCONNECTED,
+ last_heartbeat=None,
+ connection_attempts=0,
+ max_retries=5,
+ current_task_id=None
+)
+```
+
+> **Phase 1 Complete:** Device registered in local registry with user-specified configuration. Status: `DISCONNECTED`
+
+## 🌐 Phase 2: WebSocket Connection
+
+### Purpose
+
+Establish a persistent WebSocket connection to the device agent's UFO server. This connection is managed by the `WebSocketConnectionManager` component.
+
+See [Client Components](../client/components.md) for component architecture details.
+
+### Process
+
+```mermaid
+sequenceDiagram
+ participant Manager as DeviceManager
+ participant Registry as DeviceRegistry
+ participant WSManager as WebSocketConnectionManager
+ participant Server as UFO Server
+ participant MsgProc as MessageProcessor
+ participant HB as HeartbeatManager
+
+ Manager->>Registry: update_device_status(CONNECTING)
+ Manager->>Registry: increment_connection_attempts()
+
+ Manager->>WSManager: connect_to_device(device_info, message_processor)
+ WSManager->>Server: WebSocket handshake (ws://...)
+
+ alt Connection Successful
+ Server-->>WSManager: Connection accepted
+ WSManager->>MsgProc: start_message_handler(device_id, websocket)
+ MsgProc->>MsgProc: Start listening for messages
+ WSManager-->>Manager: Connection established
+
+ Manager->>Registry: update_device_status(CONNECTED)
+ Manager->>Registry: update_heartbeat()
+ Manager->>HB: start_heartbeat(device_id)
+ HB->>HB: Start periodic heartbeat checks
+ else Connection Failed
+ Server-->>WSManager: Connection refused / timeout
+ WSManager-->>Manager: ConnectionError
+ Manager->>Registry: update_device_status(FAILED)
+ Manager->>Manager: schedule_reconnection()
+ end
+```
+
+### Connection Parameters
+
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| **URL** | `ws://host:port/ws` | WebSocket endpoint from configuration |
+| **Timeout** | 30 seconds | Connection timeout |
+| **Protocols** | WebSocket standard | No special sub-protocols |
+| **Headers** | None | Standard WebSocket headers |
+
+### Retry Strategy
+
+```python
+async def connect_device(self, device_id: str, is_reconnection: bool = False) -> bool:
+ """Connect to a registered device with retry logic."""
+
+ device_info = self.device_registry.get_device(device_id)
+
+ # Update status
+ self.device_registry.update_device_status(device_id, DeviceStatus.CONNECTING)
+
+ # Increment attempts (only for initial connection, not reconnections)
+ if not is_reconnection:
+ self.device_registry.increment_connection_attempts(device_id)
+
+ try:
+ # Establish WebSocket connection
+ await self.connection_manager.connect_to_device(
+ device_info,
+ message_processor=self.message_processor
+ )
+
+ # Success: Update status
+ self.device_registry.update_device_status(device_id, DeviceStatus.CONNECTED)
+ self.device_registry.update_heartbeat(device_id)
+
+ # Start heartbeat monitoring
+ self.heartbeat_manager.start_heartbeat(device_id)
+
+ return True
+
+ except (websockets.WebSocketException, OSError, asyncio.TimeoutError) as e:
+ self.logger.error(f"Connection failed: {e}")
+ self.device_registry.update_device_status(device_id, DeviceStatus.FAILED)
+
+ # Schedule reconnection if under retry limit
+ if device_info.connection_attempts < device_info.max_retries:
+ self._schedule_reconnection(device_id)
+
+ return False
+```
+
+### Reconnection Logic
+
+```mermaid
+graph TB
+ Disconnect[Connection Lost] --> CheckRetries{Attempts < Max?}
+
+ CheckRetries -->|Yes| Wait[Wait reconnect_delay seconds]
+ Wait --> Attempt[Attempt Reconnection]
+ Attempt --> Success{Success?}
+
+ Success -->|Yes| Connected[CONNECTED]
+ Success -->|No| Increment[Increment Retry Counter]
+ Increment --> CheckRetries
+
+ CheckRetries -->|No| Failed[FAILED - Give Up]
+
+ Connected --> End([Ready for Phase 3])
+ Failed --> End2([Registration Failed])
+
+ style Connected fill:#c8e6c9
+ style Failed fill:#ffcdd2
+```
+
+**Reconnection Parameters:**
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `max_retries` | 5 | Maximum reconnection attempts |
+| `reconnect_delay` | 5.0 seconds | Delay between attempts |
+| `retry_counter` | Per-device | Tracked in AgentProfile.connection_attempts |
+
+> **Warning:** If a device fails to connect after `max_retries` attempts, it enters `FAILED` status and requires manual intervention (e.g., restarting the device agent server).
+
+### Output
+
+- **WebSocket connection** established and active
+- **Message handler** listening for incoming messages
+- **Heartbeat monitoring** started
+- **Status**: `CONNECTED`
+
+> **Phase 2 Complete:** WebSocket connection established. Message handler and heartbeat monitoring active.
+
+## 📡 Phase 3: Service Registration (AIP)
+
+### Purpose
+
+Perform AIP registration protocol exchange to:
+
+- Identify client type (DEVICE vs CONSTELLATION)
+- Advertise platform information
+- Validate registration with server
+
+See [AIP Protocol Documentation](../../aip/protocols.md#registration-protocol) for detailed protocol specifications.
+
+### Process
+
+```mermaid
+sequenceDiagram
+ participant Manager as DeviceManager
+ participant WSManager as WebSocketConnectionManager
+ participant Transport as WebSocketTransport
+ participant RegProtocol as RegistrationProtocol
+ participant Server as UFO Server Handler
+
+ Note over Manager,Server: Device Agent Client Registration
+
+ Manager->>RegProtocol: register_as_device(device_id, metadata, platform)
+
+ RegProtocol->>RegProtocol: Prepare ClientMessage
+ Note over RegProtocol: type: REGISTER client_type: DEVICE metadata: system_info, etc.
+
+ RegProtocol->>Transport: send_message(ClientMessage)
+ Transport->>Server: WebSocket: REGISTER message
+
+ Server->>Server: Parse ClientMessage
+ Server->>Server: Validate registration
+ Server->>Server: Extract metadata, system_info
+ Server->>Server: Store in ClientConnectionManager
+
+ Server->>Transport: ServerMessage (status: OK)
+ Transport->>RegProtocol: receive_message(ServerMessage)
+
+ alt Registration Successful
+ RegProtocol-->>Manager: True (registration successful)
+ Manager->>Registry: update_device_status(CONNECTED)
+ else Registration Failed
+ RegProtocol-->>Manager: False (registration failed)
+ Manager->>Registry: update_device_status(FAILED)
+ end
+```
+
+### Registration Message Structure
+
+**Client → Server (REGISTER message):**
+
+```python
+ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_id="windowsagent",
+ client_type=ClientType.DEVICE,
+ status=TaskStatus.OK,
+ timestamp="2025-11-06T10:30:00.000Z",
+ metadata={
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00.000Z",
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "supported_features": ["gui", "cli", "browser", "file_system", "office"],
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ }
+ }
+)
+```
+
+**Server → Client (Confirmation):**
+
+```python
+ServerMessage(
+ type=ServerMessageType.HEARTBEAT,
+ status=TaskStatus.OK,
+ timestamp="2025-11-06T10:30:01.000Z",
+ response_id="reg_confirmation_12345"
+)
+```
+
+### Server-Side Processing
+
+```python
+# In UFOWebSocketHandler.connect()
+
+async def connect(self, websocket: WebSocket) -> str:
+ """Server-side registration handling."""
+
+ await websocket.accept()
+
+ # Initialize AIP protocols
+ self.transport = WebSocketTransport(websocket)
+ self.registration_protocol = RegistrationProtocol(self.transport)
+
+ # Parse registration message
+ reg_info = await self._parse_registration_message()
+
+ # Validate client type
+ client_type = reg_info.client_type # DEVICE or CONSTELLATION
+ platform = reg_info.metadata.get("platform", "windows")
+
+ # Register client
+ client_id = reg_info.client_id
+ self.client_manager.add_client(
+ client_id,
+ platform,
+ websocket,
+ client_type,
+ reg_info.metadata # Contains system_info
+ )
+
+ # Send confirmation
+ await self._send_registration_confirmation()
+
+ return client_id
+```
+
+### Constellation Client Registration
+
+For constellation clients (not device agents), the registration differs:
+
+```python
+# Constellation client registration
+ClientMessage(
+ type=ClientMessageType.REGISTER,
+ client_id="constellation_orchestrator",
+ client_type=ClientType.CONSTELLATION,
+ target_id="windowsagent", # Target device for this constellation
+ status=TaskStatus.OK,
+ timestamp="2025-11-06T10:30:00.000Z",
+ metadata={
+ "type": "constellation_client",
+ "targeted_device_id": "windowsagent"
+ }
+)
+```
+
+> **Note:** Device clients register as `ClientType.DEVICE`, while constellation orchestrators register as `ClientType.CONSTELLATION` with a `target_id` pointing to the device they want to control.
+
+### Output
+
+- Client registered in server's `ClientConnectionManager`
+- Client type (DEVICE/CONSTELLATION) recorded
+- Platform information stored
+- Registration confirmation received
+
+> **Phase 3 Complete:** AIP registration protocol completed. Client type and platform recorded on server.
+
+## 📊 Phase 4: Telemetry Collection
+
+### Purpose
+
+Collect real-time system information from the device client and merge it into the AgentProfile. The system information is collected by the device's `DeviceInfoProvider` during registration and sent to the server as part of the registration metadata.
+
+See [Device Info Provider](../../client/device_info.md) for details on telemetry collection.
+
+### Process
+
+```mermaid
+sequenceDiagram
+ participant Manager as DeviceManager
+ participant WSManager as WebSocketConnectionManager
+ participant Server as UFO Server
+ participant DIP as DeviceInfoProvider
+ participant Registry as DeviceRegistry
+
+ Note over Manager,Registry: Request Device Info
+
+ Manager->>WSManager: request_device_info(device_id)
+ WSManager->>Server: Send DEVICE_INFO_REQUEST
+
+ Note over Server,DIP: Server has already received system_info during registration
+
+ Server->>Server: Retrieve stored system_info from ClientConnectionManager
+ Server-->>WSManager: Return system_info
+ WSManager-->>Manager: system_info dict
+
+ Note over Manager,Registry: Merge System Info into AgentProfile
+
+ Manager->>Registry: update_device_system_info(device_id, system_info)
+
+ Registry->>Registry: Update OS if not set
+ Registry->>Registry: Merge supported_features into capabilities
+ Registry->>Registry: Add system_info to metadata
+ Registry->>Registry: Add custom_metadata if present
+
+ Registry-->>Manager: Update complete
+```
+
+### DeviceInfoProvider (Client-Side)
+
+The device client collects system info **during registration** (before Phase 4):
+
+```python
+# In WebSocket client's register_client() method
+
+from ufo.client.device_info_provider import DeviceInfoProvider
+
+# Collect device info
+system_info = DeviceInfoProvider.collect_system_info(
+ client_id=self.ufo_client.client_id,
+ custom_metadata=None
+)
+
+# Prepare metadata for registration
+metadata = {
+ "system_info": system_info.to_dict(),
+ "registration_time": datetime.now(timezone.utc).isoformat()
+}
+
+# Register with AIP (includes system_info in metadata)
+await self.registration_protocol.register_as_device(
+ device_id=self.ufo_client.client_id,
+ metadata=metadata,
+ platform=self.ufo_client.platform
+)
+```
+
+### System Info Structure
+
+```python
+{
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "supported_features": [
+ "gui",
+ "cli",
+ "browser",
+ "file_system",
+ "office",
+ "windows_apps"
+ ],
+ "platform_type": "computer",
+ "schema_version": "1.0"
+}
+```
+
+See [Device Info Provider](../../client/device_info.md) for telemetry collection details.
+
+### Merging Logic
+
+```python
+def update_device_system_info(
+ self, device_id: str, system_info: Dict[str, Any]
+) -> bool:
+ """Update AgentProfile with system information."""
+
+ device_info = self.get_device(device_id)
+ if not device_info:
+ return False
+
+ # 1. Update OS information
+ if "platform" in system_info:
+ device_info.os = system_info["platform"]
+
+ # 2. Merge capabilities with supported features (avoid duplicates)
+ if "supported_features" in system_info:
+ features = system_info["supported_features"]
+ existing_caps = set(device_info.capabilities)
+ new_caps = existing_caps.union(set(features))
+ device_info.capabilities = list(new_caps)
+
+ # 3. Update metadata with system information
+ device_info.metadata.update({
+ "system_info": {
+ "platform": system_info.get("platform"),
+ "os_version": system_info.get("os_version"),
+ "cpu_count": system_info.get("cpu_count"),
+ "memory_total_gb": system_info.get("memory_total_gb"),
+ "hostname": system_info.get("hostname"),
+ "ip_address": system_info.get("ip_address"),
+ "platform_type": system_info.get("platform_type"),
+ "schema_version": system_info.get("schema_version")
+ }
+ })
+
+ # 4. Add custom metadata if present
+ if "custom_metadata" in system_info:
+ device_info.metadata["custom_metadata"] = system_info["custom_metadata"]
+
+ # 5. Add tags if present
+ if "tags" in system_info:
+ device_info.metadata["tags"] = system_info["tags"]
+
+ return True
+```
+
+### Before & After
+
+**Before Telemetry (AgentProfile v2):**
+
+```python
+AgentProfile(
+ device_id="windowsagent",
+ os="windows", # From user config
+ capabilities=["web_browsing", "office_applications", "file_management"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high"
+ }
+)
+```
+
+**After Telemetry (AgentProfile v3 - Complete):**
+
+```python
+AgentProfile(
+ device_id="windowsagent",
+ os="windows", # Confirmed by telemetry
+ capabilities=[
+ "web_browsing", "office_applications", "file_management", # User config
+ "gui", "cli", "browser", "file_system", "office", "windows_apps" # Auto-detected
+ ],
+ metadata={
+ # User config
+ "location": "office_desktop",
+ "performance": "high",
+
+ # Telemetry
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ }
+ }
+)
+```
+
+> **Phase 4 Complete:** System information collected and merged into AgentProfile. Capabilities expanded with auto-detected features.
+
+## ✅ Phase 5: Agent Activation
+
+### Purpose
+
+Finalize agent registration and set it to IDLE status, ready to accept task assignments.
+
+### Process
+
+```mermaid
+sequenceDiagram
+ participant Manager as DeviceManager
+ participant Registry as DeviceRegistry
+ participant HB as HeartbeatManager
+
+ Manager->>Registry: set_device_idle(device_id)
+
+ Registry->>Registry: Update status = IDLE
+ Registry->>Registry: Clear current_task_id = None
+ Registry-->>Manager: Status updated
+
+ Manager->>HB: Verify heartbeat active
+ HB-->>Manager: Heartbeat OK
+
+ Manager->>Manager: Log successful registration
+ Note over Manager: ✅ Device ready for tasks
+```
+
+### Code
+
+```python
+# Set device to IDLE (ready to accept tasks)
+self.device_registry.set_device_idle(device_id)
+
+self.logger.info(f"✅ Successfully connected to device {device_id}")
+```
+
+### Final AgentProfile State
+
+```python
+AgentProfile(
+ # Identity
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+
+ # Platform & Capabilities
+ os="windows",
+ capabilities=[
+ "web_browsing", "office_applications", "file_management",
+ "gui", "cli", "browser", "file_system", "office", "windows_apps"
+ ],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high",
+ "platform": "windows",
+ "registration_time": "2025-11-06T10:30:00Z",
+ "system_info": {
+ "platform": "windows",
+ "os_version": "10.0.22631",
+ "cpu_count": 16,
+ "memory_total_gb": 32.0,
+ "hostname": "DESKTOP-DEV01",
+ "ip_address": "192.168.1.100",
+ "platform_type": "computer",
+ "schema_version": "1.0"
+ }
+ },
+
+ # Status
+ status=DeviceStatus.IDLE, # ✅ Ready for tasks!
+ last_heartbeat=datetime(2025, 11, 6, 10, 30, 45),
+
+ # Connection
+ connection_attempts=0, # Reset after successful connection
+ max_retries=5,
+
+ # Task
+ current_task_id=None
+)
+```
+
+> **Phase 5 Complete:** Agent fully registered, profiled, and activated. Status: `IDLE` - Ready to accept task assignments.
+
+## 🎯 Complete End-to-End Example
+
+### Scenario: Register Windows Workstation
+
+```python
+import asyncio
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+async def register_windows_workstation():
+ """Complete registration flow example."""
+
+ # Initialize manager
+ manager = ConstellationDeviceManager(
+ task_name="office_constellation",
+ heartbeat_interval=30.0,
+ reconnect_delay=5.0
+ )
+
+ print("📝 Phase 1: User Configuration")
+ # Register device from user config
+ success = await manager.register_device(
+ device_id="windowsagent",
+ server_url="ws://localhost:5005/ws",
+ os="windows",
+ capabilities=["web_browsing", "office_applications", "file_management"],
+ metadata={
+ "location": "office_desktop",
+ "performance": "high",
+ "description": "Primary Windows workstation",
+ "operation_engineer_email": "admin@example.com"
+ },
+ max_retries=5,
+ auto_connect=True # Will proceed to Phase 2-5 automatically
+ )
+
+ if success:
+ print("✅ Registration successful!")
+
+ # Get complete profile
+ profile = manager.get_device_info("windowsagent")
+
+ print(f"\n📊 AgentProfile:")
+ print(f" Device ID: {profile.device_id}")
+ print(f" Status: {profile.status.value}")
+ print(f" OS: {profile.os}")
+ print(f" Capabilities: {profile.capabilities}")
+ print(f" System Info:")
+ system_info = profile.metadata.get("system_info", {})
+ print(f" - CPU Cores: {system_info.get('cpu_count')}")
+ print(f" - Memory: {system_info.get('memory_total_gb')} GB")
+ print(f" - Hostname: {system_info.get('hostname')}")
+ print(f" - IP: {system_info.get('ip_address')}")
+
+ # Device is now ready for tasks
+ print(f"\n🚀 Device is ready to receive tasks!")
+
+ else:
+ print("❌ Registration failed")
+
+# Run the example
+asyncio.run(register_windows_workstation())
+```
+
+**Output:**
+
+```
+📝 Phase 1: User Configuration
+🌐 Phase 2: WebSocket Connection
+ Connecting to ws://localhost:5005/ws...
+ Connection established
+📡 Phase 3: Service Registration
+ Sending REGISTER message...
+ Registration confirmed
+📊 Phase 4: Telemetry Collection
+ Collecting system information...
+ System info merged
+✅ Phase 5: Agent Activation
+ Device set to IDLE
+
+✅ Registration successful!
+
+📊 AgentProfile:
+ Device ID: windowsagent
+ Status: idle
+ OS: windows
+ Capabilities: ['web_browsing', 'office_applications', 'file_management', 'gui', 'cli', 'browser', 'file_system', 'office', 'windows_apps']
+ System Info:
+ - CPU Cores: 16
+ - Memory: 32.0 GB
+ - Hostname: DESKTOP-DEV01
+ - IP: 192.168.1.100
+
+🚀 Device is ready to receive tasks!
+```
+
+---
+
+## 🔧 Error Handling
+
+### Connection Failures
+
+```python
+try:
+ success = await manager.register_device(...)
+except websockets.WebSocketException as e:
+ logger.error(f"WebSocket error: {e}")
+ # Will automatically retry if under max_retries
+except OSError as e:
+ logger.error(f"Network error: {e}")
+ # Check network connectivity
+except asyncio.TimeoutError:
+ logger.error("Connection timeout")
+ # Server may be down or unreachable
+```
+
+### Registration Rejection
+
+```python
+# Server-side validation
+if not self.client_manager.is_device_connected(claimed_device_id):
+ error_msg = f"Target device '{claimed_device_id}' is not connected"
+ await self._send_error_response(error_msg)
+ await self.transport.close()
+ raise ValueError(error_msg)
+```
+
+### Telemetry Collection Failure
+
+```python
+# Graceful degradation - always returns valid DeviceSystemInfo
+try:
+ return DeviceSystemInfo(...)
+except Exception as e:
+ logger.error(f"Error collecting system info: {e}")
+ # Return minimal info instead of failing
+ return DeviceSystemInfo(
+ device_id=client_id,
+ platform="unknown",
+ os_version="unknown",
+ cpu_count=0,
+ memory_total_gb=0.0,
+ hostname="unknown",
+ ip_address="unknown",
+ supported_features=[],
+ platform_type="unknown"
+ )
+```
+
+---
+
+## 🔗 Related Documentation
+
+| Topic | Document | Description |
+|-------|----------|-------------|
+| **Overview** | [Agent Registration Overview](./overview.md) | Registration architecture |
+| **AgentProfile** | [AgentProfile](./agent_profile.md) | Profile structure details |
+| **Device Registry** | [Device Registry](./device_registry.md) | Registry component |
+| **Galaxy Devices Config** | [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md) | YAML config reference |
+| **Device Info** | [Device Info Provider](../../client/device_info.md) | Telemetry collection |
+| **AIP Protocol** | [AIP Overview](../../aip/overview.md) | Protocol fundamentals |
+
+## 💡 Best Practices
+
+**1. Use auto_connect for Production**
+
+```python
+await manager.register_device(..., auto_connect=True)
+# Automatically completes all 5 phases
+```
+
+**2. Configure Appropriate max_retries**
+
+```python
+# Critical devices: higher retries
+max_retries=10 # For production servers
+
+# Test devices: lower retries
+max_retries=3 # For development environments
+```
+
+**3. Monitor Registration Status**
+
+```python
+profile = manager.get_device_info(device_id)
+if profile.status == DeviceStatus.FAILED:
+ logger.error(f"Device {device_id} failed to register")
+ # Take corrective action
+```
+
+**4. Provide Rich Metadata**
+
+```python
+metadata={
+ "location": "datacenter_us_west",
+ "performance": "high",
+ "tags": ["production", "critical"],
+ "operation_engineer_email": "ops@example.com"
+}
+```
+
+## 🚀 Next Steps
+
+1. **Configure Devices**: Read [Galaxy Devices Configuration](../../configuration/system/galaxy_devices.md)
+2. **Understand DeviceRegistry**: Check [Device Registry](./device_registry.md)
+3. **Learn Task Assignment**: See [Task Execution Documentation](../constellation_orchestrator/overview.md)
+4. **Study AIP Messages**: Read [AIP Messages](../../aip/messages.md)
+
+## 📚 Source Code References
+
+- **ConstellationDeviceManager**: `galaxy/client/device_manager.py`
+- **DeviceRegistry**: `galaxy/client/components/device_registry.py`
+- **RegistrationProtocol**: `aip/protocol/registration.py`
+- **UFOWebSocketHandler**: `ufo/server/ws/handler.py`
+- **DeviceInfoProvider**: `ufo/client/device_info_provider.py`
diff --git a/documents/docs/galaxy/client/aip_integration.md b/documents/docs/galaxy/client/aip_integration.md
new file mode 100644
index 000000000..cbded5d08
--- /dev/null
+++ b/documents/docs/galaxy/client/aip_integration.md
@@ -0,0 +1,1072 @@
+# AIP Protocol Integration
+
+The Agent Interaction Protocol (AIP) is the communication protocol used throughout Galaxy Client for device coordination. This document explains how Galaxy Client integrates with AIP, the message flow patterns, and how different components use the protocol.
+
+## Related Documentation
+
+- [Overview](./overview.md) - Overall Galaxy Client architecture
+- [DeviceManager](./device_manager.md) - Connection management using AIP
+- [Components](./components.md) - Component-level AIP usage
+- [AIP Protocol Specification](../../aip/overview.md) - Complete protocol reference
+- [AIP Message Reference](../../aip/messages.md) - Detailed message structures
+
+---
+
+## What is AIP?
+
+AIP (Agent Interaction Protocol) is a WebSocket-based message protocol for agent communication. It defines structured message types, status codes, and communication patterns for device registration, task execution, health monitoring, and information exchange.
+
+**Core Principles:**
+
+**Transport Agnostic**: AIP runs over WebSocket in Galaxy Client, but the protocol itself is transport-independent. You could implement AIP over HTTP, gRPC, or any other transport.
+
+**Strongly Typed**: All messages are Pydantic models with strict validation. Invalid messages are rejected immediately, preventing protocol errors from propagating.
+
+**Bidirectional**: Both client and server can initiate messages. Clients send REGISTER, TASK_END, HEARTBEAT responses. Server sends TASK, DEVICE_INFO_REQUEST, HEARTBEAT requests.
+
+**Status-Based**: Every message includes a status field (OK, ERROR, CONTINUE, COMPLETED, FAILED) indicating the message's semantic meaning and guiding response handling.
+
+**Key Message Types:**
+
+```
+Registration & Connection:
+- REGISTER: Device announces itself to server
+- REGISTER_CONFIRMATION: Server acknowledges registration
+
+Health Monitoring:
+- HEARTBEAT (client→server): "I'm alive"
+- HEARTBEAT (server→client): "Are you alive?"
+
+Task Execution:
+- TASK (server→client): "Execute this task"
+- COMMAND (server→client): "Execute these commands"
+- COMMAND_RESULTS (client→server): "Command execution results"
+- TASK_END (client→server): "Task completed"
+
+Device Information:
+- DEVICE_INFO_REQUEST (client→server): "What are your system specs?"
+- DEVICE_INFO_RESPONSE (server→client): "Here's my system info"
+
+Error Handling:
+- ERROR: "Something went wrong"
+```
+
+---
+
+## Protocol Architecture in Galaxy Client
+
+Galaxy Client uses AIP at multiple levels:
+
+### Layer 1: Transport (WebSocket)
+
+WebSocketConnectionManager handles raw WebSocket communication:
+
+```python
+# Establish WebSocket connection
+ws = await websockets.connect(server_url)
+
+# Send raw bytes
+await ws.send(message_bytes)
+
+# Receive raw bytes
+message_bytes = await ws.recv()
+```
+
+WebSocketConnectionManager knows nothing about AIP message structure. It's purely a transport layer.
+
+### Layer 2: Protocol (AIP)
+
+AIPProtocol class (from `aip/protocol/base.py`) handles message serialization and deserialization:
+
+```python
+from aip.protocol import AIPProtocol
+from aip.transport import WebSocketTransport
+
+# Wrap WebSocket in Transport abstraction
+transport = WebSocketTransport(ws)
+
+# Create protocol handler
+protocol = AIPProtocol(transport)
+
+# Send structured message
+await protocol.send_message(ClientMessage(
+ type=ClientMessageType.REGISTER,
+ payload={"device_id": "windows_pc"}
+))
+
+# Receive structured message
+message = await protocol.receive_message(ServerMessage)
+```
+
+AIPProtocol converts between Pydantic models and bytes, applies middleware, and handles serialization errors.
+
+### Layer 3: Message Processing (MessageProcessor)
+
+MessageProcessor (from DeviceManager components) routes messages to handlers:
+
+```python
+# Register handler for TASK messages
+message_processor.register_handler(
+ message_type="task",
+ handler=handle_task_message
+)
+
+# Start listening for messages
+await message_processor.start_message_handler(device_id)
+
+# Messages automatically routed to registered handlers
+```
+
+MessageProcessor implements the observer pattern, dispatching incoming messages to registered callbacks.
+
+### Layer 4: Application Logic (DeviceManager, ConstellationClient)
+
+Application components use MessageProcessor to send/receive messages without dealing with protocol details:
+
+```python
+# Send REGISTER message
+await message_processor.send_message(
+ device_id=device_id,
+ message_type="REGISTER",
+ payload={"device_id": device_id, "capabilities": ["office"]}
+)
+
+# Wait for REGISTER_CONFIRMATION
+confirmation = await message_processor.wait_for_response(
+ device_id=device_id,
+ message_type="REGISTER_CONFIRMATION",
+ timeout=10.0
+)
+```
+
+This layered architecture separates concerns and makes each layer testable.
+
+---
+
+## Message Flow Patterns
+
+### Device Registration Flow
+
+When DeviceManager connects to a device, it performs AIP registration:
+
+```mermaid
+sequenceDiagram
+ participant DM as DeviceManager
+ participant MP as MessageProcessor
+ participant P as AIPProtocol
+ participant T as WebSocketTransport
+ participant Server
+
+ Note over DM,Server: 1. WebSocket Connection
+ DM->>T: connect(server_url)
+ T->>Server: WebSocket handshake
+ Server-->>T: Connection established
+
+ Note over DM,Server: 2. Device Registration
+ DM->>MP: send_message(REGISTER)
+ MP->>P: send_message(ClientMessage)
+ P->>P: Serialize to JSON
+ P->>T: send(bytes)
+ T->>Server: REGISTER message
+
+ Note over DM,Server: 3. Server Confirmation
+ Server->>T: REGISTER_CONFIRMATION
+ T-->>P: recv(bytes)
+ P->>P: Deserialize from JSON
+ P-->>MP: ServerMessage(REGISTER_CONFIRMATION)
+ MP-->>DM: Registration confirmed
+
+ Note over DM,Server: 4. Device Info Exchange
+ DM->>MP: send_message(DEVICE_INFO_REQUEST)
+ MP->>Server: DEVICE_INFO_REQUEST
+ Server->>MP: DEVICE_INFO_RESPONSE
+ MP-->>DM: Device telemetry
+```
+
+**Message Details:**
+
+**Step 2 - REGISTER Message:**
+```json
+{
+ "type": "register",
+ "client_type": "constellation",
+ "payload": {
+ "device_id": "windows_pc",
+ "capabilities": ["office", "web", "email"],
+ "metadata": {
+ "location": "office",
+ "user": "john"
+ }
+ },
+ "status": "ok",
+ "timestamp": "2025-11-06T10:30:00Z"
+}
+```
+
+**Step 3 - REGISTER_CONFIRMATION:**
+```json
+{
+ "type": "heartbeat",
+ "status": "ok",
+ "timestamp": "2025-11-06T10:30:01Z",
+ "response_id": "reg_conf_abc123"
+}
+```
+
+Note: The server confirms registration by sending a HEARTBEAT message with OK status, which serves as the registration confirmation in the AIP protocol.
+
+**Step 4 - DEVICE_INFO_REQUEST:**
+```json
+{
+ "type": "device_info_request",
+ "client_type": "constellation",
+ "payload": {
+ "request_id": "req_xyz789"
+ },
+ "status": "ok",
+ "timestamp": "2025-11-06T10:30:02Z"
+}
+```
+
+**Step 4 - DEVICE_INFO_RESPONSE:**
+```json
+{
+ "type": "device_info_response",
+ "status": "ok",
+ "result": {
+ "device_id": "windows_pc",
+ "device_info": {
+ "os": "Windows 11",
+ "cpu_count": 8,
+ "memory_gb": 32,
+ "screen_resolution": "1920x1080",
+ "python_version": "3.11.5",
+ "installed_apps": ["Microsoft Office", "Chrome", "VSCode"]
+ }
+ },
+ "timestamp": "2025-11-06T10:30:03Z",
+ "response_id": "info_resp_xyz789"
+}
+```
+
+### Heartbeat Flow
+
+HeartbeatManager sends periodic HEARTBEAT messages to monitor device health:
+
+```mermaid
+sequenceDiagram
+ participant HM as HeartbeatManager
+ participant MP as MessageProcessor
+ participant Server
+
+ Note over HM: Every 30 seconds
+
+ HM->>MP: send_message(HEARTBEAT)
+ MP->>Server: HEARTBEAT
+
+ alt Server responds
+ Server-->>MP: HEARTBEAT (response)
+ MP-->>HM: Response received
+ HM->>HM: Update last_heartbeat timestamp
+ else Timeout (no response in 10s)
+ MP-->>HM: TimeoutError
+ HM->>HM: Mark device as unhealthy
+ HM->>DM: _handle_device_disconnection("heartbeat_timeout")
+ end
+```
+
+**HEARTBEAT Message (client→server):**
+```json
+{
+ "type": "heartbeat",
+ "client_type": "constellation",
+ "client_id": "constellation_client_id",
+ "status": "ok",
+ "timestamp": "2025-11-06T10:35:00Z"
+}
+```
+
+**HEARTBEAT Response (server→client):**
+```json
+{
+ "type": "heartbeat",
+ "status": "ok",
+ "timestamp": "2025-11-06T10:35:00Z",
+ "response_id": "hb_resp_123"
+}
+```
+
+Heartbeat is a simple request-response pattern. If the server doesn't respond within timeout, HeartbeatManager assumes connection failure and triggers reconnection.
+
+### Task Execution Flow
+
+Task execution involves multiple message exchanges:
+
+```mermaid
+sequenceDiagram
+ participant Orch as TaskOrchestrator
+ participant DM as DeviceManager
+ participant MP as MessageProcessor
+ participant Server
+ participant Device
+
+ Note over Orch,Device: 1. Task Assignment
+ Orch->>DM: assign_task_to_device(task_id, device_id, ...)
+ DM->>MP: send_message(TASK)
+ MP->>Server: TASK
+ Server->>Device: Forward TASK
+
+ Note over Orch,Device: 2. Task Execution (on device)
+ Device->>Device: Plan task steps
+
+ loop For each step
+ Device->>Server: Request command execution
+ Server->>Device: COMMAND (action to take)
+ Device->>Device: Execute command
+ Device->>Server: COMMAND_RESULTS
+ Server->>Server: Store results
+ end
+
+ Note over Orch,Device: 3. Task Completion
+ Device->>Server: TASK_END (status=completed)
+ Server->>MP: TASK_END
+ MP-->>DM: Task result
+ DM-->>Orch: Task completed
+```
+
+**TASK Message (server→client):**
+```json
+{
+ "type": "task",
+ "status": "continue",
+ "user_request": "Open Excel and create a chart",
+ "task_name": "galaxy/production/excel_task",
+ "session_id": "sess_task_abc123",
+ "timestamp": "2025-11-06T10:40:00Z",
+ "response_id": "task_req_001"
+}
+```
+
+**COMMAND Message (server→client):**
+```json
+{
+ "type": "command",
+ "status": "continue",
+ "actions": [
+ {
+ "action": "launch_app",
+ "parameters": {
+ "app_name": "Excel"
+ }
+ },
+ {
+ "action": "open_file",
+ "parameters": {
+ "file_path": "sales_report.xlsx"
+ }
+ }
+ ],
+ "session_id": "sess_task_abc123",
+ "response_id": "cmd_001"
+}
+```
+
+**COMMAND_RESULTS Message (client→server):**
+```json
+{
+ "type": "command_results",
+ "client_type": "device",
+ "client_id": "device_agent_id",
+ "status": "continue",
+ "action_results": [
+ {
+ "action": "launch_app",
+ "status": "completed",
+ "result": "Excel launched successfully"
+ },
+ {
+ "action": "open_file",
+ "status": "completed",
+ "result": "File opened: sales_report.xlsx"
+ }
+ ],
+ "session_id": "sess_task_abc123",
+ "prev_response_id": "cmd_001"
+}
+```
+
+**TASK_END Message (client→server):**
+```json
+{
+ "type": "task_end",
+ "client_type": "device",
+ "client_id": "device_agent_id",
+ "status": "completed",
+ "result": {
+ "success": true,
+ "output": "Created bar chart showing quarterly sales",
+ "artifacts": [
+ {
+ "type": "file",
+ "path": "sales_report_with_chart.xlsx"
+ }
+ ]
+ },
+ "session_id": "sess_task_abc123",
+ "timestamp": "2025-11-06T10:40:15Z"
+}
+```
+
+This multi-message pattern allows streaming execution updates and early error detection.
+
+---
+
+## Error Handling
+
+AIP uses ERROR messages for protocol-level errors:
+
+### Error Types
+
+**Connection Errors**: WebSocket closed, network failure
+- Handled by: WebSocketConnectionManager
+- Recovery: Reconnection with exponential backoff
+
+**Protocol Errors**: Invalid message format, unknown message type
+- Handled by: AIPProtocol
+- Recovery: Send ERROR message, log warning, continue
+
+**Task Errors**: Command execution failure, task timeout
+- Handled by: Device agent
+- Recovery: Send TASK_END with status=failed
+
+**Application Errors**: Device not found, capability mismatch
+- Handled by: DeviceManager, ConstellationClient
+- Recovery: Application-specific (queue task, fail request, etc.)
+
+### ERROR Message Format
+
+```json
+{
+ "type": "error",
+ "status": "error",
+ "error": "Task execution exceeded 300 second timeout",
+ "session_id": "sess_task_abc123",
+ "timestamp": "2025-11-06T10:45:00Z",
+ "response_id": "err_001",
+ "metadata": {
+ "error_code": "TASK_TIMEOUT",
+ "elapsed_time": 315.2,
+ "last_command": "create_chart"
+ }
+}
+```
+
+**Error Codes:**
+
+- `CONNECTION_FAILED`: WebSocket connection failed
+- `REGISTRATION_FAILED`: Device registration rejected
+- `TASK_TIMEOUT`: Task execution exceeded timeout
+- `COMMAND_FAILED`: Individual command failed
+- `PROTOCOL_ERROR`: Invalid message format or type
+- `DEVICE_NOT_FOUND`: Target device doesn't exist
+- `CAPABILITY_MISMATCH`: Device lacks required capability
+
+### Error Handling Example
+
+```python
+try:
+ # Send TASK message
+ await message_processor.send_message(
+ device_id=device_id,
+ message_type="TASK",
+ payload=task_data
+ )
+
+ # Wait for TASK_END
+ result = await message_processor.wait_for_response(
+ device_id=device_id,
+ message_type="TASK_END",
+ timeout=300.0
+ )
+
+ if result.status == TaskStatus.FAILED:
+ # Task failed on device
+ error_info = result.payload.get("error")
+ logger.error(f"Task failed: {error_info}")
+ # Application-specific recovery
+
+except TimeoutError:
+ # No response within timeout
+ logger.error("Task timeout, marking device as failed")
+ await device_manager._handle_device_disconnection(
+ device_id,
+ reason="task_timeout"
+ )
+
+except ConnectionError:
+ # Connection lost during execution
+ logger.error("Connection lost during task")
+ await device_manager._handle_device_disconnection(
+ device_id,
+ reason="connection_lost"
+ )
+```
+
+---
+
+## Message Processing Implementation
+
+### MessageProcessor Component
+
+MessageProcessor (from DeviceManager components) implements AIP message handling:
+
+**Key Responsibilities:**
+
+1. **Message Sending**: Serialize and send messages via AIPProtocol
+2. **Message Receiving**: Deserialize and route incoming messages
+3. **Handler Registration**: Allow components to register callbacks for message types
+4. **Request-Response Pattern**: Implement synchronous request-response over async WebSocket
+
+**Internal Architecture:**
+
+```python
+class MessageProcessor:
+ def __init__(self):
+ self._protocols: Dict[str, AIPProtocol] = {} # device_id → protocol
+ self._handlers: Dict[str, Dict[str, Callable]] = {} # device_id → {msg_type → handler}
+ self._response_queues: Dict[str, asyncio.Queue] = {} # (device_id, msg_type) → queue
+
+ async def send_message(
+ self,
+ device_id: str,
+ message_type: str,
+ payload: Dict[str, Any]
+ ):
+ """Send message to device."""
+ protocol = self._protocols[device_id]
+
+ # Create message
+ msg = ClientMessage(
+ type=message_type,
+ payload=payload,
+ client_type=ClientType.CONSTELLATION,
+ status=TaskStatus.OK
+ )
+
+ # Send via protocol
+ await protocol.send_message(msg)
+
+ async def wait_for_response(
+ self,
+ device_id: str,
+ message_type: str,
+ timeout: float = 30.0
+ ) -> ServerMessage:
+ """Wait for specific message type from device."""
+ queue_key = (device_id, message_type)
+
+ # Create queue if not exists
+ if queue_key not in self._response_queues:
+ self._response_queues[queue_key] = asyncio.Queue()
+
+ # Wait for message with timeout
+ try:
+ message = await asyncio.wait_for(
+ self._response_queues[queue_key].get(),
+ timeout=timeout
+ )
+ return message
+ except asyncio.TimeoutError:
+ raise TimeoutError(f"No {message_type} received from {device_id} within {timeout}s")
+
+ async def start_message_handler(self, device_id: str):
+ """Start background loop to receive and route messages."""
+ protocol = self._protocols[device_id]
+
+ while True:
+ try:
+ # Receive message
+ message = await protocol.receive_message(ServerMessage)
+
+ # Route to handler
+ msg_type = message.type
+ if msg_type in self._handlers.get(device_id, {}):
+ handler = self._handlers[device_id][msg_type]
+ await handler(message)
+
+ # Also add to response queue
+ queue_key = (device_id, msg_type)
+ if queue_key in self._response_queues:
+ await self._response_queues[queue_key].put(message)
+
+ except ConnectionError:
+ # Connection closed, exit loop
+ break
+ except Exception as e:
+ logger.error(f"Error processing message: {e}")
+```
+
+This implementation supports both callback-based handlers and synchronous request-response patterns.
+
+---
+
+## AIP Extensions and Middleware
+
+### Protocol Middleware
+
+AIPProtocol supports middleware for cross-cutting concerns:
+
+```python
+from aip.protocol.base import ProtocolMiddleware
+
+class LoggingMiddleware(ProtocolMiddleware):
+ """Log all messages for debugging."""
+
+ async def process_outgoing(self, message: Any) -> Any:
+ """Called before sending message."""
+ logger.debug(f"Sending: {message.type} to {message.device_id}")
+ return message
+
+ async def process_incoming(self, message: Any) -> Any:
+ """Called after receiving message."""
+ logger.debug(f"Received: {message.type} from device")
+ return message
+
+class MetricsMiddleware(ProtocolMiddleware):
+ """Track message statistics."""
+
+ def __init__(self):
+ self.sent_count = 0
+ self.received_count = 0
+
+ async def process_outgoing(self, message: Any) -> Any:
+ self.sent_count += 1
+ metrics.increment("aip.messages.sent", tags={"type": message.type})
+ return message
+
+ async def process_incoming(self, message: Any) -> Any:
+ self.received_count += 1
+ metrics.increment("aip.messages.received", tags={"type": message.type})
+ return message
+
+# Add middleware to protocol
+protocol.middleware_chain.append(LoggingMiddleware())
+protocol.middleware_chain.append(MetricsMiddleware())
+```
+
+Middleware runs for every message, allowing logging, metrics, validation, transformation, etc.
+
+### Custom Message Types
+
+Extend AIP with custom message types:
+
+```python
+from enum import Enum
+from pydantic import BaseModel
+
+# Define custom message type
+class CustomMessageType(str, Enum):
+ DEVICE_SCREENSHOT = "device_screenshot"
+ PERFORMANCE_METRICS = "performance_metrics"
+
+# Define message structure
+class ScreenshotRequest(BaseModel):
+ type: Literal["device_screenshot"]
+ payload: Dict[str, Any]
+
+# Register handler
+message_processor.register_handler(
+ message_type="device_screenshot",
+ handler=handle_screenshot_request
+)
+
+# Send custom message
+await message_processor.send_message(
+ device_id=device_id,
+ message_type="device_screenshot",
+ payload={"region": "full_screen", "format": "png"}
+)
+```
+
+---
+
+## Complete Message Flow: ConstellationClient to Device Agent
+
+This section shows the complete end-to-end message flow from when a ConstellationClient assigns a task through the Agent Server to the final execution on a Device Agent.
+
+### Architecture Overview
+
+The message routing follows a three-tier architecture:
+
+```
+ConstellationClient (Galaxy Client)
+ ↓ WebSocket + AIP
+UFOWebSocketHandler (Agent Server)
+ ↓ WebSocket + AIP
+Device Agent Client
+```
+
+**Related Documentation:**
+
+- [AIP Overview](../../aip/overview.md) - Protocol specification
+
+### Task Execution End-to-End Flow
+
+When ConstellationClient assigns a task to a device, the message passes through multiple layers:
+
+```mermaid
+sequenceDiagram
+ participant CC as ConstellationClient
+ participant DM as DeviceManager
+ participant MP as MessageProcessor
+ participant WS1 as WebSocket(Client→Server)
+ participant Server as UFOWebSocketHandler
+ participant WS2 as WebSocket(Server→Device)
+ participant Device as Device Agent
+
+ Note over CC,Device: 1. Task Assignment Request
+ CC->>DM: assign_task_to_device(task_id, device_id, ...)
+ DM->>DM: Check device status (IDLE/BUSY)
+ DM->>MP: send_message(TASK)
+
+ Note over CC,Device: 2. Client → Server
+ MP->>MP: Create ClientMessage(type=TASK, client_type=CONSTELLATION)
+ MP->>WS1: Send TASK via WebSocket
+
+ Note over CC,Device: 3. Server Receives & Routes
+ WS1->>Server: TASK message arrives
+ Server->>Server: handle_message() parses ClientMessage
+ Server->>Server: handle_task_request()
+ Server->>Server: client_type=CONSTELLATION, extract target_id
+ Server->>Server: Create session_id, register constellation session
+ Server->>Server: Track device session mapping
+
+ Note over CC,Device: 4. Server → Device
+ Server->>WS2: Forward TASK to target device via AIP
+ WS2->>Device: TASK message
+ Device->>Device: Execute task (multiple rounds)
+
+ Note over CC,Device: 5. Task Execution on Device
+ loop For each action step
+ Device->>WS2: Request COMMAND
+ WS2->>Server: COMMAND request
+ Server->>Server: Generate action commands
+ Server->>WS2: COMMAND response
+ WS2->>Device: Action commands
+ Device->>Device: Execute commands
+ Device->>WS2: COMMAND_RESULTS
+ WS2->>Server: Command results
+ end
+
+ Note over CC,Device: 6. Task Completion Device → Server
+ Device->>WS2: TASK_END (status=completed)
+ WS2->>Server: Task completion
+ Server->>Server: Invoke callback send_result()
+
+ Note over CC,Device: 7. Server → Client (Dual Send)
+ Server->>WS1: TASK_END to ConstellationClient
+ Server->>WS2: TASK_END to Device Agent
+ WS1->>MP: TASK_END message
+ MP->>DM: Task result
+ DM->>CC: ExecutionResult
+```
+
+### Message Details at Each Layer
+
+#### Layer 1: ConstellationClient to Server
+
+**ConstellationClient sends:**
+
+```json
+{
+ "type": "task",
+ "client_type": "constellation",
+ "client_id": "constellation_abc123",
+ "target_id": "windows_pc",
+ "session_id": "sess_xyz789",
+ "task_name": "open_excel_task",
+ "request": "Open Excel and create a chart",
+ "payload": {
+ "task_id": "task_001",
+ "description": "Open Excel and create a chart",
+ "data": {
+ "file_path": "sales_report.xlsx",
+ "chart_type": "bar"
+ }
+ },
+ "status": "ok",
+ "timestamp": "2025-11-06T10:40:00Z"
+}
+```
+
+**Key Fields:**
+
+- `client_type: "constellation"`: Identifies this as a constellation client (not device)
+- `target_id`: The device that should execute this task (e.g., "windows_pc")
+- `session_id`: Constellation session identifier for tracking
+- `task_name`: Human-readable task identifier
+
+#### Layer 2: Server Processing
+
+The `UFOWebSocketHandler` receives the message and processes it:
+
+**handle_task_request() Logic:**
+
+```python
+async def handle_task_request(self, data: ClientMessage) -> None:
+ client_type = data.client_type
+ client_id = data.client_id
+ target_device_id = None
+
+ if client_type == ClientType.CONSTELLATION:
+ # Extract target device from constellation request
+ target_device_id = data.target_id
+ self.logger.info(f"🌟 Constellation task for device {target_device_id}")
+
+ # Track session mapping
+ self.client_manager.add_constellation_session(client_id, session_id)
+ self.client_manager.add_device_session(target_device_id, session_id)
+
+ # Get target device's task protocol
+ target_protocol = self.client_manager.get_task_protocol(target_device_id)
+
+ else:
+ # Direct device request
+ target_protocol = self.client_manager.get_task_protocol(client_id)
+
+ # Start task in background (non-blocking)
+ await self.session_manager.execute_task_async(
+ session_id=session_id,
+ task_protocol=target_protocol, # Send to target device
+ callback=send_result # Called when task completes
+ )
+```
+
+**Session Tracking:**
+
+The server maintains two mappings:
+
+1. **Constellation Sessions**: Maps constellation_client_id → [session_ids]
+2. **Device Sessions**: Maps device_id → [session_ids]
+
+This allows the server to:
+
+- Cancel all sessions when a constellation client disconnects
+- Cancel all sessions when a device disconnects
+- Send results to both constellation client AND device
+
+#### Layer 3: Server to Device
+
+The server forwards the task to the target device via its WebSocket connection:
+
+**Message sent to device:**
+
+```json
+{
+ "type": "task",
+ "session_id": "sess_xyz789",
+ "payload": {
+ "request": "Open Excel and create a chart",
+ "task_data": {
+ "file_path": "sales_report.xlsx",
+ "chart_type": "bar"
+ }
+ },
+ "status": "ok",
+ "timestamp": "2025-11-06T10:40:01Z"
+}
+```
+
+The device receives this via its own WebSocket connection and begins execution.
+
+#### Layer 4: Task Execution and Results
+
+During execution, the device exchanges multiple messages with the server:
+
+**Device requests commands:**
+```json
+{
+ "type": "command_request",
+ "session_id": "sess_xyz789",
+ "round": 1
+}
+```
+
+**Server responds with commands:**
+```json
+{
+ "type": "command",
+ "payload": {
+ "commands": [
+ {"action": "launch_app", "parameters": {"app_name": "Excel"}},
+ {"action": "open_file", "parameters": {"file_path": "sales_report.xlsx"}}
+ ]
+ }
+}
+```
+
+**Device sends results:**
+```json
+{
+ "type": "command_results",
+ "client_type": "device",
+ "client_id": "windows_pc",
+ "session_id": "sess_xyz789",
+ "payload": {
+ "results": [
+ {"action": "launch_app", "status": "completed"},
+ {"action": "open_file", "status": "completed"}
+ ]
+ }
+}
+```
+
+**Device signals completion:**
+```json
+{
+ "type": "task_end",
+ "client_type": "device",
+ "client_id": "windows_pc",
+ "session_id": "sess_xyz789",
+ "status": "completed",
+ "payload": {
+ "result": {
+ "success": true,
+ "output": "Created bar chart in sales_report.xlsx"
+ }
+ }
+}
+```
+
+#### Layer 5: Dual Result Delivery
+
+When the task completes, the server's callback `send_result()` sends TASK_END to **both**:
+
+1. **ConstellationClient** (the requester):
+```python
+requester_protocol = self.client_manager.get_task_protocol(client_id)
+await requester_protocol.send_task_end(
+ session_id=session_id,
+ status=result_msg.status,
+ result=result_msg.result
+)
+```
+
+2. **Device Agent** (the executor):
+```python
+if client_type == ClientType.CONSTELLATION and target_device_id:
+ target_protocol = self.client_manager.get_task_protocol(target_device_id)
+ await target_protocol.send_task_end(
+ session_id=session_id,
+ status=result_msg.status,
+ result=result_msg.result
+ )
+```
+
+This ensures both parties know the task completed.
+
+### Disconnection Handling
+
+The server handles disconnections at multiple levels:
+
+**Constellation Client Disconnects:**
+
+```python
+# Cancel all sessions started by this constellation
+session_ids = self.client_manager.get_constellation_sessions(client_id)
+for session_id in session_ids:
+ await self.session_manager.cancel_task(
+ session_id,
+ reason="constellation_disconnected"
+ )
+```
+
+**Device Disconnects:**
+
+```python
+# Cancel all sessions running on this device
+session_ids = self.client_manager.get_device_sessions(device_id)
+for session_id in session_ids:
+ await self.session_manager.cancel_task(
+ session_id,
+ reason="device_disconnected"
+ )
+```
+
+On the ConstellationClient side, DeviceManager detects disconnection via:
+
+- Heartbeat timeout (no response to HEARTBEAT within 10s)
+- WebSocket connection closed
+- Message send failure
+
+And triggers automatic reconnection with exponential backoff.
+
+### Client Type Distinction
+
+The server handles two client types differently:
+
+| Aspect | CONSTELLATION Client | DEVICE Client |
+|--------|---------------------|---------------|
+| Task Request | Includes `target_id` field | No `target_id`, executes locally |
+| Session Tracking | Tracked in constellation_sessions | Tracked in device_sessions |
+| Result Delivery | Receives TASK_END | Receives TASK_END |
+| Disconnection | Cancels all its sessions | Cancels sessions on this device |
+
+This allows the same server to support both direct device connections and constellation-mediated connections.
+
+---
+
+## Summary
+
+AIP integration in Galaxy Client follows a layered architecture:
+
+1. **Transport**: WebSocketConnectionManager handles raw WebSocket I/O via AIP WebSocketTransport
+2. **Protocol**: AIP protocol classes (RegistrationProtocol, TaskExecutionProtocol, HeartbeatProtocol, DeviceInfoProtocol) handle message serialization and protocol logic
+3. **Message Processing**: MessageProcessor routes messages to handlers
+4. **Application**: DeviceManager and ConstellationClient use messages for coordination
+5. **Server Routing**: UFOWebSocketHandler routes messages between constellation clients and devices
+6. **Device Execution**: Device agents execute tasks and return results
+
+**Key Message Flows:**
+
+- **Registration**: REGISTER → HEARTBEAT (OK) → DEVICE_INFO_REQUEST → DEVICE_INFO_RESPONSE
+- **Heartbeat**: HEARTBEAT (request) → HEARTBEAT (response), every 30 seconds
+- **Task Execution (Constellation)**: ConstellationClient TASK → Server routes → Device executes → Server routes → ConstellationClient TASK_END
+- **Task Execution (Direct)**: Device TASK → Server orchestrates → Device TASK_END
+
+**Error Handling:**
+
+- Connection errors trigger reconnection
+- Protocol errors send ERROR messages
+- Task errors return TASK_END with status=failed
+- Application errors use application-specific recovery
+- Disconnections cancel all associated sessions
+
+**Complete Architecture:**
+
+```
+User Request
+ ↓
+GalaxyClient (session management)
+ ↓
+ConstellationClient (device coordination)
+ ↓
+DeviceManager (connection orchestration)
+ ↓
+MessageProcessor (AIP messaging)
+ ↓
+WebSocket → UFOWebSocketHandler (server routing)
+ ↓
+WebSocket → Device Agent (task execution)
+```
+
+AIP provides a robust, extensible protocol for agent communication with strong typing, clear message flows, comprehensive error handling, and intelligent routing between constellation clients and devices.
+
+## Next Steps
+
+- See [DeviceManager](./device_manager.md) for connection management details
+- See [Components](./components.md) for MessageProcessor and WebSocketConnectionManager implementation
+- See [ConstellationClient](./constellation_client.md) for device coordination API
+- See [AIP Protocol Specification](../../aip/overview.md) for complete protocol reference
+- See [AIP Message Reference](../../aip/messages.md) for detailed message structures and examples
+- See [Server Documentation](../../server/websocket_handler.md) for server-side routing details
diff --git a/documents/docs/galaxy/client/components.md b/documents/docs/galaxy/client/components.md
new file mode 100644
index 000000000..87da9ec5d
--- /dev/null
+++ b/documents/docs/galaxy/client/components.md
@@ -0,0 +1,541 @@
+# Galaxy Client Components
+
+Galaxy Client is built from focused, single-responsibility components that work together to provide device management capabilities. This document explains how these components interact and what each one does.
+
+## Related Documentation
+
+- [Overview](./overview.md) - Overall Galaxy Client architecture
+- [DeviceManager](./device_manager.md) - How DeviceManager orchestrates these components
+- [ConstellationClient](./constellation_client.md) - How components are used in the coordination layer
+- [AIP Integration](./aip_integration.md) - Message protocol used by components
+
+---
+
+## Component Architecture Overview
+
+Galaxy Client uses 8 modular components divided into three categories: **Device Management**, **Display & UI**, and **Support Components**. Understanding how these components work together is key to understanding Galaxy Client's design.
+
+### The Big Picture: How Components Collaborate
+
+When DeviceManager needs to manage a device connection, it doesn't do everything itself. Instead, it delegates specific responsibilities to specialized components:
+
+```mermaid
+graph TB
+ DM[DeviceManager Orchestrator]
+
+ subgraph "State Management"
+ DR[DeviceRegistry Device State Storage]
+ end
+
+ subgraph "Connection Layer"
+ WS[WebSocketConnectionManager Network Communication]
+ HM[HeartbeatManager Health Monitoring]
+ MP[MessageProcessor Message Handling]
+ end
+
+ subgraph "Task Layer"
+ TQ[TaskQueueManager Task Scheduling]
+ end
+
+ DM --> DR
+ DM --> WS
+ DM --> HM
+ DM --> MP
+ DM --> TQ
+
+ WS -.->|updates| DR
+ HM -.->|reads| DR
+ HM -.->|uses| WS
+ MP -.->|updates| DR
+ MP -.->|uses| WS
+
+ style DM fill:#e1f5ff
+ style DR fill:#fff4e1
+```
+
+This diagram shows the component relationships. DeviceManager acts as the orchestrator, creating and coordinating all other components. DeviceRegistry serves as the single source of truth for device state. WebSocketConnectionManager, HeartbeatManager, and MessageProcessor all depend on both DeviceRegistry (for state) and each other (for operations). TaskQueueManager works independently, managing task queues.
+
+**Key Design Principles:**
+
+1. **Single Source of Truth**: DeviceRegistry is the only component that stores device state. All other components read from or write to DeviceRegistry, never maintaining their own state.
+
+2. **Dependency Injection**: DeviceManager creates all components and injects dependencies. For example, HeartbeatManager receives references to both WebSocketConnectionManager (to send heartbeats) and DeviceRegistry (to update timestamps).
+
+3. **Background Services**: HeartbeatManager and MessageProcessor run as independent asyncio tasks. They operate continuously in the background without blocking the main execution flow.
+
+4. **Component Independence**: Each component can be tested and understood in isolation. Changing one component's implementation doesn't affect others as long as the interface remains the same.
+
+---
+
+## Device Management Components
+
+These components handle the core device lifecycle: registration, connection, monitoring, and task execution.
+
+### DeviceRegistry: The Single Source of Truth
+
+**Purpose**: DeviceRegistry is the central repository for all device information. Every component that needs to know about device state queries DeviceRegistry.
+
+**What It Stores**: Each device is represented by an `AgentProfile` object containing:
+
+```python
+@dataclass
+class AgentProfile:
+ device_id: str # Unique device identifier
+ server_url: str # WebSocket endpoint
+ os: str # Operating system (windows/linux/mac)
+ status: DeviceStatus # Current state (DISCONNECTED/CONNECTING/CONNECTED/IDLE/BUSY/FAILED)
+ capabilities: List[str] # What the device can do (["office", "web", "email"])
+ metadata: Dict[str, Any] # Custom device properties
+ last_heartbeat: datetime # Last successful heartbeat timestamp
+ connection_attempts: int # Number of connection attempts made
+ max_retries: int # Maximum reconnection attempts allowed
+ current_task_id: str # Task being executed (None if idle)
+ system_info: Dict # Hardware/software details from device
+```
+
+The `status` field is particularly important as it drives the system's behavior. When a device is IDLE, it can accept new tasks. When BUSY, tasks are queued. When DISCONNECTED, reconnection is attempted.
+
+**Key Operations**:
+
+```python
+# Registration and lookup
+registry.register_device(device_id, server_url, os, capabilities, metadata)
+profile = registry.get_device(device_id)
+all_devices = registry.get_all_devices(connected=True)
+
+# Status management
+registry.update_device_status(device_id, DeviceStatus.CONNECTED)
+is_busy = registry.is_device_busy(device_id)
+registry.set_device_busy(device_id, task_id)
+registry.set_device_idle(device_id)
+
+# Health tracking
+registry.update_heartbeat(device_id)
+registry.increment_connection_attempts(device_id)
+registry.reset_connection_attempts(device_id)
+```
+
+**Why It Matters**: Having a single registry prevents state inconsistencies. Without DeviceRegistry, each component might have its own view of device state, leading to race conditions and bugs. For example, HeartbeatManager might think a device is connected while MessageProcessor thinks it's disconnected.
+
+### WebSocketConnectionManager: Network Communication Handler
+
+**Purpose**: Manages the low-level WebSocket connections to Agent Server and handles message transmission.
+
+**Connection Lifecycle**:
+
+When `connect_to_device()` is called, WebSocketConnectionManager performs these steps:
+
+1. **Establish WebSocket**: Creates an AIP `WebSocketTransport` and connects to the device's server_url. This is an async operation that may timeout or fail due to network issues.
+
+2. **Start Message Handler BEFORE Registration**: Crucially, this happens *before* sending REGISTER to prevent race conditions. The message handler is started via MessageProcessor to ensure we don't miss the server's response.
+
+3. **Send REGISTER**: Uses `RegistrationProtocol` to send an AIP REGISTER message identifying this client to the server. The server responds with a HEARTBEAT message with OK status to confirm registration.
+
+4. **Store Transport**: Saves the WebSocketTransport object and initializes AIP protocol handlers (`RegistrationProtocol`, `TaskExecutionProtocol`, `DeviceInfoProtocol`) for this connection.
+
+**Task Execution**:
+
+When sending a task to a device, WebSocketConnectionManager:
+
+```python
+async def send_task_to_device(device_id, task_request):
+ # 1. Get Transport and TaskExecutionProtocol
+ transport = self._transports[device_id]
+ task_protocol = self._task_protocols[device_id]
+
+ # 2. Create AIP ClientMessage for task execution
+ task_message = ClientMessage(
+ type=ClientMessageType.TASK,
+ client_type=ClientType.CONSTELLATION,
+ client_id=task_client_id,
+ target_id=device_id,
+ task_name=f"galaxy/{task_name}/{task_request.task_name}",
+ request=task_request.request,
+ session_id=constellation_task_id,
+ status=TaskStatus.CONTINUE,
+ ...
+ )
+
+ # 3. Send message via AIP transport
+ await transport.send(task_message.model_dump_json().encode("utf-8"))
+
+ # 4. Wait for response (handled via future)
+ result = await self._wait_for_task_response(device_id, constellation_task_id)
+
+ return ExecutionResult(...)
+```
+
+The `_wait_for_task_completion()` method creates an asyncio.Future that MessageProcessor will complete when it receives the TASK_END message from the device.
+
+**Error Handling**: WebSocketConnectionManager catches connection errors (InvalidURI, WebSocketException, OSError, TimeoutError) and returns False, allowing DeviceManager to trigger reconnection logic.
+
+### HeartbeatManager: Connection Health Monitor
+
+**Purpose**: Continuously monitors device health by sending periodic heartbeat messages. This detects connection failures faster than waiting for a task to timeout.
+
+**How It Works**:
+
+For each connected device, HeartbeatManager starts an independent background task that uses AIP `HeartbeatProtocol` to send HEARTBEAT messages periodically and verify the device is still responsive.
+
+**Timeout Detection**: Uses a timeout mechanism to detect when devices stop responding. If no heartbeat response arrives within the expected timeframe, the device is considered disconnected and HeartbeatManager triggers the disconnection handler.
+
+**Why Not Just Use TCP Keepalive?**: WebSocket runs over TCP, which has its own keepalive mechanism. However, TCP keepalive operates at a much longer timescale (typically 2 hours by default) and only detects network-level failures, not application-level hangs. HeartbeatManager detects if the device agent is responsive, not just if the TCP connection is alive.
+
+### MessageProcessor: Message Router and Handler
+
+**Purpose**: Runs a continuous message receiving loop for each device, dispatching incoming AIP messages to appropriate handlers.
+
+**The Message Loop**:
+
+MessageProcessor runs a background task that receives messages from the AIP transport and routes them based on message type. It handles `TASK_END` messages by completing the corresponding future that WebSocketConnectionManager is waiting on, enabling async task execution patterns.
+
+**Task Completion Handling**: When a TASK_END message arrives, MessageProcessor uses the `complete_task_response()` method in WebSocketConnectionManager to resolve the pending future for that task.
+
+**Why Run in Background**: The message loop runs continuously as an asyncio task. This allows it to receive messages asynchronously while the main execution flow (e.g., sending tasks) continues unblocked. Without this, we'd need to alternate between sending and receiving, making the code much more complex.
+
+### TaskQueueManager: Task Scheduling and Queuing
+
+**Purpose**: Manages per-device task queues, ensuring tasks execute sequentially when devices are busy.
+
+**Queue Behavior**:
+
+When a task is assigned to a device that's already executing another task:
+
+```python
+# In DeviceManager.assign_task_to_device()
+if self.device_registry.is_device_busy(device_id):
+ # Device is BUSY - enqueue task
+ future = self.task_queue_manager.enqueue_task(device_id, task_request)
+ # Wait for task to complete
+ result = await future
+ return result
+else:
+ # Device is IDLE - execute immediately
+ return await self._execute_task_on_device(device_id, task_request)
+```
+
+**How Queuing Works**:
+
+TaskQueueManager maintains a dictionary of queues: `{device_id: queue}`. Each queue is a list of `(task_request, future)` tuples. When a task is enqueued:
+
+```python
+def enqueue_task(device_id, task_request):
+ # Create a future for this task
+ future = asyncio.Future()
+
+ # Add to device's queue
+ self.queues[device_id].append((task_request, future))
+
+ # Return future so caller can await result
+ return future
+```
+
+When a device completes a task and becomes IDLE, DeviceManager calls:
+
+```python
+async def _process_next_queued_task(device_id):
+ if self.task_queue_manager.has_queued_tasks(device_id):
+ task_request = self.task_queue_manager.dequeue_task(device_id)
+ # Execute next task (don't await to avoid blocking)
+ asyncio.create_task(self._execute_task_on_device(device_id, task_request))
+```
+
+**Why Futures?**: Using asyncio.Future allows the calling code to await task completion even though the task is queued. The caller doesn't need to know whether the task executed immediately or was queued—it just awaits the future and gets the result when ready.
+
+---
+
+## Display Component
+
+### ClientDisplay: User Interface and Console Output
+
+**Purpose**: Provides Rich-based console output for interactive mode and status reporting. This component is only used by GalaxyClient, not by ConstellationClient or DeviceManager.
+
+**Key Features**:
+
+**Banner and Branding**: Shows ASCII art banner when GalaxyClient starts, creating a visual identity for the framework.
+
+**Progress Indication**: Uses Rich Progress bars for long-running operations like initialization:
+
+```python
+with display.show_initialization_progress() as progress:
+ task = progress.add_task("[cyan]Initializing...", total=None)
+ # ... initialization work ...
+ progress.update(task, description="[green]Complete!")
+```
+
+**Result Display**: Formats execution results in readable tables:
+
+```python
+display.display_result({
+ "status": "completed",
+ "execution_time": 23.45,
+ "rounds": 2,
+ "constellation": {"task_count": 5}
+})
+```
+
+This creates a formatted table showing status, time, rounds, and task count in color-coded output.
+
+**Interactive Input**: Provides user input prompts with styling:
+
+```python
+user_input = display.get_user_input("UFO[0]")
+```
+
+**Colored Messages**: Semantic color coding for different message types:
+- Green (success): Task completed, connection established
+- Red (error): Task failed, connection error
+- Yellow (warning): Device disconnected, timeout
+- Cyan (info): Status updates, progress
+
+**Why Separate Component?**: Keeping display logic separate from business logic makes it easy to replace or disable. For example, a web-based frontend could replace ClientDisplay without touching any other components.
+
+---
+
+## Support Components
+
+These components support higher-level client operations by providing status aggregation and configuration management capabilities.
+
+### StatusManager: System-Wide Status Aggregation
+
+**Purpose**: Provides consolidated views of system health and performance across all devices. While DeviceRegistry stores individual device status, StatusManager aggregates this into system-wide metrics.
+
+**Health Summary Example**:
+
+```python
+summary = status_manager.get_device_health_summary()
+# Returns:
+{
+ "total_devices": 5,
+ "connected_devices": 3,
+ "disconnected_devices": 2,
+ "connection_rate": 0.6, # 60% connected
+ "devices_by_status": {
+ "CONNECTED": 2,
+ "IDLE": 1,
+ "DISCONNECTED": 1,
+ "FAILED": 1
+ },
+ "devices_with_issues": [
+ {
+ "device_id": "device_3",
+ "issue": "multiple_connection_attempts",
+ "attempts": 4,
+ "max_retries": 5
+ }
+ ]
+}
+```
+
+**Task Statistics**:
+
+```python
+stats = status_manager.get_task_statistics()
+# Returns:
+{
+ "total_tasks_executed": 127,
+ "successful_tasks": 120,
+ "failed_tasks": 7,
+ "success_rate": 0.945,
+ "average_execution_time": 15.3, # seconds
+ "tasks_by_device": {
+ "windows_pc": 65,
+ "linux_server": 62
+ }
+}
+```
+
+**Why This Matters**: In production, you need to monitor system health. StatusManager provides the data needed for dashboards, alerts, and capacity planning. For example, if connection_rate drops below 80%, you might trigger an alert.
+
+---
+
+## How Components Work Together: A Complete Example
+
+Let's trace what happens when you call `device_manager.connect_device("windows_pc")`:
+
+**Step 1: DeviceManager Initiates Connection**
+
+```python
+# DeviceManager.connect_device()
+device_info = self.device_registry.get_device(device_id) # Get device details
+self.device_registry.update_device_status(device_id, DeviceStatus.CONNECTING) # Update status
+```
+
+**Step 2: WebSocketConnectionManager Establishes Connection**
+
+```python
+# WebSocketConnectionManager.connect_to_device()
+transport = WebSocketTransport(...)
+await transport.connect(device_info.server_url) # Create AIP transport
+self._transports[device_id] = transport # Store transport
+
+# Initialize AIP protocols for this connection
+self._registration_protocols[device_id] = RegistrationProtocol(transport)
+self._task_protocols[device_id] = TaskExecutionProtocol(transport)
+self._device_info_protocols[device_id] = DeviceInfoProtocol(transport)
+
+# ⚠️ CRITICAL: Start message handler BEFORE sending registration
+# This ensures we don't miss the server's registration response
+self.message_processor.start_message_handler(device_id, transport)
+await asyncio.sleep(0.05) # Small delay to ensure handler is listening
+
+# Register as constellation client using AIP RegistrationProtocol
+await self._register_constellation_client(device_info)
+```
+
+**Step 3: MessageProcessor Starts Background Loop**
+
+```python
+# MessageProcessor.start_message_handler()
+task = asyncio.create_task(self._handle_device_messages(device_id, transport))
+self._message_handlers[device_id] = task # Store task for later cancellation
+```
+
+Now MessageProcessor is running in the background, ready to receive messages via the AIP transport.
+
+**Step 4: Device Registration Completes**
+
+The device sends back HEARTBEAT with OK status (which serves as registration confirmation). Then WebSocketConnectionManager requests device info via `DeviceInfoProtocol`.
+
+**Step 5: DeviceRegistry Updated with System Info**
+
+```python
+# DeviceManager.connect_device() continues
+self.device_registry.update_device_system_info(device_id, device_system_info)
+self.device_registry.update_device_status(device_id, DeviceStatus.CONNECTED)
+self.device_registry.set_device_idle(device_id) # Ready for tasks
+```
+
+**Step 6: HeartbeatManager Starts Monitoring**
+
+```python
+# HeartbeatManager.start_heartbeat()
+task = asyncio.create_task(self._send_heartbeat_loop(device_id))
+self.heartbeat_tasks[device_id] = task
+```
+
+Now HeartbeatManager is running in the background, sending heartbeats every 30 seconds.
+
+**Step 7: Connection Complete**
+
+All components are now working together:
+- DeviceRegistry knows the device is IDLE and ready
+- WebSocketConnectionManager has an active AIP Transport with initialized protocols
+- MessageProcessor is listening for incoming messages via the transport
+- HeartbeatManager is monitoring connection health
+- TaskQueueManager is ready to queue tasks if device becomes busy
+
+This coordinated setup ensures reliable device communication.
+
+---
+
+## Component Dependencies
+
+Understanding component dependencies helps when debugging or extending the system:
+
+```
+DeviceManager (creates all components)
+├── DeviceRegistry (no dependencies - foundational)
+├── WebSocketConnectionManager (depends on: DeviceRegistry for task name)
+├── HeartbeatManager (depends on: WebSocketConnectionManager, DeviceRegistry)
+├── MessageProcessor (depends on: DeviceRegistry, HeartbeatManager, WebSocketConnectionManager)
+└── TaskQueueManager (no dependencies - independent)
+```
+
+**Construction Order**: DeviceManager must create components in dependency order:
+
+```python
+def __init__(self, task_name, heartbeat_interval, reconnect_delay):
+ # 1. DeviceRegistry first (no dependencies)
+ self.device_registry = DeviceRegistry()
+
+ # 2. WebSocketConnectionManager (needs task_name only)
+ self.connection_manager = WebSocketConnectionManager(task_name)
+
+ # 3. HeartbeatManager (depends on connection_manager and device_registry)
+ self.heartbeat_manager = HeartbeatManager(
+ self.connection_manager,
+ self.device_registry,
+ heartbeat_interval
+ )
+
+ # 4. MessageProcessor (depends on all previous components)
+ self.message_processor = MessageProcessor(
+ self.device_registry,
+ self.heartbeat_manager,
+ self.connection_manager
+ )
+
+ # 5. TaskQueueManager (independent)
+ self.task_queue_manager = TaskQueueManager()
+```
+
+**Why This Order Matters**: If we created MessageProcessor before HeartbeatManager, we'd get an error because MessageProcessor's constructor expects HeartbeatManager to exist. The dependency graph dictates construction order.
+
+---
+
+## Testing Components
+
+The modular design makes components easy to test in isolation:
+
+**Testing DeviceRegistry**:
+
+```python
+# No external dependencies needed
+registry = DeviceRegistry()
+registry.register_device("test_device", "ws://localhost:5000", "windows", ["test"])
+assert registry.is_device_registered("test_device")
+```
+
+**Testing WebSocketConnectionManager**:
+
+```python
+# Mock the WebSocket connection
+mock_websocket = AsyncMock()
+connection_manager = WebSocketConnectionManager("test")
+connection_manager.connections["test_device"] = mock_websocket
+
+# Test message sending
+await connection_manager.send_task_to_device("test_device", task_request)
+mock_websocket.send.assert_called_once()
+```
+
+**Testing HeartbeatManager**:
+
+```python
+# Inject mock dependencies
+mock_connection_manager = Mock()
+mock_registry = Mock()
+heartbeat_manager = HeartbeatManager(mock_connection_manager, mock_registry, 30.0)
+
+# Test heartbeat loop
+heartbeat_manager.start_heartbeat("test_device")
+await asyncio.sleep(0.1) # Let loop run
+assert mock_connection_manager.get_connection.called
+```
+
+**Why Testability Matters**: Complex systems are hard to test. By breaking DeviceManager into 5 focused components, we can write targeted unit tests for each component's specific behavior, making bugs easier to find and fix.
+
+---
+
+## Summary
+
+Galaxy Client's component architecture demonstrates several important design principles:
+
+**Single Responsibility**: Each component does one thing well. DeviceRegistry stores state, WebSocketConnectionManager handles networking, HeartbeatManager monitors health, MessageProcessor routes messages, TaskQueueManager manages queues.
+
+**Dependency Injection**: DeviceManager creates components and injects dependencies, making the system flexible and testable. Want to replace WebSocketConnectionManager with a different implementation? Just swap it out while keeping the interface.
+
+**Separation of Concerns**: Business logic (in DeviceManager) is separate from display logic (in ClientDisplay) and orchestration support (in StatusManager). Each layer can evolve independently.
+
+**Asynchronous Background Services**: HeartbeatManager and MessageProcessor run as independent asyncio tasks, enabling concurrent operations without blocking the main execution flow.
+
+This design makes Galaxy Client maintainable, extensible, and testable. When you understand how components collaborate, you can confidently modify or extend the system.
+
+## Related Documentation
+
+- [DeviceManager Reference](./device_manager.md) - See how DeviceManager orchestrates these components
+- [ConstellationClient](./constellation_client.md) - Learn how components are used in the coordination layer
+- [Overview](./overview.md) - Understand the broader Galaxy Client architecture
+- [AIP Integration](./aip_integration.md) - Learn about the message protocol components use
+- [DeviceRegistry Details](../agent_registration/device_registry.md) - Deep dive into device state management
diff --git a/documents/docs/galaxy/client/constellation_client.md b/documents/docs/galaxy/client/constellation_client.md
new file mode 100644
index 000000000..8b6cba7ca
--- /dev/null
+++ b/documents/docs/galaxy/client/constellation_client.md
@@ -0,0 +1,608 @@
+# ConstellationClient Reference
+
+ConstellationClient is the device coordination layer in Galaxy Client. It provides a clean API for registering devices, managing connections, and assigning tasks. Most applications interact with ConstellationClient rather than the lower-level DeviceManager.
+
+## Related Documentation
+
+- [Overview](./overview.md) - Overall architecture and workflow
+- [DeviceManager](./device_manager.md) - Internal connection management
+- [Components](./components.md) - Modular component details
+- [Configuration](../../configuration/system/galaxy_constellation.md) - Device configuration
+- [GalaxyClient](./galaxy_client.md) - Session wrapper on top of ConstellationClient
+
+## What ConstellationClient Does
+
+ConstellationClient implements the Facade pattern, providing a simplified interface to the complex device management system underneath. Think of it as the "device management API" for Galaxy.
+
+**Core Responsibilities:**
+
+**Device Lifecycle Management**: ConstellationClient handles the complete lifecycle of device connections. When you register a device, it stores the device information (ID, server URL, capabilities) in DeviceRegistry. When you connect, it coordinates with DeviceManager to establish WebSocket connections, perform AIP registration, and start health monitoring. When you disconnect, it cleanly tears down all resources.
+
+**Task Assignment**: When you have a task to execute, ConstellationClient determines which device should run it (based on capabilities), checks if the device is available, and delegates to DeviceManager for actual execution. It abstracts away details like task queuing when devices are busy or handling connection failures during execution.
+
+**Configuration Management**: ConstellationClient loads device configurations from YAML files or programmatic APIs, validates settings, and maintains the runtime configuration. This centralizes all configuration logic so other components don't need to worry about it.
+
+**Status Reporting**: Applications need to know what's happening with devices. ConstellationClient provides methods to query device status, get health summaries, and retrieve execution statistics. This information is aggregated from multiple components (DeviceRegistry, DeviceManager, TaskQueueManager) and presented in a unified format.
+
+**What ConstellationClient Does NOT Do:**
+
+- **DAG Planning**: Task decomposition is handled by ConstellationAgent
+- **DAG Execution**: Coordinating task dependencies is handled by TaskConstellationOrchestrator
+- **Session Management**: Multi-round interactions are handled by GalaxySession
+- **Low-Level Connection Management**: WebSocket lifecycle is handled by DeviceManager
+
+This separation of concerns keeps ConstellationClient focused on device-level operations.
+
+## Initialization
+
+### Constructor
+
+```python
+def __init__(
+ self,
+ config: Optional[ConstellationConfig] = None,
+ task_name: Optional[str] = None,
+):
+ """
+ Initialize ConstellationClient with configuration.
+
+ Args:
+ config: Device configuration (creates default if None)
+ task_name: Override task name from config
+ """
+```
+
+When you create a ConstellationClient, it performs these initialization steps:
+
+1. **Load or Create Configuration**: If you provide a `config` parameter, it uses that. Otherwise, it creates a default `ConstellationConfig` object. This config contains device information, heartbeat settings, and other parameters.
+
+2. **Override Task Name**: If you provide `task_name`, it overrides the task name from the configuration. The task name identifies this constellation instance in logs and messages.
+
+3. **Create DeviceManager**: ConstellationClient creates an internal DeviceManager instance, passing the task name and connection settings (heartbeat interval, reconnect delay). DeviceManager is the component that actually manages connections.
+
+**Initialization Examples:**
+
+```python
+# Simple: Use default configuration
+client = ConstellationClient()
+
+# Load configuration from YAML
+config = ConstellationConfig.from_yaml("config/devices.yaml")
+client = ConstellationClient(config=config)
+
+# Override task name for this instance
+client = ConstellationClient(
+ config=config,
+ task_name="data_processing_pipeline"
+)
+```
+
+The task name appears in logs and helps identify which constellation instance generated which messages, which is useful when running multiple constellations simultaneously.
+
+### Async Initialize Method
+
+```python
+async def initialize(self) -> Dict[str, bool]:
+ """
+ Register and optionally connect all devices from configuration.
+
+ Returns:
+ Dictionary mapping device_id to registration success status
+ """
+```
+
+After creating a ConstellationClient, you must call `initialize()` before using it. This method processes all devices defined in the configuration:
+
+**Registration Process:**
+
+For each device in the configuration, `initialize()` calls `register_device_from_config()`, which:
+
+1. Extracts device parameters (device_id, server_url, os, capabilities, metadata)
+2. Calls DeviceManager to register the device
+3. If `auto_connect: true` is set, immediately connects to the device
+
+**Auto-Connect Behavior:**
+
+The `auto_connect` flag in configuration determines whether devices connect during initialization or wait for explicit `connect_device()` calls. Auto-connect is convenient for simple scenarios but may not be suitable if you need fine-grained control over connection timing.
+
+**Return Value:**
+
+The method returns a dictionary showing which devices successfully registered:
+
+```python
+results = await client.initialize()
+# Example: {"windows_pc": True, "linux_server": True, "failed_device": False}
+
+# Check for failures
+failed = [device_id for device_id, success in results.items() if not success]
+if failed:
+ print(f"Failed to register: {failed}")
+```
+
+**Typical Initialization Flow:**
+
+```mermaid
+sequenceDiagram
+ participant App
+ participant CC as ConstellationClient
+ participant DM as DeviceManager
+ participant Server as Agent Server
+
+ App->>CC: ConstellationClient(config)
+ CC->>CC: Create DeviceManager
+
+ App->>CC: initialize()
+
+ loop For each device in config
+ CC->>DM: register_device()
+ DM->>DM: Store in DeviceRegistry
+
+ alt auto_connect = true
+ DM->>Server: WebSocket connect
+ Server-->>DM: Connection established
+ DM->>Server: REGISTER (AIP)
+ Server-->>DM: REGISTER_CONFIRMATION
+ DM->>Server: DEVICE_INFO_REQUEST
+ Server-->>DM: Device telemetry
+ DM->>DM: Start heartbeat & message handler
+ end
+
+ DM-->>CC: Success/failure
+ end
+
+ CC-->>App: {"device1": true, "device2": true}
+```
+
+This diagram shows the initialization sequence. For each configured device, ConstellationClient delegates to DeviceManager, which handles the low-level connection setup if auto-connect is enabled.
+
+## Device Management Methods
+
+### Register Device
+
+```python
+async def register_device(
+ self,
+ device_id: str,
+ server_url: str,
+ capabilities: Optional[List[str]] = None,
+ metadata: Optional[Dict[str, Any]] = None,
+ auto_connect: bool = True,
+) -> bool:
+```
+
+This method registers a device programmatically (outside of configuration). It's useful for dynamically adding devices at runtime.
+
+!!! warning "Known Limitation"
+ The current implementation does not pass the OS parameter to the underlying `DeviceManager`. For proper device registration with OS information, use configuration-based registration via `register_device_from_config()` or ensure the OS is included in the device metadata.
+
+**Parameters Explained:**
+
+- **device_id**: Unique identifier for the device. Used in all subsequent operations.
+- **server_url**: WebSocket endpoint of the Agent Server (e.g., `ws://192.168.1.100:5000/ws`)
+- **capabilities**: List of capabilities this device provides (e.g., `["office", "web", "email"]`)
+- **metadata**: Additional device properties (e.g., `{"location": "datacenter", "gpu": "RTX 4090"}`)
+- **auto_connect**: Whether to immediately connect after registration
+
+**Usage Example:**
+
+```python
+# Register a Windows device with Office capabilities
+success = await client.register_device(
+ device_id="workstation_001",
+ server_url="ws://192.168.1.50:5000/ws",
+ capabilities=["office", "web", "email"],
+ metadata={"location": "office", "user": "john"},
+ auto_connect=True
+)
+
+if success:
+ print("Device registered and connected")
+else:
+ print("Registration failed")
+```
+
+### Connect and Disconnect
+
+```python
+async def connect_device(self, device_id: str) -> bool:
+ """Connect to a registered device."""
+
+async def disconnect_device(self, device_id: str) -> bool:
+ """Disconnect from a device."""
+
+async def connect_all_devices(self) -> Dict[str, bool]:
+ """Connect to all registered devices."""
+
+async def disconnect_all_devices(self) -> None:
+ """Disconnect from all devices."""
+```
+
+These methods control device connections. You might disconnect devices to save resources or reconnect after configuration changes.
+
+**Connection Example:**
+
+```python
+# Connect to specific device
+await client.connect_device("windows_pc")
+
+# Connect to all registered devices
+results = await client.connect_all_devices()
+print(f"Connected to {sum(results.values())} devices")
+
+# Disconnect when done
+await client.disconnect_device("windows_pc")
+```
+
+Connection establishment involves WebSocket handshake, AIP registration, device info exchange, and starting background monitoring services (heartbeat and message processing).
+
+## Task Execution
+
+### Assign Task to Device
+
+While ConstellationClient doesn't expose a direct `assign_task_to_device()` method in its public API (that's internal to DeviceManager), it's used by higher-level orchestrators like TaskConstellationOrchestrator. Understanding how task assignment works helps you understand the system:
+
+**Task Assignment Process:**
+
+1. **Device Status Check**: DeviceManager checks if the target device is IDLE or BUSY
+2. **Immediate Execution**: If IDLE, the task executes immediately
+3. **Queuing**: If BUSY, the task enters the device's queue
+4. **Task Transmission**: WebSocketConnectionManager sends TASK message via AIP
+5. **Result Waiting**: MessageProcessor waits for TASK_END message
+6. **Completion**: Device returns to IDLE, next queued task starts
+
+**Why Task Assignment is Internal:**
+
+ConstellationClient focuses on device management, not task orchestration. Task assignment is exposed through higher-level APIs:
+
+- TaskConstellationOrchestrator assigns tasks based on DAG dependencies
+- GalaxySession coordinates multi-round task execution
+- Direct device-level task assignment is available through DeviceManager if needed
+
+This layering ensures each component has a clear responsibility.
+
+## Status and Information
+
+### Get Device Status
+
+```python
+def get_device_status(self, device_id: Optional[str] = None) -> Dict[str, Any]:
+ """
+ Get device status information.
+
+ If device_id is provided, returns status for that device.
+ If device_id is None, returns status for all connected devices.
+ """
+```
+
+Device status includes:
+
+```python
+{
+ "device_id": "windows_pc",
+ "status": "IDLE", # DISCONNECTED/CONNECTING/CONNECTED/IDLE/BUSY/FAILED
+ "server_url": "ws://192.168.1.100:5000/ws",
+ "capabilities": ["office", "web"],
+ "last_heartbeat": "2025-11-06T10:30:45",
+ "connection_attempts": 1,
+ "max_retries": 5,
+ "current_task_id": None, # Task ID if device is BUSY
+ "queued_tasks": 0, # Number of queued tasks
+ "system_info": { # From device telemetry
+ "cpu_count": 8,
+ "memory_gb": 32,
+ "os_version": "Windows 11",
+ ...
+ }
+}
+```
+
+The status provides a comprehensive view of device health and activity, useful for monitoring dashboards or debugging connection issues.
+
+### Get Connected Devices
+
+```python
+def get_connected_devices(self) -> List[str]:
+ """Get list of device IDs that are currently connected."""
+```
+
+Returns a list of device IDs in CONNECTED, IDLE, or BUSY status. Useful for determining which devices are available for task assignment.
+
+```python
+connected = client.get_connected_devices()
+print(f"Available devices: {', '.join(connected)}")
+
+# Check if specific device is connected
+if "windows_pc" in connected:
+ # Assign task to this device
+ ...
+```
+
+### Get Constellation Info
+
+```python
+def get_constellation_info(self) -> Dict[str, Any]:
+ """Get overall constellation status and configuration."""
+```
+
+Returns constellation-level information:
+
+```python
+{
+ "constellation_id": "production_constellation",
+ "connected_devices": 3, # Number currently connected
+ "total_devices": 5, # Total registered devices
+ "configuration": {
+ "heartbeat_interval": 30.0,
+ "reconnect_delay": 5.0,
+ "max_concurrent_tasks": 10
+ }
+}
+```
+
+This provides a high-level view of the entire constellation, useful for monitoring overall system health.
+
+## Configuration Management
+
+### Validate Configuration
+
+```python
+def validate_config(self, config: Optional[ConstellationConfig] = None) -> Dict[str, Any]:
+ """
+ Validate constellation configuration.
+
+ Checks:
+ - task_name is provided
+ - devices are configured
+ - settings are in valid ranges
+ """
+```
+
+Validation catches configuration errors early:
+
+```python
+result = client.validate_config()
+
+if not result["valid"]:
+ print("Configuration errors:")
+ for error in result["errors"]:
+ print(f" - {error}")
+
+if result["warnings"]:
+ print("Warnings:")
+ for warning in result["warnings"]:
+ print(f" - {warning}")
+```
+
+### Get Configuration Summary
+
+```python
+def get_config_summary(self) -> Dict[str, Any]:
+ """Get summary of current configuration."""
+```
+
+Returns a human-readable configuration summary:
+
+```python
+{
+ "task_name": "production_constellation",
+ "devices_count": 3,
+ "devices": [
+ {
+ "device_id": "windows_pc",
+ "server_url": "ws://192.168.1.100:5000/ws",
+ "capabilities": ["office", "web"],
+ "auto_connect": true
+ },
+ ...
+ ],
+ "settings": {
+ "heartbeat_interval": 30.0,
+ "reconnect_delay": 5.0,
+ "max_concurrent_tasks": 10
+ }
+}
+```
+
+### Add Device to Configuration
+
+```python
+async def add_device_to_config(
+ self,
+ device_id: str,
+ server_url: str,
+ capabilities: Optional[List[str]] = None,
+ metadata: Optional[Dict[str, Any]] = None,
+ auto_connect: bool = True,
+ register_immediately: bool = True,
+) -> bool:
+```
+
+Dynamically adds a device to the configuration and optionally registers it:
+
+```python
+# Add device to config and register
+await client.add_device_to_config(
+ device_id="new_device",
+ server_url="ws://192.168.1.200:5000/ws",
+ capabilities=["database"],
+ register_immediately=True # Register right away
+)
+
+# Add to config only, register later
+await client.add_device_to_config(
+ device_id="staging_device",
+ server_url="ws://staging.example.com:5000/ws",
+ register_immediately=False # Just update config
+)
+```
+
+This is useful for dynamic device discovery scenarios where devices are added at runtime.
+
+## Lifecycle Management
+
+### Shutdown
+
+```python
+async def shutdown(self) -> None:
+ """
+ Gracefully shutdown the constellation client.
+
+ Stops all background services and disconnects all devices.
+ """
+```
+
+Shutdown performs cleanup in this order:
+
+1. **Stop Task Queues**: Cancel all queued tasks across all devices
+2. **Stop Message Handlers**: Stop MessageProcessor loops for all devices
+3. **Stop Heartbeats**: Stop HeartbeatManager loops for all devices
+4. **Disconnect Devices**: Close WebSocket connections to all devices
+5. **Cancel Reconnection Tasks**: Cancel any pending reconnection attempts
+
+**Proper Shutdown Example:**
+
+```python
+try:
+ client = ConstellationClient(config)
+ await client.initialize()
+
+ # Use the client
+ ...
+
+finally:
+ # Always shutdown to cleanup resources
+ await client.shutdown()
+```
+
+Without proper shutdown, background tasks continue running, WebSocket connections remain open, and resources leak.
+
+## Usage Patterns
+
+### Basic Device Management
+
+```python
+# Create and initialize client
+client = ConstellationClient()
+await client.initialize()
+
+# Check which devices connected
+connected = client.get_connected_devices()
+print(f"Connected: {connected}")
+
+# Get status for specific device
+status = client.get_device_status("windows_pc")
+print(f"Status: {status['status']}, Tasks queued: {status['queued_tasks']}")
+
+# Shutdown when done
+await client.shutdown()
+```
+
+### Dynamic Device Addition
+
+```python
+# Start with base configuration
+client = ConstellationClient(base_config)
+await client.initialize()
+
+# Discover new device at runtime
+new_device_info = await discover_device()
+
+# Add and connect
+await client.add_device_to_config(
+ device_id=new_device_info["id"],
+ server_url=new_device_info["url"],
+ capabilities=new_device_info["capabilities"],
+ register_immediately=True
+)
+
+# Verify connection
+if new_device_info["id"] in client.get_connected_devices():
+ print("New device ready")
+```
+
+### Health Monitoring
+
+```python
+import asyncio
+
+async def monitor_health(client):
+ """Continuously monitor device health."""
+ while True:
+ info = client.get_constellation_info()
+
+ # Check connection rate
+ connection_rate = info["connected_devices"] / info["total_devices"]
+ if connection_rate < 0.8: # Less than 80% connected
+ print(f"Warning: Only {connection_rate:.0%} devices connected")
+
+ # Check individual device health
+ for device_id in client.get_connected_devices():
+ status = client.get_device_status(device_id)
+
+ # Check heartbeat freshness
+ last_hb = datetime.fromisoformat(status["last_heartbeat"])
+ age = datetime.now() - last_hb
+ if age.total_seconds() > 60: # No heartbeat in 60 seconds
+ print(f"Warning: {device_id} heartbeat stale")
+
+ await asyncio.sleep(30) # Check every 30 seconds
+```
+
+## Integration with Other Components
+
+### Used by GalaxyClient
+
+GalaxyClient wraps ConstellationClient for session management:
+
+```python
+class GalaxyClient:
+ def __init__(self, ...):
+ # Create internal ConstellationClient
+ self._client = ConstellationClient(config, task_name)
+
+ async def initialize(self):
+ # Initialize ConstellationClient
+ await self._client.initialize()
+
+ async def process_request(self, request):
+ # Use ConstellationClient for device coordination
+ # while GalaxySession handles task orchestration
+ session = GalaxySession(client=self._client, ...)
+ await session.run()
+```
+
+### Used by TaskConstellationOrchestrator
+
+TaskConstellationOrchestrator uses ConstellationClient's DeviceManager for task assignment:
+
+```python
+# Orchestrator assigns tasks to devices based on capabilities
+for task in dag.tasks:
+ device_id = select_device_for_task(task)
+
+ # Assign through DeviceManager (internal to ConstellationClient)
+ result = await constellation_client.device_manager.assign_task_to_device(
+ task_id=task.id,
+ device_id=device_id,
+ task_description=task.description,
+ task_data=task.data
+ )
+```
+
+## Summary
+
+ConstellationClient is the primary interface for device management in Galaxy Client. It provides:
+
+- **Simple API**: Clean methods for registration, connection, status queries
+- **Configuration Management**: Load from files, validate, modify at runtime
+- **Delegation**: Hides complexity of DeviceManager and its components
+- **Focused Scope**: Device management only, not DAG planning or session management
+
+For most applications, ConstellationClient (or GalaxyClient which wraps it) is all you need. Only advanced scenarios require working directly with DeviceManager or its components.
+
+**Next Steps:**
+
+- See [DeviceManager](./device_manager.md) for low-level connection management details
+- See [Components](./components.md) for modular component architecture
+- See [Overview](./overview.md) for overall system architecture
+- See [GalaxyClient](./galaxy_client.md) for session-level API
diff --git a/documents/docs/galaxy/client/device_manager.md b/documents/docs/galaxy/client/device_manager.md
new file mode 100644
index 000000000..652fad149
--- /dev/null
+++ b/documents/docs/galaxy/client/device_manager.md
@@ -0,0 +1,968 @@
+# DeviceManager Reference
+
+DeviceManager is the connection orchestration layer in Galaxy Client. While ConstellationClient provides the high-level device management API, DeviceManager handles the low-level details of WebSocket connections, health monitoring, message routing, and task queuing.
+
+## Related Documentation
+
+- [Overview](./overview.md) - Overall Galaxy Client architecture and workflow
+- [ConstellationClient](./constellation_client.md) - High-level device management API
+- [Components](./components.md) - Detailed documentation for each DeviceManager component
+- [AIP Integration](./aip_integration.md) - Protocol details and message flows
+
+---
+
+## What DeviceManager Does
+
+DeviceManager acts as the orchestration coordinator, managing the lifecycle of device connections from initial registration through task execution to disconnection. It doesn't perform these operations itself; instead, it coordinates five specialized components to handle different aspects of device management.
+
+**Orchestration Philosophy:**
+
+DeviceManager follows the Coordinator pattern. When you call `register_device()`, DeviceManager doesn't directly store device information—it delegates to DeviceRegistry. When you call `connect_device()`, DeviceManager doesn't create WebSocket connections itself—it delegates to WebSocketConnectionManager. When a device sends a message, DeviceManager doesn't process it—MessageProcessor handles that.
+
+This separation of concerns makes each component focused and testable. DeviceManager simply coordinates the flow of operations across components.
+
+**Core Responsibilities:**
+
+**Device Registration**: When a device registers, DeviceManager creates an AgentProfile containing device metadata (ID, server URL, capabilities, OS) and delegates to DeviceRegistry for storage. DeviceRegistry becomes the single source of truth for device state.
+
+**Connection Establishment**: When you connect to a device, DeviceManager coordinates multiple steps: WebSocketConnectionManager establishes the WebSocket connection, MessageProcessor sends the REGISTER message per AIP protocol, DeviceManager requests device telemetry, and HeartbeatManager starts background health monitoring.
+
+**Disconnection Handling**: When a device disconnects (intentionally or due to failure), DeviceManager coordinates cleanup: HeartbeatManager stops health checks, MessageProcessor stops the message handling loop, WebSocketConnectionManager closes the WebSocket, TaskQueueManager clears pending tasks, and DeviceRegistry updates device status.
+
+**Reconnection Logic**: For network failures, DeviceManager implements exponential backoff reconnection. It tracks connection attempts, waits progressively longer between retries (5s, 10s, 20s, ...), and gives up after max retries. Reconnection happens automatically without user intervention.
+
+**Task Assignment Coordination**: When assigning a task, DeviceManager checks device status via DeviceRegistry, queues tasks via TaskQueueManager if the device is busy, and delegates execution to MessageProcessor when the device becomes available.
+
+**What DeviceManager Does NOT Do:**
+
+- **WebSocket I/O**: Handled by WebSocketConnectionManager
+- **Health Monitoring**: Handled by HeartbeatManager
+- **Message Processing**: Handled by MessageProcessor
+- **Device State Storage**: Handled by DeviceRegistry
+- **Task Queuing**: Handled by TaskQueueManager
+
+DeviceManager coordinates these components but doesn't duplicate their functionality.
+
+---
+
+## Component Architecture
+
+DeviceManager uses a modular architecture with five components, each responsible for a specific aspect of device management:
+
+```
+DeviceManager (Orchestrator)
+ |
+ +-- DeviceRegistry (Device State)
+ | Stores AgentProfiles, device status
+ |
+ +-- WebSocketConnectionManager (Connection Lifecycle)
+ | Establishes/closes WebSocket connections
+ |
+ +-- HeartbeatManager (Health Monitoring)
+ | Sends periodic heartbeats, detects failures
+ |
+ +-- MessageProcessor (Message Routing)
+ | Routes AIP messages, handles responses
+ |
+ +-- TaskQueueManager (Task Queuing)
+ Queues tasks when devices busy
+```
+
+**Why This Architecture?**
+
+**Single Responsibility**: Each component has one job. DeviceRegistry manages state, WebSocketConnectionManager manages connections, HeartbeatManager monitors health. This makes each component easy to understand, test, and modify.
+
+**Testability**: You can test each component in isolation. Mock DeviceRegistry to test connection logic. Mock WebSocketConnectionManager to test message processing. This simplifies unit testing.
+
+**Extensibility**: Adding new functionality means adding or modifying a single component. Need different health monitoring? Replace HeartbeatManager. Need different queuing strategies? Modify TaskQueueManager. Other components remain unchanged.
+
+**Clarity**: When debugging, you know where to look. Connection failures? Check WebSocketConnectionManager. Missed heartbeats? Check HeartbeatManager. Status inconsistencies? Check DeviceRegistry.
+
+**Component Interactions:**
+
+Components interact through DeviceManager as the coordinator:
+
+1. **Registration Flow**: DeviceManager → DeviceRegistry (store profile)
+2. **Connection Flow**: DeviceManager → WebSocketConnectionManager (connect) → MessageProcessor (send REGISTER) → DeviceRegistry (update status) → HeartbeatManager (start monitoring)
+3. **Task Assignment Flow**: DeviceManager → DeviceRegistry (check status) → TaskQueueManager (queue if busy) → MessageProcessor (send TASK)
+4. **Disconnection Flow**: DeviceManager → HeartbeatManager (stop) → MessageProcessor (stop) → WebSocketConnectionManager (close) → TaskQueueManager (clear) → DeviceRegistry (update status)
+
+The coordinator pattern ensures components don't directly depend on each other, reducing coupling.
+
+---
+
+## Initialization
+
+### Constructor
+
+```python
+def __init__(
+ self,
+ task_name: str = "test_task",
+ heartbeat_interval: float = 30.0,
+ reconnect_delay: float = 5.0,
+):
+ """
+ Initialize DeviceManager.
+
+ Args:
+ task_name: Identifier for this constellation instance (default "test_task")
+ heartbeat_interval: Seconds between heartbeat checks (default 30s)
+ reconnect_delay: Initial delay before reconnection attempt (default 5s)
+ """
+```
+
+When you create a DeviceManager, it initializes the five components:
+
+1. **Create DeviceRegistry**: Initializes empty device storage
+2. **Create WebSocketConnectionManager**: Prepares connection handling infrastructure
+3. **Create HeartbeatManager**: Creates heartbeat scheduler with specified interval
+4. **Create MessageProcessor**: Creates message routing infrastructure
+5. **Create TaskQueueManager**: Creates per-device task queues
+6. **Store Configuration**: Saves task_name, reconnect settings for later use
+
+**Parameter Explanations:**
+
+**task_name**: This identifier appears in log messages and helps distinguish between multiple constellation instances running simultaneously. For example, "production_constellation" vs "test_constellation".
+
+**heartbeat_interval**: How often (in seconds) HeartbeatManager checks device health. Lower values (e.g., 10s) detect failures faster but increase network traffic. Higher values (e.g., 60s) reduce overhead but delay failure detection. Default 30s balances responsiveness and efficiency.
+
+**reconnect_delay**: Initial delay before first reconnection attempt. DeviceManager uses exponential backoff, so subsequent delays double: 5s, 10s, 20s, 40s, 80s. Lower values reconnect faster but may overwhelm unstable networks. Higher values give networks more recovery time.
+
+**max_retries**: The maximum number of reconnection attempts is configured per-device during registration via the `max_retries` parameter (default 5) in `AgentProfile`. This allows different devices to have different retry limits based on their reliability characteristics.
+
+---
+
+## Device Lifecycle Methods
+
+### Register Device
+
+```python
+async def register_device(
+ self,
+ device_id: str,
+ server_url: str,
+ os: str,
+ capabilities: Optional[List[str]] = None,
+ metadata: Optional[Dict[str, Any]] = None,
+ max_retries: int = 5,
+ auto_connect: bool = True,
+) -> bool:
+ """
+ Register a device for management.
+
+ Creates an AgentProfile and stores it in DeviceRegistry.
+ Does NOT establish connection; use connect_device() for that.
+ """
+```
+
+Registration stores device information without connecting. This separation allows you to register all devices at startup but connect selectively based on runtime conditions.
+
+**Registration Process:**
+
+1. **Create AgentProfile**: DeviceManager creates an AgentProfile object containing:
+ - `device_id`: Unique identifier
+ - `server_url`: WebSocket endpoint
+ - `os`: Operating system (Windows, Linux, macOS)
+ - `capabilities`: List of capability tags (e.g., ["office", "web", "email"])
+ - `metadata`: Arbitrary key-value data (e.g., {"location": "datacenter", "gpu": "RTX 4090"})
+ - `status`: Initially set to DISCONNECTED
+
+2. **Store in DeviceRegistry**: DeviceManager delegates to DeviceRegistry, which:
+ - Validates device_id is unique
+ - Stores the AgentProfile
+ - Initializes device status to DISCONNECTED
+
+3. **Return Success**: Returns True if registration succeeds, False if device_id already exists
+
+**When Registration Fails:**
+
+Registration fails if:
+- Device ID already registered (must use unique IDs)
+- Invalid server URL format
+- Validation errors in AgentProfile creation
+
+**Example:**
+
+```python
+# Register device without connecting
+success = await device_manager.register_device(
+ device_id="office_pc",
+ server_url="ws://192.168.1.100:5000/ws",
+ os="Windows",
+ capabilities=["office", "web"],
+ metadata={"location": "office_building_a", "user": "john"}
+)
+
+if success:
+ print("Device registered, ready to connect")
+else:
+ print("Registration failed (ID already exists?)")
+```
+
+### Connect Device
+
+```python
+async def connect_device(self, device_id: str, is_reconnection: bool = False) -> bool:
+ """
+ Establish connection to a registered device.
+
+ Performs WebSocket handshake, AIP registration, device info exchange,
+ and starts background monitoring services.
+ """
+```
+
+Connection is a multi-step process involving several components working together:
+
+**Step 1: Verify Registration**
+
+DeviceManager queries DeviceRegistry to verify the device is registered. If not registered, connection fails immediately.
+
+**Step 2: WebSocket Connection**
+
+DeviceManager delegates to WebSocketConnectionManager, passing the MessageProcessor to start message handling before registration (to avoid race conditions):
+
+```python
+# Connect and automatically start message handler
+await connection_manager.connect_to_device(
+ device_info,
+ message_processor=self.message_processor
+)
+```
+
+WebSocketConnectionManager creates an AIP `WebSocketTransport`, establishes the connection, starts the message handler (via MessageProcessor), and performs AIP registration using `RegistrationProtocol`.
+
+**Step 3: Update Status and Start Heartbeat**
+
+After WebSocket connects successfully:
+
+```python
+# Update status to CONNECTED
+device_registry.update_device_status(device_id, DeviceStatus.CONNECTED)
+device_registry.update_heartbeat(device_id)
+
+# Start heartbeat monitoring
+heartbeat_manager.start_heartbeat(device_id)
+```
+
+Note: The message handler was already started in `connect_to_device()` to prevent race conditions.
+
+**Step 4: Device Info Exchange**
+
+DeviceManager requests device system information from the server (the device pushes its info during registration, server stores it):
+
+```python
+device_system_info = await connection_manager.request_device_info(device_id)
+if device_system_info:
+ device_registry.update_device_system_info(device_id, device_system_info)
+```
+
+Device info includes CPU count, memory, OS version, screen resolution, and other system details stored in the AgentProfile.
+
+**Step 5: Set Device to IDLE**
+
+DeviceManager updates device status to ready for tasks:
+
+```python
+device_registry.set_device_idle(device_id)
+```
+
+Device is now ready to accept tasks. Note that HeartbeatManager was already started in Step 3, and MessageProcessor's message handler was started automatically during the WebSocket connection in Step 2.
+
+**Connection Sequence Diagram:**
+
+```mermaid
+sequenceDiagram
+ participant DM as DeviceManager
+ participant DR as DeviceRegistry
+ participant WSM as WebSocketConnectionManager
+ participant MP as MessageProcessor
+ participant HM as HeartbeatManager
+ participant Server as Agent Server
+
+ DM->>DR: Get device profile
+ DR-->>DM: AgentProfile
+
+ DM->>WSM: connect_to_device(device_info, message_processor)
+ WSM->>Server: WebSocket handshake (via AIP Transport)
+ Server-->>WSM: Connection established
+
+ Note over WSM,MP: CRITICAL: Start message handler BEFORE registration
+ WSM->>MP: start_message_handler(device_id, transport)
+ MP-->>MP: Start background message listener
+
+ WSM->>Server: REGISTER (via RegistrationProtocol)
+ Server-->>WSM: HEARTBEAT (OK status = registration confirmed)
+ WSM-->>DM: Connection successful
+
+ DM->>DR: update_device_status(CONNECTED)
+ DM->>DR: update_heartbeat()
+
+ DM->>HM: start_heartbeat(device_id)
+ HM-->>HM: Start background heartbeat loop
+
+ DM->>WSM: request_device_info(device_id)
+ WSM->>Server: DEVICE_INFO_REQUEST
+ Server-->>WSM: DEVICE_INFO_RESPONSE
+ WSM-->>DM: Device system info
+
+ DM->>DR: update_device_system_info()
+ DM->>DR: set_device_idle()
+
+ DM-->>DM: Connection complete
+```
+
+This diagram shows the entire connection flow, from initial WebSocket handshake through AIP registration to background service startup.
+
+**When Connection Fails:**
+
+Connection can fail at multiple points:
+
+- **WebSocket Failure**: Network unreachable, server not running, firewall blocking
+- **Registration Failure**: Server rejects device (invalid credentials, server full)
+- **Timeout**: Server doesn't respond within timeout period
+- **Protocol Error**: Server sends unexpected message format
+
+When connection fails, DeviceManager:
+
+1. Closes WebSocket if partially connected
+2. Updates device status to FAILED
+3. Schedules reconnection attempt (if retries remain)
+
+### Disconnect Device
+
+```python
+async def disconnect_device(self, device_id: str) -> None:
+ """
+ Disconnect from a device and cleanup resources.
+
+ Stops background services, closes WebSocket, and updates status.
+ """
+```
+
+Disconnection performs cleanup in reverse order of connection:
+
+**Step 1: Stop Heartbeat**
+
+```python
+await heartbeat_manager.stop_heartbeat(device_id)
+```
+
+This cancels the background heartbeat task, preventing further heartbeat messages.
+
+**Step 2: Stop Message Handler**
+
+```python
+await message_processor.stop_message_handler(device_id)
+```
+
+This cancels the background message listener task, preventing further message processing.
+
+**Step 3: Clear Task Queue**
+
+```python
+task_queue_manager.clear_queue(device_id)
+```
+
+Any queued tasks are cancelled. In-progress tasks are allowed to complete (graceful shutdown).
+
+**Step 4: Close WebSocket**
+
+```python
+await websocket_connection_manager.disconnect(device_id)
+```
+
+This sends WebSocket CLOSE frame and closes the connection.
+
+**Step 5: Update Status**
+
+```python
+device_registry.update_status(device_id, DeviceStatus.DISCONNECTED)
+```
+
+Device status becomes DISCONNECTED, indicating it's no longer available.
+
+**Graceful vs Forceful Disconnection:**
+
+Current implementation is graceful: it waits for in-progress tasks to complete before closing the connection. For forceful disconnection (immediate shutdown), you would:
+
+1. Cancel in-progress tasks
+2. Clear task queue
+3. Close WebSocket immediately without waiting
+
+---
+
+## Task Assignment
+
+### Assign Task to Device
+
+```python
+async def assign_task_to_device(
+ self,
+ task_id: str,
+ device_id: str,
+ task_description: str,
+ task_data: Dict[str, Any],
+ timeout: float = 1000,
+) -> ExecutionResult:
+ """
+ Assign a task to a device for execution.
+
+ If device is IDLE, executes immediately.
+ If device is BUSY, queues task for later execution.
+ """
+```
+
+Task assignment involves checking device status, potentially queuing, and sending the TASK message:
+
+**Step 1: Check Device Status**
+
+```python
+profile = device_registry.get_device(device_id)
+status = profile.status
+```
+
+Device must be CONNECTED, IDLE, or BUSY. If DISCONNECTED or FAILED, task assignment fails immediately.
+
+**Step 2: Queue if Busy**
+
+```python
+if status == DeviceStatus.BUSY:
+ # Add to queue
+ task_queue_manager.add_task(
+ device_id=device_id,
+ task_id=task_id,
+ task_description=task_description,
+ task_data=task_data
+ )
+ return {"status": "queued", "task_id": task_id}
+```
+
+TaskQueueManager maintains per-device FIFO queues. When the device completes its current task, TaskQueueManager automatically assigns the next queued task.
+
+**Step 3: Execute Immediately**
+
+```python
+if status == DeviceStatus.IDLE:
+ # Update status to BUSY
+ device_registry.update_status(device_id, DeviceStatus.BUSY)
+
+ # Send TASK message
+ await message_processor.send_message(
+ device_id=device_id,
+ message_type="TASK",
+ payload={
+ "task_id": task_id,
+ "description": task_description,
+ "data": task_data
+ }
+ )
+
+ # Wait for TASK_END
+ result = await message_processor.wait_for_response(
+ device_id=device_id,
+ message_type="TASK_END",
+ timeout=1000.0 # Default timeout
+ )
+
+ # Update status back to IDLE
+ device_registry.update_status(device_id, DeviceStatus.IDLE)
+
+ # Execute next queued task if any
+ next_task = task_queue_manager.get_next_task(device_id)
+ if next_task:
+ await self.assign_task_to_device(**next_task)
+
+ return result
+```
+
+This flow ensures devices never have more than one task executing at a time, preventing resource contention.
+
+**Task Assignment Sequence:**
+
+```mermaid
+sequenceDiagram
+ participant App
+ participant DM as DeviceManager
+ participant DR as DeviceRegistry
+ participant TQM as TaskQueueManager
+ participant MP as MessageProcessor
+ participant Device
+
+ App->>DM: assign_task_to_device(task_id, device_id, ...)
+
+ DM->>DR: get_device(device_id)
+ DR-->>DM: AgentProfile (status=IDLE)
+
+ DM->>DR: update_status(BUSY)
+
+ DM->>MP: send_message(TASK)
+ MP->>Device: TASK message
+
+ Device-->>Device: Execute task
+
+ Device->>MP: TASK_END
+ MP-->>DM: Task result
+
+ DM->>DR: update_status(IDLE)
+
+ DM->>TQM: get_next_task(device_id)
+
+ alt Queue has tasks
+ TQM-->>DM: Next task
+ DM->>DM: assign_task_to_device (recursive)
+ else Queue empty
+ TQM-->>DM: None
+ end
+
+ DM-->>App: Task result
+```
+
+This diagram shows the complete task assignment flow, including automatic processing of queued tasks after completion.
+
+**Task Timeout Handling:**
+
+If a task doesn't complete within the timeout period (default 1000 seconds):
+
+1. MessageProcessor raises TimeoutError
+2. DeviceManager marks device as FAILED
+3. DeviceManager attempts reconnection
+4. Queued tasks remain in queue and execute after reconnection
+
+---
+
+## Disconnection and Reconnection
+
+### Handle Device Disconnection
+
+```python
+async def _handle_device_disconnection(
+ self,
+ device_id: str,
+ reason: str = "unknown",
+) -> None:
+ """
+ Internal handler for unexpected disconnections.
+
+ Performs cleanup and initiates reconnection if retries remain.
+ """
+```
+
+When a device disconnects unexpectedly (network failure, server crash, heartbeat timeout), DeviceManager performs cleanup and attempts reconnection:
+
+**Step 1: Log Disconnection**
+
+```python
+logger.warning(f"Device {device_id} disconnected: {reason}")
+```
+
+Reason indicates why disconnection occurred: "heartbeat_timeout", "websocket_error", "protocol_error", etc.
+
+**Step 2: Cleanup Resources**
+
+Same as `disconnect_device()`:
+- Stop heartbeat
+- Stop message handler
+- Close WebSocket
+- Update status to FAILED
+
+**Step 3: Check Reconnection Eligibility**
+
+```python
+profile = device_registry.get_device(device_id)
+attempts = profile.connection_attempts
+
+if attempts < max_retries:
+ # Schedule reconnection
+ await self._schedule_reconnection(device_id)
+else:
+ # Give up
+ logger.error(f"Device {device_id} exceeded max retries ({max_retries})")
+ device_registry.update_status(device_id, DeviceStatus.FAILED)
+```
+
+DeviceRegistry tracks connection attempts per device. If max retries exceeded, DeviceManager gives up and marks device as permanently failed.
+
+**Step 4: Schedule Reconnection**
+
+```python
+async def _schedule_reconnection(self, device_id: str) -> None:
+ """Schedule reconnection with exponential backoff."""
+ profile = device_registry.get_device(device_id)
+ attempts = profile.connection_attempts
+
+ # Calculate delay: 5s, 10s, 20s, 40s, 80s
+ delay = reconnect_delay * (2 ** attempts)
+
+ logger.info(f"Reconnecting to {device_id} in {delay}s (attempt {attempts+1}/{max_retries})")
+
+ # Wait
+ await asyncio.sleep(delay)
+
+ # Increment attempt counter
+ device_registry.increment_attempts(device_id)
+
+ # Try to reconnect
+ success = await self.connect_device(device_id)
+
+ if success:
+ # Reset attempt counter on success
+ device_registry.reset_attempts(device_id)
+ logger.info(f"Device {device_id} reconnected successfully")
+ else:
+ # Reconnection failed, will retry again
+ await self._handle_device_disconnection(device_id, "reconnection_failed")
+```
+
+Exponential backoff prevents overwhelming unstable networks with rapid reconnection attempts.
+
+**Reconnection Flow:**
+
+```mermaid
+sequenceDiagram
+ participant HM as HeartbeatManager
+ participant DM as DeviceManager
+ participant DR as DeviceRegistry
+ participant Device
+
+ HM->>HM: Send heartbeat
+ Note over HM,Device: No response (timeout)
+
+ HM->>DM: _handle_device_disconnection("heartbeat_timeout")
+
+ DM->>DM: Stop heartbeat
+ DM->>DM: Stop message handler
+ DM->>DM: Close WebSocket
+ DM->>DR: update_status(FAILED)
+
+ DM->>DR: get connection_attempts
+ DR-->>DM: attempts = 1
+
+ alt attempts < max_retries
+ DM->>DM: Calculate delay (5s * 2^1 = 10s)
+ DM->>DM: await asyncio.sleep(10)
+
+ DM->>DR: increment_attempts (now 2)
+
+ DM->>Device: connect_device()
+
+ alt Connection succeeds
+ Device-->>DM: Success
+ DM->>DR: reset_attempts (back to 0)
+ DM->>DR: update_status(IDLE)
+ else Connection fails
+ Device-->>DM: Failure
+ DM->>DM: _handle_device_disconnection (recursive)
+ Note over DM: Next attempt in 20s
+ end
+ else attempts >= max_retries
+ DM->>DR: update_status(FAILED)
+ Note over DM: Give up
+ end
+```
+
+This diagram shows the reconnection loop with exponential backoff.
+
+**Queued Task Handling During Reconnection:**
+
+Tasks queued when a device disconnects remain in the queue. After successful reconnection, TaskQueueManager automatically starts processing queued tasks. This ensures no task loss during temporary network failures.
+
+---
+
+## Component Integration Example
+
+Here's a complete example showing how all components work together during a typical device lifecycle:
+
+```python
+# 1. Create DeviceManager
+manager = DeviceManager(
+ task_name="production_constellation",
+ heartbeat_interval=30.0,
+ reconnect_delay=5.0
+)
+
+# This creates all five components:
+# - DeviceRegistry (stores device state)
+# - WebSocketConnectionManager (handles connections)
+# - HeartbeatManager (monitors health)
+# - MessageProcessor (routes messages)
+# - TaskQueueManager (manages queues)
+
+# 2. Register device
+await manager.register_device(
+ device_id="office_pc",
+ server_url="ws://192.168.1.100:5000/ws",
+ os="Windows",
+ capabilities=["office", "web"],
+ max_retries=5,
+ auto_connect=True # Will automatically connect after registration
+)
+# DeviceManager → DeviceRegistry (store AgentProfile)
+# If auto_connect=True → DeviceManager → connect_device()
+
+# 3. Connect device (if auto_connect was False)
+# await manager.connect_device("office_pc")
+# DeviceManager → WebSocketConnectionManager (connect, start message handler)
+# → DeviceRegistry (update status to CONNECTED, then IDLE)
+# → HeartbeatManager (start heartbeat loop)
+
+# 4. Assign first task (device is IDLE)
+result1 = await manager.assign_task_to_device(
+ task_id="task_1",
+ device_id="office_pc",
+ task_description="Open Excel",
+ task_data={"file": "report.xlsx"},
+ timeout=300
+)
+# DeviceManager → DeviceRegistry (check status: IDLE)
+# → DeviceRegistry (update status to BUSY via set_device_busy)
+# → WebSocketConnectionManager (send TASK via TaskExecutionProtocol)
+# [wait for TASK_END]
+# → DeviceRegistry (update status to IDLE via set_device_idle)
+
+# 5. Assign second task while first is running (device is BUSY)
+# Note: This happens concurrently with task_1
+asyncio.create_task(
+ manager.assign_task_to_device(
+ task_id="task_2",
+ device_id="office_pc",
+ task_description="Send email",
+ task_data={"to": "john@example.com"},
+ timeout=300
+ )
+)
+# DeviceManager → DeviceRegistry (check status: BUSY)
+# → TaskQueueManager (add to queue)
+# [returns immediately with "queued" status]
+
+# When task_1 completes:
+# MessageProcessor → DeviceManager (TASK_END received)
+# DeviceManager → DeviceRegistry (update status to IDLE)
+# → TaskQueueManager (get_next_task)
+# → TaskQueueManager (returns task_2)
+# → DeviceManager (assign_task_to_device recursively for task_2)
+
+# 6. Simulate network failure
+# HeartbeatManager → [send heartbeat]
+# → [timeout waiting for response]
+# → DeviceManager (_handle_device_disconnection)
+
+# DeviceManager → HeartbeatManager (stop)
+# → MessageProcessor (stop)
+# → WebSocketConnectionManager (disconnect)
+# → TaskQueueManager (tasks remain queued)
+# → DeviceRegistry (update status to FAILED)
+# → [schedule reconnection attempt]
+# → [wait reconnect_delay seconds]
+# → connect_device (reconnection attempt with is_reconnection=True)
+
+# 7. Reconnection succeeds
+# After reconnection:
+# DeviceManager → DeviceRegistry (reset attempts, update status to IDLE)
+# → TaskQueueManager (get_next_task)
+# [if tasks queued, automatically start execution]
+
+# 8. Disconnect device
+await manager.disconnect_device("office_pc")
+# DeviceManager → HeartbeatManager (stop)
+# → MessageProcessor (stop)
+# → WebSocketConnectionManager (disconnect)
+# → TaskQueueManager (clear queue)
+# → DeviceRegistry (update status to DISCONNECTED)
+```
+
+This complete example demonstrates how DeviceManager coordinates all five components throughout the device lifecycle.
+
+---
+
+## Internal Architecture Details
+
+### Component Responsibilities
+
+**DeviceRegistry:**
+
+- Stores AgentProfile objects (one per device)
+- Manages device status transitions (DISCONNECTED → CONNECTED → IDLE → BUSY → FAILED)
+- Tracks connection attempts for reconnection logic
+- Provides thread-safe access to device state
+
+DeviceRegistry is the single source of truth. All other components query DeviceRegistry for device information rather than maintaining their own state copies.
+
+**WebSocketConnectionManager:**
+
+- Establishes WebSocket connections using `websockets` library
+- Maintains WebSocket object per device
+- Sends messages over WebSocket
+- Handles WebSocket-level errors (connection refused, SSL errors, etc.)
+- Closes connections gracefully
+
+WebSocketConnectionManager knows nothing about AIP protocol or device status. It's purely a WebSocket I/O layer.
+
+**HeartbeatManager:**
+
+- Runs background loop per device (every `heartbeat_interval` seconds)
+- Sends HEARTBEAT message via MessageProcessor
+- Waits for HEARTBEAT response
+- Calls DeviceManager's disconnection handler on timeout
+- Cancellable via `stop_heartbeat()`
+
+HeartbeatManager detects connection failures that WebSocket layer might miss (e.g., server hangs without closing connection).
+
+**MessageProcessor:**
+
+- Routes incoming messages by type (REGISTER_CONFIRMATION, DEVICE_INFO, TASK_END, HEARTBEAT)
+- Implements request-response pattern for synchronous messaging
+- Runs background message listener loop per device
+- Queues responses for `wait_for_response()` calls
+- Handles protocol-level errors
+
+MessageProcessor implements the AIP protocol message routing. It's the component that "speaks AIP".
+
+**TaskQueueManager:**
+
+- Maintains FIFO queue per device
+- Adds tasks when device is BUSY
+- Returns next task when device becomes IDLE
+- Clears queue on disconnection
+- Thread-safe for concurrent access
+
+TaskQueueManager ensures tasks execute in order and prevents task loss when devices are busy.
+
+### Component Communication Pattern
+
+Components communicate exclusively through DeviceManager as the coordinator. They do NOT directly call each other:
+
+**Wrong (direct component communication):**
+```python
+# DON'T do this
+websocket_manager.connect(device_id)
+message_processor.send_message(device_id, "REGISTER")
+device_registry.update_status(device_id, DeviceStatus.IDLE)
+```
+
+**Correct (through DeviceManager):**
+```python
+# DO this
+await device_manager.connect_device(device_id)
+# DeviceManager internally coordinates:
+# websocket_manager.connect()
+# message_processor.send_message()
+# device_registry.update_status()
+```
+
+This pattern enforces proper coordination and ensures all necessary steps happen in the correct order.
+
+---
+
+## Advanced Usage Patterns
+
+### Custom Reconnection Logic
+
+Override disconnection handler for custom reconnection behavior:
+
+```python
+class CustomDeviceManager(DeviceManager):
+ async def _handle_device_disconnection(self, device_id: str, reason: str):
+ # Custom logic: Only reconnect for specific reasons
+ if reason == "heartbeat_timeout":
+ # Network glitch, reconnect immediately
+ await self.connect_device(device_id)
+ elif reason == "protocol_error":
+ # Protocol mismatch, don't reconnect
+ logger.error(f"Protocol error on {device_id}, not reconnecting")
+ self.device_registry.update_status(device_id, DeviceStatus.FAILED)
+ else:
+ # Use default exponential backoff
+ await super()._handle_device_disconnection(device_id, reason)
+```
+
+### Priority Task Queue
+
+Extend TaskQueueManager for priority queuing:
+
+```python
+class PriorityTaskQueueManager(TaskQueueManager):
+ def add_task(self, device_id: str, task_id: str, priority: int, **kwargs):
+ """Add task with priority (lower number = higher priority)."""
+ if device_id not in self._queues:
+ self._queues[device_id] = []
+
+ # Insert in priority order
+ task = {"task_id": task_id, "priority": priority, **kwargs}
+ queue = self._queues[device_id]
+
+ # Find insertion point
+ insert_idx = 0
+ for i, queued_task in enumerate(queue):
+ if queued_task["priority"] > priority:
+ insert_idx = i
+ break
+ else:
+ insert_idx = len(queue)
+
+ queue.insert(insert_idx, task)
+
+ def get_next_task(self, device_id: str):
+ """Get highest priority task."""
+ if device_id in self._queues and self._queues[device_id]:
+ return self._queues[device_id].pop(0) # First is highest priority
+ return None
+
+# Use custom queue manager
+manager = DeviceManager(task_name="production")
+manager.task_queue_manager = PriorityTaskQueueManager()
+```
+
+### Connection Pool Management
+
+Limit concurrent connections:
+
+```python
+class PooledDeviceManager(DeviceManager):
+ def __init__(self, *args, max_concurrent_connections: int = 10, **kwargs):
+ super().__init__(*args, **kwargs)
+ self.max_concurrent = max_concurrent_connections
+ self.connection_semaphore = asyncio.Semaphore(max_concurrent_connections)
+
+ async def connect_device(self, device_id: str) -> bool:
+ async with self.connection_semaphore:
+ # Only max_concurrent connections can proceed
+ return await super().connect_device(device_id)
+
+# Limit to 5 concurrent connections
+manager = PooledDeviceManager(
+ task_name="production",
+ max_concurrent_connections=5
+)
+```
+
+---
+
+## Summary
+
+DeviceManager is the orchestration layer that coordinates five specialized components to manage device connections. It doesn't perform low-level operations itself; instead, it delegates to components and ensures they work together correctly.
+
+**Key Concepts:**
+
+- **Orchestrator Pattern**: DeviceManager coordinates components but doesn't duplicate their functionality
+- **Modular Architecture**: Five components with single responsibilities (DeviceRegistry, WebSocketConnectionManager, HeartbeatManager, MessageProcessor, TaskQueueManager)
+- **Lifecycle Management**: Register → Connect → Execute → Disconnect → Reconnect
+- **Automatic Reconnection**: Exponential backoff with configurable retries per device
+- **Task Queuing**: Automatic queuing when devices are busy
+
+**When to Use DeviceManager Directly:**
+
+Most applications should use ConstellationClient, which wraps DeviceManager. Use DeviceManager directly only for:
+
+- Custom reconnection strategies
+- Custom task queuing logic
+- Fine-grained control over component behavior
+- Advanced monitoring and debugging
+
+**Next Steps:**
+
+- See [Components](./components.md) for detailed component documentation
+- See [ConstellationClient](./constellation_client.md) for high-level API
+- See [AIP Integration](./aip_integration.md) for protocol details and message flows
+- See [Overview](./overview.md) for overall Galaxy Client architecture
+- See [Agent Registration](../agent_registration/overview.md) for device registration details
diff --git a/documents/docs/galaxy/client/galaxy_client.md b/documents/docs/galaxy/client/galaxy_client.md
new file mode 100644
index 000000000..58672ee6a
--- /dev/null
+++ b/documents/docs/galaxy/client/galaxy_client.md
@@ -0,0 +1,739 @@
+# GalaxyClient Reference
+
+GalaxyClient is an optional session management wrapper on top of ConstellationClient. It provides a convenient high-level API for initializing the system, processing user requests through GalaxySession, and running interactive sessions. Most applications use GalaxyClient as the main entry point.
+
+## Related Documentation
+
+- [Overview](./overview.md) - Overall architecture and workflow
+- [ConstellationClient](./constellation_client.md) - Device coordination layer
+
+## What GalaxyClient Does
+
+GalaxyClient is the "easy mode" API for Galaxy. While you can use ConstellationClient directly for device management, GalaxyClient adds session management, request processing, and interactive mode on top.
+
+**Think of it this way:**
+
+- **ConstellationClient**: "I need to register devices and assign tasks"
+- **GalaxyClient**: "I have a user request, please execute it across my devices"
+
+GalaxyClient handles the entire request lifecycle: parsing the request, creating a GalaxySession, coordinating with ConstellationAgent for task planning, executing the DAG across devices, and returning results to the user.
+
+**Core Responsibilities:**
+
+**Session Management**: GalaxyClient creates and manages GalaxySession objects. Each session represents one user request and contains the conversation history, task planning, and execution state. Sessions are isolated—failures in one session don't affect others.
+
+**Request Processing**: When you call `process_request()`, GalaxyClient:
+1. Creates a GalaxySession with the request
+2. Passes the session to ConstellationAgent for DAG planning
+3. Uses TaskConstellationOrchestrator to execute the DAG across devices
+4. Collects results and returns them to you
+
+**Interactive Mode**: GalaxyClient provides an interactive CLI loop where users can type requests, see execution progress, and view results. This is useful for demos, debugging, and manual testing.
+
+**Configuration Integration**: GalaxyClient loads configurations from YAML files, validates settings, and passes them to ConstellationClient. This centralizes configuration management.
+
+**What GalaxyClient Does NOT Do:**
+
+- **Device Connection Management**: Handled by ConstellationClient → DeviceManager
+- **Task Planning**: Handled by ConstellationAgent
+- **DAG Execution**: Handled by TaskConstellationOrchestrator
+- **Multi-round Interaction Logic**: Handled by GalaxySession
+
+GalaxyClient is the orchestrator at the highest level, delegating to specialized components for each concern.
+
+## When to Use GalaxyClient
+
+**Use GalaxyClient when:**
+
+- You want a simple API for processing user requests
+- You need session management for multi-round interactions
+- You want interactive mode for demos or debugging
+- You're building a conversational agent or task automation system
+
+**Use ConstellationClient directly when:**
+
+- You only need device management without session/request processing
+- You're building a custom orchestrator
+- You need fine-grained control over task assignment
+- Sessions are managed by your own higher-level system
+
+**Example Use Cases:**
+
+**GalaxyClient**: Chatbot that processes natural language requests ("Open PowerPoint and create a presentation about AI")
+
+**ConstellationClient**: Monitoring system that assigns health check tasks to devices every 5 minutes
+
+## Initialization
+
+### Constructor
+
+```python
+def __init__(
+ self,
+ session_name: Optional[str] = None,
+ task_name: Optional[str] = None,
+ max_rounds: int = 10,
+ log_level: str = "INFO",
+ output_dir: Optional[str] = None,
+):
+ """
+ Initialize GalaxyClient.
+
+ Args:
+ session_name: Name for the Galaxy session (auto-generated if None)
+ task_name: Name for the task (auto-generated if None)
+ max_rounds: Maximum number of rounds per session (default: 10)
+ log_level: Logging level (default: "INFO")
+ output_dir: Output directory for logs and results
+ """
+```
+
+GalaxyClient initialization automatically loads device configuration from the Galaxy config system:
+
+**Automatic Configuration Loading:**
+
+GalaxyClient loads device configuration from the centralized config system:
+
+```python
+# Configuration is loaded automatically
+client = GalaxyClient(
+ session_name="production_session",
+ task_name="email_automation",
+ max_rounds=10
+)
+```
+
+Internally, GalaxyClient:
+
+1. Loads Galaxy configuration using `get_galaxy_config()`
+2. Extracts device info path from `galaxy_config.constellation.DEVICE_INFO`
+3. Loads ConstellationConfig from the YAML file
+4. Creates internal ConstellationClient with this configuration
+
+**Session and Task Names:**
+
+```python
+# Use custom names
+client = GalaxyClient(
+ session_name="production_session",
+ task_name="email_task"
+)
+
+# Auto-generate names with timestamps
+client = GalaxyClient()
+# session_name: "galaxy_session_20251106_103045"
+# task_name: "request_20251106_103045"
+```
+
+Session name identifies the overall session, while task name identifies individual tasks within the session.
+
+**Max Rounds:**
+
+```python
+# Limit conversation rounds
+client = GalaxyClient(max_rounds=5)
+```
+
+Max rounds controls how many back-and-forth exchanges the agent can have during task execution. Higher values allow more complex tasks but take longer.
+
+**Output Directory:**
+
+```python
+# Custom output directory
+client = GalaxyClient(output_dir="./custom_logs")
+```
+
+If not specified, uses the default session log path from configuration.
+
+**Internal ConstellationClient Creation:**
+
+After loading configuration, GalaxyClient creates an internal ConstellationClient:
+
+```python
+self._constellation_client = ConstellationClient(
+ config=self.config,
+ task_name=self.task_name
+)
+```
+
+All device management operations delegate to this internal client.
+
+### Async Initialize Method
+
+```python
+async def initialize(self) -> None:
+ """
+ Initialize the Galaxy Client and connect to devices.
+
+ This calls ConstellationClient.initialize() to register and
+ optionally connect to all configured devices.
+ """
+```
+
+After creating a GalaxyClient, you must call `initialize()`:
+
+```python
+client = GalaxyClient(session_name="my_session")
+await client.initialize()
+
+# Now ready to process requests
+result = await client.process_request("Open Excel and create a chart")
+```
+
+Initialization creates and initializes the internal ConstellationClient, which:
+
+1. Registers all devices from configuration
+2. Connects to devices with `auto_connect: true`
+3. Starts heartbeat monitoring
+4. Starts message handlers
+
+**Initialization Failures:**
+
+If some devices fail to connect during initialization, `initialize()` logs warnings but continues. You can check connection status after initialization:
+
+```python
+await client.initialize()
+
+# Check which devices connected
+connected = client._constellation_client.get_connected_devices()
+if len(connected) == 0:
+ raise RuntimeError("No devices connected")
+```
+
+## Request Processing
+
+### Process Request
+
+```python
+async def process_request(
+ self,
+ request: str,
+ context: Optional[Dict[str, Any]] = None,
+) -> Dict[str, Any]:
+ """
+ Process a user request end-to-end.
+
+ Args:
+ request: Natural language user request
+ context: Additional context (previous results, user preferences, etc.)
+
+ Returns:
+ Dictionary containing execution results, session info, and metadata
+ """
+```
+
+This is the primary method you'll use. It handles the entire request lifecycle:
+
+**Step 1: Create Session**
+
+```python
+session = GalaxySession(
+ task=task_name,
+ should_evaluate=False,
+ id=session_id,
+ client=self._constellation_client,
+ initial_request=request
+)
+```
+
+GalaxySession encapsulates one request execution, including conversation history, task planning, and execution state.
+
+**Step 2: Execute Session**
+
+```python
+result = await session.run()
+```
+
+Session execution involves:
+
+1. **ConstellationAgent Planning**: Agent analyzes the request, determines required capabilities, and creates a DAG (Directed Acyclic Graph) of tasks
+2. **Device Selection**: For each task, select a device with matching capabilities
+3. **DAG Execution**: TaskConstellationOrchestrator executes tasks respecting dependencies
+4. **Result Collection**: Gather results from all tasks
+
+**Step 3: Return Results**
+
+```python
+return {
+ "success": result.success,
+ "output": result.output,
+ "session_id": session.session_id,
+ "task_count": len(session.dag.tasks),
+ "execution_time": result.execution_time,
+ "errors": result.errors
+}
+```
+
+**Complete Request Processing Flow:**
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant GC as GalaxyClient
+ participant Session as GalaxySession
+ participant Agent as ConstellationAgent
+ participant Orch as TaskConstellationOrchestrator
+ participant CC as ConstellationClient
+ participant Devices
+
+ User->>GC: process_request("Create PowerPoint about AI")
+
+ GC->>Session: Create GalaxySession
+ GC->>Session: run()
+
+ Session->>Agent: Analyze request
+ Agent->>Agent: Create DAG
+ Agent-->>Session: DAG (tasks + dependencies)
+
+ Session->>Orch: execute_dag()
+
+ loop For each task in topological order
+ Orch->>Orch: Select device by capabilities
+ Orch->>CC: assign_task_to_device()
+ CC->>Devices: Send TASK (AIP)
+ Devices-->>CC: TASK_END (results)
+ CC-->>Orch: Task result
+ end
+
+ Orch-->>Session: All task results
+
+ Session-->>GC: Execution result
+ GC-->>User: {"success": true, "output": "..."}
+```
+
+**Example Usage:**
+
+```python
+# Simple request
+result = await client.process_request(
+ request="Open Excel and create a chart showing quarterly sales"
+)
+
+if result["success"]:
+ print(f"Completed {result['task_count']} tasks in {result['execution_time']:.2f}s")
+ print(f"Output: {result['output']}")
+else:
+ print(f"Errors: {result['errors']}")
+
+# Request with context
+result = await client.process_request(
+ request="Update the chart with new data",
+ context={
+ "previous_file": "Q1_sales.xlsx",
+ "user_preferences": {"chart_type": "bar"}
+ }
+)
+```
+
+Context is useful for multi-round conversations where later requests reference earlier results.
+
+## Interactive Mode
+
+### Interactive Mode
+
+```python
+async def interactive_mode(self) -> None:
+ """
+ Start an interactive CLI loop for processing user requests.
+
+ Users can type requests, see execution progress, and view results.
+ Type 'quit' or 'exit' to stop.
+ """
+```
+
+Interactive mode provides a REPL (Read-Eval-Print Loop) for manual testing:
+
+```python
+client = GalaxyClient(config_path="config/devices.yaml")
+await client.initialize()
+
+# Start interactive loop
+await client.interactive_mode()
+```
+
+**Interactive Session Example:**
+
+```
+=== Galaxy Client Interactive Mode ===
+Connected to 3 devices: windows_pc, linux_server, mac_laptop
+Type 'quit' or 'exit' to stop.
+
+> Open PowerPoint and create a presentation about AI
+
+[ConstellationAgent] Analyzing request...
+[ConstellationAgent] Created DAG with 3 tasks:
+ - Task 1: Open PowerPoint
+ - Task 2: Create new presentation
+ - Task 3: Add slides about AI
+
+[TaskOrchestrator] Executing task 1 on windows_pc...
+[TaskOrchestrator] Task 1 completed successfully
+
+[TaskOrchestrator] Executing task 2 on windows_pc...
+[TaskOrchestrator] Task 2 completed successfully
+
+[TaskOrchestrator] Executing task 3 on windows_pc...
+[TaskOrchestrator] Task 3 completed successfully
+
+✓ Request completed successfully (3 tasks, 15.3s)
+Output: Created presentation "AI_Overview.pptx" with 5 slides
+
+> Send the presentation via email to john@example.com
+
+[ConstellationAgent] Analyzing request...
+[ConstellationAgent] Using context from previous task
+
+[TaskOrchestrator] Executing task 1 on windows_pc...
+[TaskOrchestrator] Task 1 completed successfully
+
+✓ Request completed successfully (1 task, 3.2s)
+Output: Email sent to john@example.com with attachment AI_Overview.pptx
+
+> quit
+
+Shutting down Galaxy Client...
+Disconnected from all devices.
+Goodbye!
+```
+
+**Interactive Mode Features:**
+
+**Persistent Session Context**: Interactive mode maintains context across requests, so later requests can reference earlier results ("Send the presentation" knows which presentation).
+
+**Real-time Progress**: Shows task execution progress as it happens, useful for understanding what's happening during long-running requests.
+
+**Error Display**: Shows detailed error messages if tasks fail, helpful for debugging.
+
+**Device Status**: Shows which devices are connected at startup.
+
+## Lifecycle Management
+
+### Shutdown
+
+```python
+async def shutdown(self) -> None:
+ """
+ Gracefully shutdown the Galaxy Client.
+
+ Stops all sessions, disconnects all devices, and cleans up resources.
+ """
+```
+
+Always call `shutdown()` to cleanup resources:
+
+```python
+try:
+ client = GalaxyClient(config_path="config.yaml")
+ await client.initialize()
+
+ # Use the client
+ await client.process_request("...")
+
+finally:
+ # Always shutdown
+ await client.shutdown()
+```
+
+Shutdown delegates to ConstellationClient, which:
+
+1. Stops all task queues
+2. Stops message handlers
+3. Stops heartbeat monitoring
+4. Closes WebSocket connections
+5. Cancels background tasks
+
+Without proper shutdown, background tasks continue running, connections stay open, and resources leak.
+
+**Context Manager Pattern** (recommended):
+
+```python
+async with GalaxyClient(config_path="config.yaml") as client:
+ await client.initialize()
+ result = await client.process_request("Open Excel")
+
+# Automatically calls shutdown() on exit
+```
+
+## Configuration Management
+
+### Get Device Status
+
+```python
+def get_device_status(self, device_id: Optional[str] = None) -> Dict[str, Any]:
+ """Get device status from underlying ConstellationClient."""
+ return self._constellation_client.get_device_status(device_id)
+```
+
+GalaxyClient exposes device status from ConstellationClient:
+
+```python
+# Get all device statuses
+all_status = client.get_device_status()
+
+# Get specific device status
+pc_status = client.get_device_status("windows_pc")
+print(f"Status: {pc_status['status']}")
+print(f"Current task: {pc_status['current_task_id']}")
+print(f"Queued tasks: {pc_status['queued_tasks']}")
+```
+
+### Get Connected Devices
+
+```python
+def get_connected_devices(self) -> List[str]:
+ """Get list of connected device IDs."""
+ return self._constellation_client.get_connected_devices()
+```
+
+Check which devices are available:
+
+```python
+connected = client.get_connected_devices()
+
+if "windows_pc" not in connected:
+ print("Warning: Windows PC not connected")
+```
+
+### Add Device
+
+```python
+async def add_device(
+ self,
+ device_id: str,
+ server_url: str,
+ capabilities: Optional[List[str]] = None,
+ metadata: Optional[Dict[str, Any]] = None,
+) -> bool:
+ """Add and connect a new device at runtime."""
+```
+
+Dynamically add devices:
+
+```python
+# Add new device discovered at runtime
+success = await client.add_device(
+ device_id="new_workstation",
+ server_url="ws://192.168.1.200:5000/ws",
+ capabilities=["office", "web", "design"],
+ metadata={"location": "design_team", "gpu": "RTX 4090"}
+)
+
+if success:
+ print("New device ready for tasks")
+```
+
+This delegates to ConstellationClient, which registers and connects the device.
+
+## Usage Patterns
+
+### Basic Request Processing
+
+```python
+async def main():
+ # Initialize client
+ client = GalaxyClient(session_name="automation_session")
+ await client.initialize()
+
+ try:
+ # Process single request
+ result = await client.process_request(
+ request="Open Word and create a document about machine learning"
+ )
+
+ if result["success"]:
+ print(f"Completed in {result['execution_time']:.1f}s")
+ else:
+ print(f"Failed: {result['errors']}")
+
+ finally:
+ await client.shutdown()
+
+asyncio.run(main())
+```
+
+### Multi-Round Conversation
+
+```python
+async def multi_round_conversation():
+ client = GalaxyClient(session_name="conversation", max_rounds=15)
+ await client.initialize()
+
+ try:
+ # First request
+ result1 = await client.process_request(
+ request="Create a sales report spreadsheet"
+ )
+
+ # Second request references first
+ result2 = await client.process_request(
+ request="Add a pie chart showing regional distribution"
+ )
+
+ # Third request references both
+ result3 = await client.process_request(
+ request="Email the report to the team"
+ )
+
+ finally:
+ await client.shutdown()
+```
+
+### Error Handling
+
+```python
+async def robust_processing():
+ client = GalaxyClient(session_name="robust")
+
+ try:
+ await client.initialize()
+ except Exception as e:
+ print(f"Initialization failed: {e}")
+ return
+
+ try:
+ result = await client.process_request("Open Excel")
+
+ if not result["success"]:
+ # Handle execution errors
+ for error in result["errors"]:
+ print(f"Task {error['task_id']} failed: {error['message']}")
+
+ # Retry specific tasks
+ if "connection" in error["message"].lower():
+ print("Retrying due to connection error...")
+ result = await client.process_request("Open Excel")
+
+ except Exception as e:
+ # Handle unexpected errors
+ print(f"Unexpected error: {e}")
+
+ finally:
+ await client.shutdown()
+```
+
+### Dynamic Device Management
+
+```python
+async def adaptive_constellation():
+ client = GalaxyClient(session_name="adaptive")
+ await client.initialize()
+
+ try:
+ # Monitor device health
+ while True:
+ connected = client.get_connected_devices()
+
+ if len(connected) < 2:
+ # Not enough devices, add more
+ print("Adding fallback device...")
+ await client.add_device(
+ device_id="fallback_device",
+ server_url="ws://backup.example.com:5000/ws",
+ capabilities=["office", "web"]
+ )
+
+ # Process request
+ result = await client.process_request("Create report")
+
+ # Sleep before next iteration
+ await asyncio.sleep(60)
+
+ finally:
+ await client.shutdown()
+```
+
+## Integration with Other Components
+
+### GalaxyClient vs ConstellationClient
+
+```python
+# GalaxyClient: High-level request processing
+galaxy_client = GalaxyClient(session_name="production")
+await galaxy_client.initialize()
+
+result = await galaxy_client.process_request("Open PowerPoint")
+# Internally:
+# 1. Creates GalaxySession
+# 2. ConstellationAgent plans DAG
+# 3. TaskOrchestrator executes DAG
+# 4. ConstellationClient assigns tasks to devices
+
+# ConstellationClient: Device management only
+constellation_client = ConstellationClient(config)
+await constellation_client.initialize()
+
+await constellation_client.connect_device("windows_pc")
+# No automatic task planning, you control everything
+```
+
+### Using GalaxyClient in Web Applications
+
+```python
+from fastapi import FastAPI, HTTPException
+
+app = FastAPI()
+
+# Global GalaxyClient instance
+galaxy_client = None
+
+@app.on_event("startup")
+async def startup():
+ global galaxy_client
+ galaxy_client = GalaxyClient(session_name="api_server")
+ await galaxy_client.initialize()
+
+@app.on_event("shutdown")
+async def shutdown():
+ global galaxy_client
+ if galaxy_client:
+ await galaxy_client.shutdown()
+
+@app.post("/execute")
+async def execute_request(request: str):
+ """Execute user request via Galaxy."""
+ if not galaxy_client:
+ raise HTTPException(status_code=500, detail="Galaxy not initialized")
+
+ result = await galaxy_client.process_request(request)
+
+ if result["success"]:
+ return {"status": "completed", "output": result["output"]}
+ else:
+ raise HTTPException(
+ status_code=500,
+ detail={"status": "failed", "errors": result["errors"]}
+ )
+
+@app.get("/devices")
+async def list_devices():
+ """Get connected device status."""
+ if not galaxy_client:
+ raise HTTPException(status_code=500, detail="Galaxy not initialized")
+
+ return {
+ "connected": galaxy_client.get_connected_devices(),
+ "status": galaxy_client.get_device_status()
+ }
+```
+
+## Summary
+
+GalaxyClient is the high-level entry point for Galaxy Client, providing:
+
+- **Simple API**: Single method (`process_request`) for end-to-end execution
+- **Session Management**: Creates and manages GalaxySession objects
+- **Interactive Mode**: CLI loop for demos and debugging
+- **Configuration Management**: Loads and validates configurations
+- **Delegation**: Wraps ConstellationClient for device management
+
+**When to Use:**
+
+- **GalaxyClient**: Processing natural language requests, multi-round conversations, interactive demos
+- **ConstellationClient**: Direct device management, custom orchestration, fine-grained control
+
+For most applications, GalaxyClient provides the right level of abstraction. Use ConstellationClient directly only when you need custom orchestration or don't need session management.
+
+**Next Steps:**
+
+- See [ConstellationClient](./constellation_client.md) for device management details
+- See [Overview](./overview.md) for overall architecture
diff --git a/documents/docs/galaxy/client/overview.md b/documents/docs/galaxy/client/overview.md
new file mode 100644
index 000000000..b763ab835
--- /dev/null
+++ b/documents/docs/galaxy/client/overview.md
@@ -0,0 +1,437 @@
+# Galaxy Client Overview
+
+Galaxy Client is the client-side layer responsible for multi-device coordination in the UFO³ framework. At its core is **ConstellationClient**, which manages device registration, connection, and task assignment. **GalaxyClient** provides a lightweight wrapper offering convenient session management interfaces.
+
+## Related Documentation
+
+- [ConstellationClient](./constellation_client.md) - Core device coordination component
+- [DeviceManager](./device_manager.md) - Low-level connection management
+- [Components](./components.md) - Modular component architecture
+- [AIP Integration](./aip_integration.md) - Communication protocol integration
+- [GalaxyClient](./galaxy_client.md) - Session wrapper API
+- [Configuration](../../configuration/system/galaxy_constellation.md) - Device configuration guide
+
+## The Complete Path: From User Request to Device Execution
+
+To understand Galaxy Client, we first need to see the entire system workflow. When a user submits a task request, the system processes it through several layers:
+
+### 1. User Interaction Layer (Optional)
+
+Users can interact with the Galaxy system in two ways:
+
+**Interactive Mode**: Users input natural language requests through a command-line interface (CLI), which are received and processed by GalaxyClient. This mode is primarily used for rapid prototyping and manual testing.
+
+**Programmatic Mode**: Developers directly call the Python API of ConstellationClient or GalaxyClient, integrating Galaxy into their applications. This is the recommended approach for production environments.
+
+### 2. Session Management Layer (GalaxyClient)
+
+GalaxyClient's role is to manage the lifecycle of task sessions. It doesn't handle specific device operations but instead:
+
+- Initializes and holds a ConstellationClient instance
+- Creates a GalaxySession for each user request
+- Passes requests to ConstellationAgent for DAG planning (task decomposition)
+- Coordinates TaskConstellationOrchestrator to execute the DAG
+- Collects and aggregates execution results
+
+**GalaxyClient is optional**. If your application doesn't need session management, you can use ConstellationClient directly.
+
+### 3. Device Coordination Layer (ConstellationClient)
+
+ConstellationClient is the heart of Galaxy Client. It is responsible for:
+
+**Device Management**: Registering devices (each device has a unique ID, server URL, capability list, etc.), connecting to devices (via WebSocket), disconnecting devices, and monitoring device health status.
+
+**Task Assignment**: Receiving task requests from upper layers (TaskConstellationOrchestrator), selecting appropriate devices based on capabilities, sending tasks to devices via the AIP protocol, and waiting for and collecting task execution results.
+
+ConstellationClient doesn't concern itself with how tasks are decomposed (that's ConstellationAgent's responsibility) or how DAGs are executed (that's TaskConstellationOrchestrator's responsibility). It focuses on "device-level matters."
+
+### 4. Connection Management Layer (DeviceManager)
+
+DeviceManager is the core internal component of ConstellationClient, responsible for all low-level connection management:
+
+**WebSocket Connection Establishment**: Establishes WebSocket connections with Agent Server, sends AIP REGISTER messages to register device identity, and requests device system information (DEVICE_INFO_REQUEST).
+
+**Connection Monitoring**: Sends HEARTBEAT messages every 20-30 seconds to check if devices are online. If a timeout occurs with no response, it triggers disconnection handling and automatically attempts reconnection (up to max_retries times).
+
+**Message Routing**: Starts a background message processing loop, receives messages returned by devices (TASK_END, COMMAND_RESULTS, etc.), and dispatches messages to appropriate handlers.
+
+**Task Queuing**: If a device is busy executing another task, new tasks are queued and automatically dequeued when the device becomes idle.
+
+### 5. Protocol Layer (AIP)
+
+All communication with devices goes through the [Agent Interaction Protocol (AIP)](../../aip/overview.md). AIP is a WebSocket-based messaging protocol that defines standard message types and interaction flows. Main message types used by Galaxy Client include:
+
+- `REGISTER`: Register device identity with Agent Server
+- `DEVICE_INFO_REQUEST/RESPONSE`: Request and return device system information
+- `TASK`: Assign task to device
+- `TASK_END`: Device reports task completion
+- `HEARTBEAT/HEARTBEAT_ACK`: Heartbeat health check
+- `COMMAND_RESULTS`: Device reports intermediate execution results
+- `ERROR`: Error reporting
+
+For detailed AIP explanation, see [AIP Integration](./aip_integration.md).
+
+## Component Responsibilities
+
+Having understood the overall flow, let's examine the specific responsibilities of each component:
+
+### ConstellationClient: The Device Coordination Facade
+
+ConstellationClient implements the Facade pattern. It provides simple device management APIs externally while delegating actual work to DeviceManager internally.
+
+**What it does:**
+
+```python
+# Register device
+await client.register_device(
+ device_id="windows_pc",
+ server_url="ws://192.168.1.100:5000/ws",
+ os="windows",
+ capabilities=["office", "web", "email"]
+)
+
+# Connect device
+success = await client.connect_device("windows_pc")
+
+# Assign task
+result = await client.assign_task_to_device(
+ device_id="windows_pc",
+ task_request=TaskRequest(...)
+)
+
+# Query status
+status = client.get_device_status("windows_pc")
+```
+
+**What it doesn't do:**
+
+- DAG planning (handled by ConstellationAgent)
+- DAG execution (handled by TaskConstellationOrchestrator)
+- Session management (handled by GalaxySession)
+
+See [ConstellationClient documentation](./constellation_client.md) for detailed API reference.
+
+### DeviceManager: The Connection Management Engine
+
+DeviceManager is the "engine" of ConstellationClient. It uses 5 modular components to accomplish connection management:
+
+**DeviceRegistry**: Stores AgentProfiles for all registered devices (including device ID, URL, status, capabilities, metadata, etc.). This component maintains the single source of truth for device state. When a device connects, disconnects, or changes status, DeviceRegistry is updated. Other components query DeviceRegistry to make decisions.
+
+**WebSocketConnectionManager**: Manages WebSocket connection lifecycle (connect, disconnect, send messages). This component handles the low-level WebSocket operations, including establishing connections, handling connection errors, and sending AIP messages. It maintains a mapping from device_id to WebSocket objects.
+
+**HeartbeatManager**: Background heartbeat loop that periodically sends HEARTBEAT to check device health. This runs as an independent asyncio task for each connected device. If a device fails to respond within the timeout period (2 × heartbeat_interval), HeartbeatManager triggers the disconnection handler, allowing the system to detect and respond to connection failures quickly.
+
+**MessageProcessor**: Background message processing loop that receives and routes AIP messages. This component runs a continuous loop for each device, receiving messages from the WebSocket and dispatching them to appropriate handlers. For example, TASK_END messages are used to complete task futures, COMMAND_RESULTS are logged for progress tracking, and ERROR messages trigger error handling.
+
+**TaskQueueManager**: Manages task queue for each device, queuing tasks when device is busy. When a task is assigned to a busy device, it's placed in that device's queue. When the device completes its current task and becomes IDLE, TaskQueueManager automatically dequeues the next task and executes it. This ensures tasks are never lost even when devices are overloaded.
+
+This modular design ensures each component has a single responsibility, making testing and maintenance easier. See [DeviceManager documentation](./device_manager.md) for details.
+
+### GalaxyClient: Session Management Wrapper
+
+GalaxyClient provides a higher-level abstraction on top of ConstellationClient:
+
+```python
+client = GalaxyClient()
+await client.initialize() # Initialize ConstellationClient and connect devices
+
+# Process user request (internally creates GalaxySession, calls ConstellationAgent for DAG planning)
+result = await client.process_request("Open Excel and create a sales chart")
+
+await client.shutdown() # Cleanup resources
+```
+
+GalaxyClient's main value lies in:
+
+- Simplifying initialization flow (automatically loads device info from config)
+- Providing session management (creates independent GalaxySession for each request)
+- Integrating display components (Rich console output, progress bars, etc.)
+- Supporting interactive mode (command-line interface)
+
+If your application already has its own session management logic, you can skip GalaxyClient and use ConstellationClient directly. See [GalaxyClient documentation](./galaxy_client.md) for detailed API.
+
+## Typical Workflow Example
+
+Let's walk through a complete example, from user request to device execution:
+
+### Scenario: Processing a Multi-Device Task
+
+Suppose a user submits: "Download sales.xlsx from email, analyze it in Excel on Windows, then generate a report PDF on Linux".
+
+**Step 1: Initialize GalaxyClient**
+
+```python
+client = GalaxyClient()
+await client.initialize()
+```
+
+What happens inside `initialize()`:
+
+1. GalaxyClient loads device information from config file (`device_info.yaml`)
+2. Creates ConstellationClient instance and passes configuration
+3. ConstellationClient calls `device_manager.register_device()` to register each device
+4. If `auto_connect: true` is configured, automatically calls `device_manager.connect_device()`
+5. DeviceManager executes connection flow for each device (detailed below)
+
+**Step 2: Device Connection Flow (Inside DeviceManager)**
+
+For each device (e.g., "windows_pc" and "linux_server"), DeviceManager executes:
+
+```mermaid
+sequenceDiagram
+ participant DM as DeviceManager
+ participant WS as WebSocketConnectionManager
+ participant Server as Agent Server
+ participant Device as Device Agent
+
+ Note over DM,Device: 1. Establish WebSocket Connection
+ DM->>WS: connect_to_device(device_info)
+ WS->>Server: WebSocket handshake
+ Server-->>WS: Connection established
+
+ Note over DM,Device: 2. Register Device Identity (AIP REGISTER)
+ WS->>Server: REGISTER message
+ Server->>Device: Forward registration
+ Device-->>Server: Service manifest (available MCP servers)
+ Server-->>WS: REGISTER_CONFIRMATION
+
+ Note over DM,Device: 3. Request Device System Info
+ WS->>Server: DEVICE_INFO_REQUEST
+ Server->>Device: Request system info
+ Device-->>Server: System info (CPU, memory, OS, etc.)
+ Server-->>WS: DEVICE_INFO_RESPONSE
+ WS->>DM: Update AgentProfile
+
+ Note over DM,Device: 4. Start Background Services
+ DM->>DM: Start MessageProcessor (message handling loop)
+ DM->>DM: Start HeartbeatManager (heartbeat loop)
+ DM->>DM: Set device status to IDLE
+```
+
+This sequence diagram shows the connection establishment process. First, a WebSocket connection is established with the Agent Server. Then, the device registers its identity through the AIP REGISTER message, allowing the server to know which device is connecting and what capabilities it offers. Next, the client requests detailed system information from the device to populate the AgentProfile with actual hardware and software details. Finally, background services are started to maintain the connection and handle incoming messages.
+
+**Step 3: User Request Processing**
+
+```python
+result = await client.process_request("Download sales.xlsx...")
+```
+
+Inside `process_request()`:
+
+1. GalaxyClient creates a GalaxySession
+2. GalaxySession calls ConstellationAgent for task planning
+3. ConstellationAgent (LLM-powered) decomposes task into DAG:
+ - Task 1: Download sales.xlsx from email (requires "email" capability)
+ - Task 2: Analyze in Excel (requires "office" capability, depends on Task 1)
+ - Task 3: Generate PDF on Linux (requires "pdf_generation" capability, depends on Task 2)
+4. TaskConstellationOrchestrator executes DAG:
+ - Based on capability matching, Task 1 assigned to device with "email" capability
+ - Task 2 assigned to "windows_pc" (has "office" capability)
+ - Task 3 assigned to "linux_server" (has "pdf_generation" capability)
+
+The DAG structure ensures tasks execute in the correct order respecting dependencies, while allowing independent tasks to run in parallel across different devices.
+
+**Step 4: Task Assignment and Execution (ConstellationClient/DeviceManager)**
+
+For each task, ConstellationClient calls:
+
+```python
+result = await client.assign_task_to_device(
+ device_id="windows_pc",
+ task_request=TaskRequest(
+ task_id="task_2",
+ request="Analyze sales.xlsx in Excel",
+ ...
+ )
+)
+```
+
+Inside `assign_task_to_device()`:
+
+1. DeviceManager checks device status (via DeviceRegistry)
+2. If device is IDLE, execute task immediately
+3. If device is BUSY, task enters queue (TaskQueueManager)
+4. WebSocketConnectionManager sends TASK message to device via AIP
+5. MessageProcessor waits in background for device to return COMMAND_RESULTS and TASK_END
+6. When task completes, DeviceManager changes device status back to IDLE
+7. If there are queued tasks, automatically dequeue and execute next task
+
+The queuing mechanism ensures no tasks are lost when devices are busy, and tasks are executed in order as devices become available.
+
+**Step 5: Connection Monitoring (Continuous Background Process)**
+
+Throughout task execution, HeartbeatManager continuously monitors each device:
+
+- Sends HEARTBEAT message every 20-30 seconds
+- If device responds, updates `last_heartbeat` timestamp
+- If timeout with no response (2 × heartbeat_interval), triggers disconnection handling:
+ - Stops MessageProcessor and HeartbeatManager
+ - Sets device status to DISCONNECTED
+ - If device was executing a task, marks task as failed
+ - Attempts automatic reconnection (up to max_retries times)
+
+This continuous monitoring ensures the system quickly detects and responds to connection failures, maintaining reliable communication with devices.
+
+**Step 6: Result Collection and Return**
+
+After all tasks complete:
+
+1. TaskConstellationOrchestrator aggregates all task results
+2. GalaxySession generates session results (including execution time, rounds, DAG statistics)
+3. GalaxyClient returns results to user
+4. Results are automatically saved to log directory
+
+The complete execution trace is preserved in logs for debugging and analysis.
+
+## Relationships with Other System Components
+
+Galaxy Client is not an isolated system—it closely collaborates with other UFO³ components:
+
+### Depends on Agent Server for Message Routing
+
+Galaxy Client doesn't connect directly to devices but routes through [Agent Server](../../server/overview.md). Agent Server's role is to:
+
+**Maintain Device Registry**: Tracks which devices are online and their connection details. When a device connects, Agent Server registers it in the central registry.
+
+**Route Messages**: Forwards TASK messages from Galaxy Client to the correct device based on device_id. The server acts as a message broker, decoupling clients from devices.
+
+**Broadcast Device Status**: Notifies clients when devices come online or go offline, enabling clients to maintain accurate device availability information.
+
+**Load Balancing**: If multiple clients connect to the same device, Agent Server can distribute load and prevent conflicts.
+
+### Used by ConstellationAgent for Task Planning
+
+When GalaxyClient receives a user request, it calls [ConstellationAgent](../constellation_agent/overview.md) to decompose the request into a DAG (Directed Acyclic Graph). ConstellationAgent is LLM-powered and can:
+
+**Understand Natural Language**: Parses user requests to identify subtasks and their relationships. For example, "Download file and then analyze it" is recognized as two sequential tasks.
+
+**Identify Task Dependencies**: Determines which tasks must complete before others can start, constructing a proper dependency graph.
+
+**Suggest Device Assignments**: Based on device capabilities, recommends which device should execute each task. If a task requires "office" capability, it's assigned to devices that advertise this capability.
+
+**Dynamically Adjust DAG**: If issues arise during execution (e.g., a device fails), ConstellationAgent can replan and modify the DAG to adapt to the new situation.
+
+For more details, see [ConstellationAgent Documentation](../constellation_agent/overview.md).
+
+### Coordinates with TaskConstellationOrchestrator for DAG Execution
+
+Once ConstellationAgent creates the DAG, [TaskConstellationOrchestrator](../constellation_orchestrator/overview.md) executes it across devices. The orchestrator:
+
+- **Respects Dependencies**: Ensures tasks execute in the correct order based on the DAG structure
+- **Selects Devices**: Chooses appropriate devices based on capability matching
+- **Parallel Execution**: Runs independent tasks concurrently across different devices
+- **Handles Failures**: Manages task failures and triggers replanning if needed
+
+For more details, see [TaskConstellationOrchestrator Documentation](../constellation_orchestrator/overview.md).
+
+### Collaborates with Device Agents for Task Execution
+
+The actual task execution happens on [Device Agents](../../client/overview.md) running on each device (such as UFO² Desktop Agent, Linux Agent, etc.). Device Agents are responsible for:
+
+**Receiving Tasks**: Accepts tasks from Agent Server and parses task requirements. Each task specifies what action to perform and what parameters to use.
+
+**Invoking MCP Servers**: Calls local MCP servers to perform specific operations (such as opening Excel, running commands, etc.). MCP servers provide the actual execution capabilities.
+
+**Reporting Progress**: Sends intermediate execution results through COMMAND_RESULTS messages, allowing clients to track progress in real-time.
+
+**Handling Errors**: Deals with local errors and exceptions, reporting them back to the client through ERROR messages for proper error handling.
+
+### Unified Communication through AIP Protocol
+
+All cross-component communication follows the [AIP protocol](../../aip/overview.md). AIP provides:
+
+**Standardized Message Formats**: Uses Pydantic models to define message structure, ensuring type safety and validation at both ends of communication.
+
+**Type-Safe Message Validation**: Automatically validates message fields using Pydantic, catching errors early before they propagate through the system.
+
+**Request-Response Correlation**: Uses request_id/response_id fields to match requests with their responses, enabling proper async handling.
+
+**Error Handling Mechanism**: Defines standard ERROR message types for reporting and handling failures consistently across all components.
+
+## Configuration and Deployment
+
+### Device Configuration
+
+Device information is defined through configuration files. See [Galaxy Configuration](../../configuration/system/galaxy_constellation.md) for complete configuration options.
+
+A typical configuration example:
+
+```yaml
+# config/galaxy/constellation.yaml
+task_name: "production_constellation"
+heartbeat_interval: 30.0 # Heartbeat interval (seconds)
+reconnect_delay: 5.0 # Reconnection delay (seconds)
+max_concurrent_tasks: 5 # Max concurrent tasks per device
+
+devices:
+ - device_id: "windows_pc"
+ server_url: "ws://192.168.1.100:5000/ws"
+ os: "windows"
+ capabilities: ["office", "email", "web"]
+ auto_connect: true
+ max_retries: 5 # Maximum reconnection attempts
+
+ - device_id: "linux_server"
+ server_url: "ws://192.168.1.101:5000/ws"
+ os: "linux"
+ capabilities: ["database", "api", "pdf_generation"]
+ auto_connect: true
+ max_retries: 10
+```
+
+Configuration fields explained:
+
+- **task_name**: Unique identifier for this constellation, used in logs and debugging
+- **heartbeat_interval**: How often to check device health (recommended: 20-30 seconds)
+- **reconnect_delay**: Wait time between reconnection attempts (recommended: 3-5 seconds)
+- **max_concurrent_tasks**: Maximum tasks a device can execute simultaneously
+- **capabilities**: List of capabilities each device provides, used for task assignment
+- **auto_connect**: Whether to automatically connect when client initializes
+- **max_retries**: Maximum reconnection attempts before giving up
+
+### Development vs Production Environment
+
+**Development Recommendations:**
+
+- Use interactive mode for quick testing: `python -m galaxy --interactive`
+- Enable DEBUG log level for detailed information
+- Single-device configuration to simplify debugging
+- Use local Agent Server (`ws://127.0.0.1:5000/ws`)
+- Lower heartbeat_interval (e.g., 10 seconds) for faster failure detection
+
+**Production Recommendations:**
+
+- Use WSS (secure WebSocket) instead of WS for encrypted communication
+- Configure reasonable heartbeat_interval (20-30 seconds) to balance responsiveness and network overhead
+- Set appropriate max_retries (5-10 attempts) based on network reliability
+- Enable automatic reconnection (`auto_connect: true`) for resilience
+- Monitor device health status via `get_device_status()` API and set up alerts
+- Configure log rotation and archiving to prevent disk space issues
+- Use connection pooling if connecting to many devices
+- Implement circuit breaker pattern for failing devices
+
+## Detailed Component Documentation
+
+- [ConstellationClient API Reference](./constellation_client.md) - Complete device coordination API
+- [DeviceManager Internals](./device_manager.md) - Detailed connection management mechanisms
+- [Components Module](./components.md) - Detailed explanation of 5 core components
+- [AIP Integration](./aip_integration.md) - How to use the communication protocol
+- [GalaxyClient Session Wrapper](./galaxy_client.md) - Session management API
+
+## Summary
+
+Galaxy Client provides the core multi-device coordination capabilities in UFO³. Through layered design, it simplifies complex distributed system management into clear APIs:
+
+- **ConstellationClient** is the core of device management, handling device registration, connection, and task assignment
+- **DeviceManager** is the underlying engine, processing WebSocket, heartbeat, message routing, and task queuing
+- **GalaxyClient** is an optional session wrapper, providing more convenient high-level APIs
+
+If you're new to Galaxy Client, we recommend reading the documentation in this order:
+
+1. This Overview (understand overall architecture and workflow)
+2. [ConstellationClient](./constellation_client.md) (learn core API)
+3. [Components](./components.md) (understand modular components)
+4. [DeviceManager](./device_manager.md) (dive deep into connection management)
+5. [AIP Integration](./aip_integration.md) (master communication protocol)
+
+If you need to get started quickly, jump directly to [GalaxyClient](./galaxy_client.md) example code.
diff --git a/documents/docs/galaxy/constellation/constellation_editor.md b/documents/docs/galaxy/constellation/constellation_editor.md
new file mode 100644
index 000000000..6fc4f5b00
--- /dev/null
+++ b/documents/docs/galaxy/constellation/constellation_editor.md
@@ -0,0 +1,842 @@
+# ConstellationEditor — Interactive DAG Editor
+
+---
+
+## 📋 Overview
+
+**ConstellationEditor** provides a high-level, command pattern-based interface for safe and comprehensive TaskConstellation manipulation. It offers undo/redo capabilities, batch operations, validation, and observer patterns for building, modifying, and managing complex workflow DAGs interactively.
+
+The editor uses the **Command Pattern** to encapsulate all operations as reversible command objects, enabling undo/redo with full command history, transactional safety with atomic operations, complete operation tracking for auditability, and easy extensibility for new command types.
+
+**Usage in Galaxy**: The ConstellationEditor is primarily used by the [Constellation Agent](../constellation_agent/overview.md) to programmatically build task workflows, but can also be used directly for manual constellation creation and debugging.
+
+---
+
+## 🏗️ Architecture
+
+### Core Components
+
+```mermaid
+graph TD
+ A[ConstellationEditor] -->|manages| B[TaskConstellation]
+ A -->|uses| C[CommandInvoker]
+ C -->|executes| D[Commands]
+ D -->|modifies| B
+ A -->|notifies| E[Observers]
+
+ style A fill:#87CEEB
+ style B fill:#90EE90
+ style C fill:#FFD700
+ style D fill:#FFB6C1
+ style E fill:#DDA0DD
+```
+
+| Component | Purpose |
+|-----------|---------|
+| **ConstellationEditor** | High-level interface for constellation editing |
+| **CommandInvoker** | Manages command execution, history, undo/redo |
+| **Commands** | Encapsulated operations (Add, Remove, Update, etc.) |
+| **Observers** | Callback functions notified on changes |
+
+---
+
+## 💻 Basic Usage
+
+### Creating an Editor
+
+```python
+from galaxy.constellation import TaskConstellation
+from galaxy.constellation.editor import ConstellationEditor
+
+# Create editor with new constellation
+editor = ConstellationEditor()
+
+# Create editor with existing constellation
+existing = TaskConstellation(name="my_workflow")
+editor = ConstellationEditor(
+ constellation=existing,
+ enable_history=True, # Enable undo/redo
+ max_history_size=100 # Keep last 100 commands
+)
+
+# Access constellation
+print(f"Editing: {editor.constellation.name}")
+```
+
+---
+
+## 🎯 Task Operations
+
+### Adding Tasks
+
+```python
+from galaxy.constellation import TaskStar
+
+# Method 1: Add existing TaskStar
+task = TaskStar(
+ task_id="fetch_data",
+ description="Download dataset from S3",
+ target_device_id="linux_server_1"
+)
+added_task = editor.add_task(task)
+
+# Method 2: Add from dictionary
+task_dict = {
+ "task_id": "preprocess",
+ "description": "Clean and normalize data",
+ "target_device_id": "linux_server_2",
+ "timeout": 300.0
+}
+added_task = editor.add_task(task_dict)
+
+# Method 3: Create and add in one step
+task = editor.create_and_add_task(
+ task_id="train_model",
+ description="Train neural network on preprocessed data",
+ name="Model Training",
+ target_device_id="gpu_server",
+ priority="HIGH",
+ timeout=3600.0,
+ retry_count=2
+)
+```
+
+### Updating Tasks
+
+```python
+# Update task properties
+updated_task = editor.update_task(
+ task_id="train_model",
+ description="Train BERT model on preprocessed text data",
+ timeout=7200.0,
+ priority="CRITICAL"
+)
+
+# Update with task_data
+editor.update_task(
+ task_id="train_model",
+ task_data={
+ "model_type": "BERT",
+ "epochs": 10,
+ "batch_size": 32
+ }
+)
+```
+
+### Removing Tasks
+
+```python
+# Remove task (also removes related dependencies)
+removed_id = editor.remove_task("preprocess")
+
+print(f"Removed task: {removed_id}")
+```
+
+### Querying Tasks
+
+```python
+# Get specific task
+task = editor.get_task("fetch_data")
+
+# List all tasks
+all_tasks = editor.list_tasks()
+
+for task in all_tasks:
+ print(f"{task.name}: {task.status.value}")
+
+# Get ready tasks
+ready = editor.get_ready_tasks()
+```
+
+---
+
+## 🔗 Dependency Operations
+
+### Adding Dependencies
+
+```python
+from galaxy.constellation import TaskStarLine
+
+# Method 1: Add existing TaskStarLine
+dep = TaskStarLine.create_success_only(
+ from_task_id="fetch_data",
+ to_task_id="preprocess",
+ description="Preprocess after successful download"
+)
+added_dep = editor.add_dependency(dep)
+
+# Method 2: Add from dictionary
+dep_dict = {
+ "from_task_id": "preprocess",
+ "to_task_id": "train_model",
+ "dependency_type": "SUCCESS_ONLY",
+ "condition_description": "Train on preprocessed data"
+}
+added_dep = editor.add_dependency(dep_dict)
+
+# Method 3: Create and add in one step
+dep = editor.create_and_add_dependency(
+ from_task_id="train_model",
+ to_task_id="evaluate_model",
+ dependency_type="UNCONDITIONAL",
+ condition_description="Evaluate after training completes"
+)
+```
+
+### Updating Dependencies
+
+```python
+# Update dependency properties
+updated_dep = editor.update_dependency(
+ dependency_id=dep.line_id,
+ dependency_type="CONDITIONAL",
+ condition_description="Evaluate only if training accuracy > 90%"
+)
+```
+
+### Removing Dependencies
+
+```python
+# Remove dependency
+removed_id = editor.remove_dependency(dep.line_id)
+```
+
+### Querying Dependencies
+
+```python
+# Get specific dependency
+dep = editor.get_dependency(dep_id)
+
+# List all dependencies
+all_deps = editor.list_dependencies()
+
+# Get dependencies for specific task
+task_deps = editor.get_task_dependencies("train_model")
+```
+
+---
+
+## 🔄 Undo/Redo Operations
+
+### Basic Undo/Redo
+
+```python
+# Add a task
+task = editor.create_and_add_task(
+ task_id="test_task",
+ description="Run unit tests"
+)
+
+# Oops, didn't mean to add that
+if editor.can_undo():
+ editor.undo()
+ print("✅ Task addition undone")
+
+# Actually, let's keep it
+if editor.can_redo():
+ editor.redo()
+ print("✅ Task addition redone")
+```
+
+### Checking Undo/Redo Availability
+
+```python
+# Check if undo/redo is available
+print(f"Can undo: {editor.can_undo()}")
+print(f"Can redo: {editor.can_redo()}")
+
+# Get description of what would be undone/redone
+if editor.can_undo():
+ print(f"Undo: {editor.get_undo_description()}")
+
+if editor.can_redo():
+ print(f"Redo: {editor.get_redo_description()}")
+```
+
+### Command History
+
+```python
+# Get command history
+history = editor.get_history()
+for i, cmd_desc in enumerate(history):
+ print(f"{i+1}. {cmd_desc}")
+
+# Example output:
+# 1. Add task: fetch_data
+# 2. Add task: preprocess
+# 3. Add dependency: fetch_data → preprocess
+# 4. Update task: preprocess
+
+# Clear history (cannot undo after this)
+editor.clear_history()
+```
+
+---
+
+## 🏗️ Bulk Operations
+
+### Building from Configuration
+
+```python
+from galaxy.agents.schema import TaskConstellationSchema
+
+# Build constellation from schema
+config = TaskConstellationSchema(
+ name="ml_pipeline",
+ tasks=[
+ {
+ "task_id": "fetch",
+ "description": "Fetch data",
+ "target_device_id": "server_1"
+ },
+ {
+ "task_id": "process",
+ "description": "Process data",
+ "target_device_id": "server_2"
+ }
+ ],
+ dependencies=[
+ {
+ "from_task_id": "fetch",
+ "to_task_id": "process",
+ "dependency_type": "SUCCESS_ONLY"
+ }
+ ]
+)
+
+constellation = editor.build_constellation(
+ config=config,
+ clear_existing=True # Clear current constellation first
+)
+```
+
+### Building from Lists
+
+```python
+# Build from task and dependency lists
+tasks = [
+ {
+ "task_id": "a",
+ "description": "Task A",
+ "target_device_id": "device_1"
+ },
+ {
+ "task_id": "b",
+ "description": "Task B",
+ "target_device_id": "device_2"
+ }
+]
+
+dependencies = [
+ {
+ "from_task_id": "a",
+ "to_task_id": "b",
+ "dependency_type": "UNCONDITIONAL"
+ }
+]
+
+constellation = editor.build_from_tasks_and_dependencies(
+ tasks=tasks,
+ dependencies=dependencies,
+ clear_existing=True,
+ metadata={"version": "1.0", "author": "system"}
+)
+```
+
+### Clearing Constellation
+
+```python
+# Remove all tasks and dependencies
+cleared = editor.clear_constellation()
+
+print(f"Constellation cleared: {cleared.task_count == 0}")
+```
+
+---
+
+## 💾 File Operations
+
+### Saving Constellation
+
+```python
+# Save to JSON file
+file_path = editor.save_constellation("my_workflow.json")
+
+print(f"Saved to: {file_path}")
+```
+
+### Loading Constellation
+
+```python
+# Load from JSON file
+loaded = editor.load_constellation("my_workflow.json")
+
+print(f"Loaded: {loaded.name}")
+print(f"Tasks: {loaded.task_count}")
+print(f"Dependencies: {loaded.dependency_count}")
+```
+
+### Loading from Data
+
+```python
+# Load from dictionary
+data = {
+ "name": "test_workflow",
+ "tasks": {...},
+ "dependencies": {...}
+}
+constellation = editor.load_from_dict(data)
+
+# Load from JSON string
+json_string = '{"name": "workflow", "tasks": {...}}'
+constellation = editor.load_from_json_string(json_string)
+```
+
+---
+
+## 🔍 Validation and Analysis
+
+### DAG Validation
+
+```python
+# Validate constellation structure
+is_valid, errors = editor.validate_constellation()
+
+if not is_valid:
+ print("❌ Validation errors:")
+ for error in errors:
+ print(f" - {error}")
+else:
+ print("✅ Constellation is valid")
+
+# Check for cycles
+if editor.has_cycles():
+ print("❌ Constellation contains cycles")
+```
+
+### Topological Analysis
+
+```python
+# Get topological order
+try:
+ order = editor.get_topological_order()
+ print(f"Execution order: {' → '.join(order)}")
+except ValueError as e:
+ print(f"Cannot get order: {e}")
+```
+
+### Statistics
+
+```python
+# Get comprehensive statistics
+stats = editor.get_statistics()
+
+print(f"Constellation: {stats['constellation_id']}")
+print(f"Tasks: {stats['total_tasks']}")
+print(f"Dependencies: {stats['total_dependencies']}")
+print(f"Longest path: {stats['longest_path_length']}")
+print(f"Max width: {stats['max_width']}")
+print(f"Parallelism ratio: {stats['parallelism_ratio']:.2f}")
+
+# Editor-specific stats
+print(f"Commands executed: {stats['editor_execution_count']}")
+print(f"History size: {stats['editor_history_size']}")
+print(f"Can undo: {stats['editor_can_undo']}")
+print(f"Can redo: {stats['editor_can_redo']}")
+```
+
+---
+
+## 👀 Observer Pattern
+
+### Adding Observers
+
+```python
+# Define observer callback
+def on_change(editor, command, result):
+ print(f"Operation: {command}")
+ print(f"Result: {result}")
+ print(f"Constellation state: {editor.constellation.state.value}")
+
+# Add observer
+editor.add_observer(on_change)
+
+# Now all operations trigger the observer
+task = editor.create_and_add_task(
+ task_id="observed_task",
+ description="This triggers the observer"
+)
+# Output:
+# Operation: add_task
+# Result:
+# Constellation state: ready
+```
+
+### Removing Observers
+
+```python
+# Remove specific observer
+editor.remove_observer(on_change)
+
+# Operations no longer trigger this observer
+```
+
+### Multiple Observers
+
+```python
+def log_observer(editor, command, result):
+ with open("constellation_log.txt", "a") as f:
+ f.write(f"{command}: {result}\n")
+
+def metrics_observer(editor, command, result):
+ stats = editor.get_statistics()
+ print(f"Current metrics: P={stats['parallelism_ratio']:.2f}")
+
+# Add multiple observers
+editor.add_observer(log_observer)
+editor.add_observer(metrics_observer)
+
+# All observers are notified on each operation
+```
+
+---
+
+## 🎨 Advanced Features
+
+### Batch Operations
+
+```python
+# Execute multiple operations in sequence
+operations = [
+ lambda e: e.create_and_add_task("task_a", "Task A"),
+ lambda e: e.create_and_add_task("task_b", "Task B"),
+ lambda e: e.create_and_add_dependency("task_a", "task_b", "UNCONDITIONAL"),
+]
+
+results = editor.batch_operations(operations)
+
+for i, result in enumerate(results):
+ if isinstance(result, Exception):
+ print(f"Operation {i+1} failed: {result}")
+ else:
+ print(f"Operation {i+1} succeeded: {result}")
+```
+
+### Creating Subgraphs
+
+```python
+# Extract subgraph with specific tasks
+task_ids = ["fetch_data", "preprocess", "train_model"]
+subgraph_editor = editor.create_subgraph(task_ids)
+
+print(f"Subgraph tasks: {subgraph_editor.constellation.task_count}")
+print(f"Subgraph deps: {subgraph_editor.constellation.dependency_count}")
+
+# Subgraph includes only dependencies between included tasks
+```
+
+### Merging Constellations
+
+```python
+# Create two separate workflows
+editor1 = ConstellationEditor()
+editor1.create_and_add_task("task_a", "Task A from editor1")
+
+editor2 = ConstellationEditor()
+editor2.create_and_add_task("task_b", "Task B from editor2")
+
+# Merge editor2 into editor1 with prefix
+editor1.merge_constellation(
+ other_editor=editor2,
+ prefix="imported_"
+)
+
+# editor1 now contains: task_a, imported_task_b
+```
+
+---
+
+## 🛡️ Error Handling
+
+### Validation Errors
+
+```python
+try:
+ # Try to add task with duplicate ID
+ editor.create_and_add_task("existing_id", "Duplicate task")
+except Exception as e:
+ print(f"❌ Error: {e}")
+ # Can undo to previous valid state
+ if editor.can_undo():
+ editor.undo()
+```
+
+### Cyclic Dependency Detection
+
+```python
+# Create cycle: A → B → C → A
+editor.create_and_add_task("a", "Task A")
+editor.create_and_add_task("b", "Task B")
+editor.create_and_add_task("c", "Task C")
+
+editor.create_and_add_dependency("a", "b", "UNCONDITIONAL")
+editor.create_and_add_dependency("b", "c", "UNCONDITIONAL")
+
+try:
+ # This creates a cycle
+ editor.create_and_add_dependency("c", "a", "UNCONDITIONAL")
+except Exception as e:
+ print(f"❌ Cycle detected: {e}")
+ # Undo the failed operation
+ # (Actually, the operation fails before execution, so nothing to undo)
+```
+
+---
+
+## 📊 Complete Example Workflow
+
+```python
+from galaxy.constellation.editor import ConstellationEditor
+
+# Create editor
+editor = ConstellationEditor(enable_history=True)
+
+# Build ML training pipeline
+# Step 1: Add tasks
+fetch = editor.create_and_add_task(
+ task_id="fetch_data",
+ description="Download dataset from S3",
+ target_device_id="linux_server_1",
+ timeout=300.0
+)
+
+preprocess = editor.create_and_add_task(
+ task_id="preprocess",
+ description="Clean and normalize data",
+ target_device_id="linux_server_2",
+ timeout=600.0
+)
+
+train = editor.create_and_add_task(
+ task_id="train_model",
+ description="Train BERT model",
+ target_device_id="gpu_server_a100",
+ priority="HIGH",
+ timeout=7200.0,
+ retry_count=2
+)
+
+evaluate = editor.create_and_add_task(
+ task_id="evaluate",
+ description="Evaluate model on test set",
+ target_device_id="linux_server_3"
+)
+
+# Step 2: Add dependencies
+editor.create_and_add_dependency(
+ "fetch_data", "preprocess", "SUCCESS_ONLY"
+)
+editor.create_and_add_dependency(
+ "preprocess", "train_model", "SUCCESS_ONLY"
+)
+editor.create_and_add_dependency(
+ "train_model", "evaluate", "UNCONDITIONAL"
+)
+
+# Step 3: Validate
+is_valid, errors = editor.validate_constellation()
+assert is_valid, f"Validation failed: {errors}"
+
+# Step 4: Analyze
+stats = editor.get_statistics()
+print(f"Pipeline: {stats['total_tasks']} tasks, {stats['total_dependencies']} dependencies")
+print(f"Critical path: {stats['longest_path_length']}")
+print(f"Parallelism: {stats['parallelism_ratio']:.2f}")
+
+# Step 5: Save
+editor.save_constellation("ml_training_pipeline.json")
+
+# Step 6: Execute (via orchestrator)
+constellation = editor.constellation
+# Pass to ConstellationOrchestrator for distributed execution
+# See: ../constellation_orchestrator/overview.md for execution details
+```
+
+For details on executing the built constellation, see the [Constellation Orchestrator documentation](../constellation_orchestrator/overview.md).
+
+---
+
+## 🎯 Best Practices
+
+### Editor Usage Guidelines
+
+1. **Enable history**: Always enable undo/redo for interactive editing sessions
+2. **Validate frequently**: Run `validate_constellation()` after major structural changes
+3. **Use observers**: Add observers for logging, metrics tracking, or UI updates
+4. **Batch operations**: Use `batch_operations()` for multiple related changes to improve efficiency
+5. **Save incrementally**: Create constellation checkpoints during complex editing workflows
+
+### Command Pattern Benefits
+
+The command pattern architecture provides several key advantages:
+
+- **Undo/Redo**: Full operation history with rollback capabilities
+- **Audit trail**: Every change is recorded and traceable
+- **Transaction safety**: Operations are atomic and validated
+- **Extensibility**: New operation types can be added easily
+
+!!!warning "Common Pitfalls"
+ - **Forgetting to validate**: Always validate before passing to orchestrator for execution
+ - **Clearing history prematurely**: Cannot undo operations after calling `clear_history()`
+ - **Modifying running constellations**: Editor operations will fail if constellation is currently executing
+ - **Ignoring observer errors**: Observers should handle their own exceptions to avoid breaking the editor
+
+---
+
+## 📚 Command Registry
+
+### Available Commands
+
+```python
+# List all available commands
+commands = editor.list_available_commands()
+
+for name, metadata in commands.items():
+ print(f"{name}: {metadata['description']}")
+ print(f" Category: {metadata['category']}")
+
+# Get command categories
+categories = editor.get_command_categories()
+print(f"Categories: {categories}")
+
+# Get metadata for specific command
+metadata = editor.get_command_metadata("add_task")
+print(metadata)
+```
+
+### Executing Commands by Name
+
+```python
+# Execute command using registry
+result = editor.execute_command_by_name(
+ "add_task",
+ task_data={"task_id": "new_task", "description": "New task"}
+)
+
+# This is equivalent to:
+# editor.add_task({"task_id": "new_task", "description": "New task"})
+```
+
+---
+
+## 🔗 Related Components
+
+- **[TaskStar](task_star.md)** — Individual tasks that can be edited and managed
+- **[TaskStarLine](task_star_line.md)** — Dependencies between tasks that define execution order
+- **[TaskConstellation](task_constellation.md)** — The constellation DAG being edited
+- **[Overview](overview.md)** — Task Constellation framework overview
+
+### Related Documentation
+
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** — Learn how edited constellations are scheduled and executed
+- **[Constellation Agent](../constellation_agent/overview.md)** — Understand how agents use the editor to build constellations
+- **[Command Pattern](https://en.wikipedia.org/wiki/Command_pattern)** — More about the command design pattern
+
+---
+
+## 📚 API Reference
+
+### Constructor
+
+```python
+ConstellationEditor(
+ constellation: Optional[TaskConstellation] = None,
+ enable_history: bool = True,
+ max_history_size: int = 100
+)
+```
+
+### Task Operations
+
+| Method | Description |
+|--------|-------------|
+| `add_task(task)` | Add task (TaskStar or dict), returns TaskStar |
+| `create_and_add_task(task_id, description, name, **kwargs)` | Create and add new task, returns TaskStar |
+| `update_task(task_id, **updates)` | Update task properties, returns updated TaskStar |
+| `remove_task(task_id)` | Remove task and related dependencies, returns removed task ID (str) |
+| `get_task(task_id)` | Get task by ID, returns Optional[TaskStar] |
+| `list_tasks()` | Get all tasks, returns List[TaskStar] |
+
+### Dependency Operations
+
+| Method | Description |
+|--------|-------------|
+| `add_dependency(dependency)` | Add dependency (TaskStarLine or dict), returns TaskStarLine |
+| `create_and_add_dependency(from_id, to_id, type, **kwargs)` | Create and add dependency, returns TaskStarLine |
+| `update_dependency(dependency_id, **updates)` | Update dependency properties, returns updated TaskStarLine |
+| `remove_dependency(dependency_id)` | Remove dependency, returns removed dependency ID (str) |
+| `get_dependency(dependency_id)` | Get dependency by ID, returns Optional[TaskStarLine] |
+| `list_dependencies()` | Get all dependencies, returns List[TaskStarLine] |
+| `get_task_dependencies(task_id)` | Get dependencies for specific task, returns List[TaskStarLine] |
+
+### Bulk Operations
+
+| Method | Description |
+|--------|-------------|
+| `build_constellation(config, clear_existing)` | Build constellation from TaskConstellationSchema |
+| `build_from_tasks_and_dependencies(tasks, deps, ...)` | Build constellation from task and dependency lists (returns TaskConstellation) |
+| `clear_constellation()` | Remove all tasks and dependencies from constellation |
+| `batch_operations(operations)` | Execute multiple operations in sequence, returning list of results |
+
+### File Operations
+
+| Method | Description |
+|--------|-------------|
+| `save_constellation(file_path)` | Save constellation to JSON file, returns file path |
+| `load_constellation(file_path)` | Load constellation from JSON file, returns TaskConstellation |
+| `load_from_dict(data)` | Load constellation from dictionary, returns TaskConstellation |
+| `load_from_json_string(json_string)` | Load constellation from JSON string, returns TaskConstellation |
+
+### History Operations
+
+| Method | Description |
+|--------|-------------|
+| `undo()` | Undo last command, returns True if successful, False if no undo available |
+| `redo()` | Redo next command, returns True if successful, False if no redo available |
+| `can_undo()` | Check if undo is available (returns bool) |
+| `can_redo()` | Check if redo is available (returns bool) |
+| `get_undo_description()` | Get description of operation that would be undone (returns Optional[str]) |
+| `get_redo_description()` | Get description of operation that would be redone (returns Optional[str]) |
+| `clear_history()` | Clear command history (no return value) |
+| `get_history()` | Get list of command descriptions (returns List[str]) |
+
+### Validation
+
+| Method | Description |
+|--------|-------------|
+| `validate_constellation()` | Validate DAG structure, returns tuple of (is_valid: bool, errors: List[str]) |
+| `has_cycles()` | Check for cycles in the DAG, returns bool |
+| `get_topological_order()` | Get topological ordering of tasks, returns List[str] of task IDs |
+| `get_ready_tasks()` | Get tasks ready to execute (no pending dependencies), returns List[TaskStar] |
+| `get_statistics()` | Get comprehensive constellation and editor statistics, returns Dict[str, Any] |
+
+### Observers
+
+| Method | Description |
+|--------|-------------|
+| `add_observer(observer)` | Add change observer callable that receives (editor, command, result) |
+| `remove_observer(observer)` | Remove previously added observer |
+
+### Advanced
+
+| Method | Description |
+|--------|-------------|
+| `create_subgraph(task_ids)` | Extract subgraph with specific tasks |
+| `merge_constellation(other_editor, prefix)` | Merge another constellation with optional ID prefix |
+| `display_constellation(mode)` | Display visualization (modes: 'overview', 'topology', 'details', 'execution') |
+
+For interactive web-based visualization and editing, see the [Galaxy WebUI](../webui.md).
+
+---
+
+**ConstellationEditor** — Safe, interactive, and reversible constellation manipulation
diff --git a/documents/docs/galaxy/constellation/overview.md b/documents/docs/galaxy/constellation/overview.md
new file mode 100644
index 000000000..64481a2cb
--- /dev/null
+++ b/documents/docs/galaxy/constellation/overview.md
@@ -0,0 +1,409 @@
+# Task Constellation — Overview
+
+
+
+
Example of a Task Constellation illustrating both sequential and parallel dependencies
+
+
+---
+
+## 🌌 Introduction
+
+The **Task Constellation** is the central abstraction in Galaxy that captures the concurrent and asynchronous structure of distributed task execution. It provides a formal, directed acyclic graph (DAG) representation of complex workflows, enabling consistent scheduling, fault-tolerant orchestration, and runtime dynamism across heterogeneous devices.
+
+At its core, a Task Constellation decomposes complex user requests into interdependent subtasks connected through explicit dependency edges. This formalism not only enables correct distributed execution but also supports runtime adaptation—allowing new tasks or dependencies to be introduced as the workflow evolves.
+
+For information on how Task Constellations are orchestrated and scheduled, see the [Constellation Orchestrator](../constellation_orchestrator/overview.md) documentation. To understand how agents interact with constellations, refer to the [Constellation Agent](../constellation_agent/overview.md) guide.
+
+---
+
+## 🎯 Core Components
+
+The Task Constellation framework consists of four primary components:
+
+| Component | Purpose | Key Features |
+|-----------|---------|--------------|
+| **[TaskStar](task_star.md)** | Atomic execution unit | Self-contained task with description, device assignment, execution state, dependencies |
+| **[TaskStarLine](task_star_line.md)** | Dependency relationship | Directed edge with conditional logic, success-only, completion-only, or unconditional execution |
+| **[TaskConstellation](task_constellation.md)** | DAG orchestrator | Complete workflow graph with validation, scheduling, and dynamic modification |
+| **[ConstellationEditor](constellation_editor.md)** | Interactive editor | Command pattern-based interface with undo/redo for safe constellation manipulation |
+
+---
+
+## 📐 Formal Model
+
+### Mathematical Foundation
+
+A Task Constellation $\mathcal{C}$ is formally defined as a directed acyclic graph (DAG):
+
+$$
+\mathcal{C} = (\mathcal{T}, \mathcal{E})
+$$
+
+where:
+- $\mathcal{T}$ is the set of all **TaskStars** (task nodes)
+- $\mathcal{E}$ is the set of **TaskStarLines** (dependency edges)
+
+### TaskStar Representation
+
+Each TaskStar $t_i \in \mathcal{T}$ encapsulates a complete task specification:
+
+$$
+t_i = (\text{name}_ i, \text{description}_ i, \text{target\_device\_id}_ i, \text{tips}_ i, \text{status}_ i, \text{dependencies}_ i)
+$$
+
+**Components:**
+- **name**: Short name for the task
+- **description**: Natural-language specification sent to the device agent
+- **target_device_id**: ID of the device agent responsible for execution
+- **tips**: List of guidance hints to help the device agent complete the task
+- **status**: Current execution state (pending, running, completed, failed, cancelled, waiting_dependency)
+- **dependencies**: Set of prerequisite task IDs that must complete first
+
+### TaskStarLine Representation
+
+Each TaskStarLine $e_{i \rightarrow j} \in \mathcal{E}$ represents a dependency from task $t_i$ to task $t_j$.
+
+**Dependency Types:**
+
+| Type | Behavior |
+|------|----------|
+| **Unconditional** | $t_j$ always waits for $t_i$ to complete |
+| **Success-only** | $t_j$ proceeds only if $t_i$ succeeds |
+| **Completion-only** | $t_j$ proceeds when $t_i$ completes (regardless of success/failure) |
+| **Conditional** | $t_j$ proceeds based on a user-defined or runtime condition |
+
+---
+
+## ✨ Key Advantages
+
+### 1. Explicit Task Ordering
+Task dependencies are explicitly captured in the DAG structure, ensuring correctness across distributed execution without ambiguity.
+
+### 2. Natural Parallelism
+The DAG topology naturally exposes parallelizable tasks, enabling efficient concurrent execution across heterogeneous devices.
+
+### 3. Runtime Dynamism
+Unlike static DAG schedulers, Task Constellations are **mutable objects**. Tasks and dependency edges can be:
+- **Added**: Introduce new subtasks or diagnostic tasks
+- **Removed**: Prune completed or redundant nodes
+- **Modified**: Rewire dependencies, update conditions, change device assignments
+
+This enables adaptive execution without restarting the entire workflow.
+
+### 4. Formal Guarantees
+The DAG representation provides formal properties:
+- **Acyclicity**: No circular dependencies
+- **Causal consistency**: Execution respects logical ordering
+- **Safe concurrency**: Parallel execution without race conditions
+
+---
+
+## 🔄 Lifecycle States
+
+The Task Constellation progresses through several states during its lifecycle:
+
+```mermaid
+stateDiagram-v2
+ [*] --> CREATED: Initialize
+ CREATED --> READY: Add tasks & dependencies
+ READY --> EXECUTING: Start execution
+ EXECUTING --> EXECUTING: Tasks running
+ EXECUTING --> COMPLETED: All tasks succeed
+ EXECUTING --> FAILED: All tasks fail
+ EXECUTING --> PARTIALLY_FAILED: Some succeed, some fail
+ COMPLETED --> [*]
+ FAILED --> [*]
+ PARTIALLY_FAILED --> [*]
+```
+
+| State | Description |
+|-------|-------------|
+| **CREATED** | Constellation initialized, no tasks added |
+| **READY** | Tasks and dependencies configured, ready to execute |
+| **EXECUTING** | At least one task is running or completed |
+| **COMPLETED** | All tasks completed successfully |
+| **FAILED** | All tasks failed |
+| **PARTIALLY_FAILED** | Some tasks succeeded, some failed |
+
+---
+
+## 📊 DAG Metrics
+
+### Parallelism Analysis
+
+The Task Constellation provides several metrics to analyze workflow parallelism:
+
+#### Critical Path Length ($L$)
+The longest serial dependency chain in the constellation:
+
+$$
+L = \max_{p \in \text{paths}} |p|
+$$
+
+where $|p|$ is the length of path $p$ from any root to any leaf node.
+
+#### Total Work ($W$)
+Sum of all task execution durations:
+
+$$
+W = \sum_{t_i \in \mathcal{T}} \text{duration}(t_i)
+$$
+
+#### Parallelism Ratio ($P$)
+Measure of achievable parallelism:
+
+$$
+P = \frac{W}{L}
+$$
+
+- $P = 1$: Completely serial execution
+- $P > 1$: Parallel execution possible
+- Higher $P$ indicates more parallelism
+
+#### Maximum Width
+Maximum number of tasks that can execute concurrently:
+
+$$
+\text{MaxWidth} = \max_{\text{level}} |\text{tasks at level}|
+$$
+
+!!!info "Calculation Modes"
+ The constellation supports two calculation modes:
+
+ - **Node Count Mode**: Uses task counts when execution is incomplete
+ - **Actual Time Mode**: Uses real execution durations when all tasks are terminal
+
+---
+
+## 🛠️ Core Operations
+
+### DAG Construction
+
+```python
+from galaxy.constellation import TaskConstellation, TaskStar, TaskStarLine
+
+# Create constellation
+constellation = TaskConstellation(name="my_workflow")
+
+# Add tasks
+task_a = TaskStar(name="task_a", description="Checkout code on laptop")
+task_b = TaskStar(name="task_b", description="Build on GPU server")
+task_c = TaskStar(name="task_c", description="Deploy to staging")
+
+constellation.add_task(task_a)
+constellation.add_task(task_b)
+constellation.add_task(task_c)
+
+# Add dependencies
+dep_ab = TaskStarLine.create_success_only(
+ from_task_id=task_a.task_id,
+ to_task_id=task_b.task_id,
+ description="Build depends on successful checkout"
+)
+
+dep_bc = TaskStarLine.create_unconditional(
+ from_task_id=task_b.task_id,
+ to_task_id=task_c.task_id,
+ description="Deploy after build"
+)
+
+constellation.add_dependency(dep_ab)
+constellation.add_dependency(dep_bc)
+```
+
+### DAG Validation
+
+```python
+# Validate structure
+is_valid, errors = constellation.validate_dag()
+if not is_valid:
+ print(f"Validation errors: {errors}")
+
+# Check for cycles
+has_cycles = constellation.has_cycle()
+
+# Get topological order
+order = constellation.get_topological_order()
+print(f"Execution order: {order}")
+```
+
+### Parallelism Analysis
+
+```python
+# Get parallelism metrics
+metrics = constellation.get_parallelism_metrics()
+
+print(f"Critical Path Length: {metrics['critical_path_length']}")
+print(f"Total Work: {metrics['total_work']}")
+print(f"Parallelism Ratio: {metrics['parallelism_ratio']}")
+print(f"Critical Path: {metrics['critical_path_tasks']}")
+
+# Get maximum width
+max_width = constellation.get_max_width()
+print(f"Maximum concurrent tasks: {max_width}")
+```
+
+---
+
+## 🔧 Dynamic Modification
+
+### Safe Editing with ConstellationEditor
+
+```python
+from galaxy.constellation.editor import ConstellationEditor
+
+# Create editor with undo/redo support
+editor = ConstellationEditor(constellation)
+
+# Add a new diagnostic task
+diagnostic_task = editor.create_and_add_task(
+ task_id="diag_1",
+ description="Check server health",
+ name="Server Health Check"
+)
+
+# Add conditional dependency
+editor.create_and_add_dependency(
+ from_task_id=task_b.task_id,
+ to_task_id=diagnostic_task.task_id,
+ dependency_type="CONDITIONAL",
+ condition_description="Run diagnostic if build fails"
+)
+
+# Undo if needed
+if something_wrong:
+ editor.undo()
+
+# Get modifiable components
+modifiable_tasks = constellation.get_modifiable_tasks()
+modifiable_deps = constellation.get_modifiable_dependencies()
+```
+
+!!!warning "Modification Safety"
+ Tasks and dependencies can only be modified if they are in `PENDING` or `WAITING_DEPENDENCY` status. Running or completed tasks cannot be modified to ensure execution consistency.
+
+---
+
+## 📈 Example Workflows
+
+### Sequential Workflow
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ B --> C[Task C]
+```
+
+- **Parallelism Ratio**: 1.0 (completely serial)
+- **Maximum Width**: 1
+
+### Parallel Workflow
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ A --> C[Task C]
+ B --> D[Task D]
+ C --> D
+```
+
+- **Parallelism Ratio**: 2.0 (B and C can run in parallel)
+- **Maximum Width**: 2
+
+### Complex Workflow
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ A --> C[Task C]
+ B --> D[Task D]
+ C --> E[Task E]
+ D --> F[Task F]
+ E --> F
+```
+
+- **Parallelism Ratio**: ~1.67
+- **Maximum Width**: 3 (B, C, E can run concurrently after A completes)
+
+---
+
+## 🎨 Visualization
+
+The Task Constellation provides multiple visualization modes for monitoring and debugging:
+
+### Overview Mode
+High-level constellation structure with task counts and state
+
+### Topology Mode
+DAG graph showing task relationships and dependencies
+
+### Details Mode
+Detailed task information including execution times and status
+
+### Execution Mode
+Real-time execution flow with progress tracking
+
+```python
+# Display constellation
+constellation.display_dag(mode="overview") # or "topology", "details", "execution"
+```
+
+For interactive web-based visualization, check out the [Galaxy WebUI](../webui.md).
+
+---
+
+## 📚 Component Documentation
+
+Explore detailed documentation for each component:
+
+- **[TaskStar](task_star.md)** — Atomic execution units representing individual tasks in the constellation
+- **[TaskStarLine](task_star_line.md)** — Dependency relationships connecting tasks with conditional logic
+- **[TaskConstellation](task_constellation.md)** — Complete DAG orchestrator managing workflow execution and coordination
+- **[ConstellationEditor](constellation_editor.md)** — Interactive editor with command pattern and undo/redo capabilities
+
+### Related Documentation
+
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** — Learn how constellations are scheduled and executed across devices
+- **[Constellation Agent](../constellation_agent/overview.md)** — Understand how agents plan and manage constellation lifecycles
+- **[Evaluation & Metrics](../evaluation/performance_metrics.md)** — Monitor constellation performance and analyze execution patterns
+
+---
+
+## 🔬 Research Background
+
+The Task Constellation model is grounded in formal DAG theory and distributed systems research. Key properties include:
+
+- **Acyclicity guarantees** through Kahn's algorithm for topological sorting
+- **Topological ordering** for consistent execution
+- **Critical path analysis** for performance optimization
+- **Dynamic graph evolution** without compromising consistency
+
+For more on Galaxy's architecture and design principles, see the [Galaxy Overview](../overview.md).
+
+---
+
+## 💡 Best Practices
+
+!!!tip "Designing Effective Constellations"
+ 1. **Keep tasks atomic**: Each TaskStar should represent a single, well-defined operation
+ 2. **Minimize dependencies**: Reduce unnecessary dependencies to maximize parallelism
+ 3. **Use appropriate dependency types**: Choose conditional dependencies for error handling
+ 4. **Validate early**: Run `validate_dag()` before execution
+ 5. **Monitor metrics**: Track parallelism ratio to optimize workflow design
+
+**Common Patterns:**
+
+- **Fan-out**: One task spawns multiple independent parallel tasks
+- **Fan-in**: Multiple parallel tasks converge to a single task
+- **Pipeline**: Sequential stages with parallel tasks within each stage
+- **Conditional branching**: Use conditional dependencies for error handling paths
+
+---
+
+## 🚀 Next Steps
+
+- Learn about **[TaskStar](task_star.md)** — Atomic task execution units
+- Explore **[TaskStarLine](task_star_line.md)** — Dependency relationships
+- Master **[TaskConstellation](task_constellation.md)** — DAG orchestration
+- Try **[ConstellationEditor](constellation_editor.md)** — Interactive editing
diff --git a/documents/docs/galaxy/constellation/task_constellation.md b/documents/docs/galaxy/constellation/task_constellation.md
new file mode 100644
index 000000000..a7ada24b1
--- /dev/null
+++ b/documents/docs/galaxy/constellation/task_constellation.md
@@ -0,0 +1,811 @@
+# TaskConstellation — DAG Orchestrator
+
+## Overview
+
+**TaskConstellation** is the complete DAG (Directed Acyclic Graph) orchestration system that manages distributed workflows across heterogeneous devices. It provides comprehensive task management, dependency validation, execution scheduling, and runtime dynamism for complex cross-device orchestration.
+
+**Formal Definition:** A TaskConstellation $\mathcal{C}$ is a DAG defined as:
+
+$$
+\mathcal{C} = (\mathcal{T}, \mathcal{E})
+$$
+
+where $\mathcal{T}$ is the set of TaskStars and $\mathcal{E}$ is the set of TaskStarLines.
+
+---
+
+## Architecture
+
+### Core Components
+
+| Component | Type | Description |
+|-----------|------|-------------|
+| **constellation_id** | `str` | Unique identifier for the constellation |
+| **name** | `str` | Human-readable constellation name |
+| **state** | `ConstellationState` | Current execution state |
+| **tasks** | `Dict[str, TaskStar]` | All tasks in the constellation |
+| **dependencies** | `Dict[str, TaskStarLine]` | All dependency relationships |
+| **metadata** | `Dict[str, Any]` | Additional constellation metadata |
+
+### Execution Tracking
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **execution_start_time** | `datetime` | When execution started |
+| **execution_end_time** | `datetime` | When execution completed |
+| **execution_duration** | `float` | Total execution time in seconds |
+| **created_at** | `datetime` | Constellation creation timestamp |
+| **updated_at** | `datetime` | Last modification timestamp |
+
+---
+
+## Constellation Lifecycle
+
+```mermaid
+stateDiagram-v2
+ [*] --> CREATED: Initialize
+ CREATED --> READY: Add tasks & validate
+ READY --> EXECUTING: Start execution
+ EXECUTING --> EXECUTING: Tasks running
+ EXECUTING --> COMPLETED: All succeed
+ EXECUTING --> FAILED: All fail
+ EXECUTING --> PARTIALLY_FAILED: Mixed results
+ COMPLETED --> [*]
+ FAILED --> [*]
+ PARTIALLY_FAILED --> [*]
+```
+
+### State Definitions
+
+| State | Description | Transition Trigger |
+|-------|-------------|-------------------|
+| **CREATED** | Empty constellation, no tasks added | Initialization |
+| **READY** | Tasks added, validated, ready to execute | Tasks added, no running tasks |
+| **EXECUTING** | At least one task running or completed | First task starts |
+| **COMPLETED** | All tasks completed successfully | Last task succeeds |
+| **FAILED** | All tasks failed | Last task fails, no successes |
+| **PARTIALLY_FAILED** | Some tasks succeeded, some failed | Mixed terminal states |
+
+---
+
+## Core Operations
+
+### Creating a Constellation
+
+```python
+from galaxy.constellation import TaskConstellation
+
+# Create with auto-generated ID
+constellation = TaskConstellation()
+print(f"ID: {constellation.constellation_id}")
+# Output: constellation_20251106_143052_a1b2c3d4
+
+# Create with custom name
+constellation = TaskConstellation(
+ name="ml_training_pipeline",
+ constellation_id="pipeline_001"
+)
+```
+
+---
+
+### Adding Tasks
+
+```python
+from galaxy.constellation import TaskStar
+
+# Create tasks
+task_a = TaskStar(
+ task_id="fetch_data",
+ description="Download training dataset",
+ target_device_id="linux_server_1"
+)
+
+task_b = TaskStar(
+ task_id="preprocess",
+ description="Preprocess and normalize data",
+ target_device_id="linux_server_2"
+)
+
+task_c = TaskStar(
+ task_id="train_model",
+ description="Train neural network",
+ target_device_id="gpu_server_1"
+)
+
+# Add to constellation
+constellation.add_task(task_a)
+constellation.add_task(task_b)
+constellation.add_task(task_c)
+
+print(f"Total tasks: {constellation.task_count}")
+# Output: Total tasks: 3
+```
+
+---
+
+### Adding Dependencies
+
+```python
+from galaxy.constellation import TaskStarLine
+
+# Create dependencies
+dep1 = TaskStarLine.create_success_only(
+ from_task_id="fetch_data",
+ to_task_id="preprocess",
+ description="Preprocess after successful download"
+)
+
+dep2 = TaskStarLine.create_success_only(
+ from_task_id="preprocess",
+ to_task_id="train_model",
+ description="Train on preprocessed data"
+)
+
+# Add to constellation
+constellation.add_dependency(dep1)
+constellation.add_dependency(dep2)
+
+print(f"Total dependencies: {constellation.dependency_count}")
+# Output: Total dependencies: 2
+```
+
+---
+
+### Removing Tasks and Dependencies
+
+```python
+# Remove a task (also removes related dependencies)
+constellation.remove_task("preprocess")
+
+# Remove a dependency
+constellation.remove_dependency(dep1.line_id)
+
+# Get specific task or dependency
+task = constellation.get_task("fetch_data")
+dep = constellation.get_dependency(dep1.line_id)
+```
+
+---
+
+## DAG Validation
+
+### Cycle Detection
+
+```python
+# Check for cycles
+has_cycles = constellation.has_cycle()
+
+if has_cycles:
+ print("❌ Constellation contains cycles!")
+else:
+ print("✅ DAG is acyclic")
+
+# Comprehensive validation
+is_valid, errors = constellation.validate_dag()
+
+if not is_valid:
+ for error in errors:
+ print(f"❌ {error}")
+else:
+ print("✅ Constellation is valid")
+```
+
+### Topological Ordering
+
+```python
+try:
+ # Get topological order (throws if cyclic)
+ order = constellation.get_topological_order()
+ print(f"Execution order: {' → '.join(order)}")
+ # Output: fetch_data → preprocess → train_model
+
+except ValueError as e:
+ print(f"Cannot get topological order: {e}")
+```
+
+---
+
+## Scheduling and Execution
+
+### Getting Ready Tasks
+
+```python
+# Get tasks ready to execute (no pending dependencies)
+ready_tasks = constellation.get_ready_tasks()
+
+for task in ready_tasks:
+ print(f"Ready: {task.name} (priority: {task.priority.value})")
+ # Tasks are sorted by priority (highest first)
+```
+
+### Execution Flow
+
+```python
+# Start constellation execution
+constellation.start_execution()
+
+# Start a specific task
+constellation.start_task("fetch_data")
+
+# Mark task as completed
+newly_ready = constellation.mark_task_completed(
+ task_id="fetch_data",
+ success=True,
+ result={"rows": 10000, "status": "success"}
+)
+
+# newly_ready contains tasks that became ready after this completion
+for task in newly_ready:
+ print(f"Now ready: {task.name}")
+```
+
+### Querying Task Status
+
+```python
+# Get tasks by status
+running = constellation.get_running_tasks()
+completed = constellation.get_completed_tasks()
+failed = constellation.get_failed_tasks()
+pending = constellation.get_pending_tasks()
+
+print(f"Running: {len(running)}")
+print(f"Completed: {len(completed)}")
+print(f"Failed: {len(failed)}")
+print(f"Pending: {len(pending)}")
+
+# Check if entire constellation is complete
+if constellation.is_complete():
+ constellation.complete_execution()
+ print(f"State: {constellation.state}")
+```
+
+---
+
+## Parallelism Analysis
+
+### DAG Metrics
+
+```python
+# Get longest path (critical path) using node counts
+longest_path_length, longest_path = constellation.get_longest_path()
+
+print(f"Critical path length: {longest_path_length}")
+print(f"Critical path: {' → '.join(longest_path)}")
+
+# Get maximum width (max concurrent tasks)
+max_width = constellation.get_max_width()
+print(f"Maximum parallelism: {max_width} tasks")
+```
+
+### Parallelism Ratio
+
+```python
+# Calculate parallelism metrics (L, W, P)
+metrics = constellation.get_parallelism_metrics()
+
+print(f"Critical Path Length (L): {metrics['critical_path_length']}")
+print(f"Total Work (W): {metrics['total_work']}")
+print(f"Parallelism Ratio (P): {metrics['parallelism_ratio']:.2f}")
+print(f"Calculation Mode: {metrics['calculation_mode']}")
+
+# Interpretation:
+# P = 1.0 → Completely serial
+# P = 2.0 → 2x parallelism on average
+# P = 3.5 → 3.5x parallelism on average
+```
+
+**Note:** Calculation modes depend on task completion status:
+- **node_count**: Used when tasks are incomplete (counts each task as 1 unit)
+- **actual_time**: Used when all tasks are terminal (uses real execution durations)
+
+### Time-Based Critical Path
+
+```python
+# Get critical path using actual execution times
+# Only valid when all tasks are completed or failed
+critical_time, critical_path_tasks = constellation.get_critical_path_length_with_time()
+
+print(f"Critical path duration: {critical_time:.2f} seconds")
+print(f"Tasks on critical path: {critical_path_tasks}")
+
+# Get total work
+total_work = constellation.get_total_work()
+print(f"Total work: {total_work:.2f} seconds")
+
+# Calculate speedup
+speedup = total_work / critical_time if critical_time > 0 else 0
+print(f"Speedup: {speedup:.2f}x")
+```
+
+---
+
+## Statistics and Monitoring
+
+### Comprehensive Statistics
+
+```python
+stats = constellation.get_statistics()
+
+print(f"Constellation: {stats['name']}")
+print(f"State: {stats['state']}")
+print(f"Tasks: {stats['total_tasks']}")
+print(f"Dependencies: {stats['total_dependencies']}")
+print(f"Longest Path: {stats['longest_path_length']}")
+print(f"Max Width: {stats['max_width']}")
+print(f"Parallelism Ratio: {stats['parallelism_ratio']:.2f}")
+
+# Task status breakdown
+status_counts = stats['task_status_counts']
+for status, count in status_counts.items():
+ print(f" {status}: {count}")
+
+# Execution duration
+if stats['execution_duration']:
+ print(f"Duration: {stats['execution_duration']:.2f} seconds")
+```
+
+---
+
+## Dynamic Modification
+
+### Modifiable Components
+
+```python
+# Get tasks that can be safely modified
+modifiable_tasks = constellation.get_modifiable_tasks()
+# Only tasks in PENDING or WAITING_DEPENDENCY status
+
+# Get modifiable dependencies
+modifiable_deps = constellation.get_modifiable_dependencies()
+# Dependencies whose target task hasn't started
+
+# Check specific task/dependency
+can_modify_task = constellation.is_task_modifiable("task_a")
+can_modify_dep = constellation.is_dependency_modifiable("dep_1")
+```
+
+### Runtime Graph Evolution
+
+```python
+# Add diagnostic task during execution
+diagnostic_task = TaskStar(
+ task_id="health_check",
+ description="Check server health after failure"
+)
+constellation.add_task(diagnostic_task)
+
+# Add conditional fallback dependency
+fallback_dep = TaskStarLine.create_conditional(
+ from_task_id="train_model",
+ to_task_id="health_check",
+ condition_description="Run health check if training fails",
+ condition_evaluator=lambda result: result is None
+)
+constellation.add_dependency(fallback_dep)
+
+# Update constellation state
+constellation.update_state()
+```
+
+!!! warning "Modification Safety"
+ The constellation enforces safe modification:
+
+ - **RUNNING tasks**: Cannot be modified
+ - **Completed/Failed tasks**: Cannot be modified
+ - **Dependencies to running tasks**: Cannot be modified
+
+ This ensures execution consistency and prevents race conditions.
+
+---
+
+## Serialization and Persistence
+
+### JSON Export/Import
+
+```python
+# Export to JSON string
+json_string = constellation.to_json()
+
+# Save to file
+constellation.to_json(save_path="constellation_backup.json")
+
+# Load from JSON string
+restored = TaskConstellation.from_json(json_data=json_string)
+
+# Load from file
+loaded = TaskConstellation.from_json(file_path="constellation_backup.json")
+```
+
+### Dictionary Conversion
+
+```python
+# Convert to dictionary
+constellation_dict = constellation.to_dict()
+
+# Create from dictionary
+new_constellation = TaskConstellation.from_dict(constellation_dict)
+
+# Dictionary structure includes:
+# - constellation_id, name, state
+# - tasks (dict of task_id -> TaskStar dict)
+# - dependencies (dict of line_id -> TaskStarLine dict)
+# - metadata, timestamps
+```
+
+### Pydantic Schema
+
+```python
+# Convert to Pydantic BaseModel
+schema = constellation.to_basemodel()
+
+# Create from schema
+constellation_from_schema = TaskConstellation.from_basemodel(schema)
+```
+
+---
+
+## Visualization
+
+### Display Modes
+
+```python
+# Overview mode - high-level structure
+constellation.display_dag(mode="overview")
+
+# Topology mode - detailed DAG graph
+constellation.display_dag(mode="topology")
+
+# Details mode - task execution details
+constellation.display_dag(mode="details")
+
+# Execution mode - real-time flow
+constellation.display_dag(mode="execution")
+```
+
+---
+
+## Querying Dependencies
+
+### Task-Specific Dependencies
+
+```python
+# Get all dependencies for a specific task
+task_deps = constellation.get_task_dependencies("train_model")
+
+for dep in task_deps:
+ print(f"{dep.from_task_id} → {dep.to_task_id} ({dep.dependency_type.value})")
+
+# Get all dependencies in constellation
+all_deps = constellation.get_all_dependencies()
+```
+
+---
+
+## Example Workflows
+
+### Simple Linear Pipeline
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ B --> C[Task C]
+```
+
+```python
+# Create: A → B → C
+constellation = TaskConstellation(name="linear_pipeline")
+
+task_a = TaskStar(task_id="a", description="Task A")
+task_b = TaskStar(task_id="b", description="Task B")
+task_c = TaskStar(task_id="c", description="Task C")
+
+constellation.add_task(task_a)
+constellation.add_task(task_b)
+constellation.add_task(task_c)
+
+dep_ab = TaskStarLine.create_unconditional("a", "b")
+dep_bc = TaskStarLine.create_unconditional("b", "c")
+
+constellation.add_dependency(dep_ab)
+constellation.add_dependency(dep_bc)
+
+# Validate
+is_valid, errors = constellation.validate_dag()
+assert is_valid
+
+# Get metrics
+metrics = constellation.get_parallelism_metrics()
+assert metrics['parallelism_ratio'] == 1.0 # Completely serial
+```
+
+### Parallel Fan-Out
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ A --> C[Task C]
+ A --> D[Task D]
+```
+
+```python
+# Create: A → [B, C, D]
+constellation = TaskConstellation(name="fan_out")
+
+task_a = TaskStar(task_id="a", description="Root task")
+task_b = TaskStar(task_id="b", description="Parallel task 1")
+task_c = TaskStar(task_id="c", description="Parallel task 2")
+task_d = TaskStar(task_id="d", description="Parallel task 3")
+
+for task in [task_a, task_b, task_c, task_d]:
+ constellation.add_task(task)
+
+# All three tasks depend on A, can run in parallel
+for target_id in ["b", "c", "d"]:
+ dep = TaskStarLine.create_success_only("a", target_id)
+ constellation.add_dependency(dep)
+
+# Get metrics
+metrics = constellation.get_parallelism_metrics()
+assert metrics['max_width'] >= 3 # Can run 3 tasks in parallel
+```
+
+### Complex Diamond Pattern
+
+```mermaid
+graph LR
+ A[Task A] --> B[Task B]
+ A --> C[Task C]
+ B --> D[Task D]
+ C --> D
+```
+
+```python
+# Create: A → [B, C] → D
+constellation = TaskConstellation(name="diamond")
+
+tasks = {
+ "a": TaskStar(task_id="a", description="Start"),
+ "b": TaskStar(task_id="b", description="Path 1"),
+ "c": TaskStar(task_id="c", description="Path 2"),
+ "d": TaskStar(task_id="d", description="Merge")
+}
+
+for task in tasks.values():
+ constellation.add_task(task)
+
+# Fan-out: A → B, A → C
+constellation.add_dependency(TaskStarLine.create_success_only("a", "b"))
+constellation.add_dependency(TaskStarLine.create_success_only("a", "c"))
+
+# Fan-in: B → D, C → D
+constellation.add_dependency(TaskStarLine.create_success_only("b", "d"))
+constellation.add_dependency(TaskStarLine.create_success_only("c", "d"))
+
+# Analyze
+order = constellation.get_topological_order()
+print(f"Possible order: {order}") # ['a', 'b', 'c', 'd'] or ['a', 'c', 'b', 'd']
+
+longest_path_length, path = constellation.get_longest_path()
+assert longest_path_length == 3 # A → B/C → D
+```
+
+---
+
+## Error Handling
+
+### Cycle Detection
+
+```python
+# Attempt to create a cycle
+try:
+ # This would create A → B → C → A
+ constellation.add_dependency(
+ TaskStarLine.create_unconditional("c", "a")
+ )
+except ValueError as e:
+ print(f"❌ {e}")
+ # Output: Adding dependency would create a cycle
+```
+
+### Missing Task References
+
+```python
+# Try to add dependency with non-existent task
+try:
+ dep = TaskStarLine.create_unconditional(
+ "nonexistent_task",
+ "task_b"
+ )
+ constellation.add_dependency(dep)
+except ValueError as e:
+ print(f"❌ {e}")
+ # Output: Source task nonexistent_task not found
+```
+
+### Modifying Running Tasks
+
+```python
+# Try to remove a running task
+task.start_execution()
+
+try:
+ constellation.remove_task(task.task_id)
+except ValueError as e:
+ print(f"❌ {e}")
+ # Output: Cannot remove running task
+```
+
+---
+
+## Best Practices
+
+### Constellation Design Guidelines
+
+1. **Validate early**: Run `validate_dag()` before execution
+2. **Minimize dependencies**: Reduce unnecessary edges to maximize parallelism
+3. **Use appropriate dependency types**: Match dependency type to workflow logic
+4. **Monitor metrics**: Track parallelism ratio to optimize design
+5. **Handle failures**: Use conditional dependencies for error recovery
+
+### Optimization Patterns
+
+**Before (Serial):**
+
+```mermaid
+graph LR
+ A[A] --> B[B]
+ B --> C[C]
+ C --> D[D]
+ D --> E[E]
+ E --> F[F]
+```
+
+Parallelism Ratio: 1.0
+
+**After (Optimized):**
+
+```mermaid
+graph LR
+ A[A] --> B[B]
+ A --> C[C]
+ A --> D[D]
+ B --> F[F]
+ C --> F
+ D --> E[E]
+```
+
+Parallelism Ratio: 1.67
+
+!!! warning "Common Pitfalls"
+ - **Over-parallelization**: Too many parallel tasks can overwhelm resources
+ - **Tight coupling**: Excessive dependencies reduce parallelism
+ - **Missing validation**: Always validate before execution
+ - **Ignoring state**: Check constellation state before modifications
+
+---
+
+## Formal Properties
+
+### Acyclicity Guarantee
+
+The TaskConstellation enforces **acyclicity** through:
+
+1. **DFS-based cycle detection** before adding dependencies
+2. **Topological ordering** validation using Kahn's algorithm
+3. **Runtime validation** during DAG modification
+
+### Causal Consistency
+
+Task dependencies ensure **causal consistency**:
+
+- If task $t_j$ depends on $t_i$, then $t_i$ must complete before $t_j$ starts
+- Transitive dependencies are preserved
+- Concurrent tasks have no causal ordering
+
+### Concurrency Safety
+
+The constellation provides **safe concurrent execution**:
+
+- **Read-only queries** are always safe
+- **Modifications** are protected by state checks
+- **Assignment locking** prevents race conditions (handled by orchestrator)
+
+---
+
+## Related Components
+
+- **[TaskStar](task_star.md)** — Atomic task execution units
+- **[TaskStarLine](task_star_line.md)** — Dependency relationships
+- **[ConstellationEditor](constellation_editor.md)** — Safe editing with undo/redo
+- **[Overview](overview.md)** — Framework overview
+
+---
+
+## API Reference
+
+### Constructor
+
+```python
+TaskConstellation(
+ constellation_id: Optional[str] = None,
+ name: Optional[str] = None
+)
+```
+
+### Task Management
+
+| Method | Description |
+|--------|-------------|
+| `add_task(task)` | Add task to constellation |
+| `remove_task(task_id)` | Remove task and related dependencies |
+| `get_task(task_id)` | Get task by ID |
+| `get_all_tasks()` | Get all tasks |
+| `get_ready_tasks()` | Get tasks ready to execute |
+| `get_running_tasks()` | Get currently running tasks |
+| `get_completed_tasks()` | Get completed tasks |
+| `get_failed_tasks()` | Get failed tasks |
+| `get_pending_tasks()` | Get pending tasks |
+| `get_modifiable_tasks()` | Get tasks safe to modify |
+
+### Dependency Management
+
+| Method | Description |
+|--------|-------------|
+| `add_dependency(dependency)` | Add dependency edge |
+| `remove_dependency(dependency_id)` | Remove dependency |
+| `get_dependency(dependency_id)` | Get dependency by ID |
+| `get_all_dependencies()` | Get all dependencies |
+| `get_task_dependencies(task_id)` | Get dependencies for specific task |
+| `get_modifiable_dependencies()` | Get dependencies safe to modify |
+
+### Validation
+
+| Method | Description |
+|--------|-------------|
+| `validate_dag()` | Validate DAG structure, returns `(bool, List[str])` with validation errors |
+| `has_cycle()` | Check for cycles (returns `bool`) |
+| `get_topological_order()` | Get topological ordering (returns `List[str]`, raises `ValueError` if cyclic) |
+
+### Execution
+
+| Method | Description |
+|--------|-------------|
+| `start_execution()` | Mark constellation as started |
+| `start_task(task_id)` | Start specific task |
+| `mark_task_completed(task_id, success, result, error)` | Mark task done, returns `List[TaskStar]` of newly ready tasks |
+| `complete_execution()` | Mark constellation as completed |
+| `is_complete()` | Check if all tasks are terminal (returns `bool`) |
+| `update_state()` | Update constellation state based on task states |
+
+### Analysis
+
+| Method | Description |
+|--------|-------------|
+| `get_longest_path()` | Get critical path using node count, returns `(int, List[str])` |
+| `get_critical_path_length_with_time()` | Get critical path using actual time, returns `(float, List[str])` |
+| `get_max_width()` | Get maximum parallelism (returns `int`) |
+| `get_total_work()` | Get sum of execution durations (returns `float`) |
+| `get_parallelism_metrics()` | Get comprehensive parallelism metrics (returns `Dict[str, Any]`) |
+| `get_statistics()` | Get all constellation statistics (returns `Dict[str, Any]`) |
+
+### Serialization
+
+| Method | Description |
+|--------|-------------|
+| `to_dict()` | Convert to dictionary |
+| `to_json(save_path)` | Export to JSON string or file |
+| `from_dict(data)` | Create from dictionary (classmethod) |
+| `from_json(json_data, file_path)` | Create from JSON (classmethod) |
+| `to_basemodel()` | Convert to Pydantic schema |
+| `from_basemodel(schema)` | Create from Pydantic schema (classmethod) |
+
+### Visualization
+
+| Method | Description |
+|--------|-------------|
+| `display_dag(mode)` | Display constellation (modes: overview, topology, details, execution) |
+
+---
+
+*TaskConstellation — Orchestrating distributed workflows across the digital galaxy*
diff --git a/documents/docs/galaxy/constellation/task_star.md b/documents/docs/galaxy/constellation/task_star.md
new file mode 100644
index 000000000..864144dce
--- /dev/null
+++ b/documents/docs/galaxy/constellation/task_star.md
@@ -0,0 +1,559 @@
+# TaskStar — Atomic Execution Unit
+
+## Overview
+
+**TaskStar** represents the atomic unit of computation in UFO Galaxy—the smallest indivisible task scheduled on a device agent. Each TaskStar encapsulates complete context necessary for autonomous execution, including semantic description, assigned device, execution state, and dependency relationships.
+
+**Formal Definition:** A TaskStar $t_i$ is formally defined as:
+
+$$
+t_i = (\text{name}_i, \text{description}_i, \text{device}_i, \text{tips}_i, \text{status}_i, \text{dependencies}_i)
+$$
+
+---
+
+## Architecture
+
+### Core Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **task_id** | `str` | Unique identifier (auto-generated UUID if not provided) |
+| **name** | `str` | Short, human-readable task name |
+| **description** | `str` | Natural-language specification of what the task should do |
+| **tips** | `List[str]` | Guidance list to help device agent complete the task |
+| **target_device_id** | `str` | ID of the device agent responsible for execution |
+| **device_type** | `DeviceType` | Type of target device (Windows, Linux, Android, etc.) |
+| **status** | `TaskStatus` | Current execution state |
+| **priority** | `TaskPriority` | Priority level for scheduling (LOW, MEDIUM, HIGH, CRITICAL) |
+| **timeout** | `float` | Maximum execution time in seconds |
+| **retry_count** | `int` | Number of allowed retries on failure |
+| **task_data** | `Dict[str, Any]` | Additional data needed for task execution |
+| **expected_output_type** | `str` | Expected type/format of the output |
+
+**Note:** The property `task_description` is available as a backward compatibility alias for `description`.
+
+### Execution Tracking
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **result** | `Any` | Task execution result (if completed successfully) |
+| **error** | `Exception` | Error information (if failed) |
+| **execution_start_time** | `datetime` | Timestamp when execution started |
+| **execution_end_time** | `datetime` | Timestamp when execution ended |
+| **execution_duration** | `float` | Duration in seconds (calculated) |
+| **created_at** | `datetime` | Task creation timestamp |
+| **updated_at** | `datetime` | Last modification timestamp |
+
+**Note:** All execution tracking properties are read-only and automatically managed by the TaskStar lifecycle methods.
+
+### Computed Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **is_terminal** | `bool` | True if task is in a terminal state (COMPLETED, FAILED, or CANCELLED) |
+| **is_ready_to_execute** | `bool` | True if task is PENDING and has no pending dependencies |
+
+---
+
+## Task Status Lifecycle
+
+```mermaid
+stateDiagram-v2
+ [*] --> PENDING: Create
+ PENDING --> WAITING_DEPENDENCY: Has dependencies
+ WAITING_DEPENDENCY --> PENDING: Dependencies satisfied
+ PENDING --> RUNNING: Start execution
+ RUNNING --> COMPLETED: Success
+ RUNNING --> FAILED: Error
+ RUNNING --> CANCELLED: User cancels
+ FAILED --> PENDING: Retry
+ COMPLETED --> [*]
+ FAILED --> [*]
+ CANCELLED --> [*]
+```
+
+### Status Definitions
+
+| Status | Description | Terminal |
+|--------|-------------|----------|
+| **PENDING** | Task is ready to execute (no pending dependencies) | ❌ |
+| **WAITING_DEPENDENCY** | Task is waiting for prerequisite tasks | ❌ |
+| **RUNNING** | Task is currently executing on device | ❌ |
+| **COMPLETED** | Task finished successfully | ✅ |
+| **FAILED** | Task encountered an error | ✅ |
+| **CANCELLED** | Task was cancelled by user | ✅ |
+
+**Note:** Terminal states (COMPLETED, FAILED, CANCELLED) are final—tasks in these states cannot transition to other states without explicit retry.
+
+---
+
+## Priority Levels
+
+Tasks are scheduled based on priority when multiple tasks are ready to execute:
+
+| Priority | Value | Use Case |
+|----------|-------|----------|
+| **LOW** | 1 | Background tasks, cleanup operations |
+| **MEDIUM** | 2 | Standard tasks (default) |
+| **HIGH** | 3 | Important tasks requiring quick execution |
+| **CRITICAL** | 4 | Time-sensitive tasks, system health checks |
+
+---
+
+## Usage Examples
+
+### Creating a TaskStar
+
+```python
+from galaxy.constellation import TaskStar
+from galaxy.constellation.enums import DeviceType, TaskPriority
+
+# Basic task creation
+task = TaskStar(
+ task_id="build_docker_image",
+ name="Docker Build",
+ description="Build the Docker image from Dockerfile in the current directory",
+ tips=[
+ "Use docker build command",
+ "Tag the image as 'myapp:latest'",
+ "Check for build errors in output"
+ ],
+ target_device_id="linux_gpu_server",
+ device_type=DeviceType.LINUX,
+ priority=TaskPriority.HIGH,
+ timeout=300.0, # 5 minutes
+ retry_count=2
+)
+```
+
+### Task with Additional Data
+
+```python
+# Task with custom data payload
+task = TaskStar(
+ task_id="process_dataset",
+ description="Preprocess the dataset and save to output directory",
+ task_data={
+ "input_path": "/data/raw/dataset.csv",
+ "output_path": "/data/processed/dataset_clean.csv",
+ "columns_to_drop": ["temp_col1", "temp_col2"],
+ "normalization": "min-max"
+ },
+ target_device_id="linux_cpu_1",
+ device_type=DeviceType.LINUX
+)
+```
+
+### Auto-Generated Task
+
+```python
+# Minimal creation with auto-generated ID and defaults
+task = TaskStar(
+ description="Run unit tests",
+ target_device_id="windows_desktop"
+)
+
+print(task.task_id) # Auto-generated UUID
+print(task.name) # Auto-generated: "task_{first 8 chars of UUID}"
+print(task.priority) # Default: TaskPriority.MEDIUM
+```
+
+---
+
+## Core Operations
+
+### Execution Management
+
+```python
+# Start execution
+task.start_execution()
+print(f"Started at: {task.execution_start_time}")
+
+# Mark as completed (success)
+result = {"status": "success", "output": "Tests passed: 45/45"}
+task.complete_with_success(result)
+print(f"Duration: {task.execution_duration} seconds")
+
+# Mark as failed
+try:
+ # ... execution code ...
+ raise Exception("Docker build failed")
+except Exception as e:
+ task.complete_with_failure(e)
+ print(f"Error: {task.error}")
+```
+
+### Retry Logic
+
+```python
+# Check if task should retry
+if task.should_retry():
+ task.retry()
+ print(f"Retry attempt {task._current_retry}/{task._retry_count}")
+ # Task status is now PENDING again
+```
+
+### Validation
+
+```python
+# Validate task configuration
+if task.validate():
+ print("Task configuration is valid")
+else:
+ errors = task.get_validation_errors()
+ print(f"Validation errors: {errors}")
+```
+
+---
+
+## State Queries
+
+### Checking Task State
+
+```python
+# Check if task is ready to execute
+if task.is_ready_to_execute:
+ print("Task can be started")
+
+# Check if task is in terminal state
+if task.is_terminal:
+ print("Task has finished executing")
+
+# Query specific status
+if task.status == TaskStatus.RUNNING:
+ elapsed = datetime.now(timezone.utc) - task.execution_start_time
+ print(f"Running for {elapsed.total_seconds()} seconds")
+```
+
+### Accessing Results
+
+```python
+# Access execution results
+if task.status == TaskStatus.COMPLETED:
+ print(f"Result: {task.result}")
+ print(f"Duration: {task.execution_duration}s")
+
+elif task.status == TaskStatus.FAILED:
+ print(f"Error: {task.error}")
+ print(f"Failed at: {task.execution_end_time}")
+```
+
+---
+
+## Serialization
+
+### JSON Export/Import
+
+```python
+# Export to JSON
+json_string = task.to_json()
+print(json_string)
+
+# Save to file
+task.to_json(save_path="task_backup.json")
+
+# Load from JSON string
+restored_task = TaskStar.from_json(json_data=json_string)
+
+# Load from file
+loaded_task = TaskStar.from_json(file_path="task_backup.json")
+```
+
+### Dictionary Conversion
+
+```python
+# Convert to dictionary
+task_dict = task.to_dict()
+
+# Create from dictionary
+new_task = TaskStar.from_dict(task_dict)
+```
+
+### Pydantic Schema Conversion
+
+```python
+# Convert to Pydantic BaseModel
+schema = task.to_basemodel()
+
+# Create from Pydantic schema
+task_from_schema = TaskStar.from_basemodel(schema)
+```
+
+---
+
+## Advanced Features
+
+### Request String Formatting
+
+The `to_request_string()` method formats the task for device agent consumption:
+
+```python
+request = task.to_request_string()
+
+# Output:
+# Task Description: Build the Docker image from Dockerfile
+# Tips for Completion:
+# - Use docker build command
+# - Tag the image as 'myapp:latest'
+# - Check for build errors in output
+```
+
+This formatted string is sent to device agents for execution.
+
+### Dynamic Data Updates
+
+```python
+# Update task data
+task.update_task_data({
+ "additional_flags": ["--no-cache", "--pull"],
+ "build_args": {"VERSION": "1.2.3"}
+})
+
+# Access task data
+data = task.task_data
+print(data["additional_flags"])
+```
+
+!!! warning "Modification Restrictions"
+ Task properties cannot be modified while the task is in `RUNNING` status. This prevents race conditions and ensures execution consistency.
+
+---
+
+## Dependency Management
+
+### Internal Dependency Tracking
+
+TaskStar maintains internal sets of dependencies and dependents:
+
+```python
+# Add dependency (internal use by TaskConstellation)
+task.add_dependency("prerequisite_task_id")
+
+# Remove dependency
+task.remove_dependency("prerequisite_task_id")
+
+# Add dependent task
+task.add_dependent("dependent_task_id")
+
+# Check dependencies
+print(f"Dependencies: {task._dependencies}")
+print(f"Dependents: {task._dependents}")
+```
+
+!!! note "Managed by TaskConstellation"
+ Dependency management methods are primarily used internally by `TaskConstellation`. Direct manipulation is not recommended—use `ConstellationEditor` for safe editing with undo/redo support.
+
+---
+
+## Integration with Constellation
+
+### Adding to Constellation
+
+```python
+from galaxy.constellation import TaskConstellation
+
+constellation = TaskConstellation(name="my_workflow")
+
+# Add task to constellation
+constellation.add_task(task)
+
+# Task is now managed by constellation
+ready_tasks = constellation.get_ready_tasks()
+```
+
+### Execution via Device Manager
+
+```python
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Execute task using device manager
+device_manager = ConstellationDeviceManager()
+
+# Execute returns an ExecutionResult object
+execution_result = await task.execute(device_manager)
+
+print(f"Status: {execution_result.status}")
+print(f"Result: {execution_result.result}")
+print(f"Execution Time: {execution_result.execution_time}s")
+```
+
+---
+
+## Error Handling
+
+### Validation Errors
+
+```python
+task = TaskStar(
+ task_id="", # Invalid: empty ID
+ name="", # Invalid: empty name
+ description="", # Invalid: empty description
+ timeout=-1.0 # Invalid: negative timeout
+)
+
+if not task.validate():
+ for error in task.get_validation_errors():
+ print(f"❌ {error}")
+
+# Output:
+# ❌ Task ID must be a non-empty string
+# ❌ Task name must be a non-empty string
+# ❌ Task description must be a non-empty string
+# ❌ Timeout must be a positive number
+```
+
+### Execution Errors
+
+```python
+try:
+ task.start_execution()
+except ValueError as e:
+ print(f"Cannot start: {e}")
+ # Example: "Cannot start task in status RUNNING"
+
+try:
+ task.complete_with_success(result)
+except ValueError as e:
+ print(f"Cannot complete: {e}")
+ # Example: "Cannot complete task in status PENDING"
+```
+
+---
+
+## Example Workflows
+
+### Simple Task Execution
+
+```python
+# Create task
+task = TaskStar(
+ description="Run Python script",
+ target_device_id="linux_server_1",
+ timeout=60.0
+)
+
+# Execute
+task.start_execution()
+try:
+ # ... actual execution ...
+ result = {"output": "Script completed", "exit_code": 0}
+ task.complete_with_success(result)
+except Exception as e:
+ task.complete_with_failure(e)
+
+# Check result
+if task.status == TaskStatus.COMPLETED:
+ print(f"✅ Success: {task.result}")
+else:
+ print(f"❌ Failed: {task.error}")
+```
+
+### Retry on Failure
+
+```python
+max_attempts = 3
+attempt = 0
+
+while attempt < max_attempts:
+ attempt += 1
+ task.start_execution()
+
+ try:
+ # ... execution code ...
+ task.complete_with_success(result)
+ break
+ except Exception as e:
+ task.complete_with_failure(e)
+
+ if task.should_retry():
+ task.retry()
+ print(f"Retry {attempt}/{max_attempts}")
+ else:
+ print("Max retries exceeded")
+ break
+```
+
+---
+
+## Best Practices
+
+### Task Design Guidelines
+
+1. **Keep tasks atomic**: Each task should represent a single, well-defined operation
+2. **Provide clear descriptions**: Use natural language that device agents can understand
+3. **Include helpful tips**: Guide the agent with specific instructions or common pitfalls
+4. **Set appropriate timeouts**: Prevent hanging tasks with realistic timeout values
+5. **Use retry wisely**: Enable retries for transient failures, not logic errors
+
+### Good vs. Bad Task Descriptions
+
+✅ **Good**: "Build the Docker image from the Dockerfile in /app directory and tag it as 'myapp:v1.2.3'"
+
+❌ **Bad**: "Build stuff"
+
+✅ **Good**: "Run pytest on the test/ directory and generate a coverage report in HTML format"
+
+❌ **Bad**: "Test the code"
+
+!!! warning "Common Pitfalls"
+ - **Don't modify running tasks**: Attempting to change properties during execution raises `ValueError`
+ - **Don't forget validation**: Always validate tasks before adding to constellation
+ - **Don't ignore timeouts**: Set realistic timeouts to prevent resource exhaustion
+
+---
+
+## Related Components
+
+- **[TaskStarLine](task_star_line.md)** — Dependency relationships between tasks
+- **[TaskConstellation](task_constellation.md)** — DAG orchestration and execution
+- **[ConstellationEditor](constellation_editor.md)** — Safe task editing with undo/redo
+- **[ConstellationDeviceManager](../client/device_manager.md)** — Device management and task assignment
+- **[Overview](overview.md)** — Task Constellation framework overview
+
+---
+
+## API Reference
+
+### Constructor
+
+```python
+TaskStar(
+ task_id: Optional[str] = None,
+ name: str = "",
+ description: str = "",
+ tips: List[str] = None,
+ target_device_id: Optional[str] = None,
+ device_type: Optional[DeviceType] = None,
+ priority: TaskPriority = TaskPriority.MEDIUM,
+ timeout: Optional[float] = None,
+ retry_count: int = 0,
+ task_data: Optional[Dict[str, Any]] = None,
+ expected_output_type: Optional[str] = None,
+ config: Optional[TaskConfiguration] = None
+)
+```
+
+### Key Methods
+
+| Method | Description |
+|--------|-------------|
+| `execute(device_manager)` | Execute task using device manager (async, returns `ExecutionResult`) |
+| `validate()` | Validate task configuration (returns `bool`) |
+| `get_validation_errors()` | Get list of validation errors (returns `List[str]`) |
+| `start_execution()` | Mark task as started |
+| `complete_with_success(result)` | Mark task as completed successfully |
+| `complete_with_failure(error)` | Mark task as failed |
+| `retry()` | Reset task for retry attempt |
+| `cancel()` | Cancel the task |
+| `should_retry()` | Check if task should be retried (returns `bool`) |
+| `to_dict()` | Convert to dictionary |
+| `to_json(save_path)` | Export to JSON string or file |
+| `from_dict(data)` | Create from dictionary (classmethod) |
+| `from_json(json_data, file_path)` | Create from JSON (classmethod) |
+| `to_basemodel()` | Convert to Pydantic BaseModel schema |
+| `from_basemodel(schema)` | Create from Pydantic schema (classmethod) |
+
+---
+
+*TaskStar — The atomic building block of distributed workflows*
diff --git a/documents/docs/galaxy/constellation/task_star_line.md b/documents/docs/galaxy/constellation/task_star_line.md
new file mode 100644
index 000000000..90c001f9f
--- /dev/null
+++ b/documents/docs/galaxy/constellation/task_star_line.md
@@ -0,0 +1,688 @@
+# TaskStarLine — Dependency Relationship
+
+## Overview
+
+**TaskStarLine** represents a directed dependency relationship between two TaskStars, forming an edge in the task constellation DAG. Each TaskStarLine defines how tasks depend on each other, with support for conditional logic, success-only execution, and custom condition evaluation.
+
+**Formal Definition:** A TaskStarLine $e_{i \rightarrow j}$ specifies a dependency from task $t_i$ to task $t_j$:
+
+$$
+e_{i \rightarrow j} = (\text{from\_task}_i, \text{to\_task}_j, \text{type}, \text{description})
+$$
+
+Task $t_j$ cannot begin until certain conditions on $t_i$ are satisfied, based on the dependency type.
+
+---
+
+## Architecture
+
+### Core Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **line_id** | `str` | Unique identifier (auto-generated UUID if not provided) |
+| **from_task_id** | `str` | ID of the prerequisite task (source) |
+| **to_task_id** | `str` | ID of the dependent task (target) |
+| **dependency_type** | `DependencyType` | Type of dependency relationship |
+| **condition_description** | `str` | Natural language description of the condition |
+| **condition_evaluator** | `Callable` | Function to evaluate if condition is met |
+| **metadata** | `Dict[str, Any]` | Additional metadata for the dependency |
+
+**Note:** The properties `source_task_id` and `target_task_id` are available as aliases for `from_task_id` and `to_task_id` respectively (for IDependency interface compatibility).
+
+### State Tracking
+
+| Property | Type | Description |
+|----------|------|-------------|
+| **is_satisfied** | `bool` | Whether the dependency condition is currently satisfied |
+| **last_evaluation_result** | `bool` | Result of the most recent condition evaluation |
+| **last_evaluation_time** | `datetime` | Timestamp of last condition evaluation |
+| **created_at** | `datetime` | Dependency creation timestamp |
+| **updated_at** | `datetime` | Last modification timestamp |
+
+**Note:** All state tracking properties are read-only and automatically managed by TaskStarLine methods.
+
+---
+
+## Dependency Types
+
+TaskStarLine supports four types of dependency relationships:
+
+### 1. Unconditional (`UNCONDITIONAL`)
+
+Task $t_j$ **always** waits for $t_i$ to complete, regardless of success or failure.
+
+```mermaid
+graph LR
+ A[Task A] -->|UNCONDITIONAL| B[Task B]
+ style A fill:#90EE90
+ style B fill:#87CEEB
+```
+
+**Use Cases:**
+- Sequential pipeline stages
+- Resource cleanup after any task completion
+- Logging or notification tasks
+
+**Example:**
+```python
+# Task B always runs after Task A completes
+dep = TaskStarLine.create_unconditional(
+ from_task_id="task_a",
+ to_task_id="task_b",
+ description="B runs after A regardless of outcome"
+)
+```
+
+---
+
+### 2. Success-Only (`SUCCESS_ONLY`)
+
+Task $t_j$ proceeds **only if** $t_i$ completes successfully (result is not `None`).
+
+```mermaid
+graph LR
+ A[Task A] -->|SUCCESS_ONLY| B[Task B]
+ A -->|FAILED| C[Skip B]
+ style A fill:#90EE90
+ style B fill:#87CEEB
+ style C fill:#FFB6C1
+```
+
+**Use Cases:**
+- Build pipeline (deploy only if build succeeds)
+- Multi-step data processing
+- Conditional workflow branches
+
+**Example:**
+```python
+# Task B only runs if Task A succeeds
+dep = TaskStarLine.create_success_only(
+ from_task_id="build_task",
+ to_task_id="deploy_task",
+ description="Deploy only if build succeeds"
+)
+```
+
+**Note:** Success is determined by the prerequisite task returning a non-`None` result.
+
+---
+
+### 3. Completion-Only (`COMPLETION_ONLY`)
+
+Task $t_j$ proceeds when $t_i$ completes, **regardless of success or failure**.
+
+```mermaid
+graph LR
+ A[Task A] -->|COMPLETION_ONLY| B[Task B]
+ A -->|SUCCESS or FAIL| B
+ style A fill:#90EE90
+ style B fill:#87CEEB
+```
+
+**Use Cases:**
+- Cleanup tasks
+- Notification tasks
+- Audit logging
+
+**Example:**
+```python
+# Task B runs after Task A finishes, regardless of outcome
+dep = TaskStarLine(
+ from_task_id="main_task",
+ to_task_id="cleanup_task",
+ dependency_type=DependencyType.COMPLETION_ONLY,
+ condition_description="Cleanup runs regardless of main task outcome"
+)
+```
+
+---
+
+### 4. Conditional (`CONDITIONAL`)
+
+Task $t_j$ proceeds based on a **user-defined condition** evaluated on $t_i$'s result.
+
+```mermaid
+graph LR
+ A[Task A] -->|CONDITIONAL| B{Evaluate}
+ B -->|True| C[Task B runs]
+ B -->|False| D[Task B skipped]
+ style A fill:#90EE90
+ style B fill:#FFD700
+ style C fill:#87CEEB
+ style D fill:#FFB6C1
+```
+
+**Use Cases:**
+- Error handling branches
+- Result-based routing
+- Performance-based optimization
+
+**Example:**
+```python
+# Define custom condition evaluator
+def check_coverage_threshold(result):
+ """Run next task only if test coverage > 80%"""
+ if result and isinstance(result, dict):
+ coverage = result.get("coverage_percent", 0)
+ return coverage > 80
+ return False
+
+# Create conditional dependency
+dep = TaskStarLine.create_conditional(
+ from_task_id="test_task",
+ to_task_id="quality_gate_task",
+ condition_description="Proceed if test coverage > 80%",
+ condition_evaluator=check_coverage_threshold
+)
+```
+
+**Note:** If no `condition_evaluator` is provided for a CONDITIONAL dependency, it defaults to SUCCESS_ONLY behavior (checks if result is not `None`).
+
+---
+
+## Dependency Lifecycle
+
+```mermaid
+stateDiagram-v2
+ [*] --> Created: Initialize
+ Created --> Waiting: Prerequisite running
+ Waiting --> Evaluating: Prerequisite completes
+ Evaluating --> Satisfied: Condition met
+ Evaluating --> Unsatisfied: Condition not met
+ Satisfied --> [*]: Dependent can run
+ Unsatisfied --> [*]: Dependent blocked
+```
+
+---
+
+## Usage Examples
+
+### Creating Dependencies
+
+```python
+from galaxy.constellation import TaskStarLine
+from galaxy.constellation.enums import DependencyType
+
+# 1. Unconditional dependency
+dep1 = TaskStarLine.create_unconditional(
+ from_task_id="checkout_code",
+ to_task_id="build_project",
+ description="Build after checkout"
+)
+
+# 2. Success-only dependency
+dep2 = TaskStarLine.create_success_only(
+ from_task_id="build_project",
+ to_task_id="deploy_staging",
+ description="Deploy only if build succeeds"
+)
+
+# 3. Conditional dependency with custom logic
+def check_test_results(result):
+ return result.get("tests_passed", 0) == result.get("total_tests", 0)
+
+dep3 = TaskStarLine.create_conditional(
+ from_task_id="run_tests",
+ to_task_id="deploy_production",
+ condition_description="Deploy to production only if all tests pass",
+ condition_evaluator=check_test_results
+)
+
+# 4. Manual construction
+dep4 = TaskStarLine(
+ from_task_id="task_a",
+ to_task_id="task_b",
+ dependency_type=DependencyType.COMPLETION_ONLY,
+ condition_description="Task B runs after Task A completes",
+ metadata={"priority": "high", "category": "cleanup"}
+)
+```
+
+---
+
+## Core Operations
+
+### Condition Evaluation
+
+```python
+# Evaluate condition with prerequisite result
+prerequisite_result = {
+ "status": "success",
+ "coverage_percent": 85,
+ "tests_passed": 120,
+ "total_tests": 120
+}
+
+is_satisfied = dep.evaluate_condition(prerequisite_result)
+
+if is_satisfied:
+ print("✅ Dependency satisfied, dependent task can run")
+ print(f"Evaluated at: {dep.last_evaluation_time}")
+else:
+ print("❌ Dependency not satisfied, dependent task blocked")
+
+# Check evaluation history
+print(f"Last result: {dep.last_evaluation_result}")
+```
+
+### Manual Satisfaction Control
+
+```python
+# Manually mark dependency as satisfied (override)
+dep.mark_satisfied()
+
+# Reset satisfaction status
+dep.reset_satisfaction()
+
+# Check satisfaction
+if dep.is_satisfied():
+ print("Dependency is satisfied")
+```
+
+---
+
+## State Queries
+
+### Checking Dependency State
+
+```python
+# Method 1: Check using completed tasks list (for IDependency interface)
+# Returns True if from_task_id is in the completed_tasks list
+completed_tasks = ["task_a", "task_b", "task_c"]
+if dep.is_satisfied(completed_tasks):
+ print("Prerequisite task is completed")
+
+# Method 2: Check internal satisfaction state (without parameter)
+# Returns the internal _is_satisfied flag set by evaluate_condition
+if dep.is_satisfied():
+ print("Dependency condition is satisfied")
+
+# Get last evaluation details
+print(f"Last evaluated: {dep.last_evaluation_time}")
+print(f"Result: {dep.last_evaluation_result}")
+
+# Access metadata
+print(f"Metadata: {dep.metadata}")
+```
+
+---
+
+## Modification
+
+### Updating Dependency Properties
+
+```python
+# Change dependency type
+dep.dependency_type = DependencyType.SUCCESS_ONLY
+
+# Update condition description
+dep.condition_description = "Updated: Deploy only after successful validation"
+
+# Set new condition evaluator
+def new_evaluator(result):
+ return result.get("validation_score", 0) > 0.95
+
+dep.set_condition_evaluator(new_evaluator)
+
+# Update metadata
+dep.update_metadata({
+ "updated_by": "admin",
+ "reason": "Stricter validation threshold"
+})
+```
+
+!!! warning "Modification During Execution"
+ Changing `dependency_type` or `condition_evaluator` resets the satisfaction status. Be cautious when modifying dependencies during active constellation execution.
+
+---
+
+## Serialization
+
+### JSON Export/Import
+
+```python
+# Export to JSON
+json_string = dep.to_json()
+print(json_string)
+
+# Save to file
+dep.to_json(save_path="dependency_backup.json")
+
+# Load from JSON string
+restored_dep = TaskStarLine.from_json(json_data=json_string)
+
+# Load from file
+loaded_dep = TaskStarLine.from_json(file_path="dependency_backup.json")
+```
+
+### Dictionary Conversion
+
+```python
+# Convert to dictionary
+dep_dict = dep.to_dict()
+
+# Create from dictionary
+new_dep = TaskStarLine.from_dict(dep_dict)
+
+# Dictionary structure
+print(dep_dict)
+# {
+# "line_id": "uuid-string",
+# "from_task_id": "task_a",
+# "to_task_id": "task_b",
+# "dependency_type": "success_only",
+# "condition_description": "...",
+# "metadata": {...},
+# "is_satisfied": false,
+# "last_evaluation_result": null,
+# "created_at": "2025-11-06T...",
+# "updated_at": "2025-11-06T..."
+# }
+```
+
+### Pydantic Schema Conversion
+
+```python
+# Convert to Pydantic BaseModel
+schema = dep.to_basemodel()
+
+# Create from Pydantic schema
+dep_from_schema = TaskStarLine.from_basemodel(schema)
+```
+
+---
+
+## Integration with Constellation
+
+### Adding to Constellation
+
+```python
+from galaxy.constellation import TaskConstellation
+
+constellation = TaskConstellation(name="my_workflow")
+
+# Add tasks first
+constellation.add_task(task_a)
+constellation.add_task(task_b)
+
+# Add dependency
+try:
+ constellation.add_dependency(dep)
+ print("✅ Dependency added successfully")
+except ValueError as e:
+ print(f"❌ Failed to add dependency: {e}")
+```
+
+### Dependency Validation
+
+```python
+# TaskConstellation validates dependencies automatically
+try:
+ # This would fail if it creates a cycle
+ constellation.add_dependency(cyclic_dep)
+except ValueError as e:
+ print(f"Validation error: {e}")
+ # Output: "Adding dependency would create a cycle"
+
+# Check DAG validity
+is_valid, errors = constellation.validate_dag()
+if not is_valid:
+ for error in errors:
+ print(f"❌ {error}")
+```
+
+---
+
+## Advanced Patterns
+
+### Conditional Error Handling
+
+```python
+# Main task
+main_task = TaskStar(
+ task_id="main_process",
+ description="Process data"
+)
+
+# Success path
+success_task = TaskStar(
+ task_id="success_notification",
+ description="Send success notification"
+)
+
+# Error path
+error_task = TaskStar(
+ task_id="error_recovery",
+ description="Attempt recovery"
+)
+
+# Success-only dependency
+success_dep = TaskStarLine.create_success_only(
+ from_task_id="main_process",
+ to_task_id="success_notification"
+)
+
+# Failure-only dependency (using conditional)
+def on_failure(result):
+ return result is None # Task failed if result is None
+
+failure_dep = TaskStarLine.create_conditional(
+ from_task_id="main_process",
+ to_task_id="error_recovery",
+ condition_description="Run recovery if main task fails",
+ condition_evaluator=on_failure
+)
+```
+
+### Performance-Based Routing
+
+```python
+# Route to different processing paths based on data size
+def route_large_dataset(result):
+ data_size = result.get("row_count", 0)
+ return data_size > 1_000_000 # Route to GPU if > 1M rows
+
+# Route to GPU for large datasets
+gpu_dep = TaskStarLine.create_conditional(
+ from_task_id="analyze_dataset",
+ to_task_id="process_on_gpu",
+ condition_description="Use GPU for datasets > 1M rows",
+ condition_evaluator=route_large_dataset
+)
+
+# Route to CPU for small datasets
+def route_small_dataset(result):
+ data_size = result.get("row_count", 0)
+ return data_size <= 1_000_000
+
+cpu_dep = TaskStarLine.create_conditional(
+ from_task_id="analyze_dataset",
+ to_task_id="process_on_cpu",
+ condition_description="Use CPU for datasets <= 1M rows",
+ condition_evaluator=route_small_dataset
+)
+```
+
+---
+
+## Error Handling
+
+### Validation
+
+```python
+# TaskStarLine validates on creation
+try:
+ invalid_dep = TaskStarLine(
+ from_task_id="task_a",
+ to_task_id="task_a", # Self-loop!
+ dependency_type=DependencyType.UNCONDITIONAL
+ )
+ constellation.add_dependency(invalid_dep)
+except ValueError as e:
+ print(f"Validation error: {e}")
+ # TaskConstellation will detect cycle
+```
+
+### Evaluation Errors
+
+```python
+def risky_evaluator(result):
+ # This might raise an exception
+ return result["complex_calculation"] / result["divisor"]
+
+dep = TaskStarLine.create_conditional(
+ from_task_id="task_a",
+ to_task_id="task_b",
+ condition_description="Conditional with potential error",
+ condition_evaluator=risky_evaluator
+)
+
+# evaluate_condition catches exceptions and returns False
+result = {"complex_calculation": 100} # Missing "divisor"
+is_satisfied = dep.evaluate_condition(result)
+print(is_satisfied) # False (evaluator raised KeyError, caught internally)
+print(dep.last_evaluation_result) # False
+```
+
+---
+
+## Example Workflows
+
+### Build Pipeline
+
+```python
+# checkout → build → test → deploy
+checkout = TaskStar(task_id="checkout", description="Checkout code")
+build = TaskStar(task_id="build", description="Build project")
+test = TaskStar(task_id="test", description="Run tests")
+deploy = TaskStar(task_id="deploy", description="Deploy to production")
+
+# Sequential success-only dependencies
+dep1 = TaskStarLine.create_success_only("checkout", "build")
+dep2 = TaskStarLine.create_success_only("build", "test")
+dep3 = TaskStarLine.create_success_only("test", "deploy")
+```
+
+### Fan-Out Pattern
+
+```python
+# analyze → [process_gpu, process_cpu, process_edge]
+analyze = TaskStar(task_id="analyze", description="Analyze data")
+process_gpu = TaskStar(task_id="gpu", description="Process on GPU")
+process_cpu = TaskStar(task_id="cpu", description="Process on CPU")
+process_edge = TaskStar(task_id="edge", description="Process on edge device")
+
+# All three can start after analyze completes
+dep1 = TaskStarLine.create_unconditional("analyze", "gpu")
+dep2 = TaskStarLine.create_unconditional("analyze", "cpu")
+dep3 = TaskStarLine.create_unconditional("analyze", "edge")
+```
+
+### Fan-In Pattern
+
+```python
+# [task_a, task_b, task_c] → aggregate
+task_a = TaskStar(task_id="task_a", description="Process batch A")
+task_b = TaskStar(task_id="task_b", description="Process batch B")
+task_c = TaskStar(task_id="task_c", description="Process batch C")
+aggregate = TaskStar(task_id="aggregate", description="Aggregate results")
+
+# Aggregate waits for all three to complete
+dep1 = TaskStarLine.create_success_only("task_a", "aggregate")
+dep2 = TaskStarLine.create_success_only("task_b", "aggregate")
+dep3 = TaskStarLine.create_success_only("task_c", "aggregate")
+```
+
+---
+
+## Best Practices
+
+### Dependency Design Guidelines
+
+1. **Use the right type**: Choose the dependency type that matches your workflow logic
+2. **Keep conditions simple**: Condition evaluators should be fast and deterministic
+3. **Handle evaluator errors**: Ensure evaluators don't raise uncaught exceptions (they're caught internally but logged)
+4. **Document conditions**: Use clear `condition_description` for debugging
+5. **Avoid cycles**: TaskConstellation validates, but design carefully to avoid attempts
+
+### Good vs. Bad Condition Evaluators
+
+✅ **Good**: Simple, fast, defensive
+
+```python
+def check_success(result):
+ return result is not None and result.get("status") == "success"
+```
+
+❌ **Bad**: Complex, slow, error-prone
+
+```python
+def check_success(result):
+ # Slow database query
+ db_status = query_database(result["task_id"])
+ # Complex logic with potential errors
+ return eval(result["complex_expression"]) and db_status
+```
+
+!!! warning "Common Pitfalls"
+ - **Cyclic dependencies**: Always validate DAG before execution
+ - **Missing tasks**: Ensure both `from_task_id` and `to_task_id` exist in constellation
+ - **Stateful evaluators**: Avoid evaluators that depend on external state
+ - **Slow evaluators**: Keep evaluation fast; avoid I/O or expensive computation
+
+---
+
+## Related Components
+
+- **[TaskStar](task_star.md)** — Atomic execution units that TaskStarLines connect
+- **[TaskConstellation](task_constellation.md)** — DAG manager that validates and executes dependencies
+- **[ConstellationEditor](constellation_editor.md)** — Safe dependency editing with undo/redo
+- **[Overview](overview.md)** — Task Constellation framework overview
+
+---
+
+## API Reference
+
+### Constructor
+
+```python
+TaskStarLine(
+ from_task_id: str,
+ to_task_id: str,
+ dependency_type: DependencyType = DependencyType.UNCONDITIONAL,
+ condition_description: Optional[str] = None,
+ condition_evaluator: Optional[Callable[[Any], bool]] = None,
+ line_id: Optional[str] = None,
+ metadata: Optional[Dict[str, Any]] = None
+)
+```
+
+### Factory Methods
+
+| Method | Description |
+|--------|-------------|
+| `create_unconditional(from_id, to_id, desc)` | Create unconditional dependency (classmethod) |
+| `create_success_only(from_id, to_id, desc)` | Create success-only dependency (classmethod) |
+| `create_conditional(from_id, to_id, desc, evaluator)` | Create conditional dependency (classmethod) |
+
+### Key Methods
+
+| Method | Description |
+|--------|-------------|
+| `evaluate_condition(result)` | Evaluate if condition is satisfied (returns `bool`) |
+| `mark_satisfied()` | Manually mark as satisfied |
+| `reset_satisfaction()` | Reset satisfaction status |
+| `is_satisfied(completed_tasks=None)` | Check if dependency is satisfied (returns `bool`); with parameter checks if from_task is completed, without checks internal state |
+| `set_condition_evaluator(evaluator)` | Set new condition evaluator |
+| `update_metadata(metadata)` | Update metadata |
+| `to_dict()` | Convert to dictionary |
+| `to_json(save_path)` | Export to JSON |
+| `from_dict(data)` | Create from dictionary (classmethod) |
+| `from_json(json_data, file_path)` | Create from JSON (classmethod) |
+| `to_basemodel()` | Convert to Pydantic BaseModel schema |
+| `from_basemodel(schema)` | Create from Pydantic schema (classmethod) |
+
+---
+
+*TaskStarLine — Connecting tasks with intelligent dependency logic*
diff --git a/documents/docs/galaxy/constellation_agent/command.md b/documents/docs/galaxy/constellation_agent/command.md
new file mode 100644
index 000000000..277798202
--- /dev/null
+++ b/documents/docs/galaxy/constellation_agent/command.md
@@ -0,0 +1,765 @@
+# Constellation MCP Server — Structured Task Management
+
+## Overview
+
+The **Constellation MCP Server** provides a standardized, idempotent interface for manipulating Task Constellations. Through Model Context Protocol (MCP), it exposes task and dependency management primitives that bridge LLM-level reasoning and concrete execution state, ensuring reproducibility and auditability.
+
+The Constellation MCP Server is a lightweight component that operationalizes dynamic graph construction for the Constellation Agent. It serves as the **structured manipulation layer** between LLM reasoning and the Task Constellation data structure.
+
+### Design Principles
+
+| Principle | Description |
+|-----------|-------------|
+| **Idempotency** | Each operation can be safely retried without side effects |
+| **Atomicity** | Single operation per tool call with clear success/failure |
+| **Consistency** | Returns globally valid constellation snapshots after each operation |
+| **Auditability** | All operations are logged and traceable |
+| **Type Safety** | Pydantic schema validation for all inputs/outputs |
+
+### Architecture
+
+```mermaid
+graph TB
+ subgraph "Constellation Agent"
+ Agent[Agent Logic]
+ Prompter[Prompter]
+ end
+
+ subgraph "MCP Server"
+ MCP[FastMCP Server]
+ Editor[ConstellationEditor]
+ Constellation[TaskConstellation]
+ end
+
+ Agent --> Prompter
+ Prompter -->|Tool Descriptions| Agent
+ Agent -->|Execute Command| MCP
+ MCP --> Editor
+ Editor --> Constellation
+ Constellation -->|JSON Response| MCP
+ MCP -->|Updated State| Agent
+
+ style MCP fill:#e1f5ff
+ style Editor fill:#fff4e1
+ style Constellation fill:#e8f5e9
+```
+
+---
+
+## 🛠️ Core Tools
+
+The MCP server exposes **7 core tools** organized into three categories:
+
+### Tool Categories
+
+```mermaid
+mindmap
+ root((MCP Tools))
+ Task Management
+ add_task
+ remove_task
+ update_task
+ Dependency Management
+ add_dependency
+ remove_dependency
+ update_dependency
+ Bulk Operations
+ build_constellation
+```
+
+---
+
+## 📦 Task Management Tools
+
+### add_task
+
+Add a new atomic task (TaskStar) to the constellation.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `task_id` | `str` | ✅ Yes | Unique identifier for the task (e.g., `"open_browser"`, `"login_system"`) |
+| `name` | `str` | ✅ Yes | Human-readable name (e.g., `"Open Browser"`, `"Login to System"`) |
+| `description` | `str` | ✅ Yes | Detailed task specification including steps and expected outcomes |
+| `target_device_id` | `str` | ❌ No (default: `None`) | Device where task executes (e.g., `"DESKTOP-ABC123"`, `"iPhone-001"`) |
+| `tips` | `List[str]` | ❌ No (default: `None`) | Critical hints for successful execution |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after adding task"
+}
+```
+
+#### Example Usage
+
+```python
+# Add a task to download data
+result = await mcp_client.call_tool(
+ tool_name="add_task",
+ parameters={
+ "task_id": "download_dataset",
+ "name": "Download MNIST Dataset",
+ "description": "Download MNIST dataset from official source, verify checksums, extract to data/ directory",
+ "target_device_id": "laptop_001",
+ "tips": [
+ "Ensure stable internet connection",
+ "Verify disk space > 500MB",
+ "Resume download if interrupted"
+ ]
+ }
+)
+
+# Returns complete constellation JSON
+constellation = json.loads(result)
+```
+
+#### Validation
+
+- **Unique task_id**: Must not conflict with existing tasks
+- **Auto-timestamps**: `created_at` and `updated_at` are automatically set
+- **Default values**: `status=PENDING`, `priority=MEDIUM` if not specified
+
+**Task ID Naming Best Practice**: Use descriptive, action-oriented identifiers:
+
+✅ Good: `"fetch_user_data"`, `"train_model"`, `"send_notification"`
+❌ Avoid: `"task1"`, `"t"`, `"temp"`
+
+---
+
+### remove_task
+
+Remove a task and all associated dependencies from the constellation.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `task_id` | `str` | ✅ Yes | Unique identifier of task to remove |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after removing task"
+}
+```
+
+#### Example Usage
+
+```python
+# Remove a task
+result = await mcp_client.call_tool(
+ tool_name="remove_task",
+ parameters={"task_id": "download_dataset"}
+)
+
+# Returns updated constellation without the task
+constellation = json.loads(result)
+```
+
+#### Side Effects
+
+**Cascade Deletion**: Removing a task automatically removes:
+
+- All **incoming dependencies** (edges pointing to this task)
+- All **outgoing dependencies** (edges from this task)
+
+This maintains DAG integrity by preventing dangling references.
+
+#### Validation
+
+- **Task exists**: `task_id` must exist in constellation
+- **Modifiable status**: Task must not be in `RUNNING`, `COMPLETED`, or `FAILED` states
+
+---
+
+### update_task
+
+Modify specific fields of an existing task.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `task_id` | `str` | ✅ Yes | Task identifier |
+| `name` | `str` | ❌ No (default: `None`) | New human-readable name |
+| `description` | `str` | ❌ No (default: `None`) | New detailed description |
+| `target_device_id` | `str` | ❌ No (default: `None`) | New target device |
+| `tips` | `List[str]` | ❌ No (default: `None`) | New tips list |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after updating task"
+}
+```
+
+#### Example Usage
+
+```python
+# Update task device assignment
+result = await mcp_client.call_tool(
+ tool_name="update_task",
+ parameters={
+ "task_id": "train_model",
+ "target_device_id": "gpu_server_002", # Switch to different GPU
+ "tips": [
+ "Use mixed precision training",
+ "Monitor GPU memory usage",
+ "Save checkpoints every 1000 steps"
+ ]
+ }
+)
+```
+
+#### Partial Updates
+
+Only provided fields are modified — others remain unchanged:
+
+```python
+# Update only description
+result = await mcp_client.call_tool(
+ tool_name="update_task",
+ parameters={
+ "task_id": "process_data",
+ "description": "Process data with enhanced validation and error handling"
+ # name, target_device_id, tips remain unchanged
+ }
+)
+```
+
+#### Validation
+
+- **At least one field**: Must provide at least one field to update
+- **Modifiable status**: Task must be in modifiable state
+- **Auto-update timestamp**: `updated_at` is automatically refreshed
+
+---
+
+## 🔗 Dependency Management Tools
+
+### add_dependency
+
+Create a dependency relationship (TaskStarLine) between two tasks.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | Unique line identifier (e.g., `"task_a->task_b"`, `"line_001"`) |
+| `from_task_id` | `str` | ✅ Yes | Source/prerequisite task that must complete first |
+| `to_task_id` | `str` | ✅ Yes | Target/dependent task that waits for source |
+| `condition_description` | `str` | ❌ No (default: `None`) | Human-readable explanation of dependency logic |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after adding dependency"
+}
+```
+
+#### Example Usage
+
+```python
+# Add unconditional dependency
+result = await mcp_client.call_tool(
+ tool_name="add_dependency",
+ parameters={
+ "dependency_id": "download->process",
+ "from_task_id": "download_dataset",
+ "to_task_id": "process_data",
+ "condition_description": "Processing requires dataset to be fully downloaded and verified"
+ }
+)
+```
+
+#### Dependency Types
+
+Currently defaults to **UNCONDITIONAL** dependency:
+
+```python
+{
+ "dependency_type": "unconditional" # Always wait for source to complete
+}
+```
+
+Future extensions may support:
+- `SUCCESS_ONLY`: Wait only if source succeeds
+- `CONDITIONAL`: Evaluate custom condition
+- `COMPLETION_ONLY`: Wait regardless of success/failure
+
+#### Validation
+
+- **Both tasks exist**: `from_task_id` and `to_task_id` must exist in constellation
+- **No cycles**: Adding dependency cannot create cycles in the DAG
+- **Unique line_id**: `dependency_id` must be unique
+- **No self-loops**: `from_task_id != to_task_id`
+
+**Cycle Detection**: The server validates DAG acyclicity after adding each dependency:
+
+```
+A → B → C
+ ↓
+ A ❌ Creates cycle!
+```
+
+---
+
+### remove_dependency
+
+Remove a specific dependency relationship.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | Line identifier to remove |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after removing dependency"
+}
+```
+
+#### Example Usage
+
+```python
+# Remove a dependency
+result = await mcp_client.call_tool(
+ tool_name="remove_dependency",
+ parameters={"dependency_id": "download->process"}
+)
+
+# Now process_data can run independently of download_dataset
+```
+
+#### Side Effects
+
+- Removing dependency does **NOT** affect the tasks themselves
+- Target task may become immediately ready if no other dependencies remain
+
+---
+
+### update_dependency
+
+Modify the condition description of an existing dependency.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | Line identifier |
+| `condition_description` | `str` | ✅ Yes | New explanation of dependency logic |
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of complete updated TaskConstellation after updating dependency"
+}
+```
+
+#### Example Usage
+
+```python
+# Update dependency description
+result = await mcp_client.call_tool(
+ tool_name="update_dependency",
+ parameters={
+ "dependency_id": "train->evaluate",
+ "condition_description": "Evaluation requires model training to complete successfully with validation loss < 0.5"
+ }
+)
+```
+
+---
+
+## 🏗️ Bulk Operations
+
+### build_constellation
+
+Batch-create a complete constellation from structured configuration.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `config` | `TaskConstellationSchema` | ✅ Yes | Constellation configuration with tasks and dependencies |
+| `clear_existing` | `bool` | ❌ No (default: `True`) | Clear existing constellation before building |
+
+#### Configuration Schema
+
+```python
+{
+ "tasks": [
+ {
+ "task_id": "string (required)",
+ "name": "string (optional)",
+ "description": "string (required)",
+ "target_device_id": "string (optional)",
+ "tips": ["string", ...] (optional),
+ "priority": int (1-4, optional),
+ "status": "string (optional)",
+ "task_data": dict (optional)
+ }
+ ],
+ "dependencies": [
+ {
+ "from_task_id": "string (required)",
+ "to_task_id": "string (required)",
+ "dependency_type": "string (optional)",
+ "condition_description": "string (optional)"
+ }
+ ],
+ "metadata": dict (optional)
+}
+```
+
+#### Return Value
+
+```json
+{
+ "type": "string",
+ "description": "JSON string of built TaskConstellation with all tasks, dependencies, and metadata"
+}
+```
+
+#### Example Usage
+
+```python
+# Build complete ML training pipeline
+config = {
+ "tasks": [
+ {
+ "task_id": "fetch_data",
+ "name": "Fetch Training Data",
+ "description": "Download CIFAR-10 dataset from S3",
+ "target_device_id": "laptop_001"
+ },
+ {
+ "task_id": "preprocess",
+ "name": "Preprocess Data",
+ "description": "Normalize images, augment with rotations",
+ "target_device_id": "server_001"
+ },
+ {
+ "task_id": "train",
+ "name": "Train Model",
+ "description": "Train ResNet-50 for 100 epochs",
+ "target_device_id": "gpu_server_001",
+ "tips": ["Use mixed precision", "Save checkpoints every 10 epochs"]
+ },
+ {
+ "task_id": "evaluate",
+ "name": "Evaluate Model",
+ "description": "Run inference on test set, compute metrics",
+ "target_device_id": "test_server_001"
+ }
+ ],
+ "dependencies": [
+ {
+ "from_task_id": "fetch_data",
+ "to_task_id": "preprocess",
+ "condition_description": "Preprocessing requires raw data"
+ },
+ {
+ "from_task_id": "preprocess",
+ "to_task_id": "train",
+ "condition_description": "Training requires preprocessed data"
+ },
+ {
+ "from_task_id": "train",
+ "to_task_id": "evaluate",
+ "condition_description": "Evaluation requires trained model"
+ }
+ ],
+ "metadata": {
+ "project": "image_classification",
+ "version": "1.0"
+ }
+}
+
+result = await mcp_client.call_tool(
+ tool_name="build_constellation",
+ parameters={
+ "config": config,
+ "clear_existing": True
+ }
+)
+```
+
+#### Execution Order
+
+1. **Clear existing** (if `clear_existing=True`)
+2. **Create all tasks** sequentially
+3. **Create all dependencies** sequentially
+4. **Validate DAG** structure (acyclicity, task references)
+5. **Return constellation** snapshot
+
+#### Validation
+
+- **Task references**: All `from_task_id` and `to_task_id` in dependencies must exist in tasks
+- **DAG acyclicity**: Final graph must have no cycles
+- **Schema compliance**: Pydantic validation ensures type correctness
+
+**Creation Mode Usage**: In creation mode, the Constellation Agent uses `build_constellation` to generate the initial constellation in a single operation, which is more efficient than incremental `add_task` calls.
+
+---
+
+## 📊 Tool Comparison Table
+
+| Tool | Category | Granularity | Creates | Modifies | Deletes | Returns |
+|------|----------|-------------|---------|----------|---------|---------|
+| `add_task` | Task | Single | ✅ Task | ❌ | ❌ | Full constellation |
+| `remove_task` | Task | Single | ❌ | ❌ | ✅ Task + deps | Full constellation |
+| `update_task` | Task | Single | ❌ | ✅ Task | ❌ | Full constellation |
+| `add_dependency` | Dependency | Single | ✅ Dependency | ❌ | ❌ | Full constellation |
+| `remove_dependency` | Dependency | Single | ❌ | ❌ | ✅ Dependency | Full constellation |
+| `update_dependency` | Dependency | Single | ❌ | ✅ Dependency | ❌ | Full constellation |
+| `build_constellation` | Bulk | Batch | ✅ Many | ✅ Full | ✅ All (if clear) | Full constellation |
+
+---
+
+## 🔄 Usage Patterns
+
+### Creation Mode Pattern
+
+```python
+# Agent creates initial constellation via build_constellation
+config = {
+ "tasks": [...],
+ "dependencies": [...]
+}
+
+constellation_json = await mcp_client.call_tool(
+ "build_constellation",
+ {"config": config, "clear_existing": True}
+)
+
+# Parse and start orchestration
+constellation = TaskConstellation.from_json(constellation_json)
+```
+
+### Editing Mode Pattern
+
+```python
+# Agent edits constellation incrementally based on events
+
+# Scenario: Training failed, add diagnostic task
+diagnostic_json = await mcp_client.call_tool(
+ "add_task",
+ {
+ "task_id": "diagnose_failure",
+ "name": "Diagnose Training Failure",
+ "description": "Check logs, GPU memory, data integrity",
+ "target_device_id": "gpu_server_001"
+ }
+)
+
+# Add dependency from failed task to diagnostic
+dep_json = await mcp_client.call_tool(
+ "add_dependency",
+ {
+ "dependency_id": "train->diagnose",
+ "from_task_id": "train",
+ "to_task_id": "diagnose_failure",
+ "condition_description": "Run diagnostics after training failure"
+ }
+)
+
+# Remove original deployment task (no longer needed)
+final_json = await mcp_client.call_tool(
+ "remove_task",
+ {"task_id": "deploy_model"}
+)
+```
+
+### Modification Constraints
+
+```python
+# Check if task is modifiable before editing
+modifiable_tasks = constellation.get_modifiable_tasks()
+modifiable_task_ids = {t.task_id for t in modifiable_tasks}
+
+if "train_model" in modifiable_task_ids:
+ # Safe to modify
+ await mcp_client.call_tool("update_task", {...})
+else:
+ # Task is RUNNING, COMPLETED, or FAILED - read-only
+ print("Task cannot be modified in current state")
+```
+
+---
+
+## 🛡️ Error Handling
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `Task not found` | Invalid `task_id` | Verify task exists in constellation |
+| `Dependency creates cycle` | Adding edge violates DAG | Remove conflicting dependencies |
+| `Task not modifiable` | Task is running/completed | Wait or skip modification |
+| `Duplicate task_id` | ID already exists | Use unique identifier |
+| `Invalid device` | `target_device_id` not in registry | Choose from available devices |
+| `At least one field required` | Empty `update_task` call | Provide fields to update |
+
+### Exception Handling
+
+```python
+from fastmcp.exceptions import ToolError
+
+try:
+ result = await mcp_client.call_tool(
+ "add_dependency",
+ {
+ "dependency_id": "c->a",
+ "from_task_id": "task_c",
+ "to_task_id": "task_a"
+ }
+ )
+except ToolError as e:
+ print(f"Operation failed: {e}")
+ # Output: "Failed to add dependency: Adding edge would create cycle"
+```
+
+---
+
+## 📈 Performance Characteristics
+
+### Operation Complexity
+
+| Tool | Time Complexity | Space Complexity | Notes |
+|------|----------------|------------------|-------|
+| `add_task` | $O(1)$ | $O(1)$ | Constant time insertion |
+| `remove_task` | $O(e)$ | $O(1)$ | Must remove $e$ dependencies |
+| `update_task` | $O(1)$ | $O(1)$ | In-place field update |
+| `add_dependency` | $O(n + e)$ | $O(n)$ | Cycle detection via DFS |
+| `remove_dependency` | $O(1)$ | $O(1)$ | Direct deletion |
+| `update_dependency` | $O(1)$ | $O(1)$ | In-place update |
+| `build_constellation` | $O(n + e)$ | $O(n + e)$ | Full constellation rebuild |
+
+Where:
+- $n$ = number of tasks
+- $e$ = number of dependencies
+
+### Scalability
+
+| Metric | Typical | Maximum Tested |
+|--------|---------|----------------|
+| Tasks per constellation | 5-20 | 100+ |
+| Dependencies per constellation | 4-30 | 200+ |
+| build_constellation latency | 50-200ms | 1s |
+| add_task latency | 10-50ms | 100ms |
+| Constellation JSON size | 5-50 KB | 500 KB |
+
+---
+
+## 💡 Best Practices
+
+### Tool Selection
+
+**Creation Mode:** Use `build_constellation` for initial synthesis
+
+**Editing Mode:** Use granular tools (`add_task`, `update_task`, etc.)
+
+**Bulk Edits:** Accumulate changes and apply via `build_constellation` with `clear_existing=False`
+
+### Modification Safety
+
+Always check task/dependency modifiability before calling update/remove tools:
+
+```python
+modifiable = constellation.get_modifiable_tasks()
+if task in modifiable:
+ await mcp_client.call_tool("update_task", ...)
+```
+
+### Idempotent Operations
+
+Design agent logic to be idempotent:
+
+```python
+# Safe to retry - will fail gracefully if task exists
+try:
+ await mcp_client.call_tool("add_task", {...})
+except ToolError:
+ # Task already exists, continue
+ pass
+```
+
+---
+
+## 🔗 Related Documentation
+
+- [Constellation Agent Overview](overview.md) — Architecture and weaving modes
+- [Constellation Agent State Machine](state.md) — FSM lifecycle and transitions
+- [Constellation Agent Strategy Pattern](strategy.md) — Processing strategies and prompters
+- [Constellation Editor MCP Server](../../mcp/servers/constellation_editor.md) — Detailed MCP server reference
+- [Task Constellation Overview](../constellation/overview.md) — DAG model and data structures
+- [Processor Framework](../../infrastructure/agents/design/processor.md) — Agent processing architecture
+
+---
+
+## 📋 API Reference
+
+### Tool Signatures
+
+```python
+# Task Management
+def add_task(
+ task_id: str,
+ name: str,
+ description: str,
+ target_device_id: Optional[str] = None,
+ tips: Optional[List[str]] = None
+) -> str # JSON string
+
+def remove_task(task_id: str) -> str # JSON string
+
+def update_task(
+ task_id: str,
+ name: Optional[str] = None,
+ description: Optional[str] = None,
+ target_device_id: Optional[str] = None,
+ tips: Optional[List[str]] = None
+) -> str # JSON string
+
+# Dependency Management
+def add_dependency(
+ dependency_id: str,
+ from_task_id: str,
+ to_task_id: str,
+ condition_description: Optional[str] = None
+) -> str # JSON string
+
+def remove_dependency(dependency_id: str) -> str # JSON string
+
+def update_dependency(
+ dependency_id: str,
+ condition_description: str
+) -> str # JSON string
+
+# Bulk Operations
+def build_constellation(
+ config: TaskConstellationSchema,
+ clear_existing: bool = True
+) -> str # JSON string
+```
+
+---
+
+**Constellation MCP Server — Structured, idempotent task manipulation for adaptive orchestration**
diff --git a/documents/docs/galaxy/constellation_agent/overview.md b/documents/docs/galaxy/constellation_agent/overview.md
new file mode 100644
index 000000000..d9a6431df
--- /dev/null
+++ b/documents/docs/galaxy/constellation_agent/overview.md
@@ -0,0 +1,545 @@
+# Constellation Agent — The Centralized Constellation Weaver
+
+The **Constellation Agent** serves as the central intelligence of UFO³ Galaxy, acting as both a planner and replanner. It interprets user intent, constructs executable Task Constellations, and dynamically steers their evolution across heterogeneous devices. By bridging high-level natural-language goals and concrete multi-agent execution, the Constellation Agent provides unified orchestration through a feedback-driven control loop.
+
+For an overview of the Galaxy system architecture, see [Galaxy Overview](../overview.md).
+
+## 🌟 Introduction
+
+
+**Figure:** An overview of the Constellation Agent showing the dual-mode control cycle between creation and editing phases.
+
+The Constellation Agent extends the abstract [Task Constellation](../constellation/overview.md) model into runtime execution. Residing within the **ConstellationClient** (see [Galaxy Client](../client/overview.md)), it transforms user requests into structured DAG workflows and continuously refines them as distributed agents provide feedback.
+
+Unlike traditional static DAG schedulers, the Constellation Agent operates as a **dynamic orchestrator** powered by an LLM-driven architecture and governed by a finite-state machine (FSM). This design enables it to alternate between two complementary operating modes:
+
+- **Creation Mode**: Synthesizes initial Task Constellations from user instructions
+- **Editing Mode**: Incrementally refines constellations based on runtime feedback
+
+This feedback-driven control loop achieves tight coupling between symbolic reasoning and distributed execution, maintaining global consistency while adapting to changing device conditions.
+
+## 🎯 Core Responsibilities
+
+The Constellation Agent orchestrates distributed workflows through structured feedback loops, alternating between creation and editing phases with explicit operational boundaries. For details on task execution, see [Constellation Orchestrator](../constellation_orchestrator/overview.md).
+
+### Primary Functions
+
+| Function | Description | Mode |
+|----------|-------------|------|
+| **Request Interpretation** | Parse user goals and context into actionable requirements | Creation |
+| **DAG Synthesis** | Decompose requests into structured Task Constellations with dependencies | Creation |
+| **Device Assignment** | Map tasks to appropriate devices based on AgentProfile capabilities | Creation |
+| **Runtime Monitoring** | Track task completion events and constellation state | Editing |
+| **Dynamic Adaptation** | Add, remove, or modify tasks/dependencies based on feedback | Editing |
+| **Consistency Maintenance** | Ensure DAG validity and execution correctness throughout lifecycle | Both |
+
+## 🏗️ Architecture
+
+### Dual-Mode Control System
+
+The Constellation Agent implements a **dual-mode control pattern** that separates planning from replanning:
+
+```mermaid
+graph LR
+ A[User Request] --> B[Creation Mode]
+ B --> C[Initial Constellation]
+ C --> D[Orchestrator]
+ D --> E[Task Execution]
+ E --> F{Event Queue}
+ F -->|Task Completed| G[Editing Mode]
+ G --> H[Updated Constellation]
+ H --> D
+ F -->|All Complete| I[Finish]
+
+ style B fill:#e1f5ff
+ style G fill:#fff4e1
+ style I fill:#e8f5e9
+```
+
+### Component Integration
+
+```mermaid
+graph TB
+ subgraph "Constellation Agent"
+ FSM[Finite State Machine]
+ Prompter[Prompter]
+ Processor[Agent Processor]
+ end
+
+ subgraph "MCP Layer"
+ Dispatcher[Command Dispatcher]
+ MCP[MCP Server Manager]
+ Editor[Constellation Editor MCP]
+ end
+
+ subgraph "Execution Layer"
+ Orchestrator[Task Orchestrator]
+ EventBus[Event Bus]
+ end
+
+ FSM --> Prompter
+ Prompter --> Processor
+ Processor --> Dispatcher
+ Dispatcher --> MCP
+ MCP --> Editor
+ Editor --> Orchestrator
+ Orchestrator --> EventBus
+ EventBus -->|Task Events| FSM
+
+ style FSM fill:#e1f5ff
+ style MCP fill:#fff4e1
+ style Orchestrator fill:#e8f5e9
+```
+
+## 🔄 Creation Mode
+
+In creation mode, the Constellation Agent receives a user request and generates the initial Task Constellation.
+
+### Inputs
+
+| Input | Type | Description |
+|-------|------|-------------|
+| **User Request** | `str` | Natural language goal or structured command |
+| **AgentProfile Registry** | `Dict[str, AgentProfile]` | Available device agents with capabilities and metadata |
+| **Demonstration Examples** | `List[Example]` | In-context learning examples for task decomposition |
+
+### Processing Flow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Agent as Constellation Agent
+ participant Prompter
+ participant LLM
+ participant Dispatcher as Command Dispatcher
+ participant MCP as MCP Server Manager
+ participant Editor as Constellation Editor MCP
+ participant Orchestrator
+
+ User->>Agent: Submit Request
+ Agent->>Prompter: Format Creation Prompt
+ Prompter->>LLM: Send Prompt + Examples
+ LLM->>Agent: Return Constellation JSON
+ Agent->>Dispatcher: Execute build_constellation
+ Dispatcher->>MCP: Route Command
+ MCP->>Editor: Call build_constellation
+ Editor->>MCP: Return Built Constellation
+ MCP->>Dispatcher: Return Result
+ Dispatcher->>Agent: Constellation Ready
+ Agent->>Orchestrator: Start Execution
+ Orchestrator-->>Agent: Constellation Started
+ Agent->>User: Display Initial Plan
+```
+
+### Outputs
+
+| Output | Type | Description |
+|--------|------|-------------|
+| **Task Constellation** | `TaskConstellation` | Structured DAG with tasks and dependencies |
+| **Observation** | `str` | Analysis of input context and device profiles |
+| **Thought** | `str` | Reasoning trace explaining decomposition logic |
+| **State** | `ConstellationAgentStatus` | Next FSM state (typically `CONTINUE`) |
+| **Result** | `Any` | Summary for user or error message |
+
+**Example: Creation Mode Response**
+
+**User Request:** "Download dataset on laptop, preprocess on server, train model on GPU"
+
+**Generated Constellation:**
+
+- Task 1: `fetch_data` → Device: laptop
+- Task 2: `preprocess` → Device: linux_server (depends on Task 1)
+- Task 3: `train_model` → Device: gpu_server (depends on Task 2)
+
+**Thought:** "Decomposed into 3 sequential tasks based on computational requirements. Laptop handles download, server preprocesses data, GPU server trains model."
+
+## ✏️ Editing Mode
+
+During execution, the Constellation Agent enters editing mode to process task completion events and adapt the constellation.
+
+### Inputs
+
+| Input | Type | Description |
+|-------|------|-------------|
+| **Original Request** | `str` | The initial user request for context |
+| **AgentProfile Registry** | `Dict[str, AgentProfile]` | Current device availability |
+| **Current Constellation** | `TaskConstellation` | Serialized constellation snapshot |
+| **Task Events** | `List[TaskEvent]` | Completion/failure events from orchestrator |
+| **Demonstration Examples** | `List[Example]` | In-context learning examples for editing |
+
+### Processing Flow
+
+```mermaid
+sequenceDiagram
+ participant Orchestrator
+ participant EventBus
+ participant Agent as Constellation Agent
+ participant Prompter
+ participant LLM
+ participant Dispatcher as Command Dispatcher
+ participant MCP as MCP Server Manager
+ participant Editor as Constellation Editor MCP
+
+ Orchestrator->>EventBus: Task Completed Event
+ EventBus->>Agent: Queue Event
+ Agent->>Agent: Collect Pending Events
+ Agent->>Dispatcher: Sync Constellation State
+ Dispatcher->>MCP: build_constellation (sync)
+ MCP->>Editor: Update State
+ Agent->>Prompter: Format Editing Prompt
+ Prompter->>LLM: Send Current State + Events
+ LLM->>Agent: Return Modification Actions
+ Agent->>Dispatcher: Execute Modification Commands
+ Dispatcher->>MCP: Route Commands
+ MCP->>Editor: Apply Modifications
+ Editor->>MCP: Return Updated Constellation
+ MCP->>Dispatcher: Return Results
+ Dispatcher->>Agent: Constellation Updated
+ Agent->>EventBus: Publish Modified Event
+ Agent->>Orchestrator: Continue Execution
+```
+
+### Editing Operations
+
+The agent can perform the following modifications through the MCP-based Constellation Editor:
+
+| Operation | Use Case | Example |
+|-----------|----------|---------|
+| **Add Task** | Introduce follow-up or diagnostic tasks | Add health check after training fails |
+| **Remove Task** | Prune redundant or obsolete tasks | Remove preprocessing if data is pre-processed |
+| **Update Task** | Modify description, device, or tips | Switch training to different GPU |
+| **Add Dependency** | Establish new task relationships | Make validation depend on training |
+| **Remove Dependency** | Decouple independent tasks | Remove unnecessary sequential constraint |
+| **Update Dependency** | Change conditional logic | Update success criteria for task trigger |
+
+> **Note:** Only tasks in `PENDING` or `WAITING_DEPENDENCY` status can be modified. Running or completed tasks are **read-only** to ensure execution consistency.
+
+### Outputs
+
+| Output | Type | Description |
+|--------|------|-------------|
+| **Updated Constellation** | `TaskConstellation` | Modified DAG with new tasks/dependencies |
+| **Thought** | `str` | Reasoning explaining modifications or no-op |
+| **State** | `ConstellationAgentStatus` | Next FSM state (`CONTINUE`, `FINISH`, or `FAIL`) |
+| **Result** | `Any` | Summary of changes or completion status |
+
+## 🔁 Finite-State Machine Lifecycle
+
+
+**Figure:** Lifecycle state transitions of the Constellation Agent FSM.
+
+The Constellation Agent's behavior is governed by a **4-state finite-state machine**:
+
+| State | Description | Triggers |
+|-------|-------------|----------|
+| **START** | Initialize constellation, begin orchestration | Agent instantiation, restart after completion |
+| **CONTINUE** | Monitor events, process feedback, update constellation | Task completion/failure events |
+| **FINISH** | Successful termination, aggregate results | All tasks completed successfully |
+| **FAIL** | Terminal error state, abort execution | Irrecoverable errors, validation failures |
+
+### State Transition Rules
+
+```mermaid
+stateDiagram-v2
+ [*] --> START: Initialize Agent
+ START --> CONTINUE: Constellation Created
+ START --> FAIL: Creation Failed
+
+ CONTINUE --> CONTINUE: Process Events
+ CONTINUE --> FINISH: All Tasks Complete
+ CONTINUE --> FAIL: Critical Error
+ CONTINUE --> START: New Constellation Needed
+
+ FINISH --> [*]
+ FAIL --> [*]
+
+ note right of START
+ Creation Mode:
+ - Generate initial constellation
+ - Validate DAG structure
+ - Start orchestration
+ end note
+
+ note right of CONTINUE
+ Editing Mode:
+ - Wait for task events
+ - Process completion feedback
+ - Apply modifications
+ end note
+```
+
+For detailed state machine documentation, see [State Machine Details](state.md).
+
+## 🛠️ MCP-Based Constellation Editor
+
+The Constellation Agent interacts with the **Constellation Editor** through the **Model Context Protocol (MCP)** layer. The architecture uses:
+
+- **MCP Server Manager**: Routes commands to appropriate MCP servers
+- **Command Dispatcher**: Provides a unified interface for executing MCP commands
+- **Constellation Editor MCP Server**: Implements the actual constellation manipulation operations
+
+This MCP-based architecture provides:
+
+- **Protocol Standardization**: Consistent interface across all agent types
+- **Loose Coupling**: Agent logic decoupled from editor implementation
+- **Extensibility**: Easy to add new operations or alternative editors
+- **Tool Discovery**: Dynamic tool listing via `list_tools` command
+
+### Core MCP Operations
+
+The Constellation Editor MCP Server exposes the following operations:
+
+| Operation | Purpose | Inputs | Output |
+|------|---------|--------|--------|
+| `build_constellation` | Batch-create constellation from config | Configuration dict, clear flag | Built constellation |
+| `add_task` | Add atomic task node | Task ID, name, description, device, tips | Updated constellation |
+| `remove_task` | Remove task and dependencies | Task ID | Updated constellation |
+| `update_task` | Modify task fields | Task ID + updated fields | Updated constellation |
+| `add_dependency` | Create dependency edge | From/to task IDs, type, condition | Updated constellation |
+| `remove_dependency` | Delete dependency | Dependency ID | Updated constellation |
+| `update_dependency` | Update dependency logic | Dependency ID, condition | Updated constellation |
+
+All operations are:
+
+- **Idempotent**: Safe to retry without side effects
+- **Atomic**: Single operation per command
+- **Validated**: Ensures DAG consistency after each modification
+- **Auditable**: All changes are logged and traceable
+
+For complete MCP command specifications and examples, see [Command Reference](command.md). For details on the underlying Task Constellation structure, see [Task Constellation Overview](../constellation/overview.md).
+
+## 📋 Processing Pipeline
+
+The Constellation Agent follows a **4-phase processing pipeline** for both creation and editing modes:
+
+### Phase 1: Context Provision
+
+```python
+# Load available MCP tools from Constellation Editor
+await agent.context_provision(context=context)
+# Queries MCP server for available operations via list_tools
+# Formats tools into LLM-compatible prompt
+```
+
+### Phase 2: LLM Interaction
+
+```python
+# Construct prompt based on mode
+prompt = agent.message_constructor(
+ request=user_request,
+ device_info=agent_profiles,
+ constellation=current_constellation
+)
+
+# Get LLM response
+response = await llm.query(prompt)
+# Returns: ConstellationAgentResponse with thought, status, actions
+```
+
+### Phase 3: Action Execution
+
+```python
+# Execute MCP commands via Command Dispatcher
+for command in response.actions:
+ result = await context.command_dispatcher.execute_commands([command])
+
+# Validate constellation
+is_valid, errors = constellation.validate_dag()
+```
+
+### Phase 4: Memory Update
+
+```python
+# Update global context
+context.set(ContextNames.CONSTELLATION, updated_constellation)
+context.set(ContextNames.ROUND_RESULT, results)
+
+# Log to memory
+memory.add_round_log(
+ step=step,
+ weaving_mode=mode,
+ request=request,
+ constellation=constellation,
+ response=response
+)
+```
+
+## 🎭 Prompter Architecture
+
+The Constellation Agent uses the **Factory Pattern** to create appropriate prompters for different weaving modes (creation and editing).
+
+### Prompter Hierarchy
+
+```mermaid
+classDiagram
+ class BaseConstellationPrompter {
+ <>
+ +format_agent_profile()
+ +format_constellation()
+ +user_content_construction()
+ +system_prompt_construction()
+ }
+
+ class ConstellationCreationPrompter {
+ +user_prompt_construction()
+ +examples_prompt_helper()
+ }
+
+ class ConstellationEditingPrompter {
+ +user_prompt_construction()
+ +examples_prompt_helper()
+ }
+
+ class ConstellationPrompterFactory {
+ +create_prompter(mode)
+ +get_supported_modes()
+ }
+
+ BaseConstellationPrompter <|-- ConstellationCreationPrompter
+ BaseConstellationPrompter <|-- ConstellationEditingPrompter
+ ConstellationPrompterFactory --> BaseConstellationPrompter
+```
+
+### Factory Pattern Benefits
+
+| Benefit | Description |
+|---------|-------------|
+| **Mode Isolation** | Creation and editing prompts remain independent |
+| **Extensibility** | New modes can be added without modifying existing code |
+| **Type Safety** | Compile-time checking for prompter selection |
+| **Testability** | Each prompter can be unit tested independently |
+
+For complete prompter architecture documentation, see [Prompter Details](strategy.md).
+
+## 💡 Key Design Benefits
+
+### 1. Unified Reasoning and Control
+
+High-level task synthesis and low-level execution coordination are decoupled yet tightly synchronized through the Task Constellation abstraction. The agent focuses on semantic reasoning while the orchestrator handles distributed execution.
+
+### 2. Dynamic Adaptability
+
+The editable constellation enables:
+- **Failure Recovery**: Add diagnostic tasks after failures
+- **Resource Reallocation**: Switch tasks to available devices
+- **Opportunistic Execution**: Insert new tasks as conditions permit
+
+### 3. End-to-End Observability
+
+Complete lineage tracking of:
+- **State Transitions**: FSM state changes logged with timestamps
+- **Modifications**: All edits tracked with before/after snapshots
+- **Events**: Task completion events queued and processed
+- **Reasoning Traces**: LLM thought processes captured in memory
+
+### 4. Safe Modification Guarantees
+
+The FSM + MCP Server architecture ensures:
+- **Acyclicity**: DAG validation prevents circular dependencies
+- **Consistency**: Only modifiable tasks can be edited
+- **Atomicity**: Each MCP operation is atomic and idempotent
+- **Auditability**: Full modification history maintained
+
+## 🔍 Example Workflow
+
+### User Request
+```
+"Download MNIST dataset on laptop, train CNN on GPU server,
+evaluate on test server, deploy to production if accuracy > 95%"
+```
+
+### Creation Mode Output
+
+```json
+{
+ "thought": "Decomposed into 4 tasks: (1) download on laptop, (2) train on GPU, (3) evaluate on test server, (4) conditional deploy based on accuracy",
+ "status": "CONTINUE",
+ "constellation": {
+ "tasks": [
+ {"task_id": "task_001", "name": "download_mnist", "device": "laptop"},
+ {"task_id": "task_002", "name": "train_cnn", "device": "gpu_server"},
+ {"task_id": "task_003", "name": "evaluate", "device": "test_server"},
+ {"task_id": "task_004", "name": "deploy", "device": "prod_server"}
+ ],
+ "dependencies": [
+ {"from": "task_001", "to": "task_002", "type": "SUCCESS_ONLY"},
+ {"from": "task_002", "to": "task_003", "type": "SUCCESS_ONLY"},
+ {"from": "task_003", "to": "task_004", "type": "CONDITIONAL",
+ "condition": "accuracy > 0.95"}
+ ]
+ }
+}
+```
+
+### Editing Mode Event
+
+```
+Task task_003 (evaluate) completed with result: {"accuracy": 0.92}
+```
+
+### Editing Mode Output
+
+```json
+{
+ "thought": "Evaluation accuracy (92%) did not meet deployment threshold (95%). Adding retraining task with adjusted hyperparameters. Removing original deployment task.",
+ "status": "CONTINUE",
+ "actions": [
+ {"tool": "add_task", "parameters": {
+ "task_id": "task_005",
+ "name": "retrain_with_tuning",
+ "device": "gpu_server",
+ "description": "Retrain with learning rate decay and data augmentation"
+ }},
+ {"tool": "add_dependency", "parameters": {
+ "from": "task_003", "to": "task_005", "type": "SUCCESS_ONLY"
+ }},
+ {"tool": "remove_task", "parameters": {"task_id": "task_004"}}
+ ]
+}
+```
+
+## 📊 Performance Characteristics
+
+### Creation Complexity
+
+- **Time**: $O(n \cdot m)$ where $n$ is task count, $m$ is LLM inference time
+- **Space**: $O(n + e)$ for $n$ tasks and $e$ edges
+- **Validation**: $O(n + e)$ for DAG cycle detection (DFS)
+
+### Editing Complexity
+
+- **Event Processing**: $O(k)$ for $k$ queued events (batched)
+- **Modification**: $O(1)$ per MCP command (constant time)
+- **Re-validation**: $O(n + e)$ for modified constellation
+
+### Scalability
+
+| Metric | Typical | Maximum Tested |
+|--------|---------|----------------|
+| Tasks per Constellation | 5-20 | 100+ |
+| Dependencies per Constellation | 4-30 | 200+ |
+| Editing Events per Session | 1-10 | 50+ |
+| LLM Response Time | 2-5s | 15s |
+
+## 🔗 Related Components
+
+- **[Task Constellation](../constellation/overview.md)** — Abstract DAG model
+- **[TaskStar](../constellation/task_star.md)** — Atomic execution units
+- **[TaskStarLine](../constellation/task_star_line.md)** — Dependency relationships
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** — Distributed executor
+- **[State Machine](state.md)** — FSM lifecycle details
+- **[Prompter Details](strategy.md)** — Prompter architecture
+- **[Command Reference](command.md)** — Editor operation specifications
+
+## 🎯 Summary
+
+The Constellation Agent serves as the **central weaver** of distributed intelligence in UFO³ Galaxy. Through its dual-mode control loop, finite-state machine governance, and MCP-based constellation manipulation, it transforms abstract user goals into live, evolving constellations—maintaining both rigor and adaptability across the complete lifecycle of multi-device orchestration.
+
+**Key Capabilities:**
+
+- **Semantic Decomposition**: Natural language → structured DAG
+- **Dynamic Adaptation**: Runtime graph evolution based on feedback
+- **MCP Integration**: Protocol-based tool invocation for extensibility
+- **Formal Guarantees**: DAG validity + safe concurrent modification
+- **Complete Observability**: Full lineage tracking and reasoning traces
+- **Modular Design**: Clean separation between reasoning and execution
diff --git a/documents/docs/galaxy/constellation_agent/state.md b/documents/docs/galaxy/constellation_agent/state.md
new file mode 100644
index 000000000..3bca4c7d9
--- /dev/null
+++ b/documents/docs/galaxy/constellation_agent/state.md
@@ -0,0 +1,643 @@
+# Constellation Agent State Machine
+
+The Constellation Agent's finite-state machine provides deterministic lifecycle management while enabling dynamic constellation evolution. This FSM governs how the agent transitions between creation, monitoring, success, and failure states—ensuring predictable behavior in complex distributed workflows.
+
+For an overview of the Constellation Agent architecture, see [Overview](overview.md).
+
+## 📐 State Machine Overview
+
+
+**Figure:** Lifecycle state transitions of the Constellation Agent showing the 4-state FSM.
+
+The Constellation Agent implements a **4-state finite-state machine (FSM)** that provides clear, enforceable structure for task lifecycle management. This design separates LLM reasoning from deterministic control logic, improving safety and debuggability.
+
+### State Space
+
+```mermaid
+stateDiagram-v2
+ [*] --> START: Agent Initialization
+ START --> CONTINUE: Constellation Created Successfully
+ START --> FAIL: Creation Failed
+
+ CONTINUE --> CONTINUE: Process Task Events
+ CONTINUE --> FINISH: All Tasks Complete
+ CONTINUE --> FAIL: Critical Error
+ CONTINUE --> START: Restart Needed
+
+ FINISH --> [*]: Success
+ FAIL --> [*]: Abort
+```
+
+## 🎯 State Definitions
+
+### State Enumeration
+
+```python
+class ConstellationAgentStatus(Enum):
+ """Constellation Agent states"""
+ START = "START"
+ CONTINUE = "CONTINUE"
+ FINISH = "FINISH"
+ FAIL = "FAIL"
+```
+
+| State | Type | Description | Entry Conditions |
+|-------|------|-------------|------------------|
+| **START** | Initial | Initialize and create constellation | Agent instantiation, restart after completion |
+| **CONTINUE** | Steady-State | Monitor events and process feedback | Constellation created successfully |
+| **FINISH** | Terminal | Successful termination | All tasks completed, no edits needed |
+| **FAIL** | Terminal | Error termination | Irrecoverable errors, validation failures |
+
+## 🚀 START State
+
+### Purpose
+
+The START state is the **initialization and creation phase** where the agent:
+1. Generates the initial Task Constellation from user request
+2. Validates DAG structure for correctness
+3. Launches background orchestration
+4. Transitions to monitoring mode
+
+### State Handler Implementation
+
+```python
+@ConstellationAgentStateManager.register
+class StartConstellationAgentState(ConstellationAgentState):
+ """Start state - create and execute constellation"""
+
+ async def handle(self, agent: "ConstellationAgent", context: Context) -> None:
+ # Skip if already in terminal state
+ if agent.status in [
+ ConstellationAgentStatus.FINISH.value,
+ ConstellationAgentStatus.FAIL.value,
+ ]:
+ return
+
+ # Initialize timing_info
+ timing_info = {}
+
+ # Create constellation if not exists
+ if not agent.current_constellation:
+ context.set(ContextNames.WEAVING_MODE, WeavingMode.CREATION)
+
+ agent._current_constellation, timing_info = (
+ await agent.process_creation(context)
+ )
+
+ # Start orchestration in background
+ if agent.current_constellation:
+ asyncio.create_task(
+ agent.orchestrator.orchestrate_constellation(
+ agent.current_constellation,
+ metadata=timing_info
+ )
+ )
+ agent.status = ConstellationAgentStatus.CONTINUE.value
+ elif agent.status == ConstellationAgentStatus.CONTINUE.value:
+ agent.status = ConstellationAgentStatus.FAIL.value
+```
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant FSM as State Machine
+ participant Agent
+ participant Creation as Creation Process
+ participant Validator
+ participant Orchestrator
+
+ FSM->>Agent: handle(START)
+ Agent->>Agent: Check if constellation exists
+
+ alt No Constellation
+ Agent->>Creation: process_creation(context)
+ Creation->>Agent: Return constellation + timing
+ Agent->>Validator: validate_dag()
+
+ alt Valid DAG
+ Validator-->>Agent: Success
+ Agent->>Orchestrator: orchestrate_constellation()
+ Note over Orchestrator: Background task started
+ Agent->>FSM: Set status = CONTINUE
+ else Invalid DAG
+ Validator-->>Agent: Errors
+ Agent->>FSM: Set status = FAIL
+ end
+ else Constellation Exists
+ Agent->>Orchestrator: orchestrate_constellation()
+ Agent->>FSM: Set status = CONTINUE
+ end
+```
+
+### Behaviors
+
+| Scenario | Action | Next State |
+|----------|--------|------------|
+| **First Execution** | Generate constellation via LLM | `CONTINUE` (success) / `FAIL` (error) |
+| **Restart Trigger** | Use existing constellation | `CONTINUE` |
+| **Creation Failure** | Log error, no constellation created | `FAIL` |
+| **Validation Failure** | DAG contains cycles or invalid structure | `FAIL` |
+| **Already Terminal** | No-op, return immediately | Same state |
+
+> **Tip:** Orchestration is launched as a **non-blocking** background task using `asyncio.create_task()`. This allows the agent to transition to CONTINUE state immediately and begin monitoring for events.
+
+### Error Handling
+
+```python
+try:
+ # Creation logic
+ agent._current_constellation, timing_info = (
+ await agent.process_creation(context)
+ )
+except AttributeError as e:
+ agent.logger.error(f"Attribute error: {traceback.format_exc()}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+except KeyError as e:
+ agent.logger.error(f"Missing key: {traceback.format_exc()}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+except Exception as e:
+ agent.logger.error(f"Unexpected error: {traceback.format_exc()}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+```
+
+## 🔄 CONTINUE State
+
+### Purpose
+
+The CONTINUE state is the **steady-state monitoring and editing phase** where the agent:
+1. Waits for task completion/failure events from orchestrator
+2. Collects batched events from the queue
+3. Merges constellation state with latest modifications
+4. Processes events and applies edits
+5. Loops until all tasks complete or critical error occurs
+
+### State Handler Implementation
+
+```python
+@ConstellationAgentStateManager.register
+class ContinueConstellationAgentState(ConstellationAgentState):
+ """Continue state - wait for task completion events"""
+
+ async def handle(self, agent: "ConstellationAgent", context=None) -> None:
+ # Set editing mode
+ context.set(ContextNames.WEAVING_MODE, WeavingMode.EDITING)
+
+ # Collect task completion events (batched)
+ completed_task_events = []
+
+ # Wait for at least one event (blocking)
+ first_event = await agent.task_completion_queue.get()
+ completed_task_events.append(first_event)
+
+ # Collect other pending events (non-blocking)
+ while not agent.task_completion_queue.empty():
+ try:
+ event = agent.task_completion_queue.get_nowait()
+ completed_task_events.append(event)
+ except asyncio.QueueEmpty:
+ break
+
+ # Get latest constellation and merge states
+ latest_constellation = completed_task_events[-1].data.get("constellation")
+ merged_constellation = await self._get_merged_constellation(
+ agent, latest_constellation
+ )
+
+ # Process editing with all collected events
+ await agent.process_editing(
+ context=context,
+ task_ids=[e.task_id for e in completed_task_events],
+ before_constellation=merged_constellation
+ )
+```
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant FSM as State Machine
+ participant Agent
+ participant Queue as Event Queue
+ participant Sync as State Synchronizer
+ participant Editing as Editing Process
+
+ FSM->>Agent: handle(CONTINUE)
+ Agent->>Queue: Wait for event (blocking)
+ Queue-->>Agent: Task Event 1
+
+ loop Collect Pending
+ Agent->>Queue: Get nowait()
+ Queue-->>Agent: Task Event N
+ end
+
+ Agent->>Sync: Merge constellation states
+ Sync-->>Agent: Merged constellation
+
+ Agent->>Editing: process_editing(events, constellation)
+ Editing->>Agent: Updated constellation
+
+ Agent->>FSM: Update status
+```
+
+### Event Batching
+
+**Why Batch Events?**
+
+If multiple tasks complete simultaneously (e.g., parallel execution), the agent collects **all pending events** before processing. This enables:
+
+- **Single LLM call** instead of multiple sequential calls
+- **Atomic modifications** reflecting multiple completions
+- **Reduced latency** and lower API costs
+
+```python
+# Example: 3 tasks complete in quick succession
+# Without batching: 3 LLM calls, 3 editing sessions
+# With batching: 1 LLM call, 1 editing session processing all 3 events
+```
+
+### State Merging
+
+The **state synchronizer** merges the orchestrator's constellation with agent modifications:
+
+```python
+async def _get_merged_constellation(
+ self, agent: "ConstellationAgent", orchestrator_constellation
+):
+ """
+ Get real-time merged constellation from synchronizer.
+
+ Ensures agent processes with most up-to-date state, including
+ structural modifications from previous editing sessions.
+ """
+ synchronizer = agent.orchestrator._modification_synchronizer
+
+ if not synchronizer:
+ return orchestrator_constellation
+
+ merged_constellation = synchronizer.merge_and_sync_constellation_states(
+ orchestrator_constellation=orchestrator_constellation
+ )
+
+ agent.logger.info(
+ f"Merged constellation for editing. "
+ f"Tasks before: {len(orchestrator_constellation.tasks)}, "
+ f"Tasks after merge: {len(merged_constellation.tasks)}"
+ )
+
+ return merged_constellation
+```
+
+> **Warning:** State synchronization is critical. Consider this scenario:
+>
+> 1. Task A completes → Agent edits constellation (adds Task C)
+> 2. Task B completes **while editing is happening**
+> 3. Without merging: Task B editing sees **old state** (no Task C)
+> 4. With merging: Task B editing sees **merged state** (includes Task C)
+
+### Behaviors
+
+| Scenario | Action | Next State |
+|----------|--------|------------|
+| **Task Completed** | Process event, apply edits | `CONTINUE` |
+| **Multiple Tasks Completed** | Batch process, single edit session | `CONTINUE` |
+| **All Tasks Done** | Agent decides to finish | `FINISH` |
+| **Critical Error** | Exception during processing | `FAIL` |
+| **Restart Needed** | New constellation required | `START` |
+
+### Transition Logic
+
+```python
+# Agent's editing process sets status based on analysis:
+
+if constellation.is_complete() and no_more_edits_needed:
+ agent.status = ConstellationAgentStatus.FINISH.value
+elif critical_error_occurred:
+ agent.status = ConstellationAgentStatus.FAIL.value
+elif new_constellation_needed:
+ agent.status = ConstellationAgentStatus.START.value
+else:
+ agent.status = ConstellationAgentStatus.CONTINUE.value # Keep monitoring
+```
+
+## ✅ FINISH State
+
+### Purpose
+
+The FINISH state represents **successful termination** when:
+- All tasks in the constellation have completed successfully
+- No further edits are necessary
+- User goal has been achieved
+
+### State Handler Implementation
+
+```python
+@ConstellationAgentStateManager.register
+class FinishConstellationAgentState(ConstellationAgentState):
+ """Finish state - task completed successfully"""
+
+ async def handle(self, agent: "ConstellationAgent", context=None) -> None:
+ agent.logger.info("Galaxy task completed successfully")
+ agent._status = ConstellationAgentStatus.FINISH.value
+
+ def next_state(self, agent: "ConstellationAgent") -> AgentState:
+ return self # Terminal state - no transitions
+
+ def is_round_end(self) -> bool:
+ return True
+
+ def is_subtask_end(self) -> bool:
+ return True
+```
+
+### Characteristics
+
+| Property | Value | Description |
+|----------|-------|-------------|
+| **Terminal** | Yes | No outgoing transitions |
+| **Round End** | Yes | Marks execution round complete |
+| **Subtask End** | Yes | Marks all subtasks complete |
+
+### Entry Conditions
+
+```python
+# LLM decides to finish based on constellation state
+{
+ "thought": "All tasks completed successfully. No further actions needed.",
+ "status": "FINISH",
+ "result": {
+ "summary": "Dataset downloaded, model trained, deployed to production",
+ "total_tasks": 5,
+ "completed": 5,
+ "failed": 0
+ }
+}
+```
+
+**Clean Termination:**
+
+The FINISH state ensures graceful shutdown with:
+
+- All resources released
+- Final results aggregated
+- Memory logs persisted
+- Success metrics recorded
+
+## ❌ FAIL State
+
+### Purpose
+
+The FAIL state represents **error termination** when:
+- Irrecoverable errors occur during creation or editing
+- DAG validation fails
+- Critical system failures prevent continuation
+
+### State Handler Implementation
+
+```python
+@ConstellationAgentStateManager.register
+class FailConstellationAgentState(ConstellationAgentState):
+ """Fail state - task failed"""
+
+ async def handle(self, agent: "ConstellationAgent", context=None) -> None:
+ agent.logger.error("Galaxy task failed")
+ agent._status = ConstellationAgentStatus.FAIL.value
+
+ def next_state(self, agent: "ConstellationAgent") -> AgentState:
+ return self # Terminal state - no transitions
+
+ def is_round_end(self) -> bool:
+ return True
+
+ def is_subtask_end(self) -> bool:
+ return True
+```
+
+### Failure Scenarios
+
+| Scenario | Trigger | Recovery |
+|----------|---------|----------|
+| **Creation Failure** | LLM cannot decompose request | User reformulates request |
+| **Validation Failure** | Generated DAG has cycles | Agent retries or manual fix |
+| **Critical Exception** | Unexpected system error | Check logs, restart agent |
+| **Timeout** | Processing exceeds limits | Increase timeout or simplify task |
+
+### Error Propagation
+
+```python
+# Example error chain:
+try:
+ constellation = await agent.process_creation(context)
+except Exception as e:
+ agent.logger.error(f"Creation failed: {e}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+ # State machine handles transition to FAIL state
+```
+
+> **Important:** Both FINISH and FAIL states are **terminal** — they have no outgoing transitions. This ensures the agent cannot accidentally resume execution after completion or failure.
+
+## 🔀 State Transitions
+
+### Transition Matrix
+
+| From ↓ / To → | START | CONTINUE | FINISH | FAIL |
+|---------------|-------|----------|--------|------|
+| **START** | ❌ | ✅ (success) | ❌ | ✅ (error) |
+| **CONTINUE** | ✅ (restart) | ✅ (loop) | ✅ (done) | ✅ (error) |
+| **FINISH** | ❌ | ❌ | ✅ (stay) | ❌ |
+| **FAIL** | ❌ | ❌ | ❌ | ✅ (stay) |
+
+### Transition Rules
+
+```python
+class ConstellationAgentState(AgentState):
+ """Base state for Constellation Agent"""
+
+ def next_state(self, agent: "ConstellationAgent") -> AgentState:
+ """Determine next state based on agent status"""
+ status = agent.status
+ state = ConstellationAgentStateManager().get_state(status)
+ return state
+```
+
+### State Manager
+
+```python
+class ConstellationAgentStateManager(AgentStateManager):
+ """State manager for Constellation Agent"""
+
+ _state_mapping: Dict[str, Type[AgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ return StartConstellationAgentState()
+```
+
+The state manager uses the **@register decorator** pattern to automatically register state classes. For more details on the overall agent architecture, see [Constellation Agent Overview](overview.md).
+
+```python
+@ConstellationAgentStateManager.register
+class StartConstellationAgentState(ConstellationAgentState):
+ @classmethod
+ def name(cls) -> str:
+ return ConstellationAgentStatus.START.value
+```
+
+## 📊 State Metrics
+
+### Execution Timeline
+
+```mermaid
+gantt
+ title Constellation Agent State Timeline
+ dateFormat YYYY-MM-DD
+ section States
+ START :start1, 2024-01-01, 3s
+ CONTINUE :cont1, after start1, 30s
+ CONTINUE :cont2, after cont1, 25s
+ CONTINUE :cont3, after cont2, 20s
+ FINISH :finish1, after cont3, 1s
+```
+
+### Typical Duration
+
+| State | Typical Duration | Factors |
+|-------|------------------|---------|
+| **START** | 2-5 seconds | LLM response time, validation complexity |
+| **CONTINUE** | Variable (10s - 10min) | Task execution time, parallelism |
+| **FINISH** | < 1 second | Logging and cleanup |
+| **FAIL** | < 1 second | Error logging |
+
+## 🛡️ Error Handling
+
+### Exception Hierarchy
+
+```python
+# START State Error Handling
+try:
+ constellation, timing = await agent.process_creation(context)
+except AttributeError as e:
+ # Missing attribute (e.g., context field)
+ agent.logger.error(f"Attribute error: {e}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+except KeyError as e:
+ # Missing key in dictionary
+ agent.logger.error(f"Missing key: {e}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+except Exception as e:
+ # Catch-all for unexpected errors
+ agent.logger.error(f"Unexpected error: {e}")
+ agent.status = ConstellationAgentStatus.FAIL.value
+```
+
+### Recovery Strategies
+
+| Error Type | State | Recovery Action |
+|------------|-------|-----------------|
+| **Temporary Network Failure** | CONTINUE | Retry with backoff |
+| **Invalid LLM Response** | CONTINUE | Re-prompt with examples |
+| **DAG Cycle Detected** | START | Fail fast, require user intervention |
+| **Task Execution Timeout** | CONTINUE | Mark task failed, continue constellation |
+| **Critical System Error** | Any | Transition to FAIL immediately |
+
+## 🔍 State Inspection
+
+### Agent State Query
+
+```python
+# Check current state
+current_state = agent.current_state
+print(f"State: {current_state.name()}")
+
+# Check if terminal
+if current_state.is_round_end():
+ print("Agent execution completed")
+
+# Get status
+status = agent.status
+print(f"Status: {status}") # "START", "CONTINUE", "FINISH", or "FAIL"
+```
+
+### State History
+
+The agent maintains state transition history in memory logs:
+
+```python
+{
+ "step": 1,
+ "state": "START",
+ "timestamp": "2024-01-01T10:00:00",
+ "constellation_id": "constellation_abc123"
+}
+```
+
+## 💡 Best Practices
+
+**State Machine Design:**
+
+1. **Keep states focused**: Each state should have a single, clear responsibility
+2. **Minimize transitions**: Fewer transitions = simpler debugging
+3. **Log all transitions**: Record state changes with context
+4. **Handle errors explicitly**: Don't rely on implicit error propagation
+5. **Use terminal states**: Ensure execution cannot resume accidentally
+
+**Common Pitfalls to Avoid:**
+
+- **Infinite loops in CONTINUE**: Always check termination conditions
+- **Missing error handling**: Unhandled exceptions → unpredictable state
+- **Blocking operations**: Use async/await to prevent deadlocks
+- **State pollution**: Don't modify agent state outside state handlers
+
+**Example: State Transition Logging**
+
+```python
+agent.logger.info(
+ f"State transition: {old_state.name()} → {new_state.name()}"
+)
+```
+
+## 🔗 Related Documentation
+
+- **[Overview](overview.md)** — Constellation Agent architecture
+- **[Prompter Details](strategy.md)** — Prompter implementation
+- **[Command Reference](command.md)** — MCP tool specifications
+- **[Task Constellation](../constellation/overview.md)** — DAG model
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** — Task execution engine
+
+## 📋 State Interface Reference
+
+### AgentState Base Class
+
+```python
+class AgentState(ABC):
+ """Base interface for agent states"""
+
+ @abstractmethod
+ async def handle(self, agent, context) -> None:
+ """Execute state-specific logic"""
+ pass
+
+ def next_state(self, agent) -> AgentState:
+ """Determine next state based on agent status"""
+ pass
+
+ def next_agent(self, agent):
+ """Get next agent (for multi-agent systems)"""
+ return agent
+
+ @abstractmethod
+ def is_round_end(self) -> bool:
+ """Check if this state marks round end"""
+ pass
+
+ @abstractmethod
+ def is_subtask_end(self) -> bool:
+ """Check if this state marks subtask end"""
+ pass
+
+ @classmethod
+ @abstractmethod
+ def name(cls) -> str:
+ """State identifier"""
+ pass
+```
diff --git a/documents/docs/galaxy/constellation_agent/strategy.md b/documents/docs/galaxy/constellation_agent/strategy.md
new file mode 100644
index 000000000..84add3e80
--- /dev/null
+++ b/documents/docs/galaxy/constellation_agent/strategy.md
@@ -0,0 +1,916 @@
+# Processing Strategy Pattern
+
+## Overview
+
+The Constellation Agent employs a sophisticated **multi-phase processing architecture** based on the [`ProcessorTemplate`](../../infrastructure/agents/design/processor.md) framework. The core orchestrator `ConstellationAgentProcessor` assembles different processing strategies for three distinct phases: **LLM Interaction**, **Action Execution**, and **Memory Update**. This modular design separates concerns, enables mode-specific behaviors, and provides robust error handling across the processing pipeline.
+
+The Constellation Agent uses `ConstellationAgentProcessor` as the central orchestrator, which dynamically creates and configures processing strategies based on the weaving mode (CREATION vs. EDITING). This follows the Template Method pattern with Strategy composition.
+
+### Core Architecture
+
+```mermaid
+classDiagram
+ class ProcessorTemplate {
+ <>
+ +process()*
+ +_setup_strategies()*
+ +_setup_middleware()*
+ -strategies: Dict
+ -middleware_chain: List
+ }
+
+ class ConstellationAgentProcessor {
+ +_setup_strategies()
+ +_setup_middleware()
+ +_get_processor_specific_context_data()
+ }
+
+ class ConstellationStrategyFactory {
+ +create_llm_interaction_strategy()
+ +create_action_execution_strategy(mode)
+ +create_memory_update_strategy()
+ }
+
+ class ConstellationLLMInteractionStrategy {
+ +execute()
+ -_build_comprehensive_prompt()
+ -_get_llm_response_with_retry()
+ -_parse_and_validate_response()
+ }
+
+ class BaseConstellationActionExecutionStrategy {
+ <>
+ +execute()
+ +_create_mode_specific_action_info()*
+ +publish_actions()*
+ +sync_constellation()*
+ -_execute_constellation_action()
+ }
+
+ class ConstellationCreationActionExecutionStrategy {
+ +_create_mode_specific_action_info()
+ +publish_actions()
+ +sync_constellation()
+ }
+
+ class ConstellationEditingActionExecutionStrategy {
+ +_create_mode_specific_action_info()
+ +publish_actions()
+ +sync_constellation()
+ }
+
+ class ConstellationMemoryUpdateStrategy {
+ +execute()
+ -_create_additional_memory_data()
+ -_create_and_populate_memory_item()
+ }
+
+ ProcessorTemplate <|-- ConstellationAgentProcessor
+ ConstellationAgentProcessor --> ConstellationStrategyFactory : uses
+ ConstellationStrategyFactory --> ConstellationLLMInteractionStrategy : creates
+ ConstellationStrategyFactory --> BaseConstellationActionExecutionStrategy : creates
+ ConstellationStrategyFactory --> ConstellationMemoryUpdateStrategy : creates
+ BaseConstellationActionExecutionStrategy <|-- ConstellationCreationActionExecutionStrategy
+ BaseConstellationActionExecutionStrategy <|-- ConstellationEditingActionExecutionStrategy
+```
+
+### Processing Phases
+
+| Phase | Strategy | Purpose | Mode-Specific |
+|-------|----------|---------|---------------|
+| **LLM Interaction** | `ConstellationLLMInteractionStrategy` | Prompt construction, LLM response parsing | ❌ Shared |
+| **Action Execution** | `ConstellationCreation/EditingActionExecutionStrategy` | Action generation and execution | ✅ Mode-specific |
+| **Memory Update** | `ConstellationMemoryUpdateStrategy` | Memory logging and state tracking | ❌ Shared |
+
+---
+
+## Processor Framework
+
+### ConstellationAgentProcessor
+
+The `ConstellationAgentProcessor` extends `ProcessorTemplate` to orchestrate the entire processing workflow. It assembles strategies based on weaving mode and manages the execution pipeline.
+
+#### Initialization
+
+```python
+class ConstellationAgentProcessor(ProcessorTemplate):
+ """Enhanced processor for Constellation Agent."""
+
+ processor_context_class: Type[ConstellationProcessorContext] = (
+ ConstellationProcessorContext
+ )
+
+ def __init__(
+ self,
+ agent: "ConstellationAgent",
+ global_context: Context
+ ) -> None:
+ """Initialize with agent and global context."""
+ super().__init__(agent, global_context)
+```
+
+#### Strategy Assembly
+
+The processor creates appropriate strategies based on weaving mode:
+
+```python
+def _setup_strategies(self) -> None:
+ """Configure processing strategies using factory pattern."""
+
+ # Get weaving mode from context
+ weaving_mode = self.global_context.get(ContextNames.WEAVING_MODE)
+
+ if not weaving_mode:
+ raise ValueError("Weaving mode must be specified in global context")
+
+ # Create strategies via factory
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ ConstellationStrategyFactory.create_llm_interaction_strategy(
+ fail_fast=True, # LLM interaction failure should trigger recovery
+ )
+ )
+
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ ConstellationStrategyFactory.create_action_execution_strategy(
+ weaving_mode=weaving_mode,
+ fail_fast=False, # Action failures can be handled gracefully
+ )
+ )
+
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ ConstellationStrategyFactory.create_memory_update_strategy(
+ fail_fast=False # Memory update failures shouldn't stop the process
+ )
+ )
+```
+
+#### Middleware Configuration
+
+```python
+def _setup_middleware(self) -> None:
+ """Set up enhanced middleware chain with comprehensive monitoring."""
+ self.middleware_chain = [
+ ConstellationLoggingMiddleware() # Specialized logging for Constellation Agent
+ ]
+```
+
+#### Context Management
+
+```python
+def _get_processor_specific_context_data(self) -> Dict[str, Any]:
+ """Provide Constellation-specific context initialization."""
+
+ before_constellation = self.global_context.get(
+ ContextNames.CONSTELLATION
+ )
+
+ return {
+ "weaving_mode": self.global_context.get(ContextNames.WEAVING_MODE),
+ "device_info": self.global_context.get(ContextNames.DEVICE_INFO),
+ "constellation_before": (
+ before_constellation.to_json() if before_constellation else None
+ ),
+ }
+```
+
+### Processing Context
+
+The `ConstellationProcessorContext` extends `BasicProcessorContext` with constellation-specific data:
+
+```python
+@dataclass
+class ConstellationProcessorContext(BasicProcessorContext):
+ """Constellation-specific processor context."""
+
+ # Agent metadata
+ agent_type: str = "ConstellationAgent"
+ weaving_mode: str = "CREATION"
+
+ # Device and constellation state
+ device_info: List[Dict] = field(default_factory=list)
+ constellation_before: Optional[str] = None
+ constellation_after: Optional[str] = None
+
+ # Action information
+ action_info: Optional[ActionCommandInfo] = None
+ target: Optional[TargetInfo] = None
+
+ # Performance tracking
+ llm_cost: float = 0.0
+ prompt_tokens: int = 0
+ completion_tokens: int = 0
+```
+
+---
+
+## Strategy Factory
+
+### ConstellationStrategyFactory
+
+The factory provides centralized strategy creation with mode-aware instantiation.
+
+#### Factory Methods
+
+```python
+class ConstellationStrategyFactory:
+ """Factory for creating Constellation processing strategies."""
+
+ _action_execution_strategies: Dict[WeavingMode, Type[BaseProcessingStrategy]] = {
+ WeavingMode.CREATION: ConstellationCreationActionExecutionStrategy,
+ WeavingMode.EDITING: ConstellationEditingActionExecutionStrategy,
+ }
+
+ @classmethod
+ def create_llm_interaction_strategy(
+ cls,
+ fail_fast: bool = True
+ ) -> BaseProcessingStrategy:
+ """Create LLM interaction strategy (shared across modes)."""
+ return ConstellationLLMInteractionStrategy(fail_fast)
+
+ @classmethod
+ def create_action_execution_strategy(
+ cls,
+ weaving_mode: WeavingMode,
+ fail_fast: bool = False
+ ) -> BaseProcessingStrategy:
+ """Create mode-specific action execution strategy."""
+
+ if weaving_mode not in cls._action_execution_strategies:
+ raise ValueError(f"Unsupported mode: {weaving_mode}")
+
+ strategy_class = cls._action_execution_strategies[weaving_mode]
+ return strategy_class(fail_fast=fail_fast)
+
+ @classmethod
+ def create_memory_update_strategy(
+ cls,
+ fail_fast: bool = False
+ ) -> BaseProcessingStrategy:
+ """Create memory update strategy (shared across modes)."""
+ return ConstellationMemoryUpdateStrategy(fail_fast=fail_fast)
+```
+
+#### Batch Strategy Creation
+
+```python
+@classmethod
+def create_all_strategies(
+ cls,
+ weaving_mode: WeavingMode,
+ llm_fail_fast: bool = True,
+ action_fail_fast: bool = False,
+ memory_fail_fast: bool = False,
+) -> Dict[str, BaseProcessingStrategy]:
+ """Create all required strategies for a weaving mode."""
+
+ return {
+ "llm_interaction": cls.create_llm_interaction_strategy(llm_fail_fast),
+ "action_execution": cls.create_action_execution_strategy(
+ weaving_mode, action_fail_fast
+ ),
+ "memory_update": cls.create_memory_update_strategy(memory_fail_fast),
+ }
+```
+
+**Note:** The `create_llm_interaction_strategy()` returns a shared `ConstellationLLMInteractionStrategy` (not mode-specific), as LLM interaction logic is the same across creation and editing modes.
+
+---
+
+## LLM Interaction Strategy
+
+### ConstellationLLMInteractionStrategy
+
+Handles prompt construction, LLM communication, and response parsing. This strategy is **shared across both creation and editing modes**, with mode-specific prompt generation delegated to the agent's prompter.
+
+#### Strategy Execution
+
+```python
+@provides(
+ "parsed_response",
+ "response_text",
+ "llm_cost",
+ "prompt_message",
+ "status",
+)
+class ConstellationLLMInteractionStrategy(BaseProcessingStrategy):
+ """LLM interaction strategy for Constellation Agent."""
+
+ async def execute(
+ self,
+ agent: "ConstellationAgent",
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute LLM interaction with retry logic."""
+
+ try:
+ # Extract context
+ session_step = context.get_local("session_step", 0)
+ device_info = context.get_local("device_info", {})
+ constellation = context.get_global("CONSTELLATION")
+ request = context.get("request", "")
+
+ # Build prompt (delegates to agent's prompter)
+ prompt_message = await self._build_comprehensive_prompt(
+ agent, device_info, constellation, request, ...
+ )
+
+ # Get LLM response with retry
+ response_text, llm_cost = await self._get_llm_response_with_retry(
+ agent, prompt_message
+ )
+
+ # Parse and validate
+ parsed_response = self._parse_and_validate_response(
+ agent, response_text
+ )
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed_response,
+ "response_text": response_text,
+ "llm_cost": llm_cost,
+ **parsed_response.model_dump(),
+ },
+ phase=ProcessingPhase.LLM_INTERACTION,
+ )
+
+ except Exception as e:
+ return self.handle_error(e, ProcessingPhase.LLM_INTERACTION, context)
+```
+
+#### Prompt Construction
+
+The strategy delegates mode-specific prompt building to the agent's prompter:
+
+```python
+async def _build_comprehensive_prompt(
+ self,
+ agent: "ConstellationAgent",
+ device_info: Dict,
+ constellation: TaskConstellation,
+ request: str,
+ ...
+) -> Dict[str, Any]:
+ """Build prompt using agent's mode-specific prompter."""
+
+ # Agent's message_constructor uses the appropriate prompter
+ # (ConstellationCreationPrompter or ConstellationEditingPrompter)
+ prompt_message = agent.message_constructor(
+ request=request,
+ device_info=device_info,
+ constellation=constellation
+ )
+
+ # Log request for debugging
+ self._log_request_data(...)
+
+ return prompt_message
+```
+
+The LLM strategy doesn't implement prompt construction directly. Instead, it calls `agent.message_constructor()`, which delegates to the appropriate prompter based on weaving mode. For details on prompter design, see the [Prompter Framework](../../infrastructure/agents/design/prompter.md). The prompters are responsible for mode-specific prompt formatting.
+
+#### Retry Logic
+
+```python
+async def _get_llm_response_with_retry(
+ self,
+ agent: "ConstellationAgent",
+ prompt_message: Dict[str, Any]
+) -> tuple[str, float]:
+ """Get LLM response with retry for JSON parsing failures."""
+
+ max_retries = ufo_config.system.JSON_PARSING_RETRY
+
+ for retry_count in range(max_retries):
+ try:
+ # Get response from LLM
+ response_text, cost = await asyncio.get_event_loop().run_in_executor(
+ None,
+ agent.get_response,
+ prompt_message,
+ AgentType.CONSTELLATION,
+ True # use_backup_engine
+ )
+
+ # Validate JSON parsing
+ agent.response_to_dict(response_text)
+
+ return response_text, cost
+
+ except Exception as e:
+ if retry_count < max_retries - 1:
+ self.logger.warning(f"Retry {retry_count + 1}/{max_retries}")
+ else:
+ raise Exception(f"Failed after {max_retries} attempts: {e}")
+```
+
+#### Response Validation
+
+```python
+def _parse_and_validate_response(
+ self,
+ agent: "ConstellationAgent",
+ response_text: str
+) -> ConstellationAgentResponse:
+ """Parse and validate LLM response."""
+
+ response_dict = agent.response_to_dict(response_text)
+ parsed_response = ConstellationAgentResponse.model_validate(response_dict)
+
+ # Validate required fields
+ if not parsed_response.thought:
+ raise ValueError("Missing 'thought' field")
+ if not parsed_response.status:
+ raise ValueError("Missing 'status' field")
+
+ agent.print_response(parsed_response)
+ return parsed_response
+```
+
+---
+
+## Action Execution Strategies
+
+### Base Action Execution Strategy
+
+The `BaseConstellationActionExecutionStrategy` provides shared logic for action execution, with abstract methods for mode-specific behaviors.
+
+```python
+@depends_on("parsed_response")
+@provides("execution_result", "action_info", "status")
+class BaseConstellationActionExecutionStrategy(BaseProcessingStrategy):
+ """Base strategy for executing Constellation actions."""
+
+ def __init__(self, weaving_mode: WeavingMode, fail_fast: bool = False):
+ super().__init__(
+ name=f"constellation_action_execution_{weaving_mode.value}",
+ fail_fast=fail_fast
+ )
+ self.weaving_mode = weaving_mode
+
+ async def execute(
+ self,
+ agent: "ConstellationAgent",
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute constellation actions with mode-specific logic."""
+
+ parsed_response = context.get_local("parsed_response")
+ command_dispatcher = context.global_context.command_dispatcher
+
+ # Create mode-specific action info (abstract method)
+ action_info = await self._create_mode_specific_action_info(
+ agent, parsed_response
+ )
+
+ # Execute actions via dispatcher
+ execution_results = await self._execute_constellation_action(
+ command_dispatcher, action_info
+ )
+
+ # Sync constellation state (abstract method)
+ self.sync_constellation(execution_results, context)
+
+ # Create action info for memory
+ actions = self._create_action_info(action_info, execution_results)
+
+ # Publish actions (abstract method)
+ action_list_info = ListActionCommandInfo(actions)
+ await self.publish_actions(agent, action_list_info)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "execution_result": execution_results,
+ "action_info": action_list_info,
+ "status": parsed_response.status,
+ },
+ phase=ProcessingPhase.ACTION_EXECUTION,
+ )
+
+ @abstractmethod
+ async def _create_mode_specific_action_info(
+ self, agent, parsed_response
+ ) -> ActionCommandInfo | List[ActionCommandInfo]:
+ """Must be implemented by subclasses."""
+ pass
+
+ @abstractmethod
+ async def publish_actions(
+ self, agent, actions
+ ) -> None:
+ """Must be implemented by subclasses."""
+ pass
+
+ @abstractmethod
+ def sync_constellation(self, results, context) -> None:
+ """Must be implemented by subclasses."""
+ pass
+```
+
+#### Shared Action Execution
+
+```python
+async def _execute_constellation_action(
+ self,
+ command_dispatcher: BasicCommandDispatcher,
+ actions: ActionCommandInfo | List[ActionCommandInfo],
+) -> List[Result]:
+ """Execute actions via command dispatcher."""
+
+ if isinstance(actions, ActionCommandInfo):
+ actions = [actions]
+
+ commands = [
+ Command(
+ tool_name=action.function,
+ parameters=action.arguments or {},
+ tool_type="action"
+ )
+ for action in actions if action.function
+ ]
+
+ return await command_dispatcher.execute_commands(commands)
+```
+
+### Creation Mode Strategy
+
+The `ConstellationCreationActionExecutionStrategy` implements creation-specific action generation.
+
+```python
+class ConstellationCreationActionExecutionStrategy(
+ BaseConstellationActionExecutionStrategy
+):
+ """Action execution for constellation creation mode."""
+
+ def __init__(self, fail_fast: bool = False):
+ super().__init__(weaving_mode=WeavingMode.CREATION, fail_fast=fail_fast)
+
+ async def _create_mode_specific_action_info(
+ self,
+ agent: "ConstellationAgent",
+ parsed_response: ConstellationAgentResponse
+ ) -> List[ActionCommandInfo]:
+ """Create constellation building action."""
+
+ if not parsed_response.constellation:
+ self.logger.warning("No constellation in response")
+ return []
+
+ return [
+ ActionCommandInfo(
+ function=agent._constellation_creation_tool_name, # "build_constellation"
+ arguments={"config": parsed_response.constellation},
+ )
+ ]
+
+ def sync_constellation(
+ self,
+ results: List[Result],
+ context: ProcessingContext
+ ) -> None:
+ """Sync newly created constellation to context."""
+
+ constellation_json = results[0].result if results else None
+ if constellation_json:
+ constellation = TaskConstellation.from_json(constellation_json)
+ context.global_context.set(ContextNames.CONSTELLATION, constellation)
+
+ async def publish_actions(
+ self, agent, actions: ListActionCommandInfo
+ ) -> None:
+ """Publish constellation creation actions as events."""
+ # Publishes simplified event for WebUI display
+ pass
+```
+
+### Editing Mode Strategy
+
+The `ConstellationEditingActionExecutionStrategy` implements editing-specific action extraction and constellation synchronization.
+
+```python
+class ConstellationEditingActionExecutionStrategy(
+ BaseConstellationActionExecutionStrategy
+):
+ """Action execution for constellation editing mode."""
+
+ def __init__(self, fail_fast: bool = False):
+ super().__init__(weaving_mode=WeavingMode.EDITING, fail_fast=fail_fast)
+
+ async def _create_mode_specific_action_info(
+ self,
+ agent: "ConstellationAgent",
+ parsed_response: ConstellationAgentResponse
+ ) -> List[ActionCommandInfo]:
+ """Extract editing actions from LLM response."""
+
+ if parsed_response.action:
+ return parsed_response.action
+ else:
+ return []
+
+ def sync_constellation(
+ self,
+ results: List[Result],
+ context: ProcessingContext
+ ) -> None:
+ """Sync modified constellation from MCP tool results."""
+
+ # Find last successful result with constellation data
+ constellation_json = None
+ for result in reversed(results):
+ if result.status == ResultStatus.SUCCESS and result.result:
+ if isinstance(result.result, str):
+ if '"constellation_id"' in result.result or '"tasks"' in result.result:
+ constellation_json = result.result
+ break
+ elif isinstance(result.result, dict):
+ if "constellation_id" in result.result or "tasks" in result.result:
+ constellation_json = result.result
+ break
+
+ if constellation_json:
+ if isinstance(constellation_json, str):
+ constellation = TaskConstellation.from_json(constellation_json)
+ else:
+ constellation = TaskConstellation.from_dict(constellation_json)
+
+ context.global_context.set(ContextNames.CONSTELLATION, constellation)
+ self.logger.info(f"Synced constellation: {constellation.constellation_id}")
+
+ async def publish_actions(self, agent, actions: ListActionCommandInfo) -> None:
+ """Publish editing actions as events for WebUI display."""
+ # Publishes detailed action events
+ pass
+```
+
+---
+
+## Memory Update Strategy
+
+### ConstellationMemoryUpdateStrategy
+
+The memory update strategy is **shared across both modes** and handles comprehensive memory logging.
+
+```python
+@depends_on("parsed_response")
+@provides("additional_memory", "memory_item", "memory_keys_count")
+class ConstellationMemoryUpdateStrategy(BaseProcessingStrategy):
+ """Memory update strategy (shared across modes)."""
+
+ async def execute(
+ self,
+ agent: "ConstellationAgent",
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute comprehensive memory update."""
+
+ parsed_response = context.get_local("parsed_response")
+
+ # Create additional memory data
+ additional_memory = self._create_additional_memory_data(agent, context)
+
+ # Create and populate memory item
+ memory_item = self._create_and_populate_memory_item(
+ parsed_response, additional_memory
+ )
+
+ # Add to agent memory
+ agent.add_memory(memory_item)
+
+ # Update structural logs
+ self._update_structural_logs(memory_item, context.global_context)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "additional_memory": additional_memory,
+ "memory_item": memory_item,
+ "memory_keys_count": len(memory_item.to_dict()),
+ },
+ phase=ProcessingPhase.MEMORY_UPDATE,
+ )
+```
+
+#### Memory Data Creation
+
+```python
+def _create_additional_memory_data(
+ self,
+ agent: "ConstellationAgent",
+ context: ProcessingContext
+) -> ConstellationProcessorContext:
+ """Create comprehensive memory data from processing context."""
+
+ constellation_context = context.local_context
+
+ # Update with current state
+ constellation_context.session_step = context.get_global("SESSION_STEP", 0)
+ constellation_context.round_step = context.get_global("CURRENT_ROUND_STEP", 0)
+ constellation_context.round_num = context.get_global("CURRENT_ROUND_ID", 0)
+ constellation_context.agent_step = agent.step
+
+ # Update action information
+ action_info = constellation_context.action_info
+ if action_info:
+ constellation_context.action = [info.model_dump() for info in action_info.actions]
+ constellation_context.function_call = [info.function for info in action_info.actions]
+ constellation_context.arguments = [info.arguments for info in action_info.actions]
+
+ # Update constellation_after
+ constellation_after = context.get_global("CONSTELLATION")
+ if constellation_after:
+ constellation_context.constellation_after = constellation_after.to_json()
+
+ return constellation_context
+```
+
+---
+
+## Mode Comparison
+
+### Strategy Differences by Mode
+
+| Aspect | Creation Mode | Editing Mode |
+|--------|---------------|--------------|
+| **LLM Interaction** | Shared strategy | Shared strategy |
+| **Prompt Generation** | `ConstellationCreationPrompter` | `ConstellationEditingPrompter` |
+| **Action Generation** | `build_constellation` with JSON | Extract `action` field from response |
+| **Action Execution** | Single bulk creation | Multiple MCP commands |
+| **Constellation Sync** | Set from creation result | Extract from last successful MCP result |
+| **Action Publishing** | Simplified event for WebUI | Detailed action events for WebUI |
+| **Memory Update** | Shared strategy | Shared strategy |
+
+### Processing Pipeline Comparison
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Processor
+ participant Factory
+ participant LLMStrat
+ participant ActionStrat
+ participant MemStrat
+
+ Note over Agent,MemStrat: CREATION MODE
+ Agent->>Processor: process()
+ Processor->>Factory: create_action_execution_strategy(CREATION)
+ Factory->>Processor: ConstellationCreationActionExecutionStrategy
+ Processor->>LLMStrat: execute() [shared]
+ LLMStrat->>Processor: parsed_response with constellation JSON
+ Processor->>ActionStrat: execute()
+ ActionStrat->>ActionStrat: Create build_constellation command
+ ActionStrat->>Processor: execution_result
+ Processor->>MemStrat: execute() [shared]
+ MemStrat->>Processor: memory_item
+
+ Note over Agent,MemStrat: EDITING MODE
+ Agent->>Processor: process()
+ Processor->>Factory: create_action_execution_strategy(EDITING)
+ Factory->>Processor: ConstellationEditingActionExecutionStrategy
+ Processor->>LLMStrat: execute() [shared]
+ LLMStrat->>Processor: parsed_response with action list
+ Processor->>ActionStrat: execute()
+ ActionStrat->>ActionStrat: Extract MCP commands
+ ActionStrat->>Processor: execution_result
+ Processor->>MemStrat: execute() [shared]
+ MemStrat->>Processor: memory_item
+```
+
+---
+
+## Error Handling
+
+### Fail-Fast Configuration
+
+Each strategy can be configured with `fail_fast` to control error propagation:
+
+```python
+# LLM failures should trigger recovery
+ConstellationStrategyFactory.create_llm_interaction_strategy(
+ fail_fast=True
+)
+
+# Action failures can be handled gracefully
+ConstellationStrategyFactory.create_action_execution_strategy(
+ weaving_mode=mode,
+ fail_fast=False
+)
+
+# Memory failures shouldn't stop the process
+ConstellationStrategyFactory.create_memory_update_strategy(
+ fail_fast=False
+)
+```
+
+### Strategy-Level Error Handling
+
+```python
+class BaseProcessingStrategy:
+ def handle_error(
+ self,
+ error: Exception,
+ phase: ProcessingPhase,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """Handle strategy execution errors."""
+
+ error_msg = f"{self.name} failed: {str(error)}"
+ self.logger.error(error_msg)
+
+ if self.fail_fast:
+ raise error
+
+ return ProcessingResult(
+ success=False,
+ data={"error": error_msg},
+ phase=phase
+ )
+```
+
+---
+
+## Best Practices
+
+### Strategy Design
+
+1. **Keep strategies focused**: Each strategy handles one processing phase
+2. **Use dependencies**: Declare data dependencies with `@depends_on` and `@provides`
+3. **Handle errors gracefully**: Configure `fail_fast` appropriately per strategy
+4. **Log comprehensively**: Use structured logging for debugging
+5. **Validate outputs**: Ensure each strategy produces expected data structures
+
+### Mode Selection
+
+```python
+def determine_strategy_mode(constellation: Optional[TaskConstellation]) -> WeavingMode:
+ """Determine appropriate mode based on constellation state."""
+
+ if constellation is None or len(constellation.tasks) == 0:
+ return WeavingMode.CREATION
+ else:
+ return WeavingMode.EDITING
+```
+
+### Testing Strategies
+
+```python
+class TestConstellationStrategies(unittest.TestCase):
+ def test_creation_action_strategy(self):
+ """Test creation strategy generates build_constellation action."""
+
+ strategy = ConstellationCreationActionExecutionStrategy()
+ response = ConstellationAgentResponse(
+ constellation={"tasks": [...], "dependencies": [...]}
+ )
+
+ actions = await strategy._create_mode_specific_action_info(
+ agent, response
+ )
+
+ self.assertEqual(len(actions), 1)
+ self.assertEqual(actions[0].function, "build_constellation")
+
+ def test_editing_action_strategy(self):
+ """Test editing strategy extracts actions from response."""
+
+ strategy = ConstellationEditingActionExecutionStrategy()
+ response = ConstellationAgentResponse(
+ action=[
+ ActionCommandInfo(function="add_task", arguments={...}),
+ ActionCommandInfo(function="add_dependency", arguments={...})
+ ]
+ )
+
+ actions = await strategy._create_mode_specific_action_info(
+ agent, response
+ )
+
+ self.assertEqual(len(actions), 2)
+```
+
+---
+
+## Summary
+
+The Constellation Agent's processing strategy pattern provides:
+
+- **Modular Processing**: Three distinct phases (LLM, Action, Memory) with dedicated strategies assembled by `ConstellationAgentProcessor`
+- **Mode Flexibility**: Factory-based strategy creation adapts to CREATION vs. EDITING modes
+- **Shared Logic**: LLM interaction and memory update strategies are mode-agnostic
+- **Targeted Customization**: Only action execution varies by mode (creation builds entire constellation, editing applies MCP commands)
+- **Robust Error Handling**: Per-strategy fail-fast configuration
+- **Clean Architecture**: ProcessorTemplate provides the orchestration framework, strategies implement phase-specific logic
+- **Testability**: Each strategy can be tested in isolation
+
+This architecture enables the Constellation Agent to handle both initial constellation creation and subsequent modifications with appropriate processing strategies while maintaining clean separation of concerns. The processor assembles these strategies dynamically based on weaving mode, making the prompters support components rather than the primary focus of the strategy pattern.
+
+## Related Documentation
+
+- [Constellation Agent Overview](overview.md) - Learn about constellation creation and editing modes
+- [Constellation Agent State Machine](state.md) - Understand the state transitions and lifecycle
+- [Processor Framework Design](../../infrastructure/agents/design/processor.md) - Deep dive into the ProcessorTemplate architecture
+- [Prompter Framework](../../infrastructure/agents/design/prompter.md) - Mode-specific prompt generation framework
+- [Constellation Editor MCP Server](../../mcp/servers/constellation_editor.md) - MCP commands for constellation manipulation
diff --git a/documents/docs/galaxy/constellation_orchestrator/api_reference.md b/documents/docs/galaxy/constellation_orchestrator/api_reference.md
new file mode 100644
index 000000000..eaa911842
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/api_reference.md
@@ -0,0 +1,869 @@
+# API Reference
+
+## Overview
+
+This document provides comprehensive API documentation for the Constellation Orchestrator system. The API is organized into three main components:
+
+- **TaskConstellationOrchestrator** - Main orchestration engine
+- **ConstellationManager** - Device assignment and resource management
+- **ConstellationModificationSynchronizer** - Safe concurrent editing
+
+## TaskConstellationOrchestrator
+
+The main orchestration engine that coordinates task execution across devices.
+
+**Module**: `galaxy.constellation.orchestrator.orchestrator`
+
+### Constructor
+
+```python
+TaskConstellationOrchestrator(
+ device_manager: Optional[ConstellationDeviceManager] = None,
+ enable_logging: bool = True,
+ event_bus = None
+)
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `device_manager` | `ConstellationDeviceManager` or `None` | Device manager for communication | `None` |
+| `enable_logging` | `bool` | Enable logging output | `True` |
+| `event_bus` | `EventBus` or `None` | Custom event bus instance | `None` (uses global) |
+
+**Example**:
+```python
+from galaxy.constellation.orchestrator import TaskConstellationOrchestrator
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+device_manager = ConstellationDeviceManager()
+orchestrator = TaskConstellationOrchestrator(
+ device_manager=device_manager,
+ enable_logging=True
+)
+```
+
+### Core Methods
+
+#### orchestrate_constellation()
+
+Main entry point for orchestrating a constellation's execution.
+
+```python
+async def orchestrate_constellation(
+ self,
+ constellation: TaskConstellation,
+ device_assignments: Optional[Dict[str, str]] = None,
+ assignment_strategy: Optional[str] = None,
+ metadata: Optional[Dict] = None,
+) -> Dict[str, Any]
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `constellation` | `TaskConstellation` | The constellation to orchestrate | Yes |
+| `device_assignments` | `Dict[str, str]` or `None` | Manual task→device mapping | No |
+| `assignment_strategy` | `str` or `None` | Strategy: `"round_robin"`, `"capability_match"`, or `"load_balance"` | No |
+| `metadata` | `Dict` or `None` | Additional orchestration metadata | No |
+
+**Returns**: `Dict[str, Any]` with keys:
+```python
+{
+ "results": {}, # Task results
+ "status": "completed", # Overall status
+ "total_tasks": int, # Number of tasks
+ "statistics": {} # Execution statistics
+}
+```
+
+**Raises**:
+- `ValueError`: Invalid DAG structure or device assignments
+- `RuntimeError`: Orchestration execution error
+- `asyncio.CancelledError`: Orchestration cancelled
+
+**Example**:
+```python
+# With automatic assignment
+results = await orchestrator.orchestrate_constellation(
+ constellation=my_constellation,
+ assignment_strategy="capability_match"
+)
+
+# With manual assignments
+device_assignments = {
+ "task_1": "windows_main",
+ "task_2": "android_device",
+ "task_3": "windows_main"
+}
+results = await orchestrator.orchestrate_constellation(
+ constellation=my_constellation,
+ device_assignments=device_assignments
+)
+```
+
+#### execute_single_task()
+
+Execute a single task independently (without constellation context).
+
+```python
+async def execute_single_task(
+ self,
+ task: TaskStar,
+ target_device_id: Optional[str] = None,
+) -> Any
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `task` | `TaskStar` | Task to execute | Yes |
+| `target_device_id` | `str` or `None` | Device for execution | No (auto-assigned if None) |
+
+**Returns**: Task execution result content (extracts `result.result` from task execution)
+
+**Raises**:
+- `ValueError`: No available devices for task execution
+
+**Example**:
+```python
+task = TaskStar(
+ task_id="standalone_task",
+ description="Collect system information"
+)
+
+result = await orchestrator.execute_single_task(
+ task=task,
+ target_device_id="windows_main"
+)
+```
+
+#### get_constellation_status()
+
+Get detailed status of a constellation during execution.
+
+```python
+async def get_constellation_status(
+ self,
+ constellation: TaskConstellation
+) -> Dict[str, Any]
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `constellation` | `TaskConstellation` | Constellation to query | Yes |
+
+**Returns**: Status dictionary from ConstellationManager
+
+**Note**: This method delegates to `ConstellationManager.get_constellation_status()` using the constellation's ID.
+
+**Example**:
+```python
+status = await orchestrator.get_constellation_status(constellation)
+if status:
+ print(f"State: {status['state']}")
+ print(f"Running: {len(status['running_tasks'])}")
+```
+
+#### get_available_devices()
+
+Get list of available devices from device manager.
+
+```python
+async def get_available_devices(self) -> List[Dict[str, Any]]
+```
+
+**Returns**: List of device info dictionaries
+
+**Example**:
+```python
+devices = await orchestrator.get_available_devices()
+for device in devices:
+ print(f"{device['device_id']}: {device['device_type']}")
+```
+
+### Configuration Methods
+
+#### set_device_manager()
+
+Set or update the device manager.
+
+```python
+def set_device_manager(
+ self,
+ device_manager: ConstellationDeviceManager
+) -> None
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `device_manager` | `ConstellationDeviceManager` | Device manager instance | Yes |
+
+**Example**:
+```python
+new_device_manager = ConstellationDeviceManager()
+orchestrator.set_device_manager(new_device_manager)
+```
+
+#### set_modification_synchronizer()
+
+Attach a modification synchronizer for safe concurrent editing.
+
+```python
+def set_modification_synchronizer(
+ self,
+ synchronizer: ConstellationModificationSynchronizer
+) -> None
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `synchronizer` | `ConstellationModificationSynchronizer` | Synchronizer instance | Yes |
+
+**Example**:
+```python
+from galaxy.session.observers.constellation_sync_observer import (
+ ConstellationModificationSynchronizer
+)
+
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+```
+
+---
+
+## ConstellationManager
+
+Manages device assignments, resource allocation, and constellation lifecycle.
+
+**Module**: `galaxy.constellation.orchestrator.constellation_manager`
+
+### Constructor
+
+```python
+ConstellationManager(
+ device_manager: Optional[ConstellationDeviceManager] = None,
+ enable_logging: bool = True
+)
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `device_manager` | `ConstellationDeviceManager` or `None` | Device manager instance | `None` |
+| `enable_logging` | `bool` | Enable logging | `True` |
+
+### Device Assignment Methods
+
+#### assign_devices_automatically()
+
+Automatically assign devices to all tasks using a strategy.
+
+```python
+async def assign_devices_automatically(
+ self,
+ constellation: TaskConstellation,
+ strategy: str = "round_robin",
+ device_preferences: Optional[Dict[str, str]] = None,
+) -> Dict[str, str]
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `constellation` | `TaskConstellation` | Constellation to assign | Required |
+| `strategy` | `str` | Assignment strategy | `"round_robin"` |
+| `device_preferences` | `Dict[str, str]` or `None` | Preferred task→device mappings | `None` |
+
+**Strategies**:
+- `"round_robin"`: Distribute tasks evenly
+- `"capability_match"`: Match device types to task requirements
+- `"load_balance"`: Minimize maximum device load
+
+For more details on device assignment strategies, see [Constellation Manager](constellation_manager.md).
+
+**Returns**: `Dict[str, str]` mapping task_id → device_id
+
+**Raises**:
+- `ValueError`: No available devices or invalid strategy
+
+**Example**:
+```python
+assignments = await manager.assign_devices_automatically(
+ constellation,
+ strategy="capability_match",
+ device_preferences={"critical_task": "windows_main"}
+)
+```
+
+#### reassign_task_device()
+
+Reassign a single task to a different device.
+
+```python
+def reassign_task_device(
+ self,
+ constellation: TaskConstellation,
+ task_id: str,
+ new_device_id: str,
+) -> bool
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `constellation` | `TaskConstellation` | Constellation containing task | Yes |
+| `task_id` | `str` | ID of task to reassign | Yes |
+| `new_device_id` | `str` | New device ID | Yes |
+
+**Returns**: `True` if successful, `False` if task not found
+
+**Example**:
+```python
+success = manager.reassign_task_device(
+ constellation,
+ task_id="task_5",
+ new_device_id="android_backup"
+)
+```
+
+#### clear_device_assignments()
+
+Clear all device assignments from a constellation.
+
+```python
+def clear_device_assignments(
+ self,
+ constellation: TaskConstellation
+) -> int
+```
+
+**Returns**: Number of assignments cleared
+
+### Validation Methods
+
+#### validate_constellation_assignments()
+
+Validate that all tasks have valid device assignments.
+
+```python
+def validate_constellation_assignments(
+ self,
+ constellation: TaskConstellation
+) -> tuple[bool, List[str]]
+```
+
+**Returns**: `(is_valid, errors)` tuple
+
+**Example**:
+```python
+is_valid, errors = manager.validate_constellation_assignments(constellation)
+if not is_valid:
+ for error in errors:
+ print(f"Error: {error}")
+```
+
+### Lifecycle Methods
+
+#### register_constellation()
+
+Register a constellation for management tracking.
+
+```python
+def register_constellation(
+ self,
+ constellation: TaskConstellation,
+ metadata: Optional[Dict[str, Any]] = None,
+) -> str
+```
+
+**Returns**: Constellation ID
+
+#### unregister_constellation()
+
+Unregister and clean up a constellation.
+
+```python
+def unregister_constellation(
+ self,
+ constellation_id: str
+) -> bool
+```
+
+**Returns**: `True` if unregistered, `False` if not found
+
+#### get_constellation()
+
+Get a managed constellation by ID.
+
+```python
+def get_constellation(
+ self,
+ constellation_id: str
+) -> Optional[TaskConstellation]
+```
+
+#### list_constellations()
+
+List all managed constellations.
+
+```python
+def list_constellations(self) -> List[Dict[str, Any]]
+```
+
+**Returns**: List of constellation info dictionaries
+
+### Status Methods
+
+#### get_constellation_status()
+
+Get detailed status of a constellation.
+
+```python
+async def get_constellation_status(
+ self,
+ constellation_id: str
+) -> Optional[Dict[str, Any]]
+```
+
+**Returns**: Status dictionary with keys:
+```python
+{
+ "constellation_id": str,
+ "name": str,
+ "state": str,
+ "statistics": dict,
+ "ready_tasks": List[str],
+ "running_tasks": List[str],
+ "completed_tasks": List[str],
+ "failed_tasks": List[str],
+ "metadata": dict
+}
+```
+
+#### get_available_devices()
+
+Get list of available devices.
+
+```python
+async def get_available_devices(self) -> List[Dict[str, Any]]
+```
+
+**Returns**: List of device info dictionaries:
+```python
+[
+ {
+ "device_id": str,
+ "device_type": str,
+ "capabilities": List[str],
+ "status": str,
+ "metadata": dict
+ },
+ ...
+]
+```
+
+#### get_device_utilization()
+
+Get device utilization statistics for a constellation.
+
+```python
+def get_device_utilization(
+ self,
+ constellation: TaskConstellation
+) -> Dict[str, int]
+```
+
+**Returns**: `Dict[device_id, task_count]`
+
+#### get_task_device_info()
+
+Get device information for a specific task.
+
+```python
+def get_task_device_info(
+ self,
+ constellation: TaskConstellation,
+ task_id: str
+) -> Optional[Dict[str, Any]]
+```
+
+**Returns**: Device info dictionary or `None`
+
+---
+
+## ConstellationModificationSynchronizer
+
+Synchronizes constellation modifications with orchestrator execution to prevent race conditions.
+
+**Module**: `galaxy.session.observers.constellation_sync_observer`
+
+### Constructor
+
+```python
+ConstellationModificationSynchronizer(
+ orchestrator: TaskConstellationOrchestrator,
+ logger: Optional[logging.Logger] = None
+)
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `orchestrator` | `TaskConstellationOrchestrator` | Orchestrator instance | Yes |
+| `logger` | `logging.Logger` or `None` | Custom logger | No |
+
+**Example**:
+```python
+synchronizer = ConstellationModificationSynchronizer(
+ orchestrator=orchestrator,
+ logger=logging.getLogger(__name__)
+)
+```
+
+### Core Methods
+
+#### on_event()
+
+Handle orchestration events (implements `IEventObserver`).
+
+```python
+async def on_event(self, event: Event) -> None
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `event` | `Event` | Event to process | Yes |
+
+**Events handled**:
+- `TASK_COMPLETED`: Register pending modification
+- `TASK_FAILED`: Register pending modification
+- `CONSTELLATION_MODIFIED`: Complete pending modifications
+
+#### wait_for_pending_modifications()
+
+Wait for all pending modifications to complete.
+
+```python
+async def wait_for_pending_modifications(
+ self,
+ timeout: Optional[float] = None
+) -> bool
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `timeout` | `float` or `None` | Timeout in seconds | `None` (uses default: 600s) |
+
+**Returns**: `True` if all completed, `False` if timeout
+
+**Example**:
+```python
+# In orchestration loop
+completed = await synchronizer.wait_for_pending_modifications(timeout=300.0)
+if not completed:
+ logger.warning("Modifications timed out")
+```
+
+#### merge_and_sync_constellation_states()
+
+Merge agent's structural changes with orchestrator's execution state.
+
+```python
+def merge_and_sync_constellation_states(
+ self,
+ orchestrator_constellation: TaskConstellation,
+) -> TaskConstellation
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `orchestrator_constellation` | `TaskConstellation` | Orchestrator's constellation | Yes |
+
+**Returns**: Merged constellation with consistent state
+
+**Example**:
+```python
+merged = synchronizer.merge_and_sync_constellation_states(
+ orchestrator_constellation=current_constellation
+)
+```
+
+### Configuration Methods
+
+#### set_modification_timeout()
+
+Set the timeout for modifications.
+
+```python
+def set_modification_timeout(self, timeout: float) -> None
+```
+
+**Parameters**:
+
+| Parameter | Type | Description | Required |
+|-----------|------|-------------|----------|
+| `timeout` | `float` | Timeout in seconds (must be > 0) | Yes |
+
+**Raises**: `ValueError` if timeout ≤ 0
+
+**Example**:
+```python
+# Increase timeout for slow LLM responses
+synchronizer.set_modification_timeout(1800.0) # 30 minutes
+```
+
+### Query Methods
+
+#### has_pending_modifications()
+
+Check if any modifications are pending.
+
+```python
+def has_pending_modifications(self) -> bool
+```
+
+**Returns**: `True` if modifications pending
+
+#### get_pending_count()
+
+Get number of pending modifications.
+
+```python
+def get_pending_count(self) -> int
+```
+
+#### get_pending_task_ids()
+
+Get list of task IDs with pending modifications.
+
+```python
+def get_pending_task_ids(self) -> list
+```
+
+#### get_current_constellation()
+
+Get the constellation currently being modified.
+
+```python
+def get_current_constellation(self) -> Optional[TaskConstellation]
+```
+
+#### get_statistics()
+
+Get synchronization statistics.
+
+```python
+def get_statistics(self) -> Dict[str, int]
+```
+
+**Returns**:
+```python
+{
+ "total_modifications": int,
+ "completed_modifications": int,
+ "timeout_modifications": int
+}
+```
+
+### Utility Methods
+
+#### clear_pending_modifications()
+
+⚠️ **Emergency use only**: Forcefully clear all pending modifications.
+
+```python
+def clear_pending_modifications(self) -> None
+```
+
+---
+
+## Common Usage Patterns
+
+### Basic Orchestration
+
+```python
+from galaxy.constellation.orchestrator import TaskConstellationOrchestrator
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Setup
+device_manager = ConstellationDeviceManager()
+orchestrator = TaskConstellationOrchestrator(device_manager)
+
+# Create constellation
+constellation = TaskConstellation(name="MyWorkflow")
+# ... add tasks and dependencies ...
+
+# Orchestrate
+results = await orchestrator.orchestrate_constellation(
+ constellation,
+ assignment_strategy="round_robin"
+)
+
+print(f"Status: {results['status']}")
+print(f"Total tasks: {results['total_tasks']}")
+```
+
+### With Synchronization
+
+```python
+from galaxy.session.observers.constellation_sync_observer import (
+ ConstellationModificationSynchronizer
+)
+from galaxy.core.events import get_event_bus
+
+# Setup orchestrator
+orchestrator = TaskConstellationOrchestrator(device_manager)
+
+# Attach synchronizer
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+
+# Subscribe to events
+event_bus = get_event_bus()
+event_bus.subscribe(synchronizer)
+
+# Orchestrate with automatic synchronization
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+For details on the synchronization protocol, see [Safe Assignment Locking](safe_assignment_locking.md).
+
+### Custom Event Handling
+
+```python
+from galaxy.core.events import IEventObserver, Event, EventType
+
+class ProgressTracker(IEventObserver):
+ async def on_event(self, event: Event):
+ if event.event_type == EventType.TASK_COMPLETED:
+ print(f"✓ {event.task_id} completed")
+ elif event.event_type == EventType.TASK_FAILED:
+ print(f"✗ {event.task_id} failed")
+
+# Subscribe
+tracker = ProgressTracker()
+event_bus.subscribe(tracker, {
+ EventType.TASK_COMPLETED,
+ EventType.TASK_FAILED
+})
+
+# Orchestrate with tracking
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+For more details on event handling, see [Event-Driven Coordination](event_driven_coordination.md).
+
+### Manual Device Assignment
+
+```python
+# Method 1: Pre-assign in tasks
+for task in constellation.get_all_tasks():
+ if "windows" in task.description.lower():
+ task.target_device_id = "windows_main"
+ elif "android" in task.description.lower():
+ task.target_device_id = "android_device"
+
+# Method 2: Manual assignment dict
+device_assignments = {
+ task.task_id: determine_device(task)
+ for task in constellation.get_all_tasks()
+}
+
+results = await orchestrator.orchestrate_constellation(
+ constellation,
+ device_assignments=device_assignments
+)
+```
+
+## Type Definitions
+
+### TaskConstellation
+
+See [TaskConstellation documentation](../constellation/task_constellation.md)
+
+### TaskStar
+
+See [TaskStar documentation](../constellation/task_star.md)
+
+### Event Types
+
+```python
+from galaxy.core.events import EventType
+
+EventType.TASK_STARTED # Task execution begins
+EventType.TASK_COMPLETED # Task completes successfully
+EventType.TASK_FAILED # Task fails
+EventType.CONSTELLATION_STARTED # Orchestration begins
+EventType.CONSTELLATION_COMPLETED # All tasks finished
+EventType.CONSTELLATION_FAILED # Orchestration failed
+EventType.CONSTELLATION_MODIFIED # DAG structure updated
+```
+
+## Error Handling
+
+### Common Exceptions
+
+| Exception | Cause | Handling |
+|-----------|-------|----------|
+| `ValueError` | Invalid DAG, missing assignments | Validate before orchestration |
+| `RuntimeError` | Execution error | Check device connectivity |
+| `asyncio.TimeoutError` | Task timeout | Increase task timeout |
+| `asyncio.CancelledError` | Orchestration cancelled | Cleanup resources |
+
+### Example Error Handling
+
+```python
+try:
+ results = await orchestrator.orchestrate_constellation(
+ constellation,
+ assignment_strategy="capability_match"
+ )
+except ValueError as e:
+ logger.error(f"Invalid constellation: {e}")
+ # Fix validation errors
+except RuntimeError as e:
+ logger.error(f"Execution failed: {e}")
+ # Retry or alert
+except asyncio.CancelledError:
+ logger.warning("Orchestration cancelled")
+ # Cleanup
+finally:
+ # Always cleanup
+ await device_manager.disconnect_all()
+```
+
+## Related Documentation
+
+- **[Overview](overview.md)** - System architecture and design
+- **[Event-Driven Coordination](event_driven_coordination.md)** - Event system details
+- **[Asynchronous Scheduling](asynchronous_scheduling.md)** - Execution model
+- **[Safe Assignment Locking](safe_assignment_locking.md)** - Synchronization protocol
+- **[Consistency Guarantees](consistency_guarantees.md)** - Invariants and validation
+- **[Batched Editing](batched_editing.md)** - Efficiency optimizations
+- **[Constellation Manager](constellation_manager.md)** - Resource management
+
+---
+
+## Getting Help
+
+Check the examples directory for complete code samples or see [GitHub issues](https://github.com/microsoft/UFO/issues) for known problems.
diff --git a/documents/docs/galaxy/constellation_orchestrator/asynchronous_scheduling.md b/documents/docs/galaxy/constellation_orchestrator/asynchronous_scheduling.md
new file mode 100644
index 000000000..238c78484
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/asynchronous_scheduling.md
@@ -0,0 +1,619 @@
+# Asynchronous Scheduling
+
+## Overview
+
+At the core of the Constellation Orchestrator lies a fully **asynchronous scheduling loop** that maximizes parallelism across heterogeneous devices. Unlike traditional schedulers that alternate between discrete planning and execution phases, the orchestrator continuously monitors the evolving DAG to identify ready tasks and dispatches them concurrently.
+
+Most critically, **task execution and constellation editing can proceed concurrently**, allowing the system to adapt in real-time as results stream in while computation continues uninterrupted.
+
+For more on the DAG structure being scheduled, see the [TaskConstellation documentation](../constellation/task_constellation.md).
+
+
+
+*Illustration of asynchronous scheduling and concurrent constellation editing. Task execution overlaps with DAG modifications, reducing end-to-end latency.*
+
+## Core Scheduling Loop
+
+The orchestration workflow is driven by a continuous asynchronous loop that coordinates task execution, constellation synchronization, and event handling:
+
+```python
+async def _run_execution_loop(self, constellation: TaskConstellation) -> None:
+ """Main execution loop for processing constellation tasks."""
+
+ while not constellation.is_complete():
+ # 1. Wait for pending modifications and refresh constellation
+ constellation = await self._sync_constellation_modifications(constellation)
+
+ # 2. Validate device assignments
+ self._validate_existing_device_assignments(constellation)
+
+ # 3. Get ready tasks and schedule them
+ ready_tasks = constellation.get_ready_tasks()
+ await self._schedule_ready_tasks(ready_tasks, constellation)
+
+ # 4. Wait for task completion
+ await self._wait_for_task_completion()
+
+ # Wait for all remaining tasks
+ await self._wait_for_all_tasks()
+```
+
+This loop embodies several key design principles:
+
+### 1. Continuous Monitoring
+
+The loop runs continuously until all tasks reach terminal states (`COMPLETED`, `FAILED`, or `CANCELLED`). Each iteration:
+
+- Checks for constellation modifications from the agent
+- Identifies newly ready tasks (dependencies satisfied)
+- Dispatches tasks to devices
+- Waits for at least one task to complete before repeating
+
+### 2. Non-Blocking Execution
+
+All operations use `async/await` to avoid blocking:
+
+```python
+# Schedule tasks without waiting for completion
+await self._schedule_ready_tasks(ready_tasks, constellation)
+
+# Wait for ANY task to complete (not all)
+await self._wait_for_task_completion()
+```
+
+This enables maximum concurrency - new tasks can be scheduled while others are still executing.
+
+### 3. Dynamic Adaptation
+
+The constellation can be modified during execution:
+
+```python
+# Synchronization point: merge agent's edits with runtime progress
+constellation = await self._sync_constellation_modifications(constellation)
+```
+
+After synchronization, the orchestrator immediately identifies and schedules newly ready tasks based on the updated DAG structure.
+
+The orchestrator treats the TaskConstellation as a **living data structure** that evolves during execution, not a static plan fixed at the start.
+
+## Task Scheduling Mechanism
+
+### Ready Task Identification
+
+Tasks become "ready" when all their dependencies are satisfied:
+
+```python
+ready_tasks = constellation.get_ready_tasks()
+```
+
+The `TaskConstellation` determines readiness by checking:
+
+1. **Status**: Task must be in `PENDING` state
+2. **Dependencies**: All prerequisite tasks must be completed
+3. **Conditions**: Any conditional dependencies must evaluate to `True`
+
+**Implementation in TaskConstellation:**
+
+```python
+def get_ready_tasks(self) -> List[TaskStar]:
+ """Get all tasks ready to execute."""
+ ready_tasks = []
+ for task in self._tasks.values():
+ if task.is_ready_to_execute:
+ # Double-check dependencies satisfied
+ if self._are_dependencies_satisfied(task.task_id):
+ ready_tasks.append(task)
+
+ # Sort by priority (higher first)
+ ready_tasks.sort(key=lambda t: t.priority.value, reverse=True)
+ return ready_tasks
+```
+
+!!!tip "Priority Scheduling"
+ Ready tasks are sorted by priority before dispatching, ensuring critical tasks execute first when multiple tasks are ready simultaneously.
+
+### Asynchronous Task Dispatch
+
+Once ready tasks are identified, they're dispatched concurrently:
+
+```python
+async def _schedule_ready_tasks(
+ self, ready_tasks: List[TaskStar], constellation: TaskConstellation
+) -> None:
+ """Schedule ready tasks for execution."""
+
+ for task in ready_tasks:
+ if task.task_id not in self._execution_tasks:
+ # Create async task (non-blocking)
+ task_future = asyncio.create_task(
+ self._execute_task_with_events(task, constellation)
+ )
+ self._execution_tasks[task.task_id] = task_future
+```
+
+**Key aspects:**
+
+- **Non-blocking dispatch**: `asyncio.create_task()` schedules the task without waiting
+- **Deduplication**: Only schedule if not already in `_execution_tasks` dict
+- **Tracking**: Store task futures for later completion detection
+
+### Task Execution Lifecycle
+
+Each task executes within its own coroutine that encapsulates the full lifecycle:
+
+```mermaid
+stateDiagram-v2
+ [*] --> PENDING
+ PENDING --> RUNNING: start_execution()
+ RUNNING --> COMPLETED: Success
+ RUNNING --> FAILED: Error
+ COMPLETED --> [*]: Publish event
+ FAILED --> [*]: Publish event
+
+ note right of RUNNING
+ Task executes on device
+ via device_manager
+ end note
+
+ note right of COMPLETED
+ Mark in constellation
+ Identify newly ready tasks
+ Publish TASK_COMPLETED
+ end note
+```
+
+**Execution implementation:**
+
+```python
+async def _execute_task_with_events(
+ self, task: TaskStar, constellation: TaskConstellation
+) -> None:
+ """Execute a single task and publish events."""
+
+ try:
+ # Publish TASK_STARTED event
+ start_event = TaskEvent(
+ event_type=EventType.TASK_STARTED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={"constellation_id": constellation.constellation_id},
+ task_id=task.task_id,
+ status=TaskStatus.RUNNING.value,
+ )
+ await self._event_bus.publish_event(start_event)
+
+ # Mark task as started
+ task.start_execution()
+
+ # Execute on device
+ result = await task.execute(self._device_manager)
+
+ is_success = result.status == TaskStatus.COMPLETED.value
+
+ # Mark task as completed in constellation
+ newly_ready = constellation.mark_task_completed(
+ task.task_id, success=is_success, result=result
+ )
+
+ # Publish TASK_COMPLETED or TASK_FAILED event
+ completed_event = TaskEvent(
+ event_type=(
+ EventType.TASK_COMPLETED if is_success
+ else EventType.TASK_FAILED
+ ),
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "constellation_id": constellation.constellation_id,
+ "newly_ready_tasks": [t.task_id for t in newly_ready],
+ "constellation": constellation,
+ },
+ task_id=task.task_id,
+ status=result.status,
+ result=result,
+ )
+ await self._event_bus.publish_event(completed_event)
+
+ except Exception as e:
+ # Handle failure (mark task failed, publish event)
+ newly_ready = constellation.mark_task_completed(
+ task.task_id, success=False, error=e
+ )
+
+ failed_event = TaskEvent(
+ event_type=EventType.TASK_FAILED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "constellation_id": constellation.constellation_id,
+ "newly_ready_tasks": [t.task_id for t in newly_ready],
+ },
+ task_id=task.task_id,
+ status=TaskStatus.FAILED.value,
+ error=e,
+ )
+ await self._event_bus.publish_event(failed_event)
+ raise
+```
+
+## Concurrent Execution Model
+
+### Parallel Task Execution
+
+Multiple tasks execute concurrently across devices:
+
+```python
+# Track active execution tasks
+self._execution_tasks: Dict[str, asyncio.Task] = {}
+
+# Schedule multiple ready tasks at once
+for task in ready_tasks:
+ task_future = asyncio.create_task(
+ self._execute_task_with_events(task, constellation)
+ )
+ self._execution_tasks[task.task_id] = task_future
+```
+
+**Concurrency characteristics:**
+
+| Aspect | Behavior | Benefit |
+|--------|----------|---------|
+| **Device parallelism** | Independent devices execute tasks simultaneously | Maximize resource utilization |
+| **Dependency-based** | Only independent tasks (no dependency path) run concurrently | Maintain correctness |
+| **Heterogeneous** | Different device types (Windows, Android, iOS, etc.) in parallel | Cross-platform orchestration |
+| **Unbounded** | No artificial limit on concurrent tasks | Scale with available devices |
+
+### Completion Detection
+
+The orchestrator waits for at least one task to complete before continuing:
+
+```python
+async def _wait_for_task_completion(self) -> None:
+ """Wait for at least one task to complete and clean up."""
+
+ if self._execution_tasks:
+ # Wait for first completion
+ done, _ = await asyncio.wait(
+ self._execution_tasks.values(),
+ return_when=asyncio.FIRST_COMPLETED
+ )
+
+ # Clean up completed tasks
+ await self._cleanup_completed_tasks(done)
+ else:
+ # No running tasks, wait briefly
+ await asyncio.sleep(0.1)
+```
+
+**Why wait for first completion?**
+
+1. **Responsiveness**: React immediately to any task completion
+2. **Event publishing**: Trigger constellation modifications as soon as possible
+3. **Resource efficiency**: Avoid busy-waiting when no tasks are running
+4. **Fairness**: Give equal opportunity for any task to trigger next iteration
+
+### Task Cleanup
+
+Completed tasks are removed from tracking:
+
+```python
+async def _cleanup_completed_tasks(self, done_futures: set) -> None:
+ """Clean up completed task futures from tracking."""
+
+ completed_task_ids = []
+ for task_future in done_futures:
+ for task_id, future in self._execution_tasks.items():
+ if future == task_future:
+ completed_task_ids.append(task_id)
+ break
+
+ for task_id in completed_task_ids:
+ del self._execution_tasks[task_id]
+```
+
+This prevents memory leaks and ensures `_execution_tasks` reflects only actively running tasks.
+
+## Concurrent Constellation Editing
+
+### The Challenge
+
+Traditional schedulers treat DAG structure as **immutable** during execution. But in UFO, the LLM-based Constellation Agent can modify the DAG based on task results:
+
+- Add new tasks when decomposition is needed
+- Remove unnecessary tasks when shortcuts are found
+- Modify dependencies when task relationships change
+- Update task descriptions or parameters
+
+This creates a **race condition**: tasks may be executing while the agent modifies the constellation.
+
+### The Solution: Overlapping Execution and Editing
+
+The orchestrator allows task execution and constellation editing to **proceed concurrently**:
+
+```mermaid
+gantt
+ title Concurrent Execution and Editing Timeline
+ dateFormat X
+ axisFormat %L
+
+ section Tasks
+ Task A executes :a1, 0, 100
+ Task B executes :b1, 50, 150
+ Task C executes :c1, 100, 200
+
+ section Editing
+ Edit on A completion :e1, 100, 130
+ Edit on B completion :e2, 150, 180
+
+ section Sync
+ Sync after Edit A :s1, 130, 135
+ Sync after Edit B :s2, 180, 185
+```
+
+In the diagram:
+
+- **Task A** completes at t=100, triggering an edit
+- **Task B** continues executing during the edit (100-130)
+- Edit completes and syncs at t=135
+- **Task C** starts at t=135 based on updated constellation
+- **Task B** completes at t=150, triggering another edit
+- **Task C** continues executing during this second edit
+
+By overlapping execution and editing, end-to-end latency is reduced by up to 30% compared to sequential edit-then-execute approaches.
+
+### Synchronization Points
+
+The orchestrator synchronizes constellation state at the start of each scheduling iteration:
+
+```python
+async def _sync_constellation_modifications(
+ self, constellation: TaskConstellation
+) -> TaskConstellation:
+ """Synchronize pending constellation modifications."""
+
+ if self._modification_synchronizer:
+ # Wait for agent to finish any pending edits
+ await self._modification_synchronizer.wait_for_pending_modifications()
+
+ # Merge agent's structural changes with orchestrator's execution state
+ constellation = self._modification_synchronizer \
+ .merge_and_sync_constellation_states(
+ orchestrator_constellation=constellation,
+ )
+
+ return constellation
+```
+
+**What gets synchronized:**
+
+1. **Structural changes** from agent (new tasks, dependencies, modifications)
+2. **Execution state** from orchestrator (task statuses, results, errors)
+3. **Consistency validation** (check invariants I1-I3)
+
+The `merge_and_sync_constellation_states` method ensures:
+
+- Agent's constellation has latest structural modifications
+- Orchestrator's execution progress is preserved
+- More advanced task states (e.g., COMPLETED) take precedence over stale states (e.g., RUNNING)
+
+[Learn more about synchronization →](safe_assignment_locking.md#constellation-state-merging)
+
+## Performance Optimizations
+
+### 1. Lazy Evaluation
+
+Ready tasks are computed only when needed:
+
+```python
+# Only compute when scheduling
+ready_tasks = constellation.get_ready_tasks()
+```
+
+Avoids repeated expensive graph traversals when no tasks complete.
+
+### 2. Priority-Based Scheduling
+
+Higher priority tasks execute first:
+
+```python
+# Sort by priority before dispatching
+ready_tasks.sort(key=lambda t: t.priority.value, reverse=True)
+```
+
+Ensures critical-path tasks don't wait behind low-priority tasks.
+
+### 3. Incremental Completion Detection
+
+Use `asyncio.wait(..., return_when=FIRST_COMPLETED)` instead of waiting for all:
+
+```python
+done, pending = await asyncio.wait(
+ self._execution_tasks.values(),
+ return_when=asyncio.FIRST_COMPLETED
+)
+```
+
+Minimizes latency between task completion and next scheduling iteration.
+
+### 4. Batched Synchronization
+
+Modifications are batched during agent editing:
+
+```python
+# Agent may modify multiple tasks before publishing CONSTELLATION_MODIFIED
+# Orchestrator waits once for all modifications
+await self._modification_synchronizer.wait_for_pending_modifications()
+```
+
+Reduces synchronization overhead from O(N) to O(1) per editing cycle.
+
+[Learn more about batching →](batched_editing.md)
+
+## Execution Timeline Example
+
+Here's a concrete example showing how asynchronous scheduling works:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator Loop
+ participant C as Constellation
+ participant T1 as Task A (Device 1)
+ participant T2 as Task B (Device 2)
+ participant T3 as Task C (Device 1)
+ participant A as Agent
+
+ Note over O: Iteration 1
+ O->>C: get_ready_tasks()
+ C-->>O: [Task A, Task B]
+ O->>T1: Schedule Task A (async)
+ O->>T2: Schedule Task B (async)
+ O->>O: wait_for_task_completion()
+
+ Note over T1,T2: Both execute concurrently
+
+ T1-->>O: Task A completes
+ O->>A: Trigger editing (async)
+
+ Note over O: Iteration 2
+ O->>O: sync_constellation_modifications()
+
+ par Agent editing
+ A->>A: Modify constellation
+ A-->>O: Publish CONSTELLATION_MODIFIED
+ and Task B continues
+ Note over T2: Still executing
+ end
+
+ O->>C: get_ready_tasks()
+ C-->>O: [Task C]
+ O->>T3: Schedule Task C (async)
+ O->>O: wait_for_task_completion()
+
+ T2-->>O: Task B completes
+
+ Note over O: Iteration 3
+ O->>O: sync_constellation_modifications()
+ O->>C: get_ready_tasks()
+ C-->>O: []
+
+ T3-->>O: Task C completes
+
+ Note over O: Constellation complete
+```
+
+**Key observations:**
+
+1. **Iteration 1**: Tasks A and B scheduled concurrently
+2. **Concurrent editing**: Agent modifies constellation while Task B executes
+3. **Iteration 2**: Task C scheduled immediately after sync, Task B still running
+4. **No blocking**: Orchestrator never waits idle; always scheduling or executing
+
+## Error Handling
+
+### Task Failure
+
+When a task fails, the orchestrator:
+
+1. Publishes `TASK_FAILED` event
+2. Marks task as failed in constellation
+3. Identifies newly ready tasks (if any dependencies allow failure)
+4. Continues scheduling remaining tasks
+
+```python
+except Exception as e:
+ newly_ready = constellation.mark_task_completed(
+ task.task_id, success=False, error=e
+ )
+
+ failed_event = TaskEvent(
+ event_type=EventType.TASK_FAILED,
+ ...
+ error=e,
+ )
+ await self._event_bus.publish_event(failed_event)
+```
+
+### Cancellation
+
+If orchestration is cancelled:
+
+```python
+except asyncio.CancelledError:
+ if self._logger:
+ self._logger.info(
+ f"Orchestration cancelled for constellation {constellation.constellation_id}"
+ )
+ raise
+```
+
+All running tasks are automatically cancelled via `asyncio` cancellation propagation.
+
+### Cleanup
+
+Cleanup always happens, even on error:
+
+```python
+finally:
+ await self._cleanup_constellation(constellation)
+```
+
+## Usage Patterns
+
+### Basic Orchestration
+
+```python
+orchestrator = TaskConstellationOrchestrator(device_manager)
+
+results = await orchestrator.orchestrate_constellation(
+ constellation=my_constellation,
+ assignment_strategy="round_robin"
+)
+```
+
+### With Custom Event Handlers
+
+```python
+class ProgressTracker(IEventObserver):
+ async def on_event(self, event: Event):
+ if event.event_type == EventType.TASK_COMPLETED:
+ print(f"✓ Task {event.task_id} completed")
+
+event_bus.subscribe(ProgressTracker())
+
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+### With Modification Synchronizer
+
+```python
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+event_bus.subscribe(synchronizer)
+
+# Now edits are synchronized automatically
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+## Performance Characteristics
+
+| Metric | Typical Value | Notes |
+|--------|--------------|-------|
+| **Scheduling latency** | < 10ms | Time from task ready to dispatch |
+| **Completion detection** | < 5ms | Time from task done to next iteration |
+| **Sync overhead** | 10-50ms | Per constellation modification |
+| **Max concurrent tasks** | Limited by devices | No artificial orchestrator limit |
+| **Throughput** | 10-100 tasks/sec | Depends on task duration |
+
+*Performance measured on: Intel i7, 16GB RAM, 5 connected devices, tasks averaging 2-5 seconds each*
+
+## Related Documentation
+
+- **[Event-Driven Coordination](event_driven_coordination.md)** - Event system enabling async scheduling
+- **[Safe Assignment Locking](safe_assignment_locking.md)** - How editing synchronizes with execution
+- **[Consistency Guarantees](consistency_guarantees.md)** - Invariants preserved during async execution
+- **[API Reference](api_reference.md)** - Orchestrator API details
+
+---
+
+!!!tip "Next Steps"
+ To understand how concurrent editing is made safe, continue to [Safe Assignment Locking](safe_assignment_locking.md).
diff --git a/documents/docs/galaxy/constellation_orchestrator/batched_editing.md b/documents/docs/galaxy/constellation_orchestrator/batched_editing.md
new file mode 100644
index 000000000..90a420d01
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/batched_editing.md
@@ -0,0 +1,533 @@
+# Batched Constellation Editing
+
+## Overview
+
+Frequent LLM-driven edits can introduce significant overhead if processed individually. Each modification requires:
+
+- LLM invocation (100-1000ms latency)
+- Lock acquisition and release
+- Validation of invariants I1-I3
+- Constellation state synchronization
+- Event publishing and notification
+
+To balance **responsiveness** with **efficiency**, the orchestrator supports **batched constellation editing**: during a reasoning round, multiple task completion events are aggregated and their resulting modifications applied atomically in a single cycle.
+
+For more on the synchronization mechanism, see [Safe Assignment Locking](safe_assignment_locking.md).
+
+## The Batching Problem
+
+### Without Batching
+
+Consider three tasks completing nearly simultaneously:
+
+```mermaid
+gantt
+ title Sequential Editing (No Batching)
+ dateFormat X
+ axisFormat %L
+
+ section Events
+ Task A completes :e1, 0, 5
+ Task B completes :e2, 10, 15
+ Task C completes :e3, 20, 25
+
+ section Editing
+ Lock + Edit A :l1, 5, 155
+ Lock + Edit B :l2, 155, 305
+ Lock + Edit C :l3, 305, 455
+
+ section Overhead
+ Total overhead :o1, 5, 455
+```
+
+**Overhead**: 3 × 150ms = **450ms total**
+
+- 3 lock acquisitions
+- 3 LLM invocations
+- 3 validations
+- 3 synchronizations
+
+### With Batching
+
+Same scenario with batched editing:
+
+```mermaid
+gantt
+ title Batched Editing
+ dateFormat X
+ axisFormat %L
+
+ section Events
+ Task A completes :e1, 0, 5
+ Task B completes :e2, 10, 15
+ Task C completes :e3, 20, 25
+
+ section Editing
+ Lock + Edit A,B,C :l1, 25, 175
+
+ section Overhead
+ Total overhead :o1, 25, 175
+```
+
+**Overhead**: 1 × 150ms = **150ms total**
+
+- 1 lock acquisition
+- 1 LLM invocation (potentially processing multiple tasks)
+- 1 validation
+- 1 synchronization
+
+**Improvement**: **3× reduction** in overhead!
+
+Batching reduces orchestration overhead from O(N) to O(1) per reasoning round, where N = number of completed tasks.
+
+## Batching Mechanism
+
+### Event Queuing
+
+When tasks complete, their IDs are queued for batch processing:
+
+```python
+# In safe assignment lock algorithm
+while system is running:
+ foreach event e ∈ E do
+ if e is TASK_COMPLETED or TASK_FAILED then
+ async enqueue(e) # ← Queue instead of immediate processing
+ end
+ end
+
+ acquire(assign_lock)
+
+ # Process ALL queued events in one batch
+ while queue not empty do
+ e ← dequeue()
+ Δ ← invoke(ConstellationAgent, edit(C, e))
+ C ← apply(C, Δ)
+ end
+
+ validate(C) # ← Single validation for entire batch
+ publish(CONSTELLATION_MODIFIED, all_task_ids)
+ C ← synchronize(C, T_C)
+
+ release(assign_lock)
+```
+
+### Implementation in Synchronizer
+
+The `ConstellationModificationSynchronizer` batches pending modifications:
+
+```python
+async def wait_for_pending_modifications(
+ self, timeout: Optional[float] = None
+) -> bool:
+ """Wait for all pending modifications to complete."""
+
+ if not self._pending_modifications:
+ return True
+
+ try:
+ while self._pending_modifications:
+ # Get current pending tasks (snapshot)
+ pending_tasks = list(self._pending_modifications.keys())
+ pending_futures = list(self._pending_modifications.values())
+
+ self.logger.info(
+ f"⏳ Waiting for {len(pending_tasks)} pending modification(s): "
+ f"{pending_tasks}"
+ )
+
+ # Wait for ALL current pending modifications (batching)
+ await asyncio.wait_for(
+ asyncio.gather(*pending_futures, return_exceptions=True),
+ timeout=remaining_timeout,
+ )
+
+ # Check if new modifications were added during wait
+ if not self._pending_modifications:
+ break
+
+ # Small delay to allow new registrations to settle
+ await asyncio.sleep(0.01)
+
+ self.logger.info("✅ All pending modifications completed")
+ return True
+
+ except asyncio.TimeoutError:
+ ...
+```
+
+**Key aspects:**
+
+1. **Snapshot pending modifications** - Capture current batch
+2. **Wait for all in batch** - Use `asyncio.gather()` for parallel completion
+3. **Check for new arrivals** - Handle dynamic additions during wait
+4. **Iterate until empty** - Process all batches
+
+### Agent-Side Batching
+
+The Constellation Agent receives multiple task IDs and processes them together:
+
+```python
+async def process_editing(
+ self,
+ context: Context = None,
+ task_ids: Optional[List[str]] = None, # ← Multiple task IDs
+ before_constellation: Optional[TaskConstellation] = None,
+) -> TaskConstellation:
+ """Process task completion events and update constellation."""
+
+ task_ids = task_ids or []
+
+ # Agent can see multiple completed tasks at once
+ self.logger.debug(
+ f"Tasks {task_ids} marked as completed, processing modifications..."
+ )
+
+ # Potentially make decisions based on multiple task outcomes
+ # e.g., "Task A and B both succeeded, skip Task C"
+ after_constellation = await self._create_and_process(context)
+
+ # Publish single CONSTELLATION_MODIFIED event for entire batch
+ await self._publish_constellation_modified_event(
+ before_constellation,
+ after_constellation,
+ task_ids, # ← All modified tasks
+ self._create_timing_info(start_time, end_time, duration),
+ )
+
+ return after_constellation
+```
+
+## Batching Timeline Example
+
+Here's a detailed timeline showing how batching works:
+
+```
+t=100ms: Task A completes
+ → Synchronizer registers pending modification for A
+ → Task A's event added to queue
+
+t=150ms: Task B completes (during A's queueing)
+ → Synchronizer registers pending modification for B
+ → Task B's event added to queue
+
+t=200ms: Task C completes
+ → Synchronizer registers pending modification for C
+ → Task C's event added to queue
+
+t=205ms: Orchestrator reaches synchronization point
+ → Calls wait_for_pending_modifications()
+ → Sees pending: [A, B, C]
+ → Waits for all three futures
+
+t=210ms: Agent starts processing (lock acquired)
+ → Receives task_ids = ['A', 'B', 'C']
+ → Makes unified editing decision
+
+t=350ms: Agent completes editing
+ → Publishes CONSTELLATION_MODIFIED with on_task_id = ['A', 'B', 'C']
+
+t=355ms: Synchronizer receives event
+ → Completes futures for A, B, C
+ → wait_for_pending_modifications() returns
+
+t=360ms: Orchestrator merges states and continues
+ → Single validation
+ → Single synchronization
+ → Resume scheduling
+```
+
+**Total overhead**: 360ms - 100ms = **260ms** for 3 tasks
+
+Compare to sequential: 3 × 150ms = **450ms** (ignoring event queueing)
+
+## Efficiency Analysis
+
+### Overhead Breakdown
+
+Per-task overhead without batching:
+
+| Operation | Cost (ms) | Frequency |
+|-----------|-----------|-----------|
+| Lock acquisition | 1-2 | Per task |
+| LLM invocation | 100-1000 | Per task |
+| Validation (I1-I3) | 5-10 | Per task |
+| State synchronization | 10-20 | Per task |
+| Event publishing | 1-2 | Per task |
+| **Total** | **117-1034** | **Per task** |
+
+Per-batch overhead with batching:
+
+| Operation | Cost (ms) | Frequency |
+|-----------|-----------|-----------|
+| Lock acquisition | 1-2 | Per batch |
+| LLM invocation | 100-1000 | Per batch |
+| Validation (I1-I3) | 5-10 | Per batch |
+| State synchronization | 10-20 | Per batch |
+| Event publishing | 1-2 | Per batch |
+| **Total** | **117-1034** | **Per batch** |
+
+**Savings with batch size N**: (N - 1) × overhead
+
+!!!example "Concrete Example"
+ With N=5 tasks completing simultaneously and 200ms average overhead:
+
+ - **Without batching**: 5 × 200ms = 1000ms
+ - **With batching**: 1 × 200ms = 200ms
+ - **Savings**: 800ms (80% reduction)
+
+### Throughput Improvement
+
+Batching improves task throughput:
+
+$$\text{Throughput}_{\text{batched}} = \frac{N \times \text{Throughput}_{\text{unbatched}}}{1 + (N-1) \times \frac{\text{overhead}}{\text{task\_duration}}}$$
+
+For tasks averaging 5 seconds with 200ms overhead:
+
+- N=1: 0.20 tasks/sec
+- N=3: 0.55 tasks/sec (**2.75× improvement**)
+- N=5: 0.83 tasks/sec (**4.15× improvement**)
+- N=10: 1.35 tasks/sec (**6.75× improvement**)
+
+### Latency Trade-Off
+
+Batching may slightly increase latency for individual tasks:
+
+- **Best case**: Task completes, is first in batch → minimal additional latency
+- **Average case**: Task waits for 1-2 other tasks to complete → ~50-200ms additional latency
+- **Worst case**: Task waits for full batch to accumulate → ~500ms additional latency
+
+**Acceptable trade-off** for significantly improved overall throughput.
+
+## Dynamic Batch Size
+
+The orchestrator uses **dynamic batching** - batch size adapts to task completion patterns:
+
+### Natural Batching
+
+Tasks completing within a short window are naturally batched:
+
+```python
+# In wait_for_pending_modifications()
+while self._pending_modifications:
+ # Snapshot current pending tasks
+ pending_tasks = list(self._pending_modifications.keys())
+
+ # Wait for all of them
+ await asyncio.gather(*pending_futures)
+
+ # Check for new arrivals during processing
+ if not self._pending_modifications:
+ break
+
+ # If new tasks arrived, include them in next iteration
+```
+
+**Batch size**: Determined by task completion timing, not fixed parameter
+
+### Adaptive Grouping
+
+The synchronizer automatically groups tasks:
+
+- **Slow periods**: Small batches (1-2 tasks)
+- **Burst periods**: Large batches (5-10+ tasks)
+- **Mixed patterns**: Variable batch sizes
+
+This provides **optimal efficiency** without manual tuning.
+
+## Atomicity of Batched Edits
+
+### Single Edit Cycle
+
+All modifications in a batch are applied in a single atomic edit cycle:
+
+```python
+acquire(assign_lock)
+
+# Apply all modifications together
+foreach event in batch:
+ Δ ← invoke(ConstellationAgent, edit(C, event))
+ C ← apply(C, Δ)
+end
+
+validate(C) # ← Validates combined result
+publish(CONSTELLATION_MODIFIED, batch_task_ids)
+
+release(assign_lock)
+```
+
+**Atomicity guarantee**: Either all modifications in the batch are applied, or none are.
+
+### Confluence Property
+
+The paper proves an **Edit-Sync Confluence Lemma**:
+
+**Lemma**: Folding runtime events commutes with lock-bounded edits within the same window.
+
+**Formally**: Given events $e_1, e_2, \ldots, e_n$ arriving within a lock window:
+
+$$\text{apply}(C, \Delta_{e_1} \circ \Delta_{e_2} \circ \cdots \circ \Delta_{e_n}) \equiv \text{apply}(\cdots\text{apply}(\text{apply}(C, \Delta_{e_1}), \Delta_{e_2})\cdots, \Delta_{e_n})$$
+
+Batched application produces the same result as sequential application.
+
+**Proof sketch**:
+
+1. Each $\Delta_i$ is a pure function of $C$ and $e_i$
+2. Lock ensures no intermediate states are visible
+3. Validation enforces invariants on final state
+4. Synchronization merges all runtime progress atomically
+
+[See Appendix A.4 in paper for complete proof]
+
+Batching is a pure **performance optimization** - it doesn't change the semantics of constellation evolution.
+
+## Implementation Patterns
+
+### Enabling Batching
+
+Batching is enabled automatically when using the synchronizer:
+
+```python
+from galaxy.session.observers.constellation_sync_observer import (
+ ConstellationModificationSynchronizer
+)
+
+# Create and attach synchronizer
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+
+# Subscribe to events
+event_bus.subscribe(synchronizer)
+
+# Batching happens automatically
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+### Monitoring Batch Sizes
+
+Track batching statistics:
+
+```python
+# After orchestration
+stats = synchronizer.get_statistics()
+
+print(f"Total modifications: {stats['total_modifications']}")
+print(f"Completed: {stats['completed_modifications']}")
+
+# Infer average batch size
+avg_batch_size = stats['total_modifications'] / number_of_edit_cycles
+print(f"Average batch size: {avg_batch_size:.2f}")
+```
+
+### Tuning Batch Timeout
+
+Adjust timeout for slower LLM responses:
+
+```python
+# Increase timeout for complex reasoning
+synchronizer.set_modification_timeout(1800.0) # 30 minutes
+
+# Or decrease for simple tasks
+synchronizer.set_modification_timeout(120.0) # 2 minutes
+```
+
+## Performance Best Practices
+
+### 1. Group Related Tasks
+
+Design constellations with tasks that complete around the same time:
+
+```python
+# Good: Tasks with similar durations
+Task A: 5 seconds
+Task B: 6 seconds # ← Likely completes near Task A
+Task C: 5 seconds # ← Likely completes near Task A
+
+# Bad: Widely varying durations
+Task X: 1 second
+Task Y: 30 seconds # ← Won't batch with X
+Task Z: 2 seconds # ← Won't batch with Y
+```
+
+### 2. Minimize LLM Overhead
+
+Reduce individual modification latency:
+
+- Use efficient prompts
+- Cache common editing patterns
+- Pre-compute possible modifications
+
+### 3. Balance Batch Size
+
+Too small: Frequent overhead
+Too large: Increased latency
+
+**Sweet spot**: 3-7 tasks per batch for most workloads
+
+### 4. Monitor and Adjust
+
+Track metrics:
+
+```python
+class BatchMetricsObserver(IEventObserver):
+ def __init__(self):
+ self.batch_sizes = []
+
+ async def on_event(self, event: Event):
+ if event.event_type == EventType.CONSTELLATION_MODIFIED:
+ task_ids = event.data.get("on_task_id", [])
+ batch_size = len(task_ids)
+ self.batch_sizes.append(batch_size)
+
+ if batch_size > 1:
+ print(f"✓ Batched {batch_size} modifications")
+```
+
+## Comparison with Alternatives
+
+### Micro-Batching
+
+**Alternative**: Fixed small batches (e.g., always wait for 2-3 tasks)
+
+**Drawback**:
+- Adds artificial delay even when single task completes
+- May miss larger natural batches
+
+**UFO's approach**: Dynamic batching with no artificial delays
+
+### Window-Based Batching
+
+**Alternative**: Fixed time window (e.g., batch every 1 second)
+
+**Drawback**:
+- Adds latency even when editing is fast
+- May split natural batches across windows
+
+**UFO's approach**: Event-driven batching without fixed windows
+
+### No Batching
+
+**Alternative**: Process each modification immediately
+
+**Drawback**:
+- High overhead for concurrent completions
+- Redundant LLM invocations
+
+**UFO's approach**: Automatic batching when beneficial
+
+| Approach | Latency | Throughput | Complexity |
+|----------|---------|------------|------------|
+| **No batching** | Low (best) | Low | Low |
+| **Fixed window** | Medium | Medium | Medium |
+| **Fixed size** | High | Medium | Medium |
+| **Dynamic (UFO)** | Low-Medium | High (best) | Low |
+
+## Related Documentation
+
+- **[Safe Assignment Locking](safe_assignment_locking.md)** - How batching integrates with locking
+- **[Asynchronous Scheduling](asynchronous_scheduling.md)** - Concurrent execution enabling batching
+- **[Event-Driven Coordination](event_driven_coordination.md)** - Event system for batching
+
+---
+
+!!!tip "Next Steps"
+ To understand device assignment and resource management, continue to [Constellation Manager](constellation_manager.md).
diff --git a/documents/docs/galaxy/constellation_orchestrator/consistency_guarantees.md b/documents/docs/galaxy/constellation_orchestrator/consistency_guarantees.md
new file mode 100644
index 000000000..4ce62bef1
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/consistency_guarantees.md
@@ -0,0 +1,557 @@
+# Consistency and Safety Guarantees
+
+## Overview
+
+Since the TaskConstellation may be dynamically rewritten by an LLM-based agent, the orchestrator must enforce runtime invariants to preserve correctness even under partial or invalid updates. Without these guarantees, the system could execute invalid DAGs, violate dependencies, or enter inconsistent states.
+
+The Constellation Orchestrator enforces three critical invariants (I1-I3) that together ensure safety, consistency, and semantic validity throughout execution.
+
+## The Three Invariants
+
+### I1: Single Assignment
+
+**Invariant**: Each TaskStar has at most one active device assignment at any time.
+
+**Rationale**: A task cannot execute on multiple devices simultaneously - this would lead to duplicate execution, wasted resources, inconsistent results, and ambiguous state (which device's result is authoritative?).
+
+**Enforcement**:
+
+```python
+# In TaskStar
+@property
+def target_device_id(self) -> Optional[str]:
+ """Get the target device ID."""
+ return self._target_device_id
+
+@target_device_id.setter
+def target_device_id(self, value: Optional[str]) -> None:
+ """Set the target device ID."""
+ if self._status == TaskStatus.RUNNING:
+ raise ValueError(
+ f"Cannot modify device assignment of running task {self._task_id}"
+ )
+ self._target_device_id = value
+```
+
+**Validation**:
+
+```python
+def validate_constellation_assignments(
+ self, constellation: TaskConstellation
+) -> tuple[bool, List[str]]:
+ """Validate that all tasks have valid device assignments."""
+
+ errors = []
+ for task_id, task in constellation.tasks.items():
+ if not task.target_device_id:
+ errors.append(f"Task '{task_id}' has no device assignment")
+
+ return len(errors) == 0, errors
+```
+
+**Warning:** The setter explicitly prevents reassignment of running tasks, ensuring I1 cannot be violated during execution.
+
+### I2: Acyclic Consistency
+
+**Invariant**: Edits must preserve DAG acyclicity - the constellation remains a valid directed acyclic graph after all modifications.
+
+**Rationale**: Cycles in the task graph create deadlocks (tasks wait for each other indefinitely), undefined execution order, and inability to determine ready tasks.
+
+**Enforcement**:
+
+```python
+def add_dependency(self, dependency: TaskStarLine) -> None:
+ """Add a dependency to the constellation."""
+
+ # Validate tasks exist
+ if dependency.from_task_id not in self._tasks:
+ raise ValueError(f"Source task {dependency.from_task_id} not found")
+ if dependency.to_task_id not in self._tasks:
+ raise ValueError(f"Target task {dependency.to_task_id} not found")
+
+ # Check for cycle BEFORE adding
+ if self._would_create_cycle(dependency.from_task_id, dependency.to_task_id):
+ raise ValueError(
+ f"Adding dependency {dependency.from_task_id} -> {dependency.to_task_id} would create a cycle"
+ )
+
+ # Safe to add
+ self._dependencies[dependency.line_id] = dependency
+```
+
+**Cycle Detection Algorithm**:
+
+```python
+def _would_create_cycle(self, from_task_id: str, to_task_id: str) -> bool:
+ """Check if adding a dependency would create a cycle."""
+
+ # Use DFS to check if path exists from to_task_id to from_task_id
+ visited = set()
+
+ def has_path(current: str, target: str) -> bool:
+ if current == target:
+ return True
+ if current in visited:
+ return False
+
+ visited.add(current)
+
+ # Check all dependencies where current is the source
+ for dependency in self._dependencies.values():
+ if dependency.from_task_id == current:
+ if has_path(dependency.to_task_id, target):
+ return True
+
+ return False
+
+ # If path exists from to_task → from_task, adding from_task → to_task creates cycle
+ return has_path(to_task_id, from_task_id)
+```
+
+**Validation**:
+
+```python
+def validate_dag(self) -> Tuple[bool, List[str]]:
+ """Validate the DAG structure."""
+
+ errors = []
+
+ # Check for cycles
+ if self.has_cycle():
+ errors.append("DAG contains cycles")
+
+ # Check for invalid dependencies
+ for dependency in self._dependencies.values():
+ if dependency.from_task_id not in self._tasks:
+ errors.append(
+ f"Dependency references non-existent source task "
+ f"{dependency.from_task_id}"
+ )
+ if dependency.to_task_id not in self._tasks:
+ errors.append(
+ f"Dependency references non-existent target task "
+ f"{dependency.to_task_id}"
+ )
+
+ return len(errors) == 0, errors
+```
+
+The orchestrator uses `get_topological_order()` which raises `ValueError` if cycles exist, providing an additional check.
+
+### I3: Valid Update
+
+**Invariant**: Only `PENDING` and `WAITING_DEPENDENCY` tasks may be modified; `RUNNING`, `COMPLETED`, and `FAILED` tasks are immutable.
+
+**Rationale**: Modifying tasks that have started or finished execution could invalidate already-collected results, create inconsistencies between device state and constellation state, or violate causal dependencies.
+
+**Enforcement**:
+
+```python
+def get_modifiable_tasks(self) -> List[TaskStar]:
+ """Get all tasks that can be modified."""
+
+ modifiable_statuses = {TaskStatus.PENDING, TaskStatus.WAITING_DEPENDENCY}
+ return [
+ task for task in self._tasks.values()
+ if task.status in modifiable_statuses
+ ]
+
+def is_task_modifiable(self, task_id: str) -> bool:
+ """Check if a specific task can be modified."""
+
+ task = self._tasks.get(task_id)
+ if not task:
+ return False
+ return task.status in {TaskStatus.PENDING, TaskStatus.WAITING_DEPENDENCY}
+```
+
+**Task-Level Protection**:
+
+```python
+# In TaskStar
+@name.setter
+def name(self, value: str) -> None:
+ if self._status == TaskStatus.RUNNING:
+ raise ValueError(f"Cannot modify name of running task {self._task_id}")
+ self._name = value
+
+@description.setter
+def description(self, value: str) -> None:
+ if self._status == TaskStatus.RUNNING:
+ raise ValueError(
+ f"Cannot modify description of running task {self._task_id}"
+ )
+ self._description = value
+```
+
+**Dependency Validation**:
+
+```python
+def get_modifiable_dependencies(self) -> List[TaskStarLine]:
+ """Get all dependencies that can be modified."""
+
+ modifiable_deps = []
+ modifiable_statuses = {TaskStatus.PENDING, TaskStatus.WAITING_DEPENDENCY}
+
+ for dep in self._dependencies.values():
+ target_task = self._tasks.get(dep.to_task_id)
+ if target_task and target_task.status in modifiable_statuses:
+ modifiable_deps.append(dep)
+
+ return modifiable_deps
+```
+
+Once a task starts execution, its core properties (description, dependencies, device assignment) become immutable, ensuring execution integrity.
+
+## Invariant Verification
+
+### Pre-Execution Validation
+
+Before orchestration begins, the orchestrator validates all invariants:
+
+```python
+async def _validate_and_prepare_constellation(
+ self, constellation: TaskConstellation,
+ device_assignments: Optional[Dict[str, str]],
+ assignment_strategy: Optional[str] = None,
+) -> None:
+ """Validate DAG structure and prepare device assignments."""
+
+ # Validate I2: Acyclic Consistency
+ is_valid, errors = constellation.validate_dag()
+ if not is_valid:
+ raise ValueError(f"Invalid DAG: {errors}")
+
+ # Handle device assignments
+ await self._assign_devices_to_tasks(
+ constellation, device_assignments, assignment_strategy
+ )
+
+ # Validate I1: Single Assignment
+ is_valid, errors = self._constellation_manager \
+ .validate_constellation_assignments(constellation)
+ if not is_valid:
+ raise ValueError(f"Device assignment validation failed: {errors}")
+```
+
+### Runtime Validation
+
+During execution, the orchestrator validates before each scheduling iteration:
+
+```python
+async def _run_execution_loop(self, constellation: TaskConstellation) -> None:
+ """Main execution loop with validation."""
+
+ while not constellation.is_complete():
+ # Sync and validate after modifications
+ constellation = await self._sync_constellation_modifications(constellation)
+
+ # Validate I1: Single Assignment
+ self._validate_existing_device_assignments(constellation)
+
+ # I2 checked implicitly by TaskConstellation.add_dependency()
+ # I3 checked by TaskStar property setters
+
+ # Schedule ready tasks
+ ready_tasks = constellation.get_ready_tasks()
+ await self._schedule_ready_tasks(ready_tasks, constellation)
+ await self._wait_for_task_completion()
+```
+
+### Post-Modification Validation
+
+After the agent modifies the constellation, validation occurs before releasing the lock:
+
+```python
+# In safe assignment lock algorithm
+while queue not empty:
+ e ← dequeue()
+ Δ ← invoke(ConstellationAgent, edit(C, e))
+ C ← apply(C, Δ)
+ validate(C) # ← Verify I1, I2, I3
+ publish(CONSTELLATION_MODIFIED, t)
+ C ← synchronize(C, T_C)
+end
+```
+
+## Consistency Under Concurrent Modification
+
+### The Challenge
+
+Concurrent task execution and constellation editing create multiple consistency challenges:
+
+| Challenge | Without Invariants | With Invariants |
+|-----------|-------------------|-----------------|
+| **Duplicate execution** | Task assigned to multiple devices | I1 prevents multiple assignments |
+| **Cyclic dependencies** | Deadlocked tasks | I2 prevents cycle introduction |
+| **Stale modifications** | Running task gets edited | I3 prevents editing running tasks |
+| **Lost results** | Completed task gets removed | I3 makes completed tasks immutable |
+
+### Consistency Model
+
+The orchestrator maintains eventual consistency with strong isolation:
+
+```mermaid
+graph TD
+ A[Task Completes] -->|Event| B[Agent Starts Editing]
+ B -->|Lock| C[Modification Phase]
+ C -->|Validate I1-I3| D{Valid?}
+ D -->|Yes| E[Apply Changes]
+ D -->|No| F[Reject Changes]
+ E -->|Sync| G[Merge States]
+ F -->|Error| H[Log & Continue]
+ G -->|Release Lock| I[Orchestrator Sees Update]
+```
+
+**Properties:**
+
+1. **Isolation**: Modifications occur atomically within lock
+2. **Validation**: All changes checked against I1-I3 before commit
+3. **Rejection**: Invalid modifications are discarded with error logging
+4. **Consistency**: Orchestrator only sees valid constellation states
+
+During the lock period (modification + validation + sync), the orchestrator has a strongly consistent view of the constellation. Learn more about [safe assignment locking](safe_assignment_locking.md).
+
+## Formal Invariant Definitions
+
+### Mathematical Formulation
+
+Let C = (T, D) be a constellation with tasks T and dependencies D, where:
+- τ ∈ T is a task with status σ(τ) and device assignment δ(τ)
+- d = (t₁ → t₂) ∈ D is a dependency from task t₁ to task t₂
+
+**I1 (Single Assignment)**: Each task has at most one device assignment.
+
+**I2 (Acyclic Consistency)**: No cyclic paths exist in the dependency graph.
+
+**I3 (Valid Update)**: If σ(τ) ∈ {RUNNING, COMPLETED, FAILED} then τ is immutable.
+
+### State Transition Rules
+
+Valid state transitions preserve invariants:
+
+```mermaid
+stateDiagram-v2
+ [*] --> PENDING: Create task
+ PENDING --> WAITING_DEPENDENCY: Has dependencies
+ WAITING_DEPENDENCY --> PENDING: Dependencies satisfied
+ PENDING --> RUNNING: Execute (I1, I2 checked)
+ RUNNING --> COMPLETED: Success
+ RUNNING --> FAILED: Error
+ RUNNING --> CANCELLED: Cancel
+```
+
+**Note:** Tasks become immutable (I3) upon entering RUNNING state. PENDING and WAITING_DEPENDENCY tasks remain modifiable.
+
+## Error Handling
+
+### Invariant Violation Responses
+
+When invariants are violated, the orchestrator takes appropriate action:
+
+#### I1 Violation: Multiple Assignments
+
+```python
+# Detected during validation
+if not task.target_device_id:
+ errors.append(f"Task '{task_id}' has no device assignment")
+
+# Or during reassignment attempt
+if self._status == TaskStatus.RUNNING:
+ raise ValueError(
+ f"Cannot modify device assignment of running task {self._task_id}"
+ )
+```
+
+**Response**: Reject modification, log error, continue with existing assignment
+
+#### I2 Violation: Cycle Detected
+
+```python
+if self._would_create_cycle(dependency.from_task_id, dependency.to_task_id):
+ raise ValueError(
+ f"Adding dependency {dependency.from_task_id} -> {dependency.to_task_id} would create a cycle"
+ )
+```
+
+**Response**: Reject dependency addition, log error, constellation remains acyclic
+
+#### I3 Violation: Modifying Running Task
+
+```python
+if self._status == TaskStatus.RUNNING:
+ raise ValueError(f"Cannot modify name of running task {self._task_id}")
+```
+
+**Response**: Reject modification, log error, task properties unchanged
+
+### Graceful Degradation
+
+If the agent produces invalid modifications:
+
+```python
+try:
+ constellation.add_dependency(new_dependency)
+except ValueError as e:
+ self.logger.error(f"Invalid dependency rejected: {e}")
+ # Continue with existing constellation structure
+ # Don't block orchestration on agent errors
+```
+
+The orchestrator continues execution with the last valid constellation state.
+
+**Warning:** The orchestrator prioritizes safety (correctness) over liveness (progress). If the agent produces invalid modifications, orchestration may slow or stall, but will never execute an invalid DAG.
+
+## Performance Impact
+
+### Validation Overhead
+
+| Invariant | Check Complexity | Per-Operation Cost | When Checked |
+|-----------|-----------------|-------------------|--------------|
+| I1 | O(1) | < 1ms | Per task assignment |
+| I2 | O(V + E) (DFS) | 1-10ms | Per dependency add |
+| I3 | O(1) | < 1ms | Per task modification |
+
+Where V = number of tasks, E = number of dependencies.
+
+### Optimization Strategies
+
+**1. Lazy Validation**:
+```python
+# Only validate when needed
+def validate_dag(self) -> Tuple[bool, List[str]]:
+ # Cache validation results if DAG hasn't changed
+ if self._last_validation_time == self._updated_at:
+ return self._cached_validation
+```
+
+**2. Incremental Checking**:
+```python
+# Check only affected subgraph for cycles
+def _would_create_cycle(self, from_task_id: str, to_task_id: str) -> bool:
+ # Only traverse from to_task → from_task
+ # Don't re-check entire graph
+```
+
+**3. Batch Validation**:
+```python
+# Validate once after applying all modifications in a batch
+while queue not empty:
+ # Apply all modifications
+ pass
+validate(C) # Single validation for entire batch
+```
+
+Learn more about [batched editing strategies](batched_editing.md).
+
+## Testing Invariants
+
+### Unit Tests
+
+Each invariant has dedicated test coverage:
+
+```python
+def test_single_assignment_invariant():
+ """Test I1: Single Assignment."""
+ task = TaskStar(task_id="test_task")
+ task.target_device_id = "device_1"
+
+ # Assignment succeeds
+ assert task.target_device_id == "device_1"
+
+ # Reassignment before execution succeeds
+ task.target_device_id = "device_2"
+ assert task.target_device_id == "device_2"
+
+ # Start execution
+ task.start_execution()
+
+ # Reassignment after execution fails
+ with pytest.raises(ValueError):
+ task.target_device_id = "device_3"
+
+def test_acyclic_consistency_invariant():
+ """Test I2: Acyclic Consistency."""
+ constellation = TaskConstellation()
+ task_a = TaskStar(task_id="A")
+ task_b = TaskStar(task_id="B")
+ task_c = TaskStar(task_id="C")
+
+ constellation.add_task(task_a)
+ constellation.add_task(task_b)
+ constellation.add_task(task_c)
+
+ # Add A → B
+ dep_ab = TaskStarLine("dep1", "A", "B")
+ constellation.add_dependency(dep_ab)
+
+ # Add B → C
+ dep_bc = TaskStarLine("dep2", "B", "C")
+ constellation.add_dependency(dep_bc)
+
+ # Try to add C → A (creates cycle)
+ dep_ca = TaskStarLine("dep3", "C", "A")
+ with pytest.raises(ValueError, match="would create a cycle"):
+ constellation.add_dependency(dep_ca)
+
+def test_valid_update_invariant():
+ """Test I3: Valid Update."""
+ task = TaskStar(task_id="test_task", description="Original")
+
+ # Modification before execution succeeds
+ task.description = "Modified"
+ assert task.description == "Modified"
+
+ # Start execution
+ task.start_execution()
+
+ # Modification after execution fails
+ with pytest.raises(ValueError):
+ task.description = "Invalid modification"
+```
+
+### Integration Tests
+
+Test invariants during full orchestration:
+
+```python
+async def test_invariants_during_orchestration():
+ """Test that invariants hold during concurrent orchestration."""
+
+ # Create constellation with potential for violations
+ constellation = create_complex_constellation()
+
+ # Attach synchronizer and validators
+ orchestrator = TaskConstellationOrchestrator(device_manager)
+ synchronizer = ConstellationModificationSynchronizer(orchestrator)
+ orchestrator.set_modification_synchronizer(synchronizer)
+
+ # Run orchestration
+ results = await orchestrator.orchestrate_constellation(constellation)
+
+ # Verify invariants held throughout
+ assert results["status"] == "completed"
+
+ # Check I1: No duplicate assignments
+ assignments = {}
+ for task in constellation.get_all_tasks():
+ device = task.target_device_id
+ assert device not in assignments.values()
+ assignments[task.task_id] = device
+
+ # Check I2: No cycles
+ is_valid, errors = constellation.validate_dag()
+ assert is_valid
+
+ # Check I3: Terminal tasks are immutable
+ for task in constellation.get_completed_tasks() + constellation.get_failed_tasks():
+ with pytest.raises(ValueError):
+ task.description = "Should fail"
+```
+
+## Related Documentation
+
+- [Safe Assignment Locking](safe_assignment_locking.md) - How invariants are enforced during locking
+- [Asynchronous Scheduling](asynchronous_scheduling.md) - Concurrent execution preserving invariants
+- [Batched Editing](batched_editing.md) - Efficient modification batching while maintaining invariants
+- [API Reference](api_reference.md) - API methods for validation
diff --git a/documents/docs/galaxy/constellation_orchestrator/constellation_manager.md b/documents/docs/galaxy/constellation_orchestrator/constellation_manager.md
new file mode 100644
index 000000000..8c5b2a629
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/constellation_manager.md
@@ -0,0 +1,730 @@
+# Constellation Manager
+
+## Overview
+
+The `ConstellationManager` is a companion component to the `TaskConstellationOrchestrator` that handles device assignment, resource management, and constellation lifecycle tracking. While the orchestrator focuses on execution flow and coordination, the manager provides the infrastructure for device operations and state management.
+
+This separation of concerns follows the Single Responsibility Principle: orchestration logic remains independent of device management details.
+
+## Architecture
+
+```mermaid
+graph TB
+ O[TaskConstellationOrchestrator] -->|uses| CM[ConstellationManager]
+ CM -->|communicates| DM[ConstellationDeviceManager]
+ DM -->|manages| D1[Device 1]
+ DM -->|manages| D2[Device 2]
+ DM -->|manages| D3[Device N]
+
+ CM -->|tracks| MD[(Constellation Metadata)]
+ CM -->|validates| AS[Device Assignments]
+```
+
+Learn more about the [orchestrator architecture](overview.md) and [asynchronous scheduling](asynchronous_scheduling.md).
+
+## Core Responsibilities
+
+The ConstellationManager handles four primary responsibilities:
+
+### 1. Device Assignment
+
+Assigns tasks to appropriate devices using configurable strategies:
+
+| Strategy | Description | Use Case |
+|----------|-------------|----------|
+| Round Robin | Distributes tasks evenly across devices | Load balancing for homogeneous devices |
+| Capability Match | Matches task requirements to device capabilities | Heterogeneous device types (Windows, Android, iOS) |
+| Load Balance | Assigns to device with lowest current load | Dynamic workload distribution |
+
+### 2. Resource Management
+
+Tracks and manages constellation resources:
+
+- Device availability and status
+- Constellation registration and metadata
+- Device utilization statistics
+- Assignment validation
+
+### 3. Lifecycle Management
+
+Manages constellation lifecycle:
+
+- Registration when orchestration begins
+- Metadata tracking during execution
+- Unregistration after completion
+- Status querying
+
+### 4. Validation
+
+Validates device assignments against constraints:
+
+- All tasks have assigned devices
+- Assigned devices exist and are connected
+- Device capabilities match task requirements
+
+## Device Assignment Strategies
+
+### Round Robin
+
+Distributes tasks cyclically across available devices:
+
+```python
+async def _assign_round_robin(
+ self,
+ constellation: TaskConstellation,
+ available_devices: List[Dict[str, Any]],
+ preferences: Optional[Dict[str, str]] = None,
+) -> Dict[str, str]:
+ """Round robin device assignment strategy."""
+
+ assignments = {}
+ device_index = 0
+
+ for task_id, task in constellation.tasks.items():
+ # Check preferences first
+ if preferences and task_id in preferences:
+ preferred_device = preferences[task_id]
+ if any(d["device_id"] == preferred_device for d in available_devices):
+ assignments[task_id] = preferred_device
+ continue
+
+ # Round robin assignment
+ device = available_devices[device_index % len(available_devices)]
+ assignments[task_id] = device["device_id"]
+ device_index += 1
+
+ return assignments
+```
+
+**Characteristics:**
+
+- Fairness: Each device gets approximately equal number of tasks
+- Simplicity: No complex decision-making
+- Overhead: O(N) where N = number of tasks
+- Best for: Homogeneous devices with similar capabilities
+
+**Example**:
+```python
+# 3 devices, 7 tasks
+Task 1 → Device A
+Task 2 → Device B
+Task 3 → Device C
+Task 4 → Device A
+Task 5 → Device B
+Task 6 → Device C
+Task 7 → Device A
+```
+
+### Capability Match
+
+Matches tasks to devices based on device type and capabilities:
+
+```python
+async def _assign_capability_match(
+ self,
+ constellation: TaskConstellation,
+ available_devices: List[Dict[str, Any]],
+ preferences: Optional[Dict[str, str]] = None,
+) -> Dict[str, str]:
+ """Capability-based device assignment strategy."""
+
+ assignments = {}
+
+ for task_id, task in constellation.tasks.items():
+ # Check preferences first
+ if preferences and task_id in preferences:
+ preferred_device = preferences[task_id]
+ if any(d["device_id"] == preferred_device for d in available_devices):
+ assignments[task_id] = preferred_device
+ continue
+
+ # Find devices matching task requirements
+ matching_devices = []
+
+ if task.device_type:
+ matching_devices = [
+ d for d in available_devices
+ if d.get("device_type") == task.device_type.value
+ ]
+
+ # Fall back to any available device if no matches
+ if not matching_devices:
+ matching_devices = available_devices
+
+ # Choose first matching device
+ if matching_devices:
+ assignments[task_id] = matching_devices[0]["device_id"]
+
+ return assignments
+```
+
+**Characteristics:**
+
+- Type-aware: Respects task's `device_type` requirement
+- Fallback: Uses any device if no type match found
+- Overhead: O(N × D) where N = tasks, D = devices
+- Best for: Heterogeneous device ecosystems
+
+**Example**:
+```python
+# Mixed device types
+Task A (requires Windows) → Windows Device 1
+Task B (requires Android) → Android Device 1
+Task C (requires Windows) → Windows Device 1
+Task D (no requirement) → Any available device
+```
+
+### Load Balance
+
+Assigns tasks to minimize device load:
+
+```python
+async def _assign_load_balance(
+ self,
+ constellation: TaskConstellation,
+ available_devices: List[Dict[str, Any]],
+ preferences: Optional[Dict[str, str]] = None,
+) -> Dict[str, str]:
+ """Load-balanced device assignment strategy."""
+
+ assignments = {}
+ device_load = {d["device_id"]: 0 for d in available_devices}
+
+ for task_id, task in constellation.tasks.items():
+ # Check preferences first
+ if preferences and task_id in preferences:
+ preferred_device = preferences[task_id]
+ if any(d["device_id"] == preferred_device for d in available_devices):
+ assignments[task_id] = preferred_device
+ device_load[preferred_device] += 1
+ continue
+
+ # Find device with lowest load
+ min_load_device = min(device_load.keys(), key=lambda d: device_load[d])
+ assignments[task_id] = min_load_device
+ device_load[min_load_device] += 1
+
+ return assignments
+```
+
+**Characteristics:**
+
+- Balanced: Minimizes maximum device load
+- Dynamic: Adapts to varying task counts
+- Overhead: O(N × log D) with priority queue optimization
+- Best for: Constellations with varying task complexity
+
+**Example**:
+```python
+# 2 devices, 5 tasks with varying complexity
+Task 1 (simple) → Device A [load: 1]
+Task 2 (complex) → Device B [load: 1]
+Task 3 (simple) → Device A [load: 2]
+Task 4 (simple) → Device B [load: 2]
+Task 5 (complex) → Device A [load: 3]
+```
+
+## Constellation Lifecycle Management
+
+### Registration
+
+Register a constellation for management:
+
+```python
+def register_constellation(
+ self,
+ constellation: TaskConstellation,
+ metadata: Optional[Dict[str, Any]] = None,
+) -> str:
+ """Register a constellation for management."""
+
+ constellation_id = constellation.constellation_id
+ self._managed_constellations[constellation_id] = constellation
+ self._constellation_metadata[constellation_id] = metadata or {}
+
+ if self._logger:
+ self._logger.info(
+ f"Registered constellation '{constellation.name}' ({constellation_id})"
+ )
+
+ return constellation_id
+```
+
+**Purpose**: Track active constellations and their metadata
+
+**Metadata examples**:
+```python
+metadata = {
+ "user_id": "user123",
+ "session_id": "session_456",
+ "priority": "high",
+ "created_by": "automation_pipeline",
+}
+```
+
+### Status Querying
+
+Get detailed status of a managed constellation:
+
+```python
+async def get_constellation_status(
+ self, constellation_id: str
+) -> Optional[Dict[str, Any]]:
+ """Get detailed status of a managed constellation."""
+
+ constellation = self._managed_constellations.get(constellation_id)
+ if not constellation:
+ return None
+
+ metadata = self._constellation_metadata.get(constellation_id, {})
+
+ return {
+ "constellation_id": constellation_id,
+ "name": constellation.name,
+ "state": constellation.state.value,
+ "statistics": constellation.get_statistics(),
+ "ready_tasks": [task.task_id for task in constellation.get_ready_tasks()],
+ "running_tasks": [task.task_id for task in constellation.get_running_tasks()],
+ "completed_tasks": [task.task_id for task in constellation.get_completed_tasks()],
+ "failed_tasks": [task.task_id for task in constellation.get_failed_tasks()],
+ "metadata": metadata,
+ }
+```
+
+**Returns**:
+```json
+{
+ "constellation_id": "constellation_20251106_143052_a1b2c3d4",
+ "name": "Multi-Device Data Collection",
+ "state": "executing",
+ "statistics": {
+ "total_tasks": 10,
+ "task_status_counts": {
+ "completed": 3,
+ "running": 2,
+ "pending": 5
+ },
+ "parallelism_ratio": 2.5
+ },
+ "ready_tasks": ["task_6", "task_7"],
+ "running_tasks": ["task_4", "task_5"],
+ "completed_tasks": ["task_1", "task_2", "task_3"],
+ "failed_tasks": [],
+ "metadata": {
+ "user_id": "user123",
+ "priority": "high"
+ }
+}
+```
+
+### Unregistration
+
+Remove a constellation from management:
+
+```python
+def unregister_constellation(self, constellation_id: str) -> bool:
+ """Unregister a constellation from management."""
+
+ if constellation_id in self._managed_constellations:
+ constellation = self._managed_constellations[constellation_id]
+ del self._managed_constellations[constellation_id]
+ del self._constellation_metadata[constellation_id]
+
+ if self._logger:
+ self._logger.info(
+ f"Unregistered constellation '{constellation.name}' ({constellation_id})"
+ )
+ return True
+
+ return False
+```
+
+**Purpose**: Clean up resources after orchestration completes
+
+## Device Operations
+
+### Getting Available Devices
+
+Retrieve list of connected devices:
+
+```python
+async def get_available_devices(self) -> List[Dict[str, Any]]:
+ """Get list of available devices from device manager."""
+
+ if not self._device_manager:
+ return []
+
+ try:
+ connected_device_ids = self._device_manager.get_connected_devices()
+ devices = []
+
+ for device_id in connected_device_ids:
+ device_info = self._device_manager.device_registry.get_device_info(
+ device_id
+ )
+ if device_info:
+ devices.append({
+ "device_id": device_id,
+ "device_type": getattr(device_info, "device_type", "unknown"),
+ "capabilities": getattr(device_info, "capabilities", []),
+ "status": "connected",
+ "metadata": getattr(device_info, "metadata", {}),
+ })
+
+ return devices
+ except Exception as e:
+ if self._logger:
+ self._logger.error(f"Failed to get available devices: {e}")
+ return []
+```
+
+**Returns**:
+```python
+[
+ {
+ "device_id": "windows_main",
+ "device_type": "windows",
+ "capabilities": ["file_ops", "browser", "office"],
+ "status": "connected",
+ "metadata": {"os_version": "Windows 11"}
+ },
+ {
+ "device_id": "android_pixel",
+ "device_type": "android",
+ "capabilities": ["touch", "camera", "gps"],
+ "status": "connected",
+ "metadata": {"android_version": "14"}
+ }
+]
+```
+
+### Device Assignment
+
+Automatically assign devices to all tasks:
+
+```python
+async def assign_devices_automatically(
+ self,
+ constellation: TaskConstellation,
+ strategy: str = "round_robin",
+ device_preferences: Optional[Dict[str, str]] = None,
+) -> Dict[str, str]:
+ """Automatically assign devices to tasks in a constellation."""
+
+ if not self._device_manager:
+ raise ValueError("Device manager not available for device assignment")
+
+ available_devices = await self._get_available_devices()
+ if not available_devices:
+ raise ValueError("No available devices for assignment")
+
+ if self._logger:
+ self._logger.info(
+ f"Assigning devices to constellation '{constellation.name}' "
+ f"using strategy '{strategy}'"
+ )
+
+ # Select strategy
+ if strategy == "round_robin":
+ assignments = await self._assign_round_robin(
+ constellation, available_devices, device_preferences
+ )
+ elif strategy == "capability_match":
+ assignments = await self._assign_capability_match(
+ constellation, available_devices, device_preferences
+ )
+ elif strategy == "load_balance":
+ assignments = await self._assign_load_balance(
+ constellation, available_devices, device_preferences
+ )
+ else:
+ raise ValueError(f"Unknown assignment strategy: {strategy}")
+
+ # Apply assignments to tasks
+ for task_id, device_id in assignments.items():
+ task = constellation.get_task(task_id)
+ if task:
+ task.target_device_id = device_id
+
+ if self._logger:
+ self._logger.info(f"Assigned {len(assignments)} tasks to devices")
+
+ return assignments
+```
+
+### Manual Reassignment
+
+Reassign a single task to a different device:
+
+```python
+def reassign_task_device(
+ self,
+ constellation: TaskConstellation,
+ task_id: str,
+ new_device_id: str,
+) -> bool:
+ """Reassign a task to a different device."""
+
+ task = constellation.get_task(task_id)
+ if not task:
+ return False
+
+ old_device_id = task.target_device_id
+ task.target_device_id = new_device_id
+
+ if self._logger:
+ self._logger.info(
+ f"Reassigned task '{task_id}' from device '{old_device_id}' "
+ f"to '{new_device_id}'"
+ )
+
+ return True
+```
+
+## Validation
+
+### Assignment Validation
+
+Validate that all tasks have valid device assignments:
+
+```python
+def validate_constellation_assignments(
+ self, constellation: TaskConstellation
+) -> tuple[bool, List[str]]:
+ """Validate that all tasks have valid device assignments."""
+
+ errors = []
+
+ for task_id, task in constellation.tasks.items():
+ if not task.target_device_id:
+ errors.append(f"Task '{task_id}' has no device assignment")
+
+ is_valid = len(errors) == 0
+
+ if self._logger:
+ if is_valid:
+ self._logger.info(
+ f"All tasks in constellation '{constellation.name}' have "
+ f"valid assignments"
+ )
+ else:
+ self._logger.warning(
+ f"Constellation '{constellation.name}' has {len(errors)} "
+ f"assignment errors"
+ )
+
+ return is_valid, errors
+```
+
+### Device Information
+
+Get device information for a specific task:
+
+```python
+def get_task_device_info(
+ self, constellation: TaskConstellation, task_id: str
+) -> Optional[Dict[str, Any]]:
+ """Get device information for a specific task."""
+
+ task = constellation.get_task(task_id)
+ if not task or not task.target_device_id:
+ return None
+
+ # Get device info from device manager
+ if self._device_manager:
+ try:
+ device_info = self._device_manager.device_registry.get_device_info(
+ task.target_device_id
+ )
+ if device_info:
+ return {
+ "device_id": task.target_device_id,
+ "device_type": getattr(device_info, "device_type", "unknown"),
+ "capabilities": getattr(device_info, "capabilities", []),
+ "metadata": getattr(device_info, "metadata", {}),
+ }
+ except Exception as e:
+ if self._logger:
+ self._logger.error(
+ f"Failed to get device info for task '{task_id}': {e}"
+ )
+
+ return None
+```
+
+## Utilization Tracking
+
+### Device Utilization Statistics
+
+Get device utilization across constellation:
+
+```python
+def get_device_utilization(
+ self, constellation: TaskConstellation
+) -> Dict[str, int]:
+ """Get device utilization statistics for a constellation."""
+
+ utilization = {}
+
+ for task in constellation.tasks.values():
+ if task.target_device_id:
+ utilization[task.target_device_id] = (
+ utilization.get(task.target_device_id, 0) + 1
+ )
+
+ return utilization
+```
+
+**Example output**:
+```python
+{
+ "windows_main": 5,
+ "android_pixel": 3,
+ "ios_iphone": 2
+}
+```
+
+### Listing All Constellations
+
+List all managed constellations:
+
+```python
+def list_constellations(self) -> List[Dict[str, Any]]:
+ """List all managed constellations with basic information."""
+
+ result = []
+ for constellation_id, constellation in self._managed_constellations.items():
+ metadata = self._constellation_metadata.get(constellation_id, {})
+ result.append({
+ "constellation_id": constellation_id,
+ "name": constellation.name,
+ "state": constellation.state.value,
+ "task_count": constellation.task_count,
+ "dependency_count": constellation.dependency_count,
+ "metadata": metadata,
+ })
+
+ return result
+```
+
+## Usage Patterns
+
+### Basic Setup
+
+```python
+from galaxy.constellation.orchestrator import ConstellationManager
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Create device manager
+device_manager = ConstellationDeviceManager()
+
+# Create constellation manager
+manager = ConstellationManager(device_manager, enable_logging=True)
+
+# Register constellation
+constellation_id = manager.register_constellation(
+ constellation,
+ metadata={"priority": "high"}
+)
+```
+
+### Automatic Assignment
+
+```python
+# Assign devices using capability matching
+assignments = await manager.assign_devices_automatically(
+ constellation,
+ strategy="capability_match"
+)
+
+print(f"Assigned {len(assignments)} tasks")
+```
+
+### With Preferences
+
+```python
+# Specify preferred devices for specific tasks
+preferences = {
+ "critical_task_1": "windows_main",
+ "gpu_task_2": "windows_gpu",
+}
+
+assignments = await manager.assign_devices_automatically(
+ constellation,
+ strategy="load_balance",
+ device_preferences=preferences
+)
+```
+
+### Manual Override
+
+```python
+# Reassign specific task
+manager.reassign_task_device(
+ constellation,
+ task_id="task_5",
+ new_device_id="android_backup"
+)
+```
+
+### Validation
+
+```python
+# Validate assignments before orchestration
+is_valid, errors = manager.validate_constellation_assignments(constellation)
+
+if not is_valid:
+ print(f"Validation errors: {errors}")
+ # Fix assignments...
+```
+
+### Monitoring
+
+```python
+# Check constellation status during execution
+status = await manager.get_constellation_status(constellation_id)
+
+print(f"State: {status['state']}")
+print(f"Running tasks: {len(status['running_tasks'])}")
+print(f"Completed tasks: {len(status['completed_tasks'])}")
+
+# Get device utilization
+utilization = manager.get_device_utilization(constellation)
+for device_id, task_count in utilization.items():
+ print(f"{device_id}: {task_count} tasks")
+```
+
+## Integration with Orchestrator
+
+The orchestrator uses the manager internally:
+
+```python
+class TaskConstellationOrchestrator:
+ def __init__(self, device_manager, enable_logging=True):
+ self._device_manager = device_manager
+ self._constellation_manager = ConstellationManager(
+ device_manager, enable_logging
+ )
+
+ async def orchestrate_constellation(self, constellation, ...):
+ # Use manager for assignment
+ await self._constellation_manager.assign_devices_automatically(
+ constellation, assignment_strategy
+ )
+
+ # Use manager for validation
+ is_valid, errors = self._constellation_manager \
+ .validate_constellation_assignments(constellation)
+
+ if not is_valid:
+ raise ValueError(f"Device assignment validation failed: {errors}")
+
+ # Continue orchestration...
+```
+
+## Related Documentation
+
+- [Overview](overview.md) - Orchestrator architecture and design
+- [Asynchronous Scheduling](asynchronous_scheduling.md) - Task execution model
+- [Consistency Guarantees](consistency_guarantees.md) - Device assignment validation
+- [API Reference](api_reference.md) - Complete API documentation
diff --git a/documents/docs/galaxy/constellation_orchestrator/event_driven_coordination.md b/documents/docs/galaxy/constellation_orchestrator/event_driven_coordination.md
new file mode 100644
index 000000000..23b18a33b
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/event_driven_coordination.md
@@ -0,0 +1,491 @@
+# Event-Driven Coordination
+
+## Overview
+
+Traditional DAG schedulers rely on **polling** or **global checkpoints** to detect task completion, introducing latency and synchronization overhead. In contrast, the Constellation Orchestrator operates as a fully **event-driven** system built on an internal event bus and observer design pattern.
+
+This architecture enables immediate, fine-grained reactions to runtime signals without centralized coordination delays, providing the foundation for adaptive orchestration in UFO. For an overview of how events drive the orchestrator, see the [Orchestrator Overview](overview.md).
+
+## Event System Architecture
+
+The event-driven coordination system consists of three core components:
+
+```mermaid
+graph LR
+ A[Event Publishers] -->|publish| B[Event Bus]
+ B -->|notify| C[Event Observers]
+
+ D[Orchestrator] -.->|implements| A
+ D -.->|implements| C
+
+ E[Synchronizer] -.->|implements| C
+ F[Agent] -.->|implements| A
+
+ style B fill:#4a90e2,stroke:#333,stroke-width:3px,color:#fff
+ style D fill:#ffa726,stroke:#333,stroke-width:2px
+ style E fill:#66bb6a,stroke:#333,stroke-width:2px
+ style F fill:#ab47bc,stroke:#333,stroke-width:2px
+```
+
+### Event Bus
+
+The `EventBus` class serves as the central message broker, managing subscriptions and distributing events throughout the system.
+
+**Key Features:**
+
+- **Type-based subscription**: Observers subscribe to specific event types
+- **Wildcard subscription**: Observers can subscribe to all events
+- **Concurrent notification**: All observers are notified asynchronously in parallel
+- **Error isolation**: Exceptions in one observer don't affect others
+
+**Implementation** (`galaxy/core/events.py`):
+
+```python
+class EventBus(IEventPublisher):
+ """Central event bus for Galaxy framework."""
+
+ def subscribe(self, observer: IEventObserver,
+ event_types: Set[EventType] = None) -> None:
+ """Subscribe observer to specific events or all events."""
+ if event_types is None:
+ self._all_observers.add(observer)
+ else:
+ for event_type in event_types:
+ if event_type not in self._observers:
+ self._observers[event_type] = set()
+ self._observers[event_type].add(observer)
+
+ async def publish_event(self, event: Event) -> None:
+ """Publish event to all relevant subscribers."""
+ observers_to_notify = set()
+
+ # Add type-specific observers
+ if event.event_type in self._observers:
+ observers_to_notify.update(self._observers[event.event_type])
+
+ # Add wildcard observers
+ observers_to_notify.update(self._all_observers)
+
+ # Notify all concurrently
+ if observers_to_notify:
+ tasks = [observer.on_event(event) for observer in observers_to_notify]
+ await asyncio.gather(*tasks, return_exceptions=True)
+```
+
+!!!tip "Design Pattern"
+ The event bus implements the **Observer** (or Publish-Subscribe) pattern, decoupling event producers from consumers and enabling extensible system behavior.
+
+## Event Types
+
+The orchestrator uses four primary event types that capture the complete lifecycle of tasks and constellations:
+
+### Task-Level Events
+
+These events track individual task state transitions during execution:
+
+| Event Type | Trigger | Published By | Data Payload |
+|------------|---------|--------------|--------------|
+| `TASK_STARTED` | Task assigned to device and execution begins | Orchestrator | `task_id`, `status`, `constellation_id` |
+| `TASK_COMPLETED` | Task finishes successfully | Orchestrator | `task_id`, `status`, `result`, `newly_ready_tasks`, `constellation` |
+| `TASK_FAILED` | Task execution fails | Orchestrator | `task_id`, `status`, `error`, `newly_ready_tasks` |
+
+**Event Structure:**
+
+```python
+@dataclass
+class TaskEvent(Event):
+ """Task-specific event."""
+ task_id: str
+ status: str
+ result: Any = None
+ error: Optional[Exception] = None
+```
+
+### Constellation-Level Events
+
+These events track macro-level constellation lifecycle and structural changes:
+
+| Event Type | Trigger | Published By | Data Payload |
+|------------|---------|--------------|--------------|
+| `CONSTELLATION_STARTED` | Orchestration begins | Orchestrator | `total_tasks`, `assignment_strategy`, `constellation` |
+| `CONSTELLATION_COMPLETED` | All tasks finished | Orchestrator | `total_tasks`, `statistics`, `execution_duration` |
+| `CONSTELLATION_MODIFIED` | DAG structure updated by agent | Agent | `on_task_id`, `new_constellation`, `modifications` |
+
+**Event Structure:**
+
+```python
+@dataclass
+class ConstellationEvent(Event):
+ """Constellation-specific event."""
+ constellation_id: str
+ constellation_state: str
+ new_ready_tasks: List[str] = None
+```
+
+All events inherit from the base `Event` class which provides common fields: `event_type`, `source_id`, `timestamp`, and `data`.
+
+## Observer Pattern Implementation
+
+The orchestrator and related components implement the `IEventObserver` interface to react to events:
+
+```python
+class IEventObserver(ABC):
+ """Interface for event observers."""
+
+ @abstractmethod
+ async def on_event(self, event: Event) -> None:
+ """Handle an event."""
+ pass
+```
+
+### Key Observers in the System
+
+#### 1. ConstellationModificationSynchronizer
+
+Ensures proper synchronization between task completion and constellation modifications:
+
+```python
+class ConstellationModificationSynchronizer(IEventObserver):
+ """Synchronizes constellation modifications with orchestrator execution."""
+
+ async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ await self._handle_task_event(event)
+ elif isinstance(event, ConstellationEvent):
+ await self._handle_constellation_event(event)
+```
+
+**Responsibilities:**
+
+- Register pending modifications when tasks complete
+- Mark modifications as complete when agent finishes editing
+- Provide synchronization point for orchestrator
+
+[Learn more →](safe_assignment_locking.md#modification-synchronizer)
+
+#### 2. Visualization Observers
+
+Handle real-time visualization updates as constellation evolves:
+
+- `DAGVisualizationObserver` - Updates DAG topology visualization
+- `TaskVisualizationHandler` - Updates task status displays
+- `ConstellationVisualizationHandler` - Updates overall constellation state
+
+!!!example "Observer Subscription"
+ ```python
+ # Subscribe synchronizer to task and constellation events
+ event_bus = get_event_bus()
+ synchronizer = ConstellationModificationSynchronizer(orchestrator)
+ event_bus.subscribe(synchronizer, {
+ EventType.TASK_COMPLETED,
+ EventType.TASK_FAILED,
+ EventType.CONSTELLATION_MODIFIED
+ })
+ ```
+
+## Event Flow in Orchestration
+
+The following sequence diagram illustrates how events flow through the system during orchestration:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as Event Bus
+ participant S as Synchronizer
+ participant V as Visualizer
+ participant A as Agent
+
+ Note over O: Task A execution starts
+ O->>EB: Publish TASK_STARTED(A)
+ EB-->>V: Notify observers
+ V->>V: Update visualization
+
+ Note over O: Task A completes
+ O->>EB: Publish TASK_COMPLETED(A)
+ EB-->>S: Notify observers
+ EB-->>V: Notify observers
+
+ S->>S: Register pending modification
+ V->>V: Update task status
+
+ Note over A: Agent processes completion
+ A->>A: Edit constellation
+ A->>EB: Publish CONSTELLATION_MODIFIED
+
+ EB-->>S: Notify observers
+ EB-->>V: Notify observers
+
+ S->>S: Complete modification future
+ S->>O: Sync constellation state
+ V->>V: Update constellation view
+```
+
+This flow demonstrates several key aspects:
+
+1. **Immediate notification**: Events are published as soon as state changes occur
+2. **Parallel processing**: Multiple observers react concurrently
+3. **Decoupled components**: Publishers don't know about subscribers
+4. **Asynchronous coordination**: No blocking waits or polling
+
+## Event Publishing in Orchestrator
+
+The orchestrator publishes events at critical execution points:
+
+### Task Execution Events
+
+When executing a task, the orchestrator wraps execution in event publishing:
+
+```python
+async def _execute_task_with_events(
+ self, task: TaskStar, constellation: TaskConstellation
+) -> None:
+ """Execute a single task and publish events."""
+
+ try:
+ # Publish task started event
+ start_event = TaskEvent(
+ event_type=EventType.TASK_STARTED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={"constellation_id": constellation.constellation_id},
+ task_id=task.task_id,
+ status=TaskStatus.RUNNING.value,
+ )
+ await self._event_bus.publish_event(start_event)
+
+ # Execute task
+ task.start_execution()
+ result = await task.execute(self._device_manager)
+
+ is_success = result.status == TaskStatus.COMPLETED.value
+
+ # Mark as completed and get newly ready tasks
+ newly_ready = constellation.mark_task_completed(
+ task.task_id, success=is_success, result=result
+ )
+
+ # Publish task completed or failed event
+ completed_event = TaskEvent(
+ event_type=(
+ EventType.TASK_COMPLETED if is_success
+ else EventType.TASK_FAILED
+ ),
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "constellation_id": constellation.constellation_id,
+ "newly_ready_tasks": [t.task_id for t in newly_ready],
+ "constellation": constellation,
+ },
+ task_id=task.task_id,
+ status=result.status,
+ result=result,
+ )
+ await self._event_bus.publish_event(completed_event)
+
+ except Exception as e:
+ # Mark task as failed and get newly ready tasks
+ newly_ready = constellation.mark_task_completed(
+ task.task_id, success=False, error=e
+ )
+
+ # Publish task failed event
+ failed_event = TaskEvent(
+ event_type=EventType.TASK_FAILED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "constellation_id": constellation.constellation_id,
+ "newly_ready_tasks": [t.task_id for t in newly_ready],
+ },
+ task_id=task.task_id,
+ status=TaskStatus.FAILED.value,
+ error=e,
+ )
+ await self._event_bus.publish_event(failed_event)
+ raise
+```
+
+!!!warning "Critical Section"
+ Event publishing happens **immediately** after state transitions but **before** any dependent operations, ensuring observers have the latest state.
+
+### Constellation Lifecycle Events
+
+The orchestrator also publishes constellation-level events:
+
+```python
+# At orchestration start
+start_event = ConstellationEvent(
+ event_type=EventType.CONSTELLATION_STARTED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "total_tasks": len(constellation.tasks),
+ "assignment_strategy": assignment_strategy,
+ "constellation": constellation,
+ },
+ constellation_id=constellation.constellation_id,
+ constellation_state="executing",
+)
+await self._event_bus.publish_event(start_event)
+
+# At orchestration completion
+completion_event = ConstellationEvent(
+ event_type=EventType.CONSTELLATION_COMPLETED,
+ source_id=f"orchestrator_{id(self)}",
+ timestamp=time.time(),
+ data={
+ "total_tasks": len(constellation.tasks),
+ "statistics": constellation.get_statistics(),
+ "execution_duration": time.time() - start_event.timestamp,
+ },
+ constellation_id=constellation.constellation_id,
+ constellation_state="completed",
+)
+await self._event_bus.publish_event(completion_event)
+```
+
+## Benefits of Event-Driven Architecture
+
+The event-driven design provides several critical advantages:
+
+### 1. High Responsiveness
+
+Events are processed **immediately** upon publication with no polling delay:
+
+- Task completion → Agent notified instantly
+- Constellation modified → Orchestrator syncs immediately
+- Failure detected → Recovery triggered without delay
+
+### 2. Loose Coupling
+
+Components interact through events rather than direct calls:
+
+- Orchestrator doesn't know about visualization
+- Agent doesn't know about synchronizer
+- New observers can be added without modifying publishers
+
+### 3. Extensibility
+
+New functionality can be added by creating new observers:
+
+```python
+class MetricsCollector(IEventObserver):
+ """Collect orchestration metrics."""
+
+ async def on_event(self, event: Event) -> None:
+ if event.event_type == EventType.TASK_COMPLETED:
+ self._record_task_completion(event)
+ elif event.event_type == EventType.CONSTELLATION_COMPLETED:
+ self._record_constellation_metrics(event)
+
+# Subscribe to event bus
+event_bus.subscribe(MetricsCollector())
+```
+
+### 4. Concurrent Processing
+
+Multiple observers process events in parallel:
+
+- Visualization updates don't block synchronization
+- Logging doesn't delay task scheduling
+- Metrics collection happens asynchronously
+
+### 5. Error Isolation
+
+Exceptions in one observer don't affect others:
+
+```python
+# In EventBus.publish_event()
+await asyncio.gather(*tasks, return_exceptions=True)
+```
+
+If a visualization observer crashes, the synchronizer still processes the event correctly.
+
+## Performance Characteristics
+
+| Aspect | Measurement | Impact |
+|--------|-------------|--------|
+| **Event Latency** | < 1ms (in-memory) | Negligible overhead |
+| **Notification Overhead** | O(N) where N = observers | Scales linearly with observers |
+| **Concurrency** | Unlimited parallel observers | No bottleneck from sequential processing |
+| **Memory** | Event objects garbage collected | No long-term accumulation |
+
+The event system has been battle-tested in production with up to 50+ concurrent observers, 1000+ events per second, complex multi-device constellations, and long-running orchestration sessions.
+
+## Usage Patterns
+
+### Creating Custom Observers
+
+To create a custom observer for orchestration events:
+
+```python
+from galaxy.core.events import IEventObserver, Event, EventType
+
+class CustomOrchestrationObserver(IEventObserver):
+ """Custom observer for orchestration events."""
+
+ def __init__(self):
+ self.task_count = 0
+ self.completion_times = []
+
+ async def on_event(self, event: Event) -> None:
+ """Handle events of interest."""
+
+ if event.event_type == EventType.TASK_COMPLETED:
+ self.task_count += 1
+ duration = event.data.get("result").end_time - \
+ event.data.get("result").start_time
+ self.completion_times.append(duration.total_seconds())
+
+ print(f"Task {event.task_id} completed in "
+ f"{duration.total_seconds():.2f}s")
+
+ elif event.event_type == EventType.CONSTELLATION_COMPLETED:
+ avg_time = sum(self.completion_times) / len(self.completion_times)
+ print(f"Constellation completed! "
+ f"Average task time: {avg_time:.2f}s")
+
+# Register observer
+from galaxy.core.events import get_event_bus
+
+observer = CustomOrchestrationObserver()
+event_bus = get_event_bus()
+event_bus.subscribe(observer, {
+ EventType.TASK_COMPLETED,
+ EventType.CONSTELLATION_COMPLETED
+})
+```
+
+### Event Filtering
+
+Observers can filter events based on custom criteria:
+
+```python
+class FailureMonitor(IEventObserver):
+ """Monitor and log only failure events."""
+
+ async def on_event(self, event: Event) -> None:
+ # Only process failure events
+ if event.event_type != EventType.TASK_FAILED:
+ return
+
+ # Log failure details
+ self.logger.error(
+ f"Task {event.task_id} failed: {event.error}"
+ )
+
+ # Optionally trigger alerts or recovery
+ await self._handle_task_failure(event)
+```
+
+## Related Documentation
+
+- **[Asynchronous Scheduling](asynchronous_scheduling.md)** - How events trigger task scheduling
+- **[Safe Assignment Locking](safe_assignment_locking.md)** - Event-driven synchronization
+- **[API Reference](api_reference.md)** - Event classes and interfaces
+
+---
+
+!!!tip "Next Steps"
+ To understand how events drive concurrent task execution, continue to [Asynchronous Scheduling](asynchronous_scheduling.md).
diff --git a/documents/docs/galaxy/constellation_orchestrator/overview.md b/documents/docs/galaxy/constellation_orchestrator/overview.md
new file mode 100644
index 000000000..95c61a91d
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/overview.md
@@ -0,0 +1,243 @@
+# Constellation Orchestrator Overview
+
+## Introduction
+
+The **Constellation Orchestrator** is the execution engine at the heart of UFO's multi-device orchestration system. While the Constellation Agent handles reasoning and task graph evolution, the orchestrator transforms these declarative plans into concrete execution across heterogeneous devices.
+
+Unlike traditional DAG schedulers that execute static task graphs, the Constellation Orchestrator operates as a **living execution fabric** where tasks evolve concurrently, react to runtime signals, and adapt to new decisions from the reasoning agent in real-time.
+
+
+
+*The Constellation Orchestrator bridges TaskConstellation and execution, enabling asynchronous, adaptive task orchestration across devices.*
+
+## Key Capabilities
+
+The orchestrator achieves three critical goals that traditional schedulers struggle to balance:
+
+| Capability | Description | Benefit |
+|------------|-------------|---------|
+| **Asynchronous Parallelism** | Execute independent tasks concurrently across heterogeneous devices | Maximize device utilization and minimize idle time |
+| **Safety & Consistency** | Maintain correctness under concurrent DAG updates from LLM reasoning | Prevent race conditions and invalid execution states |
+| **Runtime Adaptivity** | React to feedback from both devices and LLM reasoning dynamically | Enable intelligent re-planning and error recovery |
+
+## Architecture Overview
+
+The Constellation Orchestrator is built on five fundamental design pillars:
+
+```mermaid
+graph TB
+ A[Event-Driven Coordination] --> B[Orchestrator Core]
+ C[Asynchronous Scheduling] --> B
+ D[Safe Assignment Locking] --> B
+ E[Consistency Enforcement] --> B
+ F[Batched Editing] --> B
+
+ B --> G[Device Execution]
+ B --> H[Constellation Evolution]
+
+ style B fill:#4a90e2,stroke:#333,stroke-width:3px,color:#fff
+ style A fill:#7cb342,stroke:#333,stroke-width:2px
+ style C fill:#7cb342,stroke:#333,stroke-width:2px
+ style D fill:#7cb342,stroke:#333,stroke-width:2px
+ style E fill:#7cb342,stroke:#333,stroke-width:2px
+ style F fill:#7cb342,stroke:#333,stroke-width:2px
+```
+
+### 1. Event-Driven Coordination
+
+The orchestrator operates as a fully event-driven system using an observer pattern and internal event bus. Instead of polling or global checkpoints, it reacts immediately to four primary event types:
+
+- `TASK_STARTED` - Task assigned and execution begins
+- `TASK_COMPLETED` - Task finishes successfully
+- `TASK_FAILED` - Task execution fails
+- `CONSTELLATION_MODIFIED` - DAG structure updated by agent
+
+This design provides **high responsiveness** and eliminates synchronization overhead.
+
+[Learn more →](event_driven_coordination.md)
+
+### 2. Asynchronous Scheduling
+
+A continuous scheduling loop monitors the evolving TaskConstellation, identifies ready tasks (dependencies satisfied), and dispatches them concurrently to available devices. Critically, **task execution and constellation editing proceed in parallel**, overlapping computation with orchestration.
+
+This enables **maximum parallelism** and **minimal latency** in cross-device workflows.
+
+[Learn more →](asynchronous_scheduling.md)
+
+### 3. Safe Assignment Locking
+
+To prevent race conditions when LLM-driven edits overlap with task execution, the orchestrator employs a safe assignment lock protocol. During edit cycles, new task assignments are suspended while modifications are applied atomically and synchronized with runtime progress.
+
+This guarantees **atomicity** and **prevents conflicts** between execution and modification.
+
+[Learn more →](safe_assignment_locking.md)
+
+### 4. Consistency Enforcement
+
+The orchestrator enforces three runtime invariants to preserve correctness even under partial or invalid LLM updates:
+
+- **I1 (Single Assignment)**: Each task has at most one active device assignment
+- **I2 (Acyclic Consistency)**: Edits preserve DAG acyclicity (no cycles)
+- **I3 (Valid Update)**: Only PENDING tasks and their dependents can be modified
+
+These invariants ensure **structural integrity** and **semantic validity** of the constellation.
+
+[Learn more →](consistency_guarantees.md)
+
+### 5. Batched Constellation Editing
+
+To balance responsiveness with efficiency, the orchestrator batches multiple task completion events and applies their resulting modifications atomically. This amortizes LLM invocation overhead while preserving atomicity and consistency.
+
+This achieves both **efficiency** and **adaptivity** without excessive micro-edits.
+
+[Learn more →](batched_editing.md)
+
+## System Components
+
+The orchestrator consists of two primary components working in tandem:
+
+### TaskConstellationOrchestrator
+
+The main execution orchestrator focused on flow control and coordination. It manages:
+
+- Event-driven task lifecycle (start, complete, fail)
+- Asynchronous scheduling loop
+- Safe assignment locking protocol
+- Integration with modification synchronizer
+
+[API Reference →](api_reference.md)
+
+### ConstellationManager
+
+Handles device assignment, resource management, and constellation lifecycle. It provides:
+
+- Multiple assignment strategies (round-robin, capability-match, load-balance)
+- Device validation and status tracking
+- Constellation registration and metadata management
+
+[Learn more →](constellation_manager.md)
+
+## Execution Flow
+
+The orchestration workflow follows this sequence:
+
+```mermaid
+sequenceDiagram
+ participant U as User Request
+ participant O as Orchestrator
+ participant C as Constellation
+ participant S as Synchronizer
+ participant D as Devices
+ participant A as Agent
+
+ U->>O: orchestrate_constellation()
+ O->>C: Validate DAG
+ O->>C: Assign devices
+
+ loop Execution Loop
+ O->>C: Get ready tasks
+ O->>D: Dispatch tasks (async)
+ D-->>O: Task completed event
+ O->>S: Wait for modifications
+ S->>A: Trigger editing
+ A->>C: Update constellation
+ A-->>S: Modification complete
+ S->>O: Sync constellation state
+ end
+
+ O->>U: Return results
+```
+
+The orchestrator treats task execution as an **open-world process** - continuously evolving, reacting, and converging toward user intent rather than executing a fixed plan.
+
+## Design Highlights
+
+### Asynchronous by Default
+
+Every operation runs asynchronously using Python's `asyncio`, enabling:
+
+- Concurrent task execution across devices
+- Non-blocking event handling
+- Parallel constellation editing
+
+### LLM-Aware Orchestration
+
+Unlike traditional schedulers, the orchestrator is designed for **reasoning-aware execution**:
+
+- Expects and handles dynamic graph modifications
+- Synchronizes LLM reasoning with runtime execution
+- Validates and enforces safety under AI-driven changes
+
+### Production-Ready Safeguards
+
+- Timeout protection for modifications (default: 600s)
+- Automatic validation before every execution cycle
+- Device assignment verification
+- Cycle detection on every edit
+- Comprehensive error handling and logging
+
+## Performance Characteristics
+
+| Metric | Description | Implementation |
+|--------|-------------|----------------|
+| **Latency** | Time from task ready to execution start | Minimized via event-driven dispatch |
+| **Throughput** | Tasks completed per unit time | Maximized via async parallelism |
+| **Overhead** | Orchestration cost per task | Reduced via batched editing |
+| **Scalability** | Performance with increasing tasks/devices | Linear with async coordination |
+
+## Getting Started
+
+### Basic Usage
+
+```python
+from galaxy.constellation import TaskConstellationOrchestrator
+from galaxy.client.device_manager import ConstellationDeviceManager
+
+# Create orchestrator
+device_manager = ConstellationDeviceManager()
+orchestrator = TaskConstellationOrchestrator(device_manager)
+
+# Orchestrate constellation
+results = await orchestrator.orchestrate_constellation(
+ constellation=my_constellation,
+ assignment_strategy="capability_match"
+)
+```
+
+### With Modification Synchronizer
+
+```python
+from galaxy.session.observers.constellation_sync_observer import (
+ ConstellationModificationSynchronizer
+)
+
+# Create synchronizer
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+
+# Subscribe to events
+event_bus.subscribe(synchronizer)
+
+# Now orchestrator will wait for LLM edits to complete
+```
+
+## Related Documentation
+
+- **[Event-Driven Coordination](event_driven_coordination.md)** - Event system and observer pattern
+- **[Asynchronous Scheduling](asynchronous_scheduling.md)** - Concurrent task execution
+- **[Safe Assignment Locking](safe_assignment_locking.md)** - Race condition prevention
+- **[Consistency Guarantees](consistency_guarantees.md)** - Runtime invariants
+- **[Batched Editing](batched_editing.md)** - Efficient constellation updates
+- **[Constellation Manager](constellation_manager.md)** - Device and resource management
+- **[API Reference](api_reference.md)** - Complete API documentation
+
+## Further Reading
+
+- [TaskConstellation Documentation](../constellation/overview.md) - Understand the DAG structure
+- [Constellation Agent](../constellation_agent/overview.md) - LLM-based reasoning
+- [Device Manager](../client/device_manager.md) - Device communication layer
+
+---
+
+!!!tip "Next Steps"
+ To understand how events drive orchestration, continue to [Event-Driven Coordination](event_driven_coordination.md).
diff --git a/documents/docs/galaxy/constellation_orchestrator/safe_assignment_locking.md b/documents/docs/galaxy/constellation_orchestrator/safe_assignment_locking.md
new file mode 100644
index 000000000..5c6907fa9
--- /dev/null
+++ b/documents/docs/galaxy/constellation_orchestrator/safe_assignment_locking.md
@@ -0,0 +1,534 @@
+# Safe Assignment Locking
+
+## Overview
+
+While asynchronous execution maximizes efficiency, it introduces correctness challenges when task execution overlaps with DAG updates. The orchestrator must prevent race conditions where the Constellation Agent dynamically adds, removes, or rewires tasks during execution.
+
+Without safeguards, a task could be dispatched based on a stale DAG, leading to duplicated execution, missed dependencies, or invalid state transitions.
+
+To ensure atomicity, the orchestrator employs a safe assignment lock protocol combined with constellation state synchronization.
+
+
+
+*An example of the safe assignment locking and event synchronization workflow. When multiple tasks complete simultaneously, the orchestrator locks assignments, batches modifications, and releases after synchronization.*
+
+Learn more about how this integrates with [asynchronous scheduling](asynchronous_scheduling.md) and [batched editing](batched_editing.md).
+
+## The Race Condition Problem
+
+### Scenario Without Locking
+
+Consider this problematic sequence:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant C as Constellation
+ participant A as Agent
+
+ Note over O: Task A completes
+ O->>C: mark_task_completed(A)
+ O->>A: Trigger editing
+
+ par Agent modifies DAG
+ A->>C: Remove Task B
+ A->>C: Add Task B'
+ and Orchestrator continues
+ O->>C: get_ready_tasks()
+ C-->>O: [Task B, Task C]
+ Note over O: Task B dispatched (but removed!)
+ end
+
+ Note over O,A: Task B' never executed
+```
+
+**Problems:**
+
+1. **Stale dispatch**: Task B dispatched after being removed
+2. **Missing tasks**: Task B' never identified as ready
+3. **Inconsistent state**: Constellation doesn't reflect actual execution
+
+### Root Cause
+
+The orchestrator's scheduling loop and agent's editing process are **concurrent and unsynchronized**:
+
+```python
+# Orchestrator loop (simplified)
+while not constellation.is_complete():
+ ready_tasks = constellation.get_ready_tasks() # ← May see stale state
+ await schedule_ready_tasks(ready_tasks) # ← Dispatch based on stale view
+ await wait_for_task_completion()
+```
+
+Meanwhile, the agent modifies the same constellation object concurrently.
+
+## Safe Assignment Lock Protocol
+
+### The Solution
+
+The orchestrator uses a lock-bounded editing regime: during edit cycles, new task assignments are suspended until modifications are complete and synchronized.
+
+```python
+async def wait_for_pending_modifications(
+ self, timeout: Optional[float] = None
+) -> bool:
+ """Wait for all pending modifications to complete."""
+
+ if not self._pending_modifications:
+ return True
+
+ timeout = timeout or self._modification_timeout
+ start_time = asyncio.get_event_loop().time()
+
+ try:
+ while self._pending_modifications:
+ # Get current pending tasks
+ pending_tasks = list(self._pending_modifications.keys())
+ pending_futures = list(self._pending_modifications.values())
+
+ self.logger.info(
+ f"⏳ Waiting for {len(pending_tasks)} pending modification(s): {pending_tasks}"
+ )
+
+ # Calculate remaining timeout
+ elapsed = asyncio.get_event_loop().time() - start_time
+ remaining_timeout = timeout - elapsed
+
+ if remaining_timeout <= 0:
+ raise asyncio.TimeoutError()
+
+ # Wait for all current pending modifications
+ await asyncio.wait_for(
+ asyncio.gather(*pending_futures, return_exceptions=True),
+ timeout=remaining_timeout,
+ )
+
+ # Check if new modifications were added during the wait
+ if not self._pending_modifications:
+ break
+
+ # Small delay to allow new registrations to settle
+ await asyncio.sleep(0.01)
+
+ self.logger.info("✅ All pending modifications completed")
+ return True
+
+ except asyncio.TimeoutError:
+ pending = list(self._pending_modifications.keys())
+ self.logger.warning(
+ f"⚠️ Timeout waiting for modifications after {timeout}s. "
+ f"Proceeding anyway. Pending: {pending}"
+ )
+ # Clear all pending modifications to prevent permanent deadlock
+ self._pending_modifications.clear()
+ return False
+```
+
+### Edit Cycle Lifecycle
+
+An edit cycle is bounded by two events:
+
+1. **Start**: `TASK_COMPLETED` or `TASK_FAILED` event published
+2. **End**: `CONSTELLATION_MODIFIED` event published
+
+```mermaid
+stateDiagram-v2
+ [*] --> Normal_Execution
+ Normal_Execution --> Edit_Cycle_Started: TASK_COMPLETED/FAILED
+ Edit_Cycle_Started --> Editing_In_Progress: Register pending
+ Editing_In_Progress --> Edit_Cycle_Complete: CONSTELLATION_MODIFIED
+ Edit_Cycle_Complete --> Normal_Execution: Merge & release
+ Normal_Execution --> [*]
+```
+
+During the editing phase, new task assignments are suspended to prevent race conditions.
+
+### Safe Locking Protocol
+
+The complete protocol ensures atomic constellation updates:
+
+```
+Algorithm: Safe Assignment Locking and Asynchronous Rescheduling Protocol
+
+Input: Event stream E, current TaskConstellation C
+Output: Consistent and updated C with newly scheduled ready tasks
+
+while system is running do
+ foreach event e ∈ E do
+ if e is TASK_COMPLETED or TASK_FAILED then
+ async enqueue(e) // Record for processing
+ end
+ end
+
+ acquire(assign_lock) // Suspend new assignments
+
+ while queue not empty do
+ e ← dequeue()
+ Δ ← invoke(ConstellationAgent, edit(C, e)) // Propose DAG edits
+ C ← apply(C, Δ) // Update structure
+ validate(C) // Ensure invariants I1-I3
+ publish(CONSTELLATION_MODIFIED, t)
+ C ← synchronize(C, T_C) // Merge completed tasks
+ end
+
+ release(assign_lock) // Resume orchestration
+
+ // Rescheduling Phase (outside lock)
+ T_R ← get_ready_tasks(C)
+ foreach t ∈ T_R do
+ async dispatch(t)
+ async publish(TASK_STARTED, t)
+ end
+end
+```
+
+**Key properties:**
+
+- **Atomicity**: All edits within a queue batch are applied together
+- **Validation**: Constellation consistency checked before releasing
+- **Synchronization**: Runtime progress merged before rescheduling
+- **Non-blocking**: Lock only held during modification, not execution
+
+## Modification Synchronizer
+
+The `ConstellationModificationSynchronizer` component implements the locking protocol by coordinating between the orchestrator and agent.
+
+### Tracking Pending Modifications
+
+When a task completes, the synchronizer registers a pending modification:
+
+```python
+async def _handle_task_event(self, event: TaskEvent) -> None:
+ """Handle task completion/failure events."""
+
+ if event.event_type not in [EventType.TASK_COMPLETED, EventType.TASK_FAILED]:
+ return
+
+ constellation_id = event.data.get("constellation_id")
+ if not constellation_id:
+ return
+
+ # Register pending modification
+ if event.task_id not in self._pending_modifications:
+ modification_future = asyncio.Future()
+ self._pending_modifications[event.task_id] = modification_future
+ self._stats["total_modifications"] += 1
+
+ self.logger.info(
+ f"🔒 Registered pending modification for task '{event.task_id}'"
+ )
+
+ # Set timeout to auto-complete if modification takes too long
+ asyncio.create_task(
+ self._auto_complete_on_timeout(event.task_id, modification_future)
+ )
+```
+
+**Data structure:**
+
+```python
+# task_id -> Future mapping
+self._pending_modifications: Dict[str, asyncio.Future] = {}
+```
+
+Each future represents an edit cycle that will be completed when `CONSTELLATION_MODIFIED` is received.
+
+### Completing Modifications
+
+When the agent publishes `CONSTELLATION_MODIFIED`, the synchronizer completes the future:
+
+```python
+async def _handle_constellation_event(self, event: ConstellationEvent) -> None:
+ """Handle constellation modification events."""
+
+ if event.event_type != EventType.CONSTELLATION_MODIFIED:
+ return
+
+ task_ids = event.data.get("on_task_id")
+ if not task_ids:
+ return
+
+ new_constellation = event.data.get("new_constellation")
+ if new_constellation:
+ self._current_constellation = new_constellation
+
+ # Mark modifications as complete
+ for task_id in task_ids:
+ if task_id in self._pending_modifications:
+ future = self._pending_modifications[task_id]
+ if not future.done():
+ future.set_result(True) # Unblocks wait_for_pending_modifications
+ self._stats["completed_modifications"] += 1
+ self.logger.info(
+ f"✅ Completed modification for task '{task_id}'"
+ )
+ del self._pending_modifications[task_id]
+```
+
+### Timeout Protection
+
+To prevent deadlocks if the agent fails to publish `CONSTELLATION_MODIFIED`:
+
+```python
+async def _auto_complete_on_timeout(
+ self, task_id: str, future: asyncio.Future
+) -> None:
+ """Auto-complete a pending modification if it times out."""
+
+ try:
+ await asyncio.sleep(self._modification_timeout) # Default: 600s
+
+ if not future.done():
+ self._stats["timeout_modifications"] += 1
+ self.logger.warning(
+ f"⚠️ Modification for task '{task_id}' timed out. "
+ f"Auto-completing to prevent deadlock."
+ )
+ future.set_result(False)
+ if task_id in self._pending_modifications:
+ del self._pending_modifications[task_id]
+ except asyncio.CancelledError:
+ raise
+```
+
+**Warning:** Timeout protection ensures the orchestrator never permanently blocks, even if the agent encounters an error.
+
+## Constellation State Merging
+
+After modifications complete, the synchronizer must merge two potentially conflicting views:
+
+1. **Agent's constellation**: Has latest structural changes (new tasks, modified dependencies)
+2. **Orchestrator's constellation**: Has latest execution state (task statuses, results)
+
+### The Challenge
+
+During editing, tasks may complete:
+
+```
+t0: Task A completes → Agent starts editing
+t1: Agent modifies constellation (Task A still RUNNING in agent's copy)
+t2: Task B completes (orchestrator marks as COMPLETED)
+t3: Agent publishes CONSTELLATION_MODIFIED
+t4: Orchestrator syncs...
+```
+
+**Problem**: Direct replacement would lose Task B's COMPLETED status!
+
+### State Merging Algorithm
+
+The synchronizer preserves the most advanced state for each task:
+
+```python
+def merge_and_sync_constellation_states(
+ self, orchestrator_constellation: TaskConstellation
+) -> TaskConstellation:
+ """Merge constellation states: structural changes + execution state."""
+
+ if not self._current_constellation:
+ return orchestrator_constellation
+
+ # Use agent's constellation as base (has structural modifications)
+ merged = self._current_constellation
+
+ # Preserve execution state from orchestrator for existing tasks
+ for task_id, orchestrator_task in orchestrator_constellation.tasks.items():
+ if task_id in merged.tasks:
+ agent_task = merged.tasks[task_id]
+
+ # If orchestrator's state is more advanced, preserve it
+ if self._is_state_more_advanced(
+ orchestrator_task.status, agent_task.status
+ ):
+ # Preserve orchestrator's state and results
+ agent_task._status = orchestrator_task.status
+ agent_task._result = orchestrator_task.result
+ agent_task._error = orchestrator_task.error
+ agent_task._execution_start_time = orchestrator_task.execution_start_time
+ agent_task._execution_end_time = orchestrator_task.execution_end_time
+
+ # Update constellation state
+ merged.update_state()
+
+ return merged
+```
+
+### State Advancement Hierarchy
+
+States are ordered by execution progression:
+
+```python
+def _is_state_more_advanced(self, state1, state2) -> bool:
+ """Check if state1 is more advanced than state2."""
+
+ state_levels = {
+ TaskStatus.PENDING: 0,
+ TaskStatus.WAITING_DEPENDENCY: 1,
+ TaskStatus.RUNNING: 2,
+ TaskStatus.COMPLETED: 3,
+ TaskStatus.FAILED: 3, # Terminal states equally advanced
+ TaskStatus.CANCELLED: 3,
+ }
+
+ level1 = state_levels.get(state1, 0)
+ level2 = state_levels.get(state2, 0)
+
+ return level1 > level2
+```
+
+**Examples:**
+
+- `COMPLETED > RUNNING`: Preserve orchestrator's COMPLETED status
+- `FAILED > PENDING`: Preserve orchestrator's FAILED status
+- `RUNNING > PENDING`: Preserve orchestrator's RUNNING status
+- `COMPLETED = FAILED`: Both terminal, don't override
+
+State merging ensures no execution progress is lost during concurrent editing.
+
+## Synchronization in Orchestration Loop
+
+The orchestrator syncs at the start of each iteration:
+
+```python
+async def _sync_constellation_modifications(
+ self, constellation: TaskConstellation
+) -> TaskConstellation:
+ """Synchronize pending constellation modifications."""
+
+ if self._modification_synchronizer:
+ # Wait for agent to finish any pending edits
+ await self._modification_synchronizer.wait_for_pending_modifications()
+
+ # Merge agent's structural changes with orchestrator's execution state
+ constellation = self._modification_synchronizer \
+ .merge_and_sync_constellation_states(
+ orchestrator_constellation=constellation,
+ )
+
+ return constellation
+```
+
+The synchronization flow ensures the orchestrator always works with the latest merged state that includes both structural changes from the agent and execution progress from the orchestrator.
+
+## Batched Event Processing
+
+When multiple tasks complete simultaneously, their modifications are batched:
+
+```python
+# Process ALL pending modifications in one cycle
+while self._pending_modifications:
+ # Wait for all to complete
+ ...
+```
+
+**Timeline with batching:**
+
+```
+t0: Task A completes → enqueue(A)
+t3: Task B completes → enqueue(B)
+t4: Task C completes → enqueue(C)
+
+t5: acquire(lock)
+t6: Process A → Δ_A
+t7: Process B → Δ_B
+t8: Process C → Δ_C
+t9: Apply all Δs atomically
+t10: release(lock)
+```
+
+**Benefits:**
+
+- **Reduced overhead**: One lock acquisition for multiple edits
+- **Atomicity**: All modifications visible together
+- **Efficiency**: Amortize validation and synchronization costs
+
+Learn more about [batched editing strategies](batched_editing.md).
+
+## Correctness Properties
+
+The safe assignment lock protocol guarantees:
+
+**1. Atomicity**: Edit cycles are atomic - either all modifications in a batch are applied, or none are. Lock held during entire edit-validate-sync sequence.
+
+**2. Consistency**: Constellation always satisfies invariants after edits. Validation performed before releasing. See [consistency guarantees](consistency_guarantees.md) for details.
+
+**3. Progress**: The system never permanently blocks (liveness). Ensured by timeout protection (600s default), auto-completion on timeout, and exception handling in observers.
+
+## Performance Impact
+
+### Lock Overhead
+
+| Scenario | Lock Duration | Impact |
+|----------|--------------|---------|
+| Single task completion | 10-50ms | Negligible - concurrent tasks unaffected |
+| Batched completions | 50-200ms | Amortized over multiple edits |
+| Complex editing | 200-500ms | Depends on LLM response time |
+
+### Throughput Analysis
+
+The lock does not block task execution - while the lock is held for constellation modification, already-dispatched tasks continue executing concurrently.
+
+**Impact on throughput**: Minimal - only affects scheduling of new tasks, not execution of running tasks.
+
+### Latency Analysis
+
+Additional latency per task completion:
+
+- Without synchronizer: ~5ms (direct scheduling)
+- With synchronizer: ~10-50ms (wait for edit + merge)
+
+This is an acceptable tradeoff for correctness in dynamic orchestration.
+
+## Usage Patterns
+
+### Setting Up Synchronization
+
+```python
+from galaxy.constellation.orchestrator import TaskConstellationOrchestrator
+from galaxy.session.observers.constellation_sync_observer import (
+ ConstellationModificationSynchronizer
+)
+
+# Create orchestrator
+orchestrator = TaskConstellationOrchestrator(device_manager)
+
+# Create and attach synchronizer
+synchronizer = ConstellationModificationSynchronizer(orchestrator)
+orchestrator.set_modification_synchronizer(synchronizer)
+
+# Subscribe to events
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(synchronizer)
+
+# Orchestrate with synchronization
+results = await orchestrator.orchestrate_constellation(constellation)
+```
+
+### Custom Timeout
+
+```python
+# Increase timeout for slow LLM responses
+synchronizer.set_modification_timeout(1200.0) # 20 minutes
+```
+
+### Monitoring Synchronization
+
+```python
+# Check pending modifications
+if synchronizer.has_pending_modifications():
+ pending = synchronizer.get_pending_task_ids()
+ print(f"Waiting for modifications: {pending}")
+
+# Get statistics
+stats = synchronizer.get_statistics()
+print(f"Total: {stats['total_modifications']}")
+print(f"Completed: {stats['completed_modifications']}")
+print(f"Timeouts: {stats['timeout_modifications']}")
+```
+
+## Related Documentation
+
+- [Asynchronous Scheduling](asynchronous_scheduling.md) - Concurrent execution model
+- [Consistency Guarantees](consistency_guarantees.md) - Invariants enforced by locking
+- [Batched Editing](batched_editing.md) - Efficient modification batching
+- [Event-Driven Coordination](event_driven_coordination.md) - Event system foundation
diff --git a/documents/docs/galaxy/evaluation/performance_metrics.md b/documents/docs/galaxy/evaluation/performance_metrics.md
new file mode 100644
index 000000000..1ee1f1913
--- /dev/null
+++ b/documents/docs/galaxy/evaluation/performance_metrics.md
@@ -0,0 +1,618 @@
+# Performance Metrics and Logging
+
+Galaxy provides comprehensive performance monitoring and metrics collection throughout multi-device workflow execution. The system tracks task execution times, constellation modifications, and overall session metrics to enable analysis and optimization of distributed workflows.
+
+## Overview
+
+Galaxy uses an **event-driven observer pattern** to collect real-time performance metrics without impacting execution flow. The `SessionMetricsObserver` automatically captures timing data, task statistics, constellation modifications, and parallelism metrics.
+
+### Key Metrics Categories
+
+| Category | Description | Use Cases |
+|----------|-------------|-----------|
+| **Task Metrics** | Individual task execution times and outcomes | Identify slow tasks, success rates |
+| **Constellation Metrics** | DAG-level statistics and parallelism analysis | Optimize workflow structure |
+| **Modification Metrics** | Dynamic constellation editing during execution | Understand adaptability patterns |
+| **Session Metrics** | Overall session duration and resource usage | End-to-end performance analysis |
+
+## Metrics Collection System
+
+### SessionMetricsObserver
+
+The `SessionMetricsObserver` is automatically initialized for every Galaxy session and listens to events from the orchestration system.
+
+**Architecture:**
+
+```mermaid
+graph LR
+ A[Task Execution] -->|Task Events| B[SessionMetricsObserver]
+ C[Constellation Operations] -->|Constellation Events| B
+ B -->|Collect & Aggregate| D[Metrics Dictionary]
+ D -->|Save on Completion| E[result.json]
+
+ style B fill:#e1f5ff
+ style D fill:#fff4e1
+ style E fill:#c8e6c9
+```
+
+**Event Types Tracked:**
+
+| Event Type | Trigger | Metrics Captured |
+|-----------|---------|------------------|
+| `TASK_STARTED` | Task begins execution | Start timestamp, task count |
+| `TASK_COMPLETED` | Task finishes successfully | Duration, end timestamp |
+| `TASK_FAILED` | Task encounters error | Duration, failure count |
+| `CONSTELLATION_STARTED` | New DAG created | Initial statistics, task count |
+| `CONSTELLATION_COMPLETED` | DAG fully executed | Final statistics, total duration |
+| `CONSTELLATION_MODIFIED` | DAG edited during execution | Changes, modification type |
+
+---
+
+## Collected Metrics
+
+### 1. Task Metrics
+
+**Raw Task Data:**
+
+```python
+{
+ "task_timings": {
+ "t1": {
+ "start": 1761388508.9484463,
+ "duration": 11.852121591567993,
+ "end": 1761388520.8005679
+ },
+ "t2": {
+ "start": 1761388508.9494512,
+ "duration": 12.128723621368408,
+ "end": 1761388521.0781748
+ },
+ # ... more tasks
+ }
+}
+```
+
+**Computed Task Statistics:**
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `total_tasks` | int | Total number of tasks created | `5` |
+| `completed_tasks` | int | Successfully completed tasks | `5` |
+| `failed_tasks` | int | Failed tasks | `0` |
+| `success_rate` | float | Completion rate (0.0-1.0) | `1.0` |
+| `failure_rate` | float | Failure rate (0.0-1.0) | `0.0` |
+| `average_task_duration` | float | Mean task execution time (seconds) | `134.91` |
+| `min_task_duration` | float | Fastest task duration | `11.85` |
+| `max_task_duration` | float | Slowest task duration | `369.05` |
+| `total_task_execution_time` | float | Sum of all task durations | `674.55` |
+
+### 2. Constellation Metrics
+
+**Raw Constellation Data:**
+
+```python
+{
+ "constellation_timings": {
+ "constellation_b0864385_20251025_183508": {
+ "start_time": 1761388508.9061587,
+ "initial_statistics": {
+ "total_tasks": 5,
+ "total_dependencies": 4,
+ "longest_path_length": 2,
+ "max_width": 4,
+ "parallelism_ratio": 2.5
+ },
+ "processing_start_time": 1761388493.1049807,
+ "processing_end_time": 1761388508.9061587,
+ "processing_duration": 15.801177978515625,
+ "end_time": 1761389168.8877504,
+ "duration": 659.9815917015076,
+ "final_statistics": {
+ "total_tasks": 5,
+ "task_status_counts": {
+ "completed": 5
+ },
+ "critical_path_length": 638.134632,
+ "total_work": 674.4709760000001,
+ "parallelism_ratio": 1.0569415013350976
+ }
+ }
+ }
+}
+```
+
+**Computed Constellation Statistics:**
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `total_constellations` | int | Number of DAGs created | `1` |
+| `completed_constellations` | int | Successfully completed DAGs | `1` |
+| `failed_constellations` | int | Failed DAGs | `0` |
+| `success_rate` | float | Completion rate | `1.0` |
+| `average_constellation_duration` | float | Mean DAG execution time | `659.98` |
+| `min_constellation_duration` | float | Fastest DAG completion | `659.98` |
+| `max_constellation_duration` | float | Slowest DAG completion | `659.98` |
+| `average_tasks_per_constellation` | float | Mean tasks per DAG | `5.0` |
+
+**Key Constellation Metrics:**
+
+| Metric | Description | Formula | Interpretation |
+|--------|-------------|---------|----------------|
+| **Critical Path Length** | Duration of longest task chain | `max(path_durations)` | Minimum possible execution time |
+| **Total Work** | Sum of all task durations | `Σ task_durations` | Total computational effort |
+| **Parallelism Ratio** | Efficiency of parallel execution | `total_work / critical_path_length` | >1.0 indicates parallelism benefit |
+| **Max Width** | Maximum concurrent tasks | `max(concurrent_tasks_at_time_t)` | Peak resource utilization |
+
+!!! note "Parallelism Calculation Modes"
+ The system uses two calculation modes:
+
+ - **`node_count`**: Used when tasks are incomplete. Uses task count and path length.
+ - **`actual_time`**: Used when all tasks are completed. Uses real execution times for accurate parallelism analysis.
+
+**Example from result.json:**
+
+```json
+{
+ "critical_path_length": 638.134632,
+ "total_work": 674.4709760000001,
+ "parallelism_ratio": 1.0569415013350976
+}
+```
+
+**Analysis:** Parallelism ratio of `1.057` indicates minimal parallelism benefit (5.7% reduction in execution time). This suggests most tasks executed sequentially due to dependencies.
+
+### 3. Constellation Modification Metrics
+
+**Modification Records:**
+
+```python
+{
+ "constellation_modifications": {
+ "constellation_b0864385_20251025_183508": [
+ {
+ "timestamp": 1761388539.3350308,
+ "modification_type": "Edited by constellation_agent",
+ "on_task_id": ["t1"],
+ "changes": {
+ "modification_type": "task_properties_updated",
+ "added_tasks": [],
+ "removed_tasks": [],
+ "modified_tasks": ["t5", "t3"],
+ "added_dependencies": [],
+ "removed_dependencies": []
+ },
+ "new_statistics": {
+ "total_tasks": 5,
+ "task_status_counts": {
+ "completed": 2,
+ "running": 2,
+ "pending": 1
+ }
+ },
+ "processing_start_time": 1761388521.482895,
+ "processing_end_time": 1761388537.9989598,
+ "processing_duration": 16.516064882278442
+ }
+ # ... more modifications
+ ]
+ }
+}
+```
+
+**Computed Modification Statistics:**
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `total_modifications` | int | Total constellation edits | `4` |
+| `constellations_modified` | int | Number of DAGs modified | `1` |
+| `average_modifications_per_constellation` | float | Mean edits per DAG | `4.0` |
+| `max_modifications_for_single_constellation` | int | Most edits to one DAG | `4` |
+| `most_modified_constellation` | str | Constellation ID with most edits | `constellation_...` |
+| `modifications_per_constellation` | dict | Edit count per DAG | `{"constellation_...": 4}` |
+| `modification_types_breakdown` | dict | Count by modification type | `{"Edited by constellation_agent": 4}` |
+
+**Modification Types:**
+
+| Type | Description | Trigger |
+|------|-------------|---------|
+| `Edited by constellation_agent` | ConstellationAgent refined DAG | Task completion, feedback |
+| `task_properties_updated` | Task details modified | Result refinement |
+| `constellation_updated` | DAG structure changed | Dependency updates |
+| `tasks_added` | New tasks inserted | Workflow expansion |
+| `tasks_removed` | Tasks deleted | Optimization |
+
+## Session Results Structure
+
+The complete session results are saved to `logs/galaxy//result.json` with the following structure:
+
+```json
+{
+ "session_name": "galaxy_session_20251025_183449",
+ "request": "User's original request text",
+ "task_name": "task_32",
+ "status": "completed",
+ "execution_time": 684.864645,
+ "rounds": 1,
+ "start_time": "2025-10-25T18:34:52.641877",
+ "end_time": "2025-10-25T18:46:17.506522",
+ "trajectory_path": "logs/galaxy/task_32/",
+
+ "session_results": {
+ "total_execution_time": 684.8532314300537,
+ "final_constellation_stats": { /* ... */ },
+ "status": "FINISH",
+ "final_results": [ /* ... */ ],
+ "metrics": { /* ... */ }
+ },
+
+ "constellation": {
+ "id": "constellation_b0864385_20251025_183508",
+ "name": "constellation_b0864385_20251025_183508",
+ "task_count": 5,
+ "dependency_count": 4,
+ "state": "completed"
+ }
+}
+```
+
+**Top-Level Fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `session_name` | str | Unique session identifier |
+| `request` | str | Original user request |
+| `task_name` | str | Task identifier |
+| `status` | str | Session outcome: `"completed"`, `"failed"`, `"timeout"` |
+| `execution_time` | float | Total session duration (seconds) |
+| `rounds` | int | Number of orchestration rounds |
+| `start_time` | str | ISO 8601 session start timestamp |
+| `end_time` | str | ISO 8601 session end timestamp |
+| `trajectory_path` | str | Path to session logs |
+
+## Performance Analysis
+
+### Reading Metrics Programmatically
+
+```python
+import json
+from pathlib import Path
+
+def analyze_session_performance(result_path: str):
+ """
+ Analyze Galaxy session performance from result.json.
+
+ :param result_path: Path to result.json file
+ """
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ metrics = result["session_results"]["metrics"]
+
+ # Task performance
+ task_stats = metrics["task_statistics"]
+ print(f"✅ Tasks completed: {task_stats['completed_tasks']}/{task_stats['total_tasks']}")
+ print(f"⏱️ Average task duration: {task_stats['average_task_duration']:.2f}s")
+ print(f"📊 Success rate: {task_stats['success_rate'] * 100:.1f}%")
+
+ # Constellation performance
+ const_stats = metrics["constellation_statistics"]
+ print(f"\n🌌 Constellations: {const_stats['completed_constellations']}/{const_stats['total_constellations']}")
+ print(f"⏱️ Average constellation duration: {const_stats['average_constellation_duration']:.2f}s")
+
+ # Parallelism analysis
+ final_stats = result["session_results"]["final_constellation_stats"]
+ parallelism = final_stats.get("parallelism_ratio", 1.0)
+ print(f"\n🔀 Parallelism ratio: {parallelism:.2f}")
+
+ if parallelism > 1.5:
+ print(" → High parallelism: tasks executed concurrently")
+ elif parallelism > 1.0:
+ print(" → Moderate parallelism: some concurrent execution")
+ else:
+ print(" → Sequential execution: limited parallelism")
+
+ # Modification analysis
+ mod_stats = metrics["modification_statistics"]
+ print(f"\n✏️ Total modifications: {mod_stats['total_modifications']}")
+ print(f" Average per constellation: {mod_stats['average_modifications_per_constellation']:.1f}")
+
+ return metrics
+
+# Example usage
+metrics = analyze_session_performance("logs/galaxy/task_32/result.json")
+```
+
+**Expected Output:**
+
+```
+✅ Tasks completed: 5/5
+⏱️ Average task duration: 134.91s
+📊 Success rate: 100.0%
+
+🌌 Constellations: 1/1
+⏱️ Average constellation duration: 659.98s
+
+🔀 Parallelism ratio: 1.06
+ → Sequential execution: limited parallelism
+
+✏️ Total modifications: 4
+ Average per constellation: 4.0
+```
+
+### Identifying Performance Bottlenecks
+
+```python
+def identify_bottlenecks(result_path: str):
+ """
+ Identify performance bottlenecks from session metrics.
+
+ :param result_path: Path to result.json file
+ """
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ metrics = result["session_results"]["metrics"]
+ task_timings = metrics["task_timings"]
+ task_stats = metrics["task_statistics"]
+
+ # Find slowest tasks
+ avg_duration = task_stats["average_task_duration"]
+ threshold = avg_duration * 2 # 2x average = bottleneck
+
+ bottlenecks = []
+ for task_id, timing in task_timings.items():
+ if "duration" in timing and timing["duration"] > threshold:
+ bottlenecks.append({
+ "task_id": task_id,
+ "duration": timing["duration"],
+ "factor": timing["duration"] / avg_duration
+ })
+
+ if bottlenecks:
+ print("⚠️ Performance Bottlenecks Detected:")
+ for task in sorted(bottlenecks, key=lambda x: x["duration"], reverse=True):
+ print(f" • {task['task_id']}: {task['duration']:.2f}s ({task['factor']:.1f}x average)")
+ else:
+ print("✅ No significant bottlenecks detected")
+
+ return bottlenecks
+
+# Example usage
+bottlenecks = identify_bottlenecks("logs/galaxy/task_32/result.json")
+```
+
+**Example Output:**
+
+```
+⚠️ Performance Bottlenecks Detected:
+ • t5: 369.05s (2.7x average)
+ • t4: 269.11s (2.0x average)
+```
+
+### Visualizing Task Timeline
+
+```python
+import matplotlib.pyplot as plt
+from datetime import datetime
+
+def visualize_task_timeline(result_path: str):
+ """
+ Visualize task execution timeline.
+
+ :param result_path: Path to result.json file
+ """
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ metrics = result["session_results"]["metrics"]
+ task_timings = metrics["task_timings"]
+
+ # Prepare data
+ tasks = []
+ for task_id, timing in task_timings.items():
+ if "start" in timing and "end" in timing:
+ tasks.append({
+ "task_id": task_id,
+ "start": timing["start"],
+ "end": timing["end"],
+ "duration": timing["duration"]
+ })
+
+ # Sort by start time
+ tasks.sort(key=lambda x: x["start"])
+
+ # Create Gantt chart
+ fig, ax = plt.subplots(figsize=(12, 6))
+
+ for i, task in enumerate(tasks):
+ start_offset = task["start"] - tasks[0]["start"]
+ ax.barh(i, task["duration"], left=start_offset, height=0.5)
+ ax.text(start_offset + task["duration"] / 2, i,
+ f"{task['task_id']} ({task['duration']:.1f}s)",
+ ha='center', va='center')
+
+ ax.set_yticks(range(len(tasks)))
+ ax.set_yticklabels([t["task_id"] for t in tasks])
+ ax.set_xlabel("Time (seconds)")
+ ax.set_title("Task Execution Timeline")
+ ax.grid(axis='x', alpha=0.3)
+
+ plt.tight_layout()
+ plt.savefig("task_timeline.png")
+ print("📊 Timeline saved to task_timeline.png")
+
+# Example usage
+visualize_task_timeline("logs/galaxy/task_32/result.json")
+```
+
+---
+
+## Optimization Strategies
+
+### 1. Improve Parallelism
+
+**Goal:** Increase parallelism ratio by reducing dependencies
+
+```python
+# Analyze dependency structure
+def analyze_dependencies(result_path: str):
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ final_stats = result["session_results"]["final_constellation_stats"]
+
+ max_width = final_stats["max_width"]
+ total_tasks = final_stats["total_tasks"]
+ parallelism = final_stats["parallelism_ratio"]
+
+ print(f"Current parallelism: {parallelism:.2f}")
+ print(f"Max concurrent tasks: {max_width}/{total_tasks}")
+
+ if parallelism < 1.5:
+ print("\n💡 Recommendations:")
+ print(" • Reduce task dependencies where possible")
+ print(" • Break large sequential tasks into parallel subtasks")
+ print(" • Use more device agents for concurrent execution")
+
+# Example usage
+analyze_dependencies("logs/galaxy/task_32/result.json")
+```
+
+### 2. Reduce Task Duration**Goal:** Optimize slow tasks identified as bottlenecks
+
+```python
+# Generate optimization report
+def generate_optimization_report(result_path: str):
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ metrics = result["session_results"]["metrics"]
+ task_stats = metrics["task_statistics"]
+
+ avg_duration = task_stats["average_task_duration"]
+ max_duration = task_stats["max_task_duration"]
+
+ potential_savings = max_duration - avg_duration
+
+ print(f"📈 Optimization Potential:")
+ print(f" Current slowest task: {max_duration:.2f}s")
+ print(f" Average task duration: {avg_duration:.2f}s")
+ print(f" Potential time savings: {potential_savings:.2f}s ({potential_savings/max_duration*100:.1f}%)")
+
+# Example usage
+generate_optimization_report("logs/galaxy/task_32/result.json")
+```
+
+### 3. Reduce Constellation Modifications
+
+**Goal:** Minimize dynamic editing overhead
+
+```python
+# Analyze modification overhead
+def analyze_modification_overhead(result_path: str):
+ with open(result_path, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+
+ metrics = result["session_results"]["metrics"]
+ modifications = metrics["constellation_modifications"]
+
+ total_processing_time = 0
+ modification_count = 0
+
+ for const_mods in modifications.values():
+ for mod in const_mods:
+ if "processing_duration" in mod:
+ total_processing_time += mod["processing_duration"]
+ modification_count += 1
+
+ if modification_count > 0:
+ avg_overhead = total_processing_time / modification_count
+ print(f"✏️ Modification Overhead:")
+ print(f" Total modifications: {modification_count}")
+ print(f" Total overhead: {total_processing_time:.2f}s")
+ print(f" Average per modification: {avg_overhead:.2f}s")
+
+ if modification_count > 10:
+ print("\n💡 Recommendations:")
+ print(" • Provide more detailed initial request")
+ print(" • Use device capabilities metadata for better planning")
+
+# Example usage
+analyze_modification_overhead("logs/galaxy/task_32/result.json")
+```
+
+## Best Practices
+
+### 1. Regular Analysis
+
+Analyze every session to identify trends:
+
+```python
+from pathlib import Path
+
+# Analyze every session to identify trends
+for session_dir in Path("logs/galaxy").iterdir():
+ result_file = session_dir / "result.json"
+ if result_file.exists():
+ analyze_session_performance(str(result_file))
+```
+
+### 2. Baseline Metrics
+
+Establish baseline performance for common task types:
+
+| Task Type | Baseline Duration | Acceptable Range |
+|-----------|-------------------|------------------|
+| Simple data query | 10-30s | <60s |
+| Document generation | 30-60s | <120s |
+| Multi-device workflow | 60-180s | <300s |
+
+### 3. Track Trends
+
+Monitor performance over time to detect degradation:
+
+```python
+import pandas as pd
+from pathlib import Path
+
+def track_performance_trends(log_dir: str):
+ """Track performance metrics over time."""
+ results = []
+ for session_dir in Path(log_dir).iterdir():
+ result_file = session_dir / "result.json"
+ if result_file.exists():
+ with open(result_file, 'r') as f:
+ data = json.load(f)
+ results.append({
+ "session_name": data["session_name"],
+ "execution_time": data["execution_time"],
+ "task_count": data["session_results"]["metrics"]["task_count"],
+ "parallelism": data["session_results"]["final_constellation_stats"].get("parallelism_ratio", 1.0)
+ })
+
+ df = pd.DataFrame(results)
+ print(df.describe())
+
+# Example usage
+track_performance_trends("logs/galaxy")
+```
+
+## Related Documentation
+
+- **[Result JSON Format](./result_json.md)** - Complete result.json schema reference
+- **[Galaxy Overview](../overview.md)** - Main Galaxy framework documentation
+- **[Task Constellation](../constellation/task_constellation.md)** - DAG-based task planning and parallelism metrics
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** - Execution coordination and event handling
+
+## Summary
+
+Galaxy's performance metrics system provides comprehensive monitoring capabilities:
+
+- **Real-time monitoring** - Event-driven metrics collection through `SessionMetricsObserver`
+- **Comprehensive coverage** - Tasks, constellations, and modifications tracking
+- **Parallelism analysis** - Critical path and efficiency metrics with two calculation modes
+- **Bottleneck identification** - Statistical analysis to find performance outliers
+- **Optimization insights** - Data-driven improvement recommendations
+- **Programmatic access** - Structured JSON format for automated analysis
+
+Use these metrics to optimize workflow design, analyze task dependencies, and enhance overall system performance.
diff --git a/documents/docs/galaxy/evaluation/result_json.md b/documents/docs/galaxy/evaluation/result_json.md
new file mode 100644
index 000000000..0eb2d8028
--- /dev/null
+++ b/documents/docs/galaxy/evaluation/result_json.md
@@ -0,0 +1,842 @@
+# Result JSON Format Reference
+
+Galaxy automatically saves comprehensive execution results to `result.json` after each session completes. This file contains the complete execution history, performance metrics, constellation statistics, and final outcomes of multi-device workflows.
+
+## Overview
+
+The `result.json` file provides a **complete audit trail** and **performance analysis** of Galaxy session execution. It combines session metadata, execution metrics, constellation statistics, and final results into a single structured document.
+
+### File Location
+
+```
+logs/galaxy//result.json
+```
+
+**Example:**
+
+```
+logs/galaxy/request_20251111_140216_1/result.json
+```
+
+## File Structure
+
+### Top-Level Schema
+
+```json
+{
+ "session_name": "string", // Unique session identifier
+ "request": "string", // Original user request
+ "task_name": "string", // Task identifier
+ "status": "string", // Session outcome
+ "execution_time": "float", // Total duration (seconds)
+ "rounds": "integer", // Number of orchestration rounds
+ "start_time": "string", // ISO 8601 start timestamp
+ "end_time": "string", // ISO 8601 end timestamp
+ "trajectory_path": "string", // Path to session logs
+ "session_results": { /* ... */ }, // Detailed execution results
+ "constellation": { /* ... */ } // Final constellation summary
+}
+```
+
+---
+
+## Field Reference
+
+### Session Metadata
+
+#### `session_name` (string)
+
+Unique identifier for the Galaxy session, generated automatically.
+
+**Format:** `galaxy_session_YYYYMMDD_HHMMSS`
+
+**Example:**
+
+```json
+{
+ "session_name": "galaxy_session_20251025_183449"
+}
+```
+
+#### `request` (string)
+
+The original natural language request provided by the user.
+
+**Example:**
+
+```json
+{
+ "request": "For all linux, get their disk usage statistics. Then, from Windows browser, search for the top 3 recommended ways to reduce high disk usage for Linux systems and document these in a report on notepad."
+}
+```
+
+#### `task_name` (string)
+
+Internal task identifier assigned to the session.
+
+**Format:** `task_` or custom name
+
+**Example:**
+
+```json
+{
+ "task_name": "task_32"
+}
+```
+
+#### `status` (string)
+
+Final session outcome status.
+
+**Possible Values:**
+
+| Status | Description | Meaning |
+|--------|-------------|---------|
+| `"completed"` | Session finished successfully | All tasks completed |
+| `"failed"` | Session encountered unrecoverable error | Task failure or system error |
+| `"timeout"` | Session exceeded time limit | Max execution time reached |
+| `"cancelled"` | Session manually stopped by user | User interruption |
+
+**Example:**
+
+```json
+{
+ "status": "completed"
+}
+```
+
+#### `execution_time` (float)
+
+Total session duration in seconds, from start to completion.
+
+**Example:**
+
+```json
+{
+ "execution_time": 684.864645
+}
+```
+
+#### `rounds` (integer)
+
+Number of orchestration rounds executed during the session. Each round represents a full constellation creation or modification cycle.
+
+**Example:**
+
+```json
+{
+ "rounds": 1
+}
+```
+
+!!! tip "Understanding Rounds"
+ Multiple rounds indicate a complex request requiring iterative refinement. Most sessions complete in 1-2 rounds.
+
+#### `start_time` (string)
+
+ISO 8601 formatted timestamp when the session started.
+
+**Format:** `YYYY-MM-DDTHH:MM:SS.ssssss`
+
+**Example:**
+
+```json
+{
+ "start_time": "2025-10-25T18:34:52.641877"
+}
+```
+
+#### `end_time` (string)
+
+ISO 8601 formatted timestamp when the session completed.
+
+**Example:**
+
+```json
+{
+ "end_time": "2025-10-25T18:46:17.506522"
+}
+```
+
+#### `trajectory_path` (string)
+
+File system path to the directory containing all session logs and artifacts.
+
+**Example:**
+
+```json
+{
+ "trajectory_path": "logs/galaxy/request_20251111_140216_1/"
+}
+```
+
+**Directory Contents:**
+
+```
+logs/galaxy/request_20251111_140216_1/
+├── result.json # This file
+├── output.md # Trajectory report
+├── response.log # JSONL execution log
+├── request.log # Request details
+├── evaluation.log # Optional evaluation
+└── topology_images/ # DAG visualizations
+ └── *.png
+```
+
+### Session Results
+
+The `session_results` object contains detailed execution information and metrics.
+
+```json
+{
+ "session_results": {
+ "total_execution_time": "float",
+ "final_constellation_stats": { /* ... */ },
+ "status": "string",
+ "final_results": [ /* ... */ ],
+ "metrics": { /* ... */ }
+ }
+}
+```
+
+#### `total_execution_time` (float)
+
+Total time spent executing tasks (excludes planning/overhead).
+
+**Example:**
+
+```json
+{
+ "total_execution_time": 684.8532314300537
+}
+```
+
+#### `final_constellation_stats` (object)
+
+Statistics for the final constellation after all tasks completed.
+
+**Schema:**
+
+```json
+{
+ "constellation_id": "string", // Unique constellation ID
+ "name": "string", // Constellation name
+ "state": "string", // "completed", "failed", "executing"
+ "total_tasks": "integer", // Total task count
+ "total_dependencies": "integer", // Dependency count
+ "task_status_counts": { // Task states
+ "completed": "integer",
+ "failed": "integer",
+ "pending": "integer",
+ "running": "integer"
+ },
+ "longest_path_length": "integer", // Max depth (levels)
+ "longest_path_tasks": ["string"], // Task IDs in longest path
+ "max_width": "integer", // Max concurrent tasks
+ "critical_path_length": "float", // Critical path duration (seconds)
+ "total_work": "float", // Sum of all task durations
+ "parallelism_ratio": "float", // total_work / critical_path_length
+ "parallelism_calculation_mode": "string", // "actual_time" or "node_count"
+ "critical_path_tasks": ["string"], // Task IDs in critical path
+ "execution_duration": "float", // Constellation total duration
+ "created_at": "string", // ISO 8601 creation timestamp
+ "updated_at": "string" // ISO 8601 last update timestamp
+}
+```
+
+**Example:**
+
+```json
+{
+ "final_constellation_stats": {
+ "constellation_id": "constellation_b0864385_20251025_183508",
+ "name": "constellation_b0864385_20251025_183508",
+ "state": "completed",
+ "total_tasks": 5,
+ "total_dependencies": 4,
+ "task_status_counts": {
+ "completed": 5
+ },
+ "longest_path_length": 2,
+ "longest_path_tasks": ["t1", "t5"],
+ "max_width": 4,
+ "critical_path_length": 638.134632,
+ "total_work": 674.4709760000001,
+ "parallelism_ratio": 1.0569415013350976,
+ "parallelism_calculation_mode": "actual_time",
+ "critical_path_tasks": ["t4", "t5"],
+ "execution_duration": null,
+ "created_at": "2025-10-25T10:35:08.777663+00:00",
+ "updated_at": "2025-10-25T10:46:08.625716+00:00"
+ }
+}
+```
+
+**Key Metrics:**
+
+| Field | Description | Use Case |
+|-------|-------------|----------|
+| `critical_path_length` | Minimum possible execution time | Theoretical performance limit |
+| `total_work` | Total computational effort | Resource utilization |
+| `parallelism_ratio` | Efficiency of parallel execution | Optimization target |
+| `max_width` | Peak concurrent tasks | Capacity planning |
+
+!!! note "Parallelism Ratio Interpretation"
+ - **1.0**: Sequential execution (no parallelism)
+ - **1.5**: 50% time reduction through parallelism
+ - **2.0**: 2x speedup from parallel execution
+ - **>2.0**: High parallelism efficiency
+
+#### `status` (string)
+
+Final status from ConstellationAgent.
+
+**Possible Values:**
+
+- `"FINISH"`: Successful completion
+- `"FAIL"`: Execution failure
+- `"PENDING"`: Incomplete (should not appear in final result)
+
+**Example:**
+
+```json
+{
+ "status": "FINISH"
+}
+```
+
+#### `final_results` (array)
+
+Array of result objects containing request-result pairs.
+
+**Schema:**
+
+```json
+{
+ "final_results": [
+ {
+ "request": "string", // User request (may be same as top-level)
+ "result": "string" // Final outcome description
+ }
+ ]
+}
+```
+
+**Example:**
+
+```json
+{
+ "final_results": [
+ {
+ "request": "For all linux, get their disk usage statistics. Then, from Windows browser, search for the top 3 recommended ways to reduce high disk usage for Linux systems and document these in a report on notepad.",
+ "result": "User request fully completed. Final artifact: 'Documents\\\\Linux_Disk_Usage_Report.txt' on windows_agent, containing full disk usage summaries for linux_agent_1, linux_agent_2, and linux_agent_3, and top 3 recommendations for reducing high disk usage (from Tecmint). All tasks completed successfully; no further constellation updates required."
+ }
+ ]
+}
+```
+
+#### `metrics` (object)
+
+Comprehensive performance metrics collected during execution. See **[Performance Metrics](./performance_metrics.md)** for detailed documentation.
+
+**Schema:**
+
+```json
+{
+ "metrics": {
+ "session_id": "string",
+ "task_count": "integer",
+ "completed_tasks": "integer",
+ "failed_tasks": "integer",
+ "total_execution_time": "float",
+ "task_timings": { /* ... */ },
+ "constellation_count": "integer",
+ "completed_constellations": "integer",
+ "failed_constellations": "integer",
+ "total_constellation_time": "float",
+ "constellation_timings": { /* ... */ },
+ "constellation_modifications": { /* ... */ },
+ "task_statistics": { /* ... */ },
+ "constellation_statistics": { /* ... */ },
+ "modification_statistics": { /* ... */ }
+ }
+}
+```
+
+**See:** [Performance Metrics Documentation](./performance_metrics.md)
+
+### Constellation Summary
+
+The `constellation` object provides a high-level summary of the final constellation.
+
+**Schema:**
+
+```json
+{
+ "constellation": {
+ "id": "string", // Constellation ID
+ "name": "string", // Constellation name
+ "task_count": "integer", // Total tasks
+ "dependency_count": "integer", // Total dependencies
+ "state": "string" // Final state
+ }
+}
+```
+
+**Example:**
+
+```json
+{
+ "constellation": {
+ "id": "constellation_b0864385_20251025_183508",
+ "name": "constellation_b0864385_20251025_183508",
+ "task_count": 5,
+ "dependency_count": 4,
+ "state": "completed"
+ }
+}
+```
+
+---
+
+## Complete Example
+
+Here's a complete `result.json` file from an actual Galaxy session:
+
+```json
+{
+ "session_name": "galaxy_session_20251025_183449",
+ "request": "For all linux, get their disk usage statistics. Then, from Windows browser, search for the top 3 recommended ways to reduce high disk usage for Linux systems and document these in a report on notepad.",
+ "task_name": "task_32",
+ "status": "completed",
+ "execution_time": 684.864645,
+ "rounds": 1,
+ "start_time": "2025-10-25T18:34:52.641877",
+ "end_time": "2025-10-25T18:46:17.506522",
+ "trajectory_path": "logs/galaxy/task_32/",
+
+ "session_results": {
+ "total_execution_time": 684.8532314300537,
+
+ "final_constellation_stats": {
+ "constellation_id": "constellation_b0864385_20251025_183508",
+ "name": "constellation_b0864385_20251025_183508",
+ "state": "completed",
+ "total_tasks": 5,
+ "total_dependencies": 4,
+ "task_status_counts": {
+ "completed": 5
+ },
+ "longest_path_length": 2,
+ "longest_path_tasks": ["t1", "t5"],
+ "max_width": 4,
+ "critical_path_length": 638.134632,
+ "total_work": 674.4709760000001,
+ "parallelism_ratio": 1.0569415013350976,
+ "parallelism_calculation_mode": "actual_time",
+ "critical_path_tasks": ["t4", "t5"],
+ "execution_duration": null,
+ "created_at": "2025-10-25T10:35:08.777663+00:00",
+ "updated_at": "2025-10-25T10:46:08.625716+00:00"
+ },
+
+ "status": "FINISH",
+
+ "final_results": [
+ {
+ "request": "For all linux, get their disk usage statistics. Then, from Windows browser, search for the top 3 recommended ways to reduce high disk usage for Linux systems and document these in a report on notepad.",
+ "result": "User request fully completed. Final artifact: 'Documents\\\\Linux_Disk_Usage_Report.txt' on windows_agent, containing full disk usage summaries for linux_agent_1, linux_agent_2, and linux_agent_3, and top 3 recommendations for reducing high disk usage (from Tecmint). All tasks completed successfully; no further constellation updates required."
+ }
+ ],
+
+ "metrics": {
+ "session_id": "galaxy_session_galaxy_session_20251025_183449_task_32",
+ "task_count": 5,
+ "completed_tasks": 5,
+ "failed_tasks": 0,
+ "total_execution_time": 674.547759771347,
+
+ "task_timings": {
+ "t1": {
+ "start": 1761388508.9484463,
+ "duration": 11.852121591567993,
+ "end": 1761388520.8005679
+ },
+ "t2": {
+ "start": 1761388508.9494512,
+ "duration": 12.128723621368408,
+ "end": 1761388521.0781748
+ },
+ "t3": {
+ "start": 1761388508.9494512,
+ "duration": 12.409801721572876,
+ "end": 1761388521.359253
+ },
+ "t4": {
+ "start": 1761388508.9494512,
+ "duration": 269.1103162765503,
+ "end": 1761388778.0597675
+ },
+ "t5": {
+ "start": 1761388799.57892,
+ "duration": 369.0467965602875,
+ "end": 1761389168.6257164
+ }
+ },
+
+ "constellation_count": 1,
+ "completed_constellations": 1,
+ "failed_constellations": 0,
+ "total_constellation_time": 0.0,
+
+ "task_statistics": {
+ "total_tasks": 5,
+ "completed_tasks": 5,
+ "failed_tasks": 0,
+ "success_rate": 1.0,
+ "failure_rate": 0.0,
+ "average_task_duration": 134.9095519542694,
+ "min_task_duration": 11.852121591567993,
+ "max_task_duration": 369.0467965602875,
+ "total_task_execution_time": 674.547759771347
+ },
+
+ "constellation_statistics": {
+ "total_constellations": 1,
+ "completed_constellations": 1,
+ "failed_constellations": 0,
+ "success_rate": 1.0,
+ "average_constellation_duration": 659.9815917015076,
+ "min_constellation_duration": 659.9815917015076,
+ "max_constellation_duration": 659.9815917015076,
+ "total_constellation_time": 0.0,
+ "average_tasks_per_constellation": 5.0
+ },
+
+ "modification_statistics": {
+ "total_modifications": 4,
+ "constellations_modified": 1,
+ "average_modifications_per_constellation": 4.0,
+ "max_modifications_for_single_constellation": 4,
+ "most_modified_constellation": "constellation_b0864385_20251025_183508",
+ "modifications_per_constellation": {
+ "constellation_b0864385_20251025_183508": 4
+ },
+ "modification_types_breakdown": {
+ "Edited by constellation_agent": 4
+ }
+ }
+ }
+ },
+
+ "constellation": {
+ "id": "constellation_b0864385_20251025_183508",
+ "name": "constellation_b0864385_20251025_183508",
+ "task_count": 5,
+ "dependency_count": 4,
+ "state": "completed"
+ }
+}
+```
+
+---
+
+## Programmatic Access
+
+### Reading Result JSON
+
+```python
+import json
+from pathlib import Path
+
+def load_session_result(task_name: str) -> dict:
+ """
+ Load Galaxy session result.
+
+ :param task_name: Task identifier (e.g., "task_32")
+ :return: Result dictionary
+ """
+ result_path = Path("logs/galaxy") / task_name / "result.json"
+
+ with open(result_path, 'r', encoding='utf-8') as f:
+ return json.load(f)
+
+# Example usage
+result = load_session_result("task_32")
+print(f"Session: {result['session_name']}")
+print(f"Status: {result['status']}")
+print(f"Duration: {result['execution_time']:.2f}s")
+```
+
+### Extracting Key Information
+
+```python
+def extract_summary(result: dict) -> dict:
+ """
+ Extract key summary information from result.json.
+
+ :param result: Result dictionary from load_session_result()
+ :return: Summary dictionary
+ """
+ metrics = result["session_results"]["metrics"]
+ task_stats = metrics["task_statistics"]
+ const_stats = result["session_results"]["final_constellation_stats"]
+
+ return {
+ "session_name": result["session_name"],
+ "request": result["request"],
+ "status": result["status"],
+ "total_duration": result["execution_time"],
+ "task_count": task_stats["total_tasks"],
+ "success_rate": task_stats["success_rate"],
+ "parallelism_ratio": const_stats.get("parallelism_ratio", 1.0),
+ "final_result": result["session_results"]["final_results"][0]["result"]
+ if result["session_results"]["final_results"] else None
+ }
+
+# Example usage
+result = load_session_result("task_32")
+summary = extract_summary(result)
+
+print(f"✅ Success Rate: {summary['success_rate'] * 100:.1f}%")
+print(f"⏱️ Duration: {summary['total_duration']:.2f}s")
+print(f"🔀 Parallelism: {summary['parallelism_ratio']:.2f}")
+```
+
+**Expected Output:**
+
+```
+✅ Success Rate: 100.0%
+⏱️ Duration: 684.86s
+🔀 Parallelism: 1.06
+```
+
+### Batch Analysis
+
+```python
+def analyze_multiple_sessions(log_dir: str = "logs/galaxy"):
+ """
+ Analyze multiple Galaxy sessions from log directory.
+
+ :param log_dir: Path to Galaxy log directory
+ :return: DataFrame with session analysis
+ """
+ import pandas as pd
+
+ sessions = []
+
+ for task_dir in Path(log_dir).iterdir():
+ result_file = task_dir / "result.json"
+
+ if result_file.exists():
+ with open(result_file, 'r', encoding='utf-8') as f:
+ result = json.load(f)
+ summary = extract_summary(result)
+ sessions.append(summary)
+
+ df = pd.DataFrame(sessions)
+
+ print("📊 Session Analysis Summary:")
+ print(f" Total sessions: {len(df)}")
+ print(f" Average duration: {df['total_duration'].mean():.2f}s")
+ print(f" Average success rate: {df['success_rate'].mean() * 100:.1f}%")
+ print(f" Average parallelism: {df['parallelism_ratio'].mean():.2f}")
+
+ return df
+
+# Example usage
+df = analyze_multiple_sessions()
+```
+
+### Generating Reports
+
+```python
+def generate_performance_report(task_name: str, output_file: str = "report.md"):
+ """
+ Generate Markdown performance report from result.json.
+
+ :param task_name: Task identifier
+ :param output_file: Output Markdown file path
+ """
+ result = load_session_result(task_name)
+ metrics = result["session_results"]["metrics"]
+
+ # Generate Markdown report
+ report = f"""# Galaxy Session Performance Report
+```
+
+## Session Information
+
+- **Session Name:** {result['session_name']}
+- **Task Name:** {result['task_name']}
+- **Status:** {result['status']}
+- **Start Time:** {result['start_time']}
+- **End Time:** {result['end_time']}
+- **Total Duration:** {result['execution_time']:.2f}s
+
+
+## Task Performance
+
+| Metric | Value |
+|--------|-------|
+| Total Tasks | `{metrics['task_count']}` |
+| Completed Tasks | `{metrics['completed_tasks']}` |
+| Failed Tasks | `{metrics['failed_tasks']}` |
+| Success Rate | `{metrics['task_statistics']['success_rate'] * 100:.1f}%` |
+| Average Task Duration | `{metrics['task_statistics']['average_task_duration']:.2f}s` |
+| Min Task Duration | `{metrics['task_statistics']['min_task_duration']:.2f}s` |
+| Max Task Duration | `{metrics['task_statistics']['max_task_duration']:.2f}s` |
+
+## Constellation Performance
+
+| Metric | Value |
+|--------|-------|
+| Parallelism Ratio | `{result['session_results']['final_constellation_stats']['parallelism_ratio']:.2f}` |
+| Critical Path Length | `{result['session_results']['final_constellation_stats']['critical_path_length']:.2f}s` |
+| Total Work | `{result['session_results']['final_constellation_stats']['total_work']:.2f}s` |
+| Max Width | `{result['session_results']['final_constellation_stats']['max_width']}` |
+
+
+# Example usage
+
+```python
+ generate_performance_report("task_32", "task_32_report.md")
+```
+
+## Use Cases
+
+### 1. Debugging Failed Sessions
+
+```python
+def debug_failed_session(task_name: str):
+ """
+ Analyze failed session for debugging.
+
+ :param task_name: Task identifier
+ """
+ result = load_session_result(task_name)
+
+ if result["status"] != "completed":
+ print(f"⚠️ Session Failed: {result['status']}")
+
+ metrics = result["session_results"]["metrics"]
+ failed_tasks = []
+
+ for task_id, timing in metrics["task_timings"].items():
+ # Check if task is in failed list
+ if task_id in [f"t{i}" for i in range(metrics["failed_tasks"])]:
+ failed_tasks.append(task_id)
+
+ if failed_tasks:
+ print(f"\n❌ Failed Tasks:")
+ for task_id in failed_tasks:
+ print(f" • {task_id}")
+
+ # Check logs for more details
+ log_dir = Path(result["trajectory_path"])
+ print(f"\n📁 Check logs in: {log_dir}")
+```
+
+### 2. Comparing Session Performance
+
+```python
+def compare_sessions(task_name_1: str, task_name_2: str):
+ """
+ Compare performance of two Galaxy sessions.
+
+ :param task_name_1: First task identifier
+ :param task_name_2: Second task identifier
+ """
+ result1 = load_session_result(task_name_1)
+ result2 = load_session_result(task_name_2)
+
+ summary1 = extract_summary(result1)
+ summary2 = extract_summary(result2)
+
+ print(f"📊 Session Comparison:")
+ print(f"\n{'Metric':<30} {task_name_1:<20} {task_name_2:<20}")
+ print("-" * 70)
+ print(f"{'Duration (s)':<30} {summary1['total_duration']:<20.2f} {summary2['total_duration']:<20.2f}")
+ print(f"{'Task Count':<30} {summary1['task_count']:<20} {summary2['task_count']:<20}")
+ print(f"{'Success Rate':<30} {summary1['success_rate']*100:<20.1f}% {summary2['success_rate']*100:<20.1f}%")
+ print(f"{'Parallelism Ratio':<30} {summary1['parallelism_ratio']:<20.2f} {summary2['parallelism_ratio']:<20.2f}")
+```
+
+```python
+import matplotlib.pyplot as plt
+from datetime import datetime
+
+def plot_performance_trend(log_dir: str = "logs/galaxy"):
+ """
+ Plot performance trends across sessions.
+
+ :param log_dir: Path to Galaxy log directory
+ """
+ sessions = []
+
+ for task_dir in sorted(Path(log_dir).iterdir()):
+ result_file = task_dir / "result.json"
+
+ if result_file.exists():
+ with open(result_file, 'r') as f:
+ result = json.load(f)
+ sessions.append({
+ "timestamp": datetime.fromisoformat(result["start_time"]),
+ "duration": result["execution_time"],
+ "task_count": result["session_results"]["metrics"]["task_count"],
+ "parallelism": result["session_results"]["final_constellation_stats"].get("parallelism_ratio", 1.0)
+ })
+
+ if not sessions:
+ print("No sessions found")
+ return
+
+ # Plot duration trend
+ fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
+
+ timestamps = [s["timestamp"] for s in sessions]
+ durations = [s["duration"] for s in sessions]
+ parallelism = [s["parallelism"] for s in sessions]
+
+ ax1.plot(timestamps, durations, marker='o')
+ ax1.set_xlabel("Session Timestamp")
+ ax1.set_ylabel("Duration (seconds)")
+ ax1.set_title("Session Duration Trend")
+ ax1.grid(True, alpha=0.3)
+
+ ax2.plot(timestamps, parallelism, marker='o', color='green')
+ ax2.set_xlabel("Session Timestamp")
+ ax2.set_ylabel("Parallelism Ratio")
+ ax2.set_title("Parallelism Efficiency Trend")
+ ax2.axhline(y=1.0, color='red', linestyle='--', label='Sequential (no parallelism)')
+ ax2.grid(True, alpha=0.3)
+ ax2.legend()
+
+ plt.tight_layout()
+ plt.savefig("performance_trend.png")
+ print("📈 Trend plot saved to performance_trend.png")
+
+# Example usage
+plot_performance_trend()
+```
+
+## Related Documentation
+
+- **[Performance Metrics](./performance_metrics.md)** - Detailed metrics documentation and analysis
+- **[Trajectory Report](./trajectory_report.md)** - Human-readable execution log with DAG visualizations
+- **[Galaxy Overview](../overview.md)** - Main Galaxy framework documentation
+- **[Task Constellation](../constellation/task_constellation.md)** - DAG structure and parallelism metrics
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** - Execution coordination
+
+## Summary
+
+The `result.json` file provides comprehensive session analysis:
+
+- **Complete execution history** - All session details in structured format
+- **Performance metrics** - Comprehensive timing and statistics via `SessionMetricsObserver`
+- **Constellation analysis** - DAG structure and parallelism data
+- **Programmatic access** - JSON format for automated analysis and reporting
+- **Debugging support** - Failed task identification and detailed execution logs
+- **Trend analysis** - Compare sessions over time for performance monitoring
+
+Use `result.json` for debugging, performance optimization, reporting, and automated analysis of Galaxy workflows.
diff --git a/documents/docs/galaxy/evaluation/trajectory_report.md b/documents/docs/galaxy/evaluation/trajectory_report.md
new file mode 100644
index 000000000..96160992c
--- /dev/null
+++ b/documents/docs/galaxy/evaluation/trajectory_report.md
@@ -0,0 +1,906 @@
+# Galaxy Trajectory Report
+
+## Overview
+
+The **Galaxy Trajectory Report** (`output.md`) is an automatically generated comprehensive execution log that documents the complete lifecycle of a multi-device task execution session in Galaxy. This human-readable Markdown report provides step-by-step visualization of constellation evolution, task execution, and device coordination.
+
+### Report Location
+
+After each Galaxy session completes, the trajectory report is automatically generated:
+
+```
+logs/galaxy//output.md
+logs/galaxy//topology_images/ # DAG visualizations
+```
+
+**Example:**
+```
+logs/galaxy/request_20251111_140216_1/
+├── output.md # Main trajectory report
+├── response.log # Raw JSONL execution log
+├── request.log # Request details
+├── evaluation.log # Optional evaluation
+├── result.json # Performance metrics
+└── topology_images/ # Generated DAG topology graphs
+ ├── step1_after_constellation_xxx.png
+ ├── step2_after_constellation_xxx.png
+ └── step999_final_constellation_xxx.png
+```
+
+## Report Structure
+
+### 1. Executive Summary
+
+High-level session overview:
+
+```markdown
+## Executive Summary
+
+- **User Request**: type hi on all linux and write results to windows notepad
+- **Total Steps**: 4
+- **Total Time**: 31.54s
+```
+
+**Components:**
+- **User Request**: Original natural language task description
+- **Total Steps**: Number of orchestration steps (DAG creation + execution rounds)
+- **Total Time**: End-to-end session duration in seconds
+
+### 2. Step-by-Step Execution
+
+Detailed breakdown of each orchestration step with:
+
+#### Step Metadata
+
+```markdown
+### Step 2
+
+- **Agent**: constellation_agent (ConstellationAgent)
+- **Status**: CONTINUE
+- **Round**: 0 | **Round Step**: 0
+- **Execution Time**: 9.27s
+- **Time Breakdown**:
+ - LLM_INTERACTION: 8.96s
+ - ACTION_EXECUTION: 0.29s
+ - MEMORY_UPDATE: 0.00s
+```
+
+**Fields:**
+- **Agent**: Agent name and type (ConstellationAgent for orchestration)
+- **Status**: Step outcome (`CONTINUE`, `FINISH`, `ERROR`)
+- **Round/Round Step**: ReAct iteration counters
+- **Execution Time**: Total step duration
+- **Time Breakdown**: Profiling data for LLM calls, action execution, memory updates
+
+#### Actions Performed
+
+Documents agent actions with collapsible argument details:
+
+```markdown
+#### Actions Performed
+
+**Function**: `build_constellation`
+
+
+Arguments (click to expand)
+
+```json
+{
+ "config": {
+ "constellation_id": "constellation_xxx",
+ "tasks": { ... },
+ "dependencies": { ... }
+ }
+}
+```
+
+
+```
+
+**Common Functions:**
+- `build_constellation`: Initial DAG creation
+- `edit_constellation`: Dynamic DAG modification
+- `execute_constellation`: Trigger task execution
+
+#### Constellation Evolution
+
+Visualizes DAG state changes with interactive topology graphs:
+
+```markdown
+#### Constellation Evolution
+
+
+Constellation AFTER (click to expand)
+
+**Constellation ID**: constellation_bcd1726e_20251105_134526
+**State**: created
+
+##### Dependency Graph (Topology)
+
+
+
+##### Task Summary Table
+
+| Task ID | Name | Status | Device | Duration |
+|---------|------|--------|--------|----------|
+| task-1 | Type hi on linux_agent_1 | pending | linux_agent_1 | N/A |
+| task-2 | Type hi on linux_agent_2 | pending | linux_agent_2 | N/A |
+| task-3 | Type hi on linux_agent_3 | pending | linux_agent_3 | N/A |
+```
+
+**Topology Visualization Features:**
+- **Color-coded nodes** by task status:
+ - 🟢 Green: Completed
+ - 🔵 Cyan: Running
+ - ⚫ Gray: Pending
+ - 🔴 Red: Failed/Error
+- **Edge styles** for dependencies:
+ - Solid green: Satisfied dependencies
+ - Dashed orange: Pending dependencies
+- **Automatic layout** with hierarchical spring algorithm
+- **Legend** showing node/edge meanings
+
+##### Detailed Task Information
+
+Comprehensive task metadata with execution details:
+
+```markdown
+#### Task task-1: Type hi on linux_agent_1
+
+- **Status**: completed
+- **Target Device**: linux_agent_1
+- **Priority**: 2
+- **Description**: On device linux_agent_1 (Linux), open a terminal and execute the command: echo 'hi'. Return the output text.
+- **Tips**:
+ - Ensure CLI access is available.
+ - Expected textual result: Return the exact output of the command, which should be 'hi'.
+- **Result**:
+ ```
+ hi
+ ```
+- **Started**: 2025-11-05T05:45:26.395208+00:00
+- **Ended**: 2025-11-05T05:45:42.981859+00:00
+- **Duration**: 16.59s
+```
+
+**Task Fields:**
+- **Status**: Current execution state (`pending`, `running`, `completed`, `failed`, `cancelled`)
+- **Target Device**: Assigned device agent ID
+- **Priority**: Task scheduling priority (1=HIGH, 2=MEDIUM, 3=LOW)
+- **Description**: Natural language task specification for device agent
+- **Tips**: Execution hints and expected output guidance
+- **Result**: Task execution output (truncated if large)
+- **Error**: Error message if task failed
+- **Timing**: Start/end timestamps and duration
+
+##### Dependency Details
+
+Shows task relationships and satisfaction status:
+
+```markdown
+| Line ID | From Task | To Task | Type | Satisfied | Condition |
+|---------|-----------|---------|------|-----------|----------|
+| l1 | t1 | t4 | unconditional | [PENDING] | Output from linux_agent_1 collected successfully. |
+| l2 | t2 | t4 | unconditional | [OK] | Output from linux_agent_2 collected successfully. |
+```
+
+**Dependency Types:**
+- `unconditional`: Always active when source task completes
+- `conditional`: Activated based on result evaluation
+
+#### Connected Devices
+
+Device registry snapshot at step completion:
+
+```markdown
+
+Connected Devices
+
+| Device ID | OS | Status | Last Heartbeat |
+|-----------|----|---------|--------------|
+| windowsagent | windows | idle | 2025-11-05T05:45:43 |
+| linux_agent_1 | linux | idle | 2025-11-05T05:45:43 |
+| linux_agent_2 | linux | idle | 2025-11-05T05:45:43 |
+| linux_agent_3 | linux | idle | 2025-11-05T05:45:43 |
+```
+
+**Device Statuses:**
+- `idle`: Connected and available
+- `busy`: Executing task
+- `disconnected`: WebSocket connection lost
+
+### 3. Final Constellation State
+
+Complete final DAG with all task results:
+
+```markdown
+## Final Constellation State
+
+**ID**: constellation_bcd1726e_20251105_134526
+**State**: completed
+**Created**: 2025-11-05T05:45:26.230930+00:00
+**Updated**: 2025-11-05T05:45:42.981859+00:00
+
+### Task Details
+[Full task information with results]
+
+### Task Summary Table
+[Aggregated task status table]
+
+### Final Dependency Graph
+[Final topology visualization]
+```
+
+## Generation Process
+
+### Automatic Generation
+
+The trajectory report is generated automatically by `GalaxySession` upon completion:
+
+```python
+# galaxy/session/galaxy_session.py
+async def close_session(self):
+ """Generate trajectory report on session close"""
+ trajectory = GalaxyTrajectory(self.log_path)
+ trajectory.to_markdown(self.log_path + "output.md")
+```
+
+**Trigger Points:**
+1. Normal session completion (`GalaxyClient.shutdown()`)
+2. User termination (Ctrl+C in interactive mode)
+3. Error-induced session end
+
+### Manual Generation
+
+You can regenerate reports manually using the CLI tool:
+
+```bash
+# Generate report for specific session
+python -m galaxy.trajectory.generate_report logs/galaxy/test1
+
+# Custom output path
+python -m galaxy.trajectory.generate_report logs/galaxy/test1 -o custom_report.md
+
+# Minimal report (exclude details)
+python -m galaxy.trajectory.generate_report logs/galaxy/test1 \
+ --no-constellation --no-tasks --no-devices
+```
+
+**CLI Options:**
+- `--no-constellation`: Exclude constellation evolution details
+- `--no-tasks`: Exclude detailed task information
+- `--no-devices`: Exclude device connection information
+- `-o, --output`: Custom output file path
+
+### Batch Generation
+
+Process multiple sessions at once:
+
+```python
+# galaxy/trajectory/galaxy_parser.py
+if __name__ == "__main__":
+ """Process all Galaxy task logs and generate markdown reports."""
+
+ galaxy_logs_dir = Path("logs/galaxy")
+ task_dirs = sorted([d for d in galaxy_logs_dir.iterdir() if d.is_dir()])
+
+ for task_dir in task_dirs:
+ trajectory = GalaxyTrajectory(str(task_dir))
+ output_path = task_dir / "trajectory_report.md"
+ trajectory.to_markdown(str(output_path))
+```
+
+Run batch processing:
+
+```bash
+cd c:\Users\chaoyunzhang\OneDrive - Microsoft\Desktop\research\GPTV\UFO-windows\github\saber\UFO2
+python -m galaxy.trajectory.galaxy_parser
+```
+
+**Output:**
+```
+[BOLD BLUE] Galaxy Trajectory Parser - Batch Mode
+Found 42 task directories
+
+Processing task_1... [OK]
+Processing task_2... [OK]
+Processing test1... [OK]
+...
+
+=====================================================
+Summary:
+ Total: 42
+ Success: 40
+ Skipped: 2
+ Failed: 0
+=====================================================
+```
+
+## Programmatic Access
+
+### Loading Trajectory Data
+
+```python
+from galaxy.trajectory import GalaxyTrajectory
+
+# Load trajectory from log directory
+trajectory = GalaxyTrajectory("logs/galaxy/test1")
+
+# Access metadata
+print(f"Request: {trajectory.request}")
+print(f"Steps: {trajectory.total_steps}")
+print(f"Cost: ${trajectory.total_cost:.4f}")
+print(f"Time: {trajectory.total_time:.2f}s")
+
+# Iterate through steps
+for idx, step in enumerate(trajectory.step_log, 1):
+ agent = step.get("agent_name")
+ status = step.get("status")
+ time = step.get("total_time", 0)
+ print(f"Step {idx}: {agent} - {status} ({time:.2f}s)")
+```
+
+### Extracting Constellation Data
+
+```python
+# Get final constellation state
+last_step = trajectory.step_log[-1]
+final_constellation = trajectory._parse_constellation(
+ last_step.get("constellation_after")
+)
+
+if final_constellation:
+ constellation_id = final_constellation.get("constellation_id")
+ state = final_constellation.get("state")
+ tasks = final_constellation.get("tasks", {})
+
+ print(f"Constellation {constellation_id}: {state}")
+ print(f"Tasks: {len(tasks)}")
+
+ # Analyze task outcomes
+ completed = sum(1 for t in tasks.values() if t.get("status") == "completed")
+ failed = sum(1 for t in tasks.values() if t.get("status") == "failed")
+
+ print(f"Completed: {completed}/{len(tasks)}")
+ print(f"Failed: {failed}/{len(tasks)}")
+```
+
+### Custom Report Generation
+
+```python
+# Generate custom report with specific options
+trajectory.to_markdown(
+ output_path="custom_report.md",
+ include_constellation_details=True, # Show DAG evolution
+ include_task_details=True, # Show task results
+ include_device_info=False # Hide device info
+)
+```
+
+## Visualization Features
+
+### Topology Graph Generation
+
+The trajectory report includes dynamically generated DAG topology images:
+
+**Implementation:**
+```python
+def _generate_topology_image(
+ self,
+ dependencies: Dict[str, Any],
+ tasks: Dict[str, Any],
+ constellation_id: str,
+ step_number: int,
+ state: str = "before"
+) -> Optional[str]:
+ """Generate beautiful topology graph using networkx and matplotlib"""
+
+ # Create directed graph
+ G = nx.DiGraph()
+
+ # Add all tasks as nodes
+ for task_id in tasks.keys():
+ G.add_node(task_id)
+
+ # Add dependency edges
+ for dep in dependencies.values():
+ from_task = dep["from_task_id"]
+ to_task = dep["to_task_id"]
+ G.add_edge(from_task, to_task)
+
+ # Color nodes by status
+ status_colors = {
+ "completed": "#28A745", # Green
+ "running": "#17A2B8", # Cyan
+ "pending": "#6C757D", # Gray
+ "failed": "#DC3545", # Red
+ }
+
+ # Generate layout and save image
+ pos = nx.spring_layout(G, k=1.5, iterations=100)
+ # ... [matplotlib rendering code]
+```
+
+**Graph Features:**
+- **Hierarchical Layout**: Spring algorithm with optimized spacing (`k=1.5`)
+- **Adaptive Node Size**: Ellipses scale with task ID length
+- **Color-Coded Status**: Bootstrap-inspired color scheme
+- **Edge Differentiation**: Solid (satisfied) vs dashed (pending)
+- **Legend**: Automatic status and dependency type legend
+- **High Quality**: 120 DPI PNG with antialiasing
+
+### Image Organization
+
+```
+topology_images/
+├── step1_after_constellation_7b3c0f47_20251104_182305.png
+├── step2_before_constellation_bcd1726e_20251105_134526.png
+├── step2_after_constellation_bcd1726e_20251105_134526.png
+├── step3_before_constellation_bcd1726e_20251105_134526.png
+├── step3_after_constellation_bcd1726e_20251105_134526.png
+└── step999_final_constellation_bcd1726e_20251105_134526.png
+```
+
+**Naming Convention:**
+- `step{N}_{state}_{constellation_id}.png`
+- `state`: `before`, `after`, or `final`
+- `step999`: Reserved for final summary graph
+
+## Use Cases
+
+### 1. Debugging Failed Sessions
+
+Identify which task failed and why:
+
+```python
+trajectory = GalaxyTrajectory("logs/galaxy/failed_session")
+
+for step in trajectory.step_log:
+ constellation = trajectory._parse_constellation(step.get("constellation_after"))
+ if not constellation:
+ continue
+
+ tasks = constellation.get("tasks", {})
+ for task_id, task in tasks.items():
+ if task.get("status") == "failed":
+ print(f"❌ Task {task_id}: {task.get('name')}")
+ print(f" Device: {task.get('target_device_id')}")
+ print(f" Error: {task.get('error')}")
+```
+
+### 2. Performance Analysis
+
+Correlate with `result.json` for bottleneck identification:
+
+```python
+import json
+
+# Load trajectory for execution timeline
+trajectory = GalaxyTrajectory("logs/galaxy/task_32")
+
+# Load metrics for performance data
+with open("logs/galaxy/task_32/result.json") as f:
+ result = json.load(f)
+
+metrics = result["session_results"]["metrics"]
+task_stats = metrics["task_statistics"]
+
+# Find slowest tasks
+slow_tasks = [
+ (tid, task.get("execution_duration", 0))
+ for step in trajectory.step_log
+ for tid, task in trajectory._parse_constellation(
+ step.get("constellation_after")
+ ).get("tasks", {}).items()
+]
+
+slow_tasks.sort(key=lambda x: x[1], reverse=True)
+print(f"Top 5 slowest tasks:")
+for tid, duration in slow_tasks[:5]:
+ print(f" {tid}: {duration:.2f}s")
+```
+
+### 3. Constellation Evolution Analysis
+
+Track DAG modifications across steps:
+
+```python
+trajectory = GalaxyTrajectory("logs/galaxy/adaptive_session")
+
+for idx, step in enumerate(trajectory.step_log, 1):
+ before = trajectory._parse_constellation(step.get("constellation_before"))
+ after = trajectory._parse_constellation(step.get("constellation_after"))
+
+ if before and after:
+ tasks_before = len(before.get("tasks", {}))
+ tasks_after = len(after.get("tasks", {}))
+
+ if tasks_after > tasks_before:
+ print(f"Step {idx}: Added {tasks_after - tasks_before} tasks")
+ elif tasks_after < tasks_before:
+ print(f"Step {idx}: Removed {tasks_before - tasks_after} tasks")
+```
+
+### 4. Device Utilization Tracking
+
+Analyze device workload distribution:
+
+```python
+trajectory = GalaxyTrajectory("logs/galaxy/multi_device")
+
+# Count tasks per device
+device_tasks = {}
+for step in trajectory.step_log:
+ constellation = trajectory._parse_constellation(step.get("constellation_after"))
+ if not constellation:
+ continue
+
+ for task in constellation.get("tasks", {}).values():
+ device = task.get("target_device_id")
+ device_tasks[device] = device_tasks.get(device, 0) + 1
+
+print("Task distribution:")
+for device, count in sorted(device_tasks.items(), key=lambda x: x[1], reverse=True):
+ print(f" {device}: {count} tasks")
+```
+
+### 5. Session Comparison
+
+Compare multiple sessions for regression testing:
+
+```python
+def compare_sessions(session1_path, session2_path):
+ t1 = GalaxyTrajectory(session1_path)
+ t2 = GalaxyTrajectory(session2_path)
+
+ print(f"Session 1 vs Session 2:")
+ print(f" Steps: {t1.total_steps} vs {t2.total_steps}")
+ print(f" Time: {t1.total_time:.2f}s vs {t2.total_time:.2f}s")
+ print(f" Cost: ${t1.total_cost:.4f} vs ${t2.total_cost:.4f}")
+
+ speedup = (t1.total_time - t2.total_time) / t1.total_time * 100
+ print(f" Performance: {speedup:+.1f}%")
+
+compare_sessions("logs/galaxy/test_v1", "logs/galaxy/test_v2")
+```
+
+## Data Sources
+
+The trajectory report aggregates data from multiple log sources:
+
+### 1. response.log (Primary Source)
+
+JSONL file with per-step execution records:
+
+```json
+{
+ "request": "type hi on all linux devices",
+ "agent_name": "constellation_agent",
+ "agent_type": "ConstellationAgent",
+ "status": "CONTINUE",
+ "round_num": 0,
+ "round_step": 0,
+ "total_time": 9.27,
+ "cost": 0.0042,
+ "execution_times": {
+ "LLM_INTERACTION": 8.96,
+ "ACTION_EXECUTION": 0.29,
+ "MEMORY_UPDATE": 0.00
+ },
+ "action": [
+ {
+ "function": "build_constellation",
+ "arguments": { ... }
+ }
+ ],
+ "constellation_before": "{...}",
+ "constellation_after": "{...}",
+ "device_info": { ... }
+}
+```
+
+### 2. result.json (Performance Metrics)
+
+Aggregated session-level metrics:
+
+```json
+{
+ "session_results": {
+ "request": "type hi on all linux devices",
+ "status": "completed",
+ "total_cost": 0.0156,
+ "total_rounds": 1,
+ "total_steps": 4,
+ "total_time": 31.54,
+ "metrics": {
+ "task_statistics": { ... },
+ "constellation_statistics": { ... }
+ }
+ }
+}
+```
+
+### 3. evaluation.log (Optional)
+
+User-provided evaluation results:
+
+```json
+{
+ "task_success": true,
+ "evaluation_score": 5,
+ "comments": "All tasks completed successfully"
+}
+```
+
+## Configuration
+
+### Customizing Report Content
+
+Control report verbosity via generation parameters:
+
+```python
+trajectory.to_markdown(
+ output_path="output.md",
+ include_constellation_details=True, # DAG evolution (default: True)
+ include_task_details=True, # Task execution logs (default: True)
+ include_device_info=True # Device status (default: True)
+)
+```
+
+**Report Size Impact:**
+- Full report (all options enabled): ~200KB for 10-task session
+- Minimal report (all options disabled): ~20KB
+- Topology images: ~50KB each
+
+### Topology Graph Styling
+
+Customize graph appearance by modifying `_generate_topology_image()`:
+
+```python
+# Adjust node colors
+status_colors = {
+ "completed": "#28A745", # Change to custom color
+ "running": "#17A2B8",
+ # ...
+}
+
+# Adjust layout parameters
+pos = nx.spring_layout(
+ G,
+ k=1.5, # Node spacing (higher = more spread)
+ iterations=100, # Layout quality (higher = better but slower)
+ seed=42 # Deterministic layout
+)
+
+# Adjust image quality
+plt.savefig(
+ image_path,
+ dpi=120, # Resolution (higher = larger files)
+ bbox_inches="tight",
+ facecolor="white"
+)
+```
+
+## Best Practices
+
+### 1. Regular Report Review
+
+Monitor trajectory reports to catch issues early:
+
+```bash
+# Generate reports for recent sessions
+for dir in logs/galaxy/*/; do
+ python -m galaxy.trajectory.generate_report "$dir"
+done
+
+# Open reports in browser for visual inspection
+start logs/galaxy/test1/output.md
+```
+
+### 2. Archive Trajectory Reports
+
+Store reports with version control for reproducibility:
+
+```bash
+# Create timestamped archive
+mkdir -p trajectory_archives/$(date +%Y-%m-%d)
+cp logs/galaxy/*/output.md trajectory_archives/$(date +%Y-%m-%d)/
+cp logs/galaxy/*/result.json trajectory_archives/$(date +%Y-%m-%d)/
+```
+
+### 3. Automated Analysis
+
+Integrate trajectory parsing into CI/CD pipelines:
+
+```python
+# test/analyze_trajectory.py
+def validate_trajectory(log_dir):
+ trajectory = GalaxyTrajectory(log_dir)
+
+ # Check for failures
+ for step in trajectory.step_log:
+ if step.get("status") == "ERROR":
+ raise AssertionError(f"Session failed at step {step.get('_line_number')}")
+
+ # Check performance thresholds
+ if trajectory.total_time > 60.0:
+ print(f"WARNING: Session took {trajectory.total_time:.2f}s (>60s threshold)")
+
+ return True
+```
+
+### 4. Compare Before/After States
+
+Use constellation evolution to verify correctness:
+
+```python
+# Verify DAG grows monotonically (no premature task deletion)
+trajectory = GalaxyTrajectory("logs/galaxy/session")
+
+prev_task_count = 0
+for step in trajectory.step_log:
+ constellation = trajectory._parse_constellation(step.get("constellation_after"))
+ if constellation:
+ task_count = len(constellation.get("tasks", {}))
+ if task_count < prev_task_count:
+ print(f"WARNING: Task count decreased from {prev_task_count} to {task_count}")
+ prev_task_count = task_count
+```
+
+## Related Documentation
+
+- **[Performance Metrics](./performance_metrics.md)** - Quantitative session analysis with `result.json`
+- **[Result JSON Reference](./result_json.md)** - Complete `result.json` schema documentation
+- **[Galaxy Overview](../overview.md)** - Main Galaxy framework documentation
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** - DAG execution engine
+- **[Task Constellation](../constellation/overview.md)** - DAG data structure and validation
+
+## Troubleshooting
+
+### Empty or Missing Report
+
+**Problem:** `output.md` not generated after session
+
+**Solutions:**
+
+1. Check for `response.log` existence:
+ ```bash
+ ls logs/galaxy//response.log
+ ```
+
+2. Manually trigger generation:
+ ```bash
+ python -m galaxy.trajectory.generate_report logs/galaxy/
+ ```
+
+3. Verify session closed properly (check for exception in terminal)
+
+### Parse Errors in Report
+
+**Problem:** `⚠️ Parse Error` warnings in report
+
+**Cause:** Legacy log format with serialization bugs (tasks as Python strings instead of JSON)
+
+**Solution:** This is a known issue fixed in current versions. Reports will display:
+```markdown
+##### ⚠️ Parse Error
+
+**Error Type**: `legacy_serialization_bug`
+**Message**: Tasks field contains Python object representations (not pure JSON).
+This is due to a serialization bug in older versions.
+```
+
+**Workaround:** Re-run session with updated codebase to generate proper logs.
+
+### Missing Topology Images
+
+**Problem:** Broken image links in report
+
+**Solutions:**
+
+1. Check `topology_images/` directory exists:
+ ```bash
+ ls logs/galaxy//topology_images/
+ ```
+
+2. Verify matplotlib backend:
+ ```python
+ import matplotlib
+ matplotlib.use("Agg") # Non-interactive backend required
+ ```
+
+3. Regenerate report to recreate images:
+ ```bash
+ python -m galaxy.trajectory.generate_report logs/galaxy/
+ ```
+
+### Large Report Files
+
+**Problem:** `output.md` exceeds 10MB
+
+**Solutions:**
+
+1. Generate minimal report:
+ ```bash
+ python -m galaxy.trajectory.generate_report logs/galaxy/ \
+ --no-constellation --no-tasks
+ ```
+
+2. Reduce topology image quality (edit `galaxy_parser.py`):
+ ```python
+ plt.savefig(image_path, dpi=80) # Lower DPI
+ ```
+
+3. Archive and compress:
+ ```bash
+ gzip logs/galaxy//output.md
+ ```
+
+## API Reference
+
+### GalaxyTrajectory Class
+
+```python
+class GalaxyTrajectory:
+ """Parser for Galaxy agent logs with constellation visualization"""
+
+ def __init__(self, folder_path: str) -> None:
+ """
+ Initialize trajectory parser.
+
+ Args:
+ folder_path: Path to Galaxy log directory (e.g., logs/galaxy/task_1)
+
+ Raises:
+ ValueError: If response.log file not found
+ """
+
+ @property
+ def step_log(self) -> List[Dict[str, Any]]:
+ """Get all step logs from response.log"""
+
+ @property
+ def evaluation_log(self) -> Dict[str, Any]:
+ """Get evaluation results from evaluation.log"""
+
+ @property
+ def request(self) -> Optional[str]:
+ """Get original user request"""
+
+ @property
+ def total_steps(self) -> int:
+ """Get total number of steps"""
+
+ @property
+ def total_cost(self) -> float:
+ """Calculate total LLM cost"""
+
+ @property
+ def total_time(self) -> float:
+ """Calculate total execution time"""
+
+ def to_markdown(
+ self,
+ output_path: str,
+ include_constellation_details: bool = True,
+ include_task_details: bool = True,
+ include_device_info: bool = True
+ ) -> None:
+ """
+ Export trajectory to Markdown file.
+
+ Args:
+ output_path: Path to save markdown file
+ include_constellation_details: Include DAG evolution details
+ include_task_details: Include task execution logs
+ include_device_info: Include device status information
+ """
+```
+
+---
+
+**Next Steps:**
+- Combine trajectory reports with `result.json` metrics for comprehensive analysis
+- Automate report generation in CI/CD pipelines
+- Visualize execution timelines with custom scripts
+- Compare session trajectories for performance regression testing
diff --git a/documents/docs/galaxy/observer/agent_output_observer.md b/documents/docs/galaxy/observer/agent_output_observer.md
new file mode 100644
index 000000000..7f713365f
--- /dev/null
+++ b/documents/docs/galaxy/observer/agent_output_observer.md
@@ -0,0 +1,536 @@
+# Agent Output Observer
+
+The **AgentOutputObserver** handles real-time display of agent responses and actions. It listens for agent interaction events and delegates the actual presentation logic to specialized presenters, providing a clean separation between event handling and output formatting.
+
+**Location:** `galaxy/session/observers/agent_output_observer.py`
+
+## Purpose
+
+The Agent Output Observer enables:
+
+- **Real-time Feedback** — Display agent thinking and decision-making process
+- **Action Visibility** — Show what actions the agent is taking
+- **Debugging** — Understand agent behavior during constellation execution
+- **User Engagement** — Keep users informed of progress and decisions
+
+## Architecture
+
+The observer uses a **presenter pattern** for flexible output formatting:
+
+```mermaid
+graph TB
+ subgraph "Agent Layer"
+ A[ConstellationAgent]
+ end
+
+ subgraph "Event System"
+ EB[EventBus]
+ end
+
+ subgraph "Observer Layer"
+ AOO[AgentOutputObserver]
+ ER[Event Router]
+ end
+
+ subgraph "Presenter Layer"
+ P[Presenter Factory]
+ RP[RichPresenter]
+ TP[TextPresenter]
+ end
+
+ subgraph "Output"
+ O[Terminal/Console]
+ end
+
+ A -->|publish| EB
+ EB -->|notify| AOO
+ AOO --> ER
+ ER -->|agent_response| RP
+ ER -->|agent_action| RP
+
+ P --> RP
+ P --> TP
+
+ RP --> O
+ TP --> O
+
+ style AOO fill:#66bb6a,stroke:#333,stroke-width:3px
+ style P fill:#ffa726,stroke:#333,stroke-width:2px
+ style EB fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
+```
+
+**Component Responsibilities:**
+
+| Component | Role | Description |
+|-----------|------|-------------|
+| **Agent** | Event publisher | Publishes AGENT_RESPONSE and AGENT_ACTION events |
+| **AgentOutputObserver** | Event handler | Receives and routes agent events |
+| **Presenter** | Output formatter | Formats and displays agent output |
+| **PresenterFactory** | Creator | Creates appropriate presenter based on type |
+
+## Handled Events
+
+The observer handles two types of agent events:
+
+### 1. AGENT_RESPONSE
+
+Triggered when agent generates responses (thoughts, plans, reasoning):
+
+**Event Data Structure:**
+
+```python
+{
+ "agent_name": "constellation_agent",
+ "agent_type": "constellation",
+ "output_type": "response",
+ "output_data": {
+ # ConstellationAgentResponse fields
+ "thought": "Task 1 completed successfully...",
+ "plan": "Next, I will process the results...",
+ "operation": "EDIT",
+ "observation": "Task result shows...",
+ # ... other fields
+ },
+ "print_action": False # Whether to print action details
+}
+```
+
+### 2. AGENT_ACTION
+
+Triggered when agent executes actions (constellation editing):
+
+**Event Data Structure:**
+
+```python
+{
+ "agent_name": "constellation_agent",
+ "agent_type": "constellation",
+ "output_type": "action",
+ "output_data": {
+ "action_type": "constellation_editing",
+ "actions": [
+ {
+ "name": "add_task",
+ "arguments": {
+ "task_id": "new_task_1",
+ "description": "Process attachment",
+ # ...
+ }
+ },
+ # ... more actions
+ ]
+ }
+}
+```
+
+## Implementation
+
+### Initialization
+
+```python
+from galaxy.session.observers import AgentOutputObserver
+
+# Create agent output observer with default Rich presenter
+agent_output_observer = AgentOutputObserver(presenter_type="rich")
+
+# Subscribe to event bus
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(agent_output_observer)
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `presenter_type` | `str` | `"rich"` | Type of presenter ("rich", "text", etc.) |
+
+### Presenter Types
+
+The observer supports different presenter types for various output formats:
+
+| Presenter Type | Description | Use Case |
+|----------------|-------------|----------|
+| `"rich"` | Rich terminal formatting with colors and boxes | Interactive terminal use |
+| `"text"` | Plain text output | Log files, CI/CD, simple terminals |
+
+## Output Examples
+
+### Agent Response Display
+
+When the agent generates a response, the Rich presenter displays:
+
+```
+╭─────────────────────────────────────────────────────────────╮
+│ 🤖 Agent Response │
+├─────────────────────────────────────────────────────────────┤
+│ Thought: │
+│ Task 'fetch_emails' has completed successfully. I need to │
+│ analyze the results and determine next steps. │
+│ │
+│ Plan: │
+│ I will extract the email count from the result and create │
+│ parallel parsing tasks for each email. │
+│ │
+│ Operation: EDIT │
+│ │
+│ Observation: │
+│ Result shows 3 emails were fetched. I will create 3 │
+│ parsing tasks with dependencies on the fetch task. │
+╰─────────────────────────────────────────────────────────────╯
+```
+
+### Agent Action Display
+
+When the agent performs constellation editing:
+
+```
+╭─────────────────────────────────────────────────────────────╮
+│ 🛠️ Agent Actions: Constellation Editing │
+├─────────────────────────────────────────────────────────────┤
+│ Action 1: add_task │
+│ ├─ task_id: parse_email_1 │
+│ ├─ description: Parse the first email │
+│ ├─ target_device_id: windows_pc_001 │
+│ └─ priority: MEDIUM │
+│ │
+│ Action 2: add_task │
+│ ├─ task_id: parse_email_2 │
+│ ├─ description: Parse the second email │
+│ ├─ target_device_id: windows_pc_001 │
+│ └─ priority: MEDIUM │
+│ │
+│ Action 3: add_dependency │
+│ ├─ from_task_id: fetch_emails │
+│ ├─ to_task_id: parse_email_1 │
+│ └─ dependency_type: SUCCESS_ONLY │
+│ │
+│ Action 4: add_dependency │
+│ ├─ from_task_id: fetch_emails │
+│ ├─ to_task_id: parse_email_2 │
+│ └─ dependency_type: SUCCESS_ONLY │
+╰─────────────────────────────────────────────────────────────╯
+```
+
+## Event Processing Flow
+
+```mermaid
+sequenceDiagram
+ participant A as ConstellationAgent
+ participant EB as EventBus
+ participant AOO as AgentOutputObserver
+ participant P as Presenter
+ participant C as Console
+
+ Note over A: Agent generates response
+ A->>EB: publish(AGENT_RESPONSE)
+ EB->>AOO: on_event(event)
+ AOO->>AOO: _handle_agent_response()
+ AOO->>AOO: Reconstruct ConstellationAgentResponse
+ AOO->>P: present_constellation_agent_response()
+ P->>C: Display formatted response
+
+ Note over A: Agent performs actions
+ A->>EB: publish(AGENT_ACTION)
+ EB->>AOO: on_event(event)
+ AOO->>AOO: _handle_agent_action()
+ AOO->>AOO: Reconstruct ActionCommandInfo
+ AOO->>P: present_constellation_editing_actions()
+ P->>C: Display formatted actions
+```
+
+## API Reference
+
+### Constructor
+
+```python
+def __init__(self, presenter_type: str = "rich")
+```
+
+Initialize the agent output observer with specified presenter type.
+
+**Parameters:**
+
+- `presenter_type` — Type of presenter to use ("rich", "text", etc.)
+
+**Example:**
+
+```python
+# Use Rich presenter (default)
+rich_observer = AgentOutputObserver(presenter_type="rich")
+
+# Use plain text presenter
+text_observer = AgentOutputObserver(presenter_type="text")
+```
+
+### Event Handler
+
+```python
+async def on_event(self, event: Event) -> None
+```
+
+Handle agent output events.
+
+**Parameters:**
+
+- `event` — Event instance (must be AgentEvent)
+
+**Behavior:**
+
+- Filters for `AgentEvent` instances
+- Routes to appropriate handler based on event type
+- Reconstructs response/action objects from event data
+- Delegates display to presenter
+
+## Usage Examples
+
+### Example 1: Basic Setup
+
+```python
+from galaxy.core.events import get_event_bus
+from galaxy.session.observers import AgentOutputObserver
+
+# Create and subscribe agent output observer
+agent_output_observer = AgentOutputObserver(presenter_type="rich")
+event_bus = get_event_bus()
+event_bus.subscribe(agent_output_observer)
+
+# Agent events will now be displayed automatically
+await orchestrator.execute_constellation(constellation)
+
+# Clean up
+event_bus.unsubscribe(agent_output_observer)
+```
+
+### Example 2: Conditional Display
+
+```python
+async def execute_with_agent_feedback(show_agent_output: bool = True):
+ """Execute constellation with optional agent output display."""
+
+ event_bus = get_event_bus()
+
+ if show_agent_output:
+ agent_output_observer = AgentOutputObserver(presenter_type="rich")
+ event_bus.subscribe(agent_output_observer)
+
+ try:
+ await orchestrator.execute_constellation(constellation)
+ finally:
+ if show_agent_output:
+ event_bus.unsubscribe(agent_output_observer)
+```
+
+### Example 3: Different Presenters for Different Modes
+
+```python
+import sys
+
+def create_agent_observer():
+ """Create appropriate agent observer based on environment."""
+
+ # Use Rich presenter for interactive terminal
+ if sys.stdout.isatty():
+ return AgentOutputObserver(presenter_type="rich")
+
+ # Use text presenter for logs/CI
+ else:
+ return AgentOutputObserver(presenter_type="text")
+
+# Usage
+agent_output_observer = create_agent_observer()
+event_bus.subscribe(agent_output_observer)
+```
+
+### Example 4: Custom Filtering
+
+```python
+from galaxy.core.events import EventType
+
+# Subscribe only to specific agent events
+event_bus.subscribe(
+ agent_output_observer,
+ {EventType.AGENT_ACTION} # Only show actions, not responses
+)
+```
+
+## Implementation Details
+
+### Response Handling
+
+The observer reconstructs `ConstellationAgentResponse` from event data:
+
+```python
+async def _handle_agent_response(self, event: AgentEvent) -> None:
+ """Handle agent response event."""
+
+ try:
+ output_data = event.output_data
+
+ if event.agent_type == "constellation":
+ # Reconstruct ConstellationAgentResponse from output data
+ response = ConstellationAgentResponse.model_validate(output_data)
+ print_action = output_data.get("print_action", False)
+
+ # Use presenter to display the response
+ self.presenter.present_constellation_agent_response(
+ response,
+ print_action=print_action
+ )
+
+ except Exception as e:
+ self.logger.error(f"Error handling agent response: {e}")
+```
+
+### Action Handling
+
+The observer reconstructs action command objects:
+
+```python
+async def _handle_agent_action(self, event: AgentEvent) -> None:
+ """Handle agent action event."""
+
+ try:
+ output_data = event.output_data
+
+ if output_data.get("action_type") == "constellation_editing":
+ actions_data = output_data.get("actions", [])
+
+ # Convert each action dict to ActionCommandInfo
+ action_objects = []
+ for action_dict in actions_data:
+ action_obj = ActionCommandInfo.model_validate(action_dict)
+ action_objects.append(action_obj)
+
+ # Create ListActionCommandInfo with reconstructed actions
+ actions = ListActionCommandInfo(actions=action_objects)
+
+ # Use presenter to display the actions
+ self.presenter.present_constellation_editing_actions(actions)
+
+ except Exception as e:
+ self.logger.error(f"Error handling agent action: {e}")
+```
+
+## Best Practices
+
+### 1. Match Presenter to Environment
+
+```python
+# ✅ Good: Choose presenter based on context
+if running_in_jupyter:
+ presenter_type = "rich" # Good for notebooks
+elif running_in_ci:
+ presenter_type = "text" # Good for logs
+elif is_interactive_terminal:
+ presenter_type = "rich" # Good for terminal
+else:
+ presenter_type = "text" # Safe default
+```
+
+### 2. Selective Event Subscription
+
+```python
+# Only show actions (skip verbose responses)
+event_bus.subscribe(
+ agent_output_observer,
+ {EventType.AGENT_ACTION}
+)
+
+# Show everything (responses + actions)
+event_bus.subscribe(agent_output_observer)
+```
+
+### 3. Handle Errors Gracefully
+
+The observer includes comprehensive error handling:
+
+```python
+try:
+ # Process agent event
+ await self._handle_agent_response(event)
+except Exception as e:
+ self.logger.error(f"Error handling agent output event: {e}")
+ # Don't re-raise - continue observing other events
+```
+
+## Integration with Agent
+
+The observer integrates with the ConstellationAgent's state machine:
+
+### Agent Publishes Events
+
+The agent publishes events at key points:
+
+```python
+class ConstellationAgent:
+ async def generate_response(self):
+ """Generate agent response and publish event."""
+
+ # Generate response using LLM
+ response = await self._llm_call(...)
+
+ # Publish AGENT_RESPONSE event
+ await self._publish_agent_response_event(response)
+
+ return response
+
+ async def execute_actions(self, actions):
+ """Execute actions and publish event."""
+
+ # Publish AGENT_ACTION event
+ await self._publish_agent_action_event(actions)
+
+ # Actually execute the actions
+ result = await self._execute_constellation_editing(actions)
+
+ return result
+```
+
+## Performance Considerations
+
+### Display Overhead
+
+The observer adds minimal overhead:
+
+- **Event processing**: < 1ms per event
+- **Rich rendering**: 5-10ms per display
+- **Text rendering**: < 1ms per display
+
+### Optimization for Large Outputs
+
+```python
+# For very verbose agents, consider:
+
+# 1. Use text presenter instead of rich
+agent_output_observer = AgentOutputObserver(presenter_type="text")
+
+# 2. Subscribe only to actions
+event_bus.subscribe(
+ agent_output_observer,
+ {EventType.AGENT_ACTION}
+)
+
+# 3. Disable in production
+if not debug_mode:
+ # Don't create or subscribe observer
+ pass
+```
+
+## Related Documentation
+
+- **[Observer System Overview](overview.md)** — Architecture and design
+- **[Progress Observer](progress_observer.md)** — Task completion coordination
+- **[Constellation Agent](../constellation_agent/overview.md)** — Agent implementation and state machine
+
+## Summary
+
+The Agent Output Observer:
+
+- **Displays** agent responses and actions in real-time
+- **Delegates** to presenters for flexible formatting
+- **Supports** multiple output formats (Rich, text)
+- **Provides** transparency into agent decision-making
+- **Enables** debugging and user engagement
+
+This observer is essential for understanding agent behavior during constellation execution, providing visibility into the AI's thought process and actions.
diff --git a/documents/docs/galaxy/observer/event_system.md b/documents/docs/galaxy/observer/event_system.md
new file mode 100644
index 000000000..dfcf721ff
--- /dev/null
+++ b/documents/docs/galaxy/observer/event_system.md
@@ -0,0 +1,609 @@
+# Event System Core
+
+The Event System Core provides the foundational infrastructure for event-driven communication in the Galaxy framework. It implements the Observer pattern through a central event bus, type-safe event classes, and well-defined interfaces.
+
+**Location:** `galaxy/core/events.py`
+
+---
+
+## 📦 Core Components
+
+### EventBus — Central Message Broker
+
+The `EventBus` class is the heart of the event system, managing subscriptions and distributing events to all registered observers.
+
+```mermaid
+graph LR
+ A[Publisher 1] -->|publish| B[EventBus]
+ C[Publisher 2] -->|publish| B
+ D[Publisher 3] -->|publish| B
+
+ B -->|notify| E[Observer 1]
+ B -->|notify| F[Observer 2]
+ B -->|notify| G[Observer 3]
+ B -->|notify| H[Observer 4]
+
+ style B fill:#4a90e2,stroke:#333,stroke-width:3px,color:#fff
+ style E fill:#66bb6a,stroke:#333,stroke-width:2px
+ style F fill:#66bb6a,stroke:#333,stroke-width:2px
+ style G fill:#66bb6a,stroke:#333,stroke-width:2px
+ style H fill:#66bb6a,stroke:#333,stroke-width:2px
+```
+
+**Key Features:**
+
+- **Singleton Pattern**: Single global instance accessed via `get_event_bus()`
+- **Type-based Filtering**: Observers can subscribe to specific event types or all events
+- **Concurrent Notification**: All observers notified in parallel using `asyncio.gather()`
+- **Error Isolation**: Exceptions in one observer don't affect others
+
+### Event Types
+
+`EventType` enumeration defines all possible events in the system:
+
+```python
+class EventType(Enum):
+ # Task-level events
+ TASK_STARTED = "task_started"
+ TASK_COMPLETED = "task_completed"
+ TASK_FAILED = "task_failed"
+
+ # Constellation lifecycle events
+ CONSTELLATION_STARTED = "constellation_started"
+ CONSTELLATION_COMPLETED = "constellation_completed"
+ CONSTELLATION_FAILED = "constellation_failed"
+
+ # Structure modification events
+ CONSTELLATION_MODIFIED = "constellation_modified"
+
+ # Agent output events
+ AGENT_RESPONSE = "agent_response"
+ AGENT_ACTION = "agent_action"
+
+ # Device events
+ DEVICE_CONNECTED = "device_connected"
+ DEVICE_DISCONNECTED = "device_disconnected"
+ DEVICE_STATUS_CHANGED = "device_status_changed"
+```
+
+### Event Classes
+
+Five specialized event types provide type-safe event handling:
+
+| Event Class | Extends | Additional Fields | Use Case |
+|-------------|---------|-------------------|----------|
+| `Event` | (base) | `event_type`, `source_id`, `timestamp`, `data` | Generic events |
+| `TaskEvent` | `Event` | `task_id`, `status`, `result`, `error` | Task execution events |
+| `ConstellationEvent` | `Event` | `constellation_id`, `constellation_state`, `new_ready_tasks` | Constellation lifecycle events |
+| `AgentEvent` | `Event` | `agent_name`, `agent_type`, `output_type`, `output_data` | Agent interaction events |
+| `DeviceEvent` | `Event` | `device_id`, `device_status`, `device_info`, `all_devices` | Device management events |
+
+---
+
+## 🔌 Interfaces
+
+### IEventObserver
+
+Defines the contract for all observer implementations:
+
+```python
+from abc import ABC, abstractmethod
+from galaxy.core.events import Event
+
+class IEventObserver(ABC):
+ """Interface for event observers."""
+
+ @abstractmethod
+ async def on_event(self, event: Event) -> None:
+ """
+ Handle an event.
+
+ :param event: The event object containing type, source, timestamp and data
+ """
+ pass
+```
+
+**Implementation Pattern:**
+
+```python
+class MyCustomObserver(IEventObserver):
+ """Custom observer implementation."""
+
+ async def on_event(self, event: Event) -> None:
+ """Handle events of interest."""
+
+ # Type-safe handling using isinstance
+ if isinstance(event, TaskEvent):
+ await self._handle_task_event(event)
+ elif isinstance(event, ConstellationEvent):
+ await self._handle_constellation_event(event)
+
+ async def _handle_task_event(self, event: TaskEvent) -> None:
+ """Process task events."""
+ if event.event_type == EventType.TASK_COMPLETED:
+ print(f"Task {event.task_id} completed with status: {event.status}")
+
+ async def _handle_constellation_event(self, event: ConstellationEvent) -> None:
+ """Process constellation events."""
+ if event.event_type == EventType.CONSTELLATION_STARTED:
+ print(f"Constellation {event.constellation_id} started")
+```
+
+### IEventPublisher
+
+Defines the contract for event publishing:
+
+```python
+class IEventPublisher(ABC):
+ """Interface for event publishers."""
+
+ @abstractmethod
+ def subscribe(self, observer: IEventObserver,
+ event_types: Set[EventType] = None) -> None:
+ """Subscribe an observer to events."""
+ pass
+
+ @abstractmethod
+ def unsubscribe(self, observer: IEventObserver) -> None:
+ """Unsubscribe an observer."""
+ pass
+
+ @abstractmethod
+ async def publish_event(self, event: Event) -> None:
+ """Publish an event to subscribers."""
+ pass
+```
+
+---
+
+## 📖 EventBus API Reference
+
+### Subscription Management
+
+#### subscribe()
+
+Subscribe an observer to receive event notifications:
+
+```python
+def subscribe(
+ self,
+ observer: IEventObserver,
+ event_types: Set[EventType] = None
+) -> None
+```
+
+**Parameters:**
+
+- `observer`: The observer object implementing `IEventObserver`
+- `event_types`: Optional set of event types to subscribe to (None = all events)
+
+**Examples:**
+
+```python
+from galaxy.core.events import get_event_bus, EventType
+
+event_bus = get_event_bus()
+
+# Subscribe to all events
+event_bus.subscribe(my_observer)
+
+# Subscribe to specific event types
+event_bus.subscribe(my_observer, {
+ EventType.TASK_COMPLETED,
+ EventType.TASK_FAILED
+})
+
+# Subscribe to constellation events only
+event_bus.subscribe(constellation_observer, {
+ EventType.CONSTELLATION_STARTED,
+ EventType.CONSTELLATION_COMPLETED,
+ EventType.CONSTELLATION_MODIFIED
+})
+```
+
+#### unsubscribe()
+
+Remove an observer from all event subscriptions:
+
+```python
+def unsubscribe(self, observer: IEventObserver) -> None
+```
+
+**Parameters:**
+
+- `observer`: The observer object to unsubscribe
+
+**Example:**
+
+```python
+# Clean up observer when done
+event_bus.unsubscribe(my_observer)
+```
+
+### Event Publishing
+
+#### publish_event()
+
+Publish an event to all subscribed observers:
+
+```python
+async def publish_event(self, event: Event) -> None
+```
+
+**Parameters:**
+
+- `event`: The event object to publish
+
+**Example:**
+
+```python
+from galaxy.core.events import TaskEvent, EventType
+import time
+
+# Create and publish a task event
+event = TaskEvent(
+ event_type=EventType.TASK_COMPLETED,
+ source_id="orchestrator",
+ timestamp=time.time(),
+ data={
+ "execution_time": 2.5,
+ "newly_ready_tasks": ["task_2", "task_3"]
+ },
+ task_id="task_1",
+ status="COMPLETED",
+ result={"output": "success"}
+)
+
+await event_bus.publish_event(event)
+```
+
+**Concurrent Notification**: The event bus notifies all observers concurrently using `asyncio.gather()` with `return_exceptions=True`. This means:
+
+- All observers receive events in parallel
+- Slow observers don't block fast ones
+- Exceptions in one observer don't affect others
+- The `publish_event()` call returns after all observers have processed the event
+
+---
+
+## 🔄 Event Flow Patterns
+
+### Pattern 1: Task Execution Flow
+
+This pattern shows how task events flow through the system:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as EventBus
+ participant PO as ProgressObserver
+ participant MO as MetricsObserver
+ participant VO as VizObserver
+
+ Note over O: Start task execution
+ O->>EB: publish(TASK_STARTED)
+
+ par Concurrent Notification
+ EB->>PO: on_event(event)
+ EB->>MO: on_event(event)
+ EB->>VO: on_event(event)
+ end
+
+ Note over PO: Track progress
+ Note over MO: Record start time
+ Note over VO: Display task start
+
+ Note over O: Task completes
+ O->>EB: publish(TASK_COMPLETED)
+
+ par Concurrent Notification
+ EB->>PO: on_event(event)
+ EB->>MO: on_event(event)
+ EB->>VO: on_event(event)
+ end
+
+ Note over PO: Queue for agent
+ Note over MO: Calculate duration
+ Note over VO: Update display
+```
+
+### Pattern 2: Constellation Modification Flow
+
+This pattern shows how modification events coordinate agent and orchestrator:
+
+```mermaid
+sequenceDiagram
+ participant A as Agent
+ participant EB as EventBus
+ participant S as Synchronizer
+ participant M as MetricsObserver
+ participant V as VizObserver
+
+ Note over A: Modify constellation
+ A->>EB: publish(CONSTELLATION_MODIFIED)
+
+ par Concurrent Notification
+ EB->>S: on_event(event)
+ EB->>M: on_event(event)
+ EB->>V: on_event(event)
+ end
+
+ Note over S: Complete pending modification
+ Note over M: Track modification
+ Note over V: Display changes
+```
+
+---
+
+## 💻 Usage Examples
+
+### Example 1: Basic Event Publishing
+
+```python
+import asyncio
+import time
+from galaxy.core.events import (
+ get_event_bus, Event, EventType, IEventObserver
+)
+
+class SimpleLogger(IEventObserver):
+ """Simple observer that logs all events."""
+
+ async def on_event(self, event: Event) -> None:
+ print(f"[{event.timestamp}] {event.event_type.value} from {event.source_id}")
+
+async def main():
+ # Get event bus and subscribe observer
+ event_bus = get_event_bus()
+ logger = SimpleLogger()
+ event_bus.subscribe(logger)
+
+ # Publish some events
+ for i in range(3):
+ event = Event(
+ event_type=EventType.TASK_STARTED,
+ source_id="test_publisher",
+ timestamp=time.time(),
+ data={"iteration": i}
+ )
+ await event_bus.publish_event(event)
+ await asyncio.sleep(0.1)
+
+ # Clean up
+ event_bus.unsubscribe(logger)
+
+asyncio.run(main())
+```
+
+### Example 2: Type-Specific Subscription
+
+```python
+from galaxy.core.events import (
+ get_event_bus, TaskEvent, ConstellationEvent,
+ EventType, IEventObserver
+)
+
+class TaskOnlyObserver(IEventObserver):
+ """Observer that only handles task events."""
+
+ def __init__(self):
+ self.task_count = 0
+ self.completed_tasks = []
+
+ async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ self.task_count += 1
+
+ if event.event_type == EventType.TASK_COMPLETED:
+ self.completed_tasks.append(event.task_id)
+ print(f"Task {event.task_id} completed. "
+ f"Total: {len(self.completed_tasks)}")
+
+# Subscribe only to task events
+observer = TaskOnlyObserver()
+event_bus = get_event_bus()
+event_bus.subscribe(observer, {
+ EventType.TASK_STARTED,
+ EventType.TASK_COMPLETED,
+ EventType.TASK_FAILED
+})
+```
+
+### Example 3: Custom Metrics Collection
+
+```python
+from typing import Dict, List
+from galaxy.core.events import (
+ TaskEvent, ConstellationEvent, EventType, IEventObserver
+)
+
+class CustomMetricsCollector(IEventObserver):
+ """Collect custom domain-specific metrics."""
+
+ def __init__(self):
+ self.task_durations: Dict[str, float] = {}
+ self.task_start_times: Dict[str, float] = {}
+ self.constellation_tasks: Dict[str, List[str]] = {}
+
+ async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ await self._handle_task_event(event)
+ elif isinstance(event, ConstellationEvent):
+ await self._handle_constellation_event(event)
+
+ async def _handle_task_event(self, event: TaskEvent) -> None:
+ if event.event_type == EventType.TASK_STARTED:
+ self.task_start_times[event.task_id] = event.timestamp
+
+ elif event.event_type == EventType.TASK_COMPLETED:
+ if event.task_id in self.task_start_times:
+ duration = event.timestamp - self.task_start_times[event.task_id]
+ self.task_durations[event.task_id] = duration
+
+ async def _handle_constellation_event(self, event: ConstellationEvent) -> None:
+ if event.event_type == EventType.CONSTELLATION_STARTED:
+ const_id = event.constellation_id
+ self.constellation_tasks[const_id] = []
+
+ def get_average_duration(self) -> float:
+ """Calculate average task duration."""
+ if not self.task_durations:
+ return 0.0
+ return sum(self.task_durations.values()) / len(self.task_durations)
+
+ def get_slowest_tasks(self, n: int = 5) -> List[tuple]:
+ """Get the n slowest tasks."""
+ sorted_tasks = sorted(
+ self.task_durations.items(),
+ key=lambda x: x[1],
+ reverse=True
+ )
+ return sorted_tasks[:n]
+```
+
+---
+
+## ⚙️ Implementation Details
+
+### Internal Observer Storage
+
+The EventBus maintains two internal data structures:
+
+```python
+class EventBus(IEventPublisher):
+ def __init__(self):
+ # Type-specific observers: EventType -> Set[IEventObserver]
+ self._observers: Dict[EventType, Set[IEventObserver]] = {}
+
+ # Observers subscribed to all events
+ self._all_observers: Set[IEventObserver] = set()
+```
+
+**Storage Strategy:**
+
+| Subscription Type | Storage | Lookup Time | Use Case |
+|-------------------|---------|-------------|----------|
+| All events | `_all_observers` set | O(1) | General monitoring |
+| Specific types | `_observers` dict | O(1) | Targeted handling |
+
+### Concurrent Notification Logic
+
+When an event is published, the bus:
+
+1. **Collects relevant observers**: Combines type-specific and all-event observers
+2. **Creates async tasks**: One task per observer
+3. **Executes concurrently**: Uses `asyncio.gather()` with `return_exceptions=True`
+4. **Isolates errors**: Exceptions don't propagate to other observers
+
+```python
+async def publish_event(self, event: Event) -> None:
+ observers_to_notify: Set[IEventObserver] = set()
+
+ # Add type-specific observers
+ if event.event_type in self._observers:
+ observers_to_notify.update(self._observers[event.event_type])
+
+ # Add wildcard observers
+ observers_to_notify.update(self._all_observers)
+
+ # Notify concurrently
+ if observers_to_notify:
+ tasks = [observer.on_event(event) for observer in observers_to_notify]
+ await asyncio.gather(*tasks, return_exceptions=True)
+```
+
+---
+
+## 🎯 Best Practices
+
+### 1. Use Type-Specific Subscriptions
+
+Subscribe only to events you care about:
+
+```python
+# ❌ Bad: Receives all events, must filter manually
+event_bus.subscribe(observer)
+
+# ✅ Good: Receives only relevant events
+event_bus.subscribe(observer, {
+ EventType.TASK_COMPLETED,
+ EventType.CONSTELLATION_MODIFIED
+})
+```
+
+### 2. Handle Errors Gracefully
+
+Always catch exceptions in observer implementations:
+
+```python
+class RobustObserver(IEventObserver):
+ async def on_event(self, event: Event) -> None:
+ try:
+ await self._process_event(event)
+ except Exception as e:
+ self.logger.error(f"Error processing event: {e}")
+ # Don't re-raise - other observers should continue
+```
+
+### 3. Clean Up Subscriptions
+
+Unsubscribe observers when done to prevent memory leaks:
+
+```python
+class SessionManager:
+ def __init__(self):
+ self.observers = []
+
+ def setup_observers(self):
+ # Create and subscribe observers
+ observer = MyObserver()
+ event_bus.subscribe(observer)
+ self.observers.append(observer)
+
+ def cleanup(self):
+ # Unsubscribe all observers
+ event_bus = get_event_bus()
+ for observer in self.observers:
+ event_bus.unsubscribe(observer)
+ self.observers.clear()
+```
+
+### 4. Use Type Guards
+
+Leverage Python's type system for safer event handling:
+
+```python
+from typing import cast
+
+async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ # Type checker now knows event is TaskEvent
+ task_event = cast(TaskEvent, event)
+ task_id = task_event.task_id # Type-safe access
+ status = task_event.status
+```
+
+---
+
+## 🔗 Related Documentation
+
+- **[Observer System Overview](overview.md)** — High-level architecture and design
+- **[Session Metrics Observer](metrics_observer.md)** — Performance metrics collection
+
+!!! note "Additional Observer Documentation"
+ For documentation on `ConstellationProgressObserver`, `DAGVisualizationObserver`, `ConstellationModificationSynchronizer`, and `AgentOutputObserver`, refer to their implementation in `galaxy/session/observers/`.
+
+---
+
+## 📋 Summary
+
+The Event System Core provides:
+
+- **EventBus**: Singleton message broker for system-wide communication
+- **EventType**: Enumeration of all system events
+- **Event Classes**: Type-safe event data structures
+- **Interfaces**: Clear contracts for observers and publishers
+- **Concurrent Execution**: Efficient parallel event processing
+- **Error Isolation**: Robust error handling
+
+This foundation enables the Galaxy framework to implement a loosely coupled, extensible event-driven architecture.
diff --git a/documents/docs/galaxy/observer/metrics_observer.md b/documents/docs/galaxy/observer/metrics_observer.md
new file mode 100644
index 000000000..e9d5771f5
--- /dev/null
+++ b/documents/docs/galaxy/observer/metrics_observer.md
@@ -0,0 +1,614 @@
+# Session Metrics Observer
+
+The **SessionMetricsObserver** collects comprehensive performance metrics and statistics during constellation execution. It tracks task execution times, constellation lifecycle, modifications, and computes detailed statistics for performance analysis.
+
+**Location:** `galaxy/session/observers/base_observer.py`
+
+The metrics observer is essential for evaluating Galaxy performance, identifying bottlenecks, and analyzing constellation modification patterns for research and optimization.
+
+---
+
+## 🎯 Purpose
+
+The Metrics Observer provides:
+
+1. **Performance Tracking** — Measure task and constellation execution times
+2. **Success Rate Monitoring** — Track completion and failure rates
+3. **Modification Analytics** — Monitor constellation structural changes
+4. **Statistical Summaries** — Compute aggregated metrics for analysis
+
+---
+
+## 🏗️ Architecture
+
+```mermaid
+graph TB
+ subgraph "Event Sources"
+ O[Orchestrator]
+ A[Agent]
+ end
+
+ subgraph "Event System"
+ EB[EventBus]
+ end
+
+ subgraph "Metrics Observer"
+ SMO[SessionMetricsObserver]
+ TE[Task Events Handler]
+ CE[Constellation Events Handler]
+ MS[Metrics Storage]
+ SC[Statistics Computer]
+ end
+
+ subgraph "Outputs"
+ R[result.json]
+ L[Logs]
+ end
+
+ O -->|task events| EB
+ A -->|constellation events| EB
+ EB -->|notify| SMO
+
+ SMO --> TE
+ SMO --> CE
+ TE --> MS
+ CE --> MS
+ MS --> SC
+ SC --> R
+ SC --> L
+
+ style SMO fill:#66bb6a,stroke:#333,stroke-width:3px
+ style MS fill:#fff4e1,stroke:#333,stroke-width:2px
+ style SC fill:#ffa726,stroke:#333,stroke-width:2px
+ style EB fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
+```
+
+---
+
+## 📊 Metrics Collected
+
+The observer collects metrics across three categories:
+
+### Task Metrics
+
+Track individual task execution:
+
+| Metric | Description | Computed |
+|--------|-------------|----------|
+| **task_count** | Total number of tasks started | Real-time |
+| **completed_tasks** | Number of successfully completed tasks | Real-time |
+| **failed_tasks** | Number of failed tasks | Real-time |
+| **total_execution_time** | Sum of all task execution times | Real-time |
+| **task_timings** | Dict mapping task_id → {start, end, duration} | Real-time |
+| **success_rate** | completed / total tasks | Computed |
+| **failure_rate** | failed / total tasks | Computed |
+| **average_task_duration** | Average execution time per task | Computed |
+| **min_task_duration** | Fastest task execution time | Computed |
+| **max_task_duration** | Slowest task execution time | Computed |
+
+### Constellation Metrics
+
+Monitor constellation lifecycle:
+
+| Metric | Description | Computed |
+|--------|-------------|----------|
+| **constellation_count** | Total constellations processed | Real-time |
+| **completed_constellations** | Successfully completed constellations | Real-time |
+| **failed_constellations** | Failed constellations | Real-time |
+| **total_constellation_time** | Total constellation execution time | Real-time |
+| **constellation_timings** | Dict mapping constellation_id → timing data | Real-time |
+| **constellation_success_rate** | completed / total constellations | Computed |
+| **average_constellation_duration** | Average constellation execution time | Computed |
+| **min_constellation_duration** | Fastest constellation | Computed |
+| **max_constellation_duration** | Slowest constellation | Computed |
+| **average_tasks_per_constellation** | Average number of tasks | Computed |
+
+### Modification Metrics
+
+Track constellation structural changes:
+
+| Metric | Description | Computed |
+|--------|-------------|----------|
+| **constellation_modifications** | Dict mapping constellation_id → modification list | Real-time |
+| **total_modifications** | Total number of modifications | Computed |
+| **constellations_modified** | Number of constellations with modifications | Computed |
+| **average_modifications_per_constellation** | Average modifications per constellation | Computed |
+| **max_modifications_for_single_constellation** | Most-modified constellation | Computed |
+| **most_modified_constellation** | ID of most-modified constellation | Computed |
+| **modification_types_breakdown** | Count by modification type | Computed |
+
+---
+
+## 💻 Implementation
+
+### Initialization
+
+```python
+from galaxy.session.observers import SessionMetricsObserver
+import logging
+
+# Create metrics observer
+metrics_observer = SessionMetricsObserver(
+ session_id="galaxy_session_20231113",
+ logger=logging.getLogger(__name__)
+)
+
+# Subscribe to event bus
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(metrics_observer)
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `session_id` | `str` | Yes | Unique identifier for the session |
+| `logger` | `logging.Logger` | No | Logger instance (creates default if None) |
+
+### Internal Metrics Structure
+
+The observer maintains a comprehensive metrics dictionary:
+
+```python
+self.metrics: Dict[str, Any] = {
+ "session_id": session_id,
+
+ # Task metrics
+ "task_count": 0,
+ "completed_tasks": 0,
+ "failed_tasks": 0,
+ "total_execution_time": 0.0,
+ "task_timings": {}, # task_id -> {start, end, duration}
+
+ # Constellation metrics
+ "constellation_count": 0,
+ "completed_constellations": 0,
+ "failed_constellations": 0,
+ "total_constellation_time": 0.0,
+ "constellation_timings": {}, # constellation_id -> timing data
+
+ # Modification tracking
+ "constellation_modifications": {} # constellation_id -> [modifications]
+}
+```
+
+---
+
+## 🔄 Event Processing
+
+### Task Event Handling
+
+The observer tracks task lifecycle events:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as EventBus
+ participant MO as MetricsObserver
+ participant MS as Metrics Storage
+
+ O->>EB: TASK_STARTED
+ EB->>MO: on_event(event)
+ MO->>MS: Increment task_count Record start_time
+
+ Note over O: Task executes
+
+ O->>EB: TASK_COMPLETED
+ EB->>MO: on_event(event)
+ MO->>MS: Increment completed_tasks Calculate duration Update total_execution_time
+```
+
+**Processing Logic:**
+
+```python
+def _handle_task_started(self, event: TaskEvent) -> None:
+ """Handle TASK_STARTED event."""
+ self.metrics["task_count"] += 1
+ self.metrics["task_timings"][event.task_id] = {
+ "start": event.timestamp
+ }
+
+def _handle_task_completed(self, event: TaskEvent) -> None:
+ """Handle TASK_COMPLETED event."""
+ self.metrics["completed_tasks"] += 1
+
+ if event.task_id in self.metrics["task_timings"]:
+ duration = (
+ event.timestamp -
+ self.metrics["task_timings"][event.task_id]["start"]
+ )
+ self.metrics["task_timings"][event.task_id]["duration"] = duration
+ self.metrics["task_timings"][event.task_id]["end"] = event.timestamp
+ self.metrics["total_execution_time"] += duration
+
+def _handle_task_failed(self, event: TaskEvent) -> None:
+ """Handle TASK_FAILED event."""
+ self.metrics["failed_tasks"] += 1
+ # Also calculate duration for failed tasks
+ if event.task_id in self.metrics["task_timings"]:
+ duration = (
+ event.timestamp -
+ self.metrics["task_timings"][event.task_id]["start"]
+ )
+ self.metrics["task_timings"][event.task_id]["duration"] = duration
+ self.metrics["total_execution_time"] += duration
+```
+
+### Constellation Event Handling
+
+Tracks constellation lifecycle and modifications:
+
+```python
+def _handle_constellation_started(self, event: ConstellationEvent) -> None:
+ """Handle CONSTELLATION_STARTED event."""
+ self.metrics["constellation_count"] += 1
+ constellation_id = event.constellation_id
+ constellation = event.data.get("constellation")
+
+ # Store initial statistics
+ self.metrics["constellation_timings"][constellation_id] = {
+ "start_time": event.timestamp,
+ "initial_statistics": (
+ constellation.get_statistics() if constellation else {}
+ ),
+ "processing_start_time": event.data.get("processing_start_time"),
+ "processing_end_time": event.data.get("processing_end_time"),
+ "processing_duration": event.data.get("processing_duration"),
+ }
+
+def _handle_constellation_completed(self, event: ConstellationEvent) -> None:
+ """Handle CONSTELLATION_COMPLETED event."""
+ self.metrics["completed_constellations"] += 1
+ constellation_id = event.constellation_id
+ constellation = event.data.get("constellation")
+
+ # Calculate duration and store final statistics
+ duration = (
+ event.timestamp -
+ self.metrics["constellation_timings"][constellation_id]["start_time"]
+ if constellation_id in self.metrics["constellation_timings"]
+ else None
+ )
+
+ if constellation_id in self.metrics["constellation_timings"]:
+ self.metrics["constellation_timings"][constellation_id].update({
+ "end_time": event.timestamp,
+ "duration": duration,
+ "final_statistics": (
+ constellation.get_statistics() if constellation else {}
+ ),
+ })
+```
+
+### Modification Tracking
+
+Tracks constellation structural changes with detailed change detection:
+
+```python
+def _handle_constellation_modified(self, event: ConstellationEvent) -> None:
+ """Handle CONSTELLATION_MODIFIED event."""
+ constellation_id = event.constellation_id
+
+ # Initialize modifications list if needed
+ if constellation_id not in self.metrics["constellation_modifications"]:
+ self.metrics["constellation_modifications"][constellation_id] = []
+
+ if hasattr(event, "data") and event.data:
+ old_constellation = event.data.get("old_constellation")
+ new_constellation = event.data.get("new_constellation")
+
+ # Calculate changes using VisualizationChangeDetector
+ changes = None
+ if old_constellation and new_constellation:
+ changes = VisualizationChangeDetector.calculate_constellation_changes(
+ old_constellation, new_constellation
+ )
+
+ # Store modification record
+ modification_record = {
+ "timestamp": event.timestamp,
+ "modification_type": event.data.get("modification_type", "unknown"),
+ "on_task_id": event.data.get("on_task_id", []),
+ "changes": changes,
+ "new_statistics": (
+ new_constellation.get_statistics() if new_constellation else {}
+ ),
+ "processing_start_time": event.data.get("processing_start_time"),
+ "processing_end_time": event.data.get("processing_end_time"),
+ "processing_duration": event.data.get("processing_duration"),
+ }
+
+ self.metrics["constellation_modifications"][constellation_id].append(
+ modification_record
+ )
+```
+
+---
+
+## 📖 API Reference
+
+### Constructor
+
+```python
+def __init__(self, session_id: str, logger: Optional[logging.Logger] = None)
+```
+
+Initialize the metrics observer.
+
+**Parameters:**
+
+- `session_id` — Unique identifier for the session
+- `logger` — Optional logger instance (creates default if None)
+
+### get_metrics()
+
+```python
+def get_metrics(self) -> Dict[str, Any]
+```
+
+Get collected metrics with computed statistics.
+
+**Returns:**
+
+Dictionary containing:
+- All raw metrics (counts, timings, etc.)
+- `task_statistics` — Computed task metrics
+- `constellation_statistics` — Computed constellation metrics
+- `modification_statistics` — Computed modification metrics
+
+**Example:**
+
+```python
+# After constellation execution
+metrics = metrics_observer.get_metrics()
+
+# Access task statistics
+print(f"Total tasks: {metrics['task_statistics']['total_tasks']}")
+print(f"Success rate: {metrics['task_statistics']['success_rate']:.2%}")
+print(f"Avg duration: {metrics['task_statistics']['average_task_duration']:.2f}s")
+
+# Access constellation statistics
+print(f"Total constellations: {metrics['constellation_statistics']['total_constellations']}")
+print(f"Avg tasks per constellation: {metrics['constellation_statistics']['average_tasks_per_constellation']:.1f}")
+
+# Access modification statistics
+print(f"Total modifications: {metrics['modification_statistics']['total_modifications']}")
+print(f"Modification types: {metrics['modification_statistics']['modification_types_breakdown']}")
+```
+
+---
+
+## 📊 Computed Statistics
+
+The observer computes three categories of statistics:
+
+### Task Statistics
+
+```python
+{
+ "total_tasks": 10,
+ "completed_tasks": 8,
+ "failed_tasks": 2,
+ "success_rate": 0.8,
+ "failure_rate": 0.2,
+ "average_task_duration": 2.5,
+ "min_task_duration": 0.5,
+ "max_task_duration": 5.2,
+ "total_task_execution_time": 25.0
+}
+```
+
+### Constellation Statistics
+
+```python
+{
+ "total_constellations": 1,
+ "completed_constellations": 1,
+ "failed_constellations": 0,
+ "success_rate": 1.0,
+ "average_constellation_duration": 30.5,
+ "min_constellation_duration": 30.5,
+ "max_constellation_duration": 30.5,
+ "total_constellation_time": 30.5,
+ "average_tasks_per_constellation": 10.0
+}
+```
+
+### Modification Statistics
+
+```python
+{
+ "total_modifications": 3,
+ "constellations_modified": 1,
+ "average_modifications_per_constellation": 3.0,
+ "max_modifications_for_single_constellation": 3,
+ "most_modified_constellation": "const_123",
+ "modifications_per_constellation": {
+ "const_123": 3
+ },
+ "modification_types_breakdown": {
+ "add_tasks": 2,
+ "modify_dependencies": 1
+ }
+}
+```
+
+---
+
+## 🔍 Usage Examples
+
+### Example 1: Basic Metrics Collection
+
+```python
+import asyncio
+from galaxy.core.events import get_event_bus
+from galaxy.session.observers import SessionMetricsObserver
+
+async def collect_metrics():
+ """Collect and display metrics for constellation execution."""
+
+ # Create and subscribe metrics observer
+ metrics_observer = SessionMetricsObserver(session_id="demo_session")
+ event_bus = get_event_bus()
+ event_bus.subscribe(metrics_observer)
+
+ # Execute constellation (orchestrator will publish events)
+ await orchestrator.execute_constellation(constellation)
+
+ # Retrieve metrics
+ metrics = metrics_observer.get_metrics()
+
+ # Display summary
+ print("\n=== Execution Summary ===")
+ print(f"Session: {metrics['session_id']}")
+ print(f"Tasks: {metrics['task_count']} total, "
+ f"{metrics['completed_tasks']} completed, "
+ f"{metrics['failed_tasks']} failed")
+ print(f"Total execution time: {metrics['total_execution_time']:.2f}s")
+
+ # Display task statistics
+ task_stats = metrics['task_statistics']
+ print(f"\nTask Success Rate: {task_stats['success_rate']:.1%}")
+ print(f"Average Task Duration: {task_stats['average_task_duration']:.2f}s")
+ print(f"Fastest Task: {task_stats['min_task_duration']:.2f}s")
+ print(f"Slowest Task: {task_stats['max_task_duration']:.2f}s")
+
+ # Clean up
+ event_bus.unsubscribe(metrics_observer)
+
+asyncio.run(collect_metrics())
+```
+
+### Example 2: Performance Analysis
+
+```python
+def analyze_performance(metrics_observer: SessionMetricsObserver):
+ """Analyze performance metrics and identify bottlenecks."""
+
+ metrics = metrics_observer.get_metrics()
+ task_timings = metrics['task_timings']
+
+ # Find slowest tasks
+ sorted_tasks = sorted(
+ task_timings.items(),
+ key=lambda x: x[1].get('duration', 0),
+ reverse=True
+ )
+
+ print("\n=== Top 5 Slowest Tasks ===")
+ for task_id, timing in sorted_tasks[:5]:
+ duration = timing.get('duration', 0)
+ print(f"{task_id}: {duration:.2f}s")
+
+ # Analyze modification patterns
+ mod_stats = metrics['modification_statistics']
+ if mod_stats['total_modifications'] > 0:
+ print(f"\n=== Modification Analysis ===")
+ print(f"Total Modifications: {mod_stats['total_modifications']}")
+ print(f"Average per Constellation: "
+ f"{mod_stats['average_modifications_per_constellation']:.1f}")
+ print(f"Most Modified: {mod_stats['most_modified_constellation']}")
+ print("\nModification Types:")
+ for mod_type, count in mod_stats['modification_types_breakdown'].items():
+ print(f" {mod_type}: {count}")
+```
+
+### Example 3: Export Metrics to JSON
+
+```python
+import json
+from pathlib import Path
+
+def export_metrics(metrics_observer: SessionMetricsObserver, output_path: str):
+ """Export metrics to JSON file for analysis."""
+
+ metrics = metrics_observer.get_metrics()
+
+ # Convert to JSON-serializable format
+ output_data = {
+ "session_id": metrics["session_id"],
+ "task_statistics": metrics["task_statistics"],
+ "constellation_statistics": metrics["constellation_statistics"],
+ "modification_statistics": metrics["modification_statistics"],
+ "raw_metrics": {
+ "task_count": metrics["task_count"],
+ "completed_tasks": metrics["completed_tasks"],
+ "failed_tasks": metrics["failed_tasks"],
+ "total_execution_time": metrics["total_execution_time"],
+ "constellation_count": metrics["constellation_count"],
+ }
+ }
+
+ # Write to file
+ output_file = Path(output_path)
+ output_file.parent.mkdir(parents=True, exist_ok=True)
+
+ with open(output_file, 'w') as f:
+ json.dump(output_data, f, indent=2)
+
+ print(f"Metrics exported to: {output_file}")
+```
+
+---
+
+## 🎓 Best Practices
+
+### 1. Session ID Naming
+
+Use descriptive session IDs for easier analysis:
+
+```python
+# ✅ Good: Descriptive session ID
+session_id = f"galaxy_session_{task_type}_{timestamp}"
+
+# ❌ Bad: Generic session ID
+session_id = "session_1"
+```
+
+### 2. Metrics Export
+
+Export metrics immediately after execution:
+
+```python
+try:
+ await orchestrator.execute_constellation(constellation)
+finally:
+ # Always export metrics, even if execution failed
+ metrics = metrics_observer.get_metrics()
+ export_metrics(metrics, "results/metrics.json")
+```
+
+### 3. Memory Management
+
+Clear large timing dictionaries for long-running sessions:
+
+```python
+# After processing metrics
+metrics_observer.metrics["task_timings"].clear()
+metrics_observer.metrics["constellation_timings"].clear()
+```
+
+---
+
+## 🔗 Related Documentation
+
+- **[Observer System Overview](overview.md)** — Architecture and design
+- **[Event System Core](event_system.md)** — Event types and EventBus
+
+!!! note "Additional Resources"
+ For information on constellation execution and orchestration, see the constellation orchestrator documentation in `galaxy/constellation/orchestrator/`.
+
+---
+
+## 📋 Summary
+
+The Session Metrics Observer:
+
+- **Collects** comprehensive performance metrics
+- **Tracks** task and constellation execution times
+- **Monitors** modification patterns
+- **Computes** statistical summaries
+- **Exports** data for analysis
+
+This observer is essential for performance evaluation, bottleneck identification, and research analysis of Galaxy's constellation execution.
diff --git a/documents/docs/galaxy/observer/overview.md b/documents/docs/galaxy/observer/overview.md
new file mode 100644
index 000000000..2d14c0d68
--- /dev/null
+++ b/documents/docs/galaxy/observer/overview.md
@@ -0,0 +1,405 @@
+# Observer System — Overview
+
+The **Observer System** in UFO Galaxy implements an event-driven architecture that enables real-time monitoring, visualization, and coordination of constellation execution. It provides a decoupled, extensible mechanism for components to react to system events without tight coupling.
+
+The system implements the classic **Observer Pattern** (also known as Publish-Subscribe), enabling loose coupling between event producers and consumers. This allows the system to be extended with new observers without modifying existing code.
+
+---
+
+## 🎯 Purpose and Design Goals
+
+The observer system serves several critical functions in the Galaxy framework:
+
+1. **Real-time Monitoring** — Track task execution, constellation lifecycle, and system events
+2. **Visualization** — Provide live updates for DAG topology and execution progress
+3. **Metrics Collection** — Gather performance statistics and execution data
+4. **Synchronization** — Coordinate between agent modifications and orchestrator execution
+5. **Agent Output Handling** — Display agent responses and actions in real-time
+
+---
+
+## 🏗️ Architecture Overview
+
+The observer system consists of three main layers:
+
+```mermaid
+graph TB
+ subgraph "Event Publishers"
+ A1[Orchestrator]
+ A2[Agent]
+ A3[Device Manager]
+ end
+
+ subgraph "Event Bus Layer"
+ B[EventBus Singleton]
+ end
+
+ subgraph "Observer Layer"
+ C1[ConstellationProgressObserver]
+ C2[SessionMetricsObserver]
+ C3[DAGVisualizationObserver]
+ C4[ConstellationModificationSynchronizer]
+ C5[AgentOutputObserver]
+ end
+
+ subgraph "Handler Layer"
+ D1[TaskVisualizationHandler]
+ D2[ConstellationVisualizationHandler]
+ end
+
+ A1 -->|publish events| B
+ A2 -->|publish events| B
+ A3 -->|publish events| B
+
+ B -->|notify| C1
+ B -->|notify| C2
+ B -->|notify| C3
+ B -->|notify| C4
+ B -->|notify| C5
+
+ C3 -->|delegate| D1
+ C3 -->|delegate| D2
+
+ style B fill:#4a90e2,stroke:#333,stroke-width:3px,color:#fff
+ style C1 fill:#66bb6a,stroke:#333,stroke-width:2px
+ style C2 fill:#66bb6a,stroke:#333,stroke-width:2px
+ style C3 fill:#66bb6a,stroke:#333,stroke-width:2px
+ style C4 fill:#ffa726,stroke:#333,stroke-width:2px
+ style C5 fill:#66bb6a,stroke:#333,stroke-width:2px
+```
+
+**Architecture Layers:**
+
+| Layer | Component | Responsibility |
+|-------|-----------|----------------|
+| **Event Publishers** | Orchestrator, Agent, Device Manager | Generate events during system operation |
+| **Event Bus** | `EventBus` singleton | Central message broker, manages subscriptions and routing |
+| **Observers** | 5 specialized observers | React to specific event types and perform actions |
+| **Handlers** | Task & Constellation handlers | Delegate visualization logic for specific components |
+
+---
+
+## 📊 Core Components
+
+### Event System Core
+
+The foundation of the observer system consists of:
+
+| Component | Location | Description |
+|-----------|----------|-------------|
+| **EventBus** | `galaxy/core/events.py` | Central message broker managing subscriptions |
+| **EventType** | `galaxy/core/events.py` | Enumeration of all system event types |
+| **Event Classes** | `galaxy/core/events.py` | Base (`Event`) and specialized (`TaskEvent`, `ConstellationEvent`, `AgentEvent`, `DeviceEvent`) event data structures |
+| **Interfaces** | `galaxy/core/events.py` | `IEventObserver`, `IEventPublisher` contracts |
+
+For detailed documentation of the event system core components, see the **[Event System Core](event_system.md)** documentation.
+
+### Observer Implementations
+
+Five specialized observers handle different aspects of system monitoring:
+
+| Observer | File Location | Primary Role | Key Features |
+|----------|---------------|--------------|--------------|
+| **ConstellationProgressObserver** | `galaxy/session/observers/base_observer.py` | Task progress tracking | Queues completion events for agent, coordinates task lifecycle |
+| **SessionMetricsObserver** | `galaxy/session/observers/base_observer.py` | Performance metrics | Collects timing, success rates, modification statistics |
+| **DAGVisualizationObserver** | `galaxy/session/observers/dag_visualization_observer.py` | Real-time visualization | Displays constellation topology and execution flow |
+| **ConstellationModificationSynchronizer** | `galaxy/session/observers/constellation_sync_observer.py` | Modification coordination | Prevents race conditions between agent and orchestrator |
+| **AgentOutputObserver** | `galaxy/session/observers/agent_output_observer.py` | Agent interaction display | Shows agent responses and actions in real-time |
+
+---
+
+## 🔄 Event Flow
+
+The following diagram illustrates how events flow through the system:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as EventBus
+ participant CPO as ProgressObserver
+ participant SMO as MetricsObserver
+ participant DVO as VisualizationObserver
+ participant A as Agent
+
+ O->>EB: publish(TASK_STARTED)
+ EB->>CPO: on_event(event)
+ EB->>SMO: on_event(event)
+ EB->>DVO: on_event(event)
+
+ Note over DVO: Display task start
+ Note over SMO: Increment task count
+
+ O->>EB: publish(TASK_COMPLETED)
+ EB->>CPO: on_event(event)
+ EB->>SMO: on_event(event)
+ EB->>DVO: on_event(event)
+
+ CPO->>A: add_task_completion_event()
+ Note over A: Process result, modify constellation
+
+ A->>EB: publish(CONSTELLATION_MODIFIED)
+ EB->>SMO: on_event(event)
+ EB->>DVO: on_event(event)
+
+ Note over DVO: Display updated DAG
+ Note over SMO: Track modification
+```
+
+The event flow demonstrates how a single action (task completion) triggers multiple observers, each performing their specialized function without interfering with others.
+
+---
+
+## 📋 Event Types
+
+The system defines the following event types:
+
+### Task Events
+
+Track individual task execution lifecycle:
+
+| Event Type | Trigger | Data Includes |
+|------------|---------|---------------|
+| `TASK_STARTED` | Task begins execution | task_id, status, constellation_id |
+| `TASK_COMPLETED` | Task finishes successfully | task_id, result, execution_time, newly_ready_tasks |
+| `TASK_FAILED` | Task encounters error | task_id, error, retry_info |
+
+### Constellation Events
+
+Monitor constellation-level operations:
+
+| Event Type | Trigger | Data Includes |
+|------------|---------|---------------|
+| `CONSTELLATION_STARTED` | Constellation begins processing | constellation, initial_statistics, processing_time |
+| `CONSTELLATION_COMPLETED` | All tasks finished | constellation, final_statistics, execution_time |
+| `CONSTELLATION_FAILED` | Constellation encounters error | constellation, error |
+| `CONSTELLATION_MODIFIED` | Structure changed by agent | old_constellation, new_constellation, on_task_id, modification_type, changes |
+
+### Agent Events
+
+Display agent interactions:
+
+| Event Type | Trigger | Data Includes |
+|------------|---------|---------------|
+| `AGENT_RESPONSE` | Agent generates response | agent_name, agent_type, response_data |
+| `AGENT_ACTION` | Agent executes action | agent_name, action_type, actions |
+
+### Device Events
+
+Monitor device status (used by client):
+
+| Event Type | Trigger | Data Includes |
+|------------|---------|---------------|
+| `DEVICE_CONNECTED` | Device joins pool | device_id, device_status, device_info |
+| `DEVICE_DISCONNECTED` | Device leaves pool | device_id, device_status |
+| `DEVICE_STATUS_CHANGED` | Device state changes | device_id, device_status, all_devices |
+
+---
+
+## 🚀 Usage Example
+
+Here's a complete example showing how observers are initialized and used in a Galaxy session:
+
+```python
+from galaxy.core.events import get_event_bus, EventType
+from galaxy.session.observers import (
+ ConstellationProgressObserver,
+ SessionMetricsObserver,
+ DAGVisualizationObserver,
+ ConstellationModificationSynchronizer,
+ AgentOutputObserver
+)
+
+# Get the global event bus
+event_bus = get_event_bus()
+
+# 1. Create progress observer for agent coordination
+progress_observer = ConstellationProgressObserver(agent=constellation_agent)
+event_bus.subscribe(progress_observer)
+
+# 2. Create metrics observer for performance tracking
+metrics_observer = SessionMetricsObserver(
+ session_id="my_session",
+ logger=logger
+)
+event_bus.subscribe(metrics_observer)
+
+# 3. Create visualization observer for real-time display
+viz_observer = DAGVisualizationObserver(enable_visualization=True)
+event_bus.subscribe(viz_observer)
+
+# 4. Create synchronizer to prevent race conditions
+synchronizer = ConstellationModificationSynchronizer(
+ orchestrator=orchestrator,
+ logger=logger
+)
+event_bus.subscribe(synchronizer)
+
+# 5. Create agent output observer for displaying interactions
+agent_output_observer = AgentOutputObserver(presenter_type="rich")
+event_bus.subscribe(agent_output_observer)
+
+# Execute constellation
+await orchestrator.execute_constellation(constellation)
+
+# Retrieve collected metrics
+metrics = metrics_observer.get_metrics()
+print(f"Tasks completed: {metrics['completed_tasks']}")
+print(f"Total execution time: {metrics['total_execution_time']:.2f}s")
+print(f"Modifications: {metrics['constellation_modifications']}")
+```
+
+---
+
+## 🔑 Key Benefits
+
+### 1. Decoupling
+
+Events decouple components — publishers don't need to know about observers:
+
+- **Orchestrator** publishes task events without knowing who's listening
+- **Agent** publishes modification events without coordinating with orchestrator
+- **New observers** can be added without changing existing code
+
+### 2. Extensibility
+
+Add custom observers for new functionality:
+
+```python
+from galaxy.core.events import IEventObserver, Event, EventType
+
+class CustomMetricsObserver(IEventObserver):
+ """Custom observer for domain-specific metrics."""
+
+ def __init__(self):
+ self.custom_metrics = {}
+
+ async def on_event(self, event: Event) -> None:
+ if event.event_type == EventType.TASK_COMPLETED:
+ # Collect custom metrics
+ task_type = event.data.get("task_type")
+ if task_type not in self.custom_metrics:
+ self.custom_metrics[task_type] = []
+
+ self.custom_metrics[task_type].append({
+ "duration": event.data.get("execution_time"),
+ "result": event.result
+ })
+
+# Subscribe to specific events
+event_bus = get_event_bus()
+custom_observer = CustomMetricsObserver()
+event_bus.subscribe(custom_observer, {EventType.TASK_COMPLETED})
+```
+
+### 3. Concurrent Execution
+
+All observers are notified concurrently using `asyncio.gather()`:
+
+- No observer blocks another
+- Exceptions in one observer don't affect others
+- Efficient parallel processing
+
+### 4. Type-Safe Event Handling
+
+Specialized event classes provide type safety:
+
+```python
+async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ # TaskEvent-specific handling
+ task_id = event.task_id # Type-safe access
+ status = event.status
+
+ elif isinstance(event, ConstellationEvent):
+ # ConstellationEvent-specific handling
+ constellation_id = event.constellation_id
+ state = event.constellation_state
+```
+
+---
+
+## 📚 Component Documentation
+
+Explore detailed documentation for each observer:
+
+- **[Session Metrics Observer](metrics_observer.md)** — Performance metrics and statistics collection
+- **[Event System Core](event_system.md)** — Event bus, event types, and interfaces
+
+!!! note "Additional Observers"
+ Documentation for `ConstellationProgressObserver`, `DAGVisualizationObserver`, `ConstellationModificationSynchronizer`, and `AgentOutputObserver` is available in their source code files. These observers handle task progress tracking, real-time visualization, modification synchronization, and agent output display respectively.
+
+---
+
+## 🔗 Related Documentation
+
+- **[Constellation Orchestrator](../constellation_orchestrator/overview.md)** — Event publishers for task execution
+- **[Constellation Agent](../constellation_agent/overview.md)** — Event publishers for agent operations
+- **[Performance Metrics](../evaluation/performance_metrics.md)** — How metrics are collected and analyzed
+- **[Event-Driven Coordination](../constellation_orchestrator/event_driven_coordination.md)** — Deep dive into event system architecture
+
+---
+
+## 💡 Best Practices
+
+### Observer Lifecycle Management
+
+Properly manage observer subscriptions to prevent memory leaks:
+
+```python
+# Subscribe observers
+observers = [progress_observer, metrics_observer, viz_observer]
+for observer in observers:
+ event_bus.subscribe(observer)
+
+try:
+ # Execute constellation
+ await orchestrator.execute_constellation(constellation)
+finally:
+ # Clean up observers
+ for observer in observers:
+ event_bus.unsubscribe(observer)
+```
+
+### Event-Specific Subscription
+
+Subscribe only to relevant events for efficiency:
+
+```python
+# Instead of subscribing to all events
+event_bus.subscribe(observer) # Receives ALL events
+
+# Subscribe to specific event types
+event_bus.subscribe(observer, {
+ EventType.TASK_COMPLETED,
+ EventType.TASK_FAILED,
+ EventType.CONSTELLATION_MODIFIED
+})
+```
+
+### Error Handling in Observers
+
+Always handle exceptions gracefully:
+
+```python
+async def on_event(self, event: Event) -> None:
+ try:
+ # Process event
+ await self._handle_event(event)
+ except Exception as e:
+ self.logger.error(f"Error processing event: {e}")
+ # Don't re-raise - let other observers continue
+```
+
+---
+
+## 🎓 Summary
+
+The Observer System provides a robust, event-driven foundation for monitoring and coordinating Galaxy's constellation execution:
+
+- **Event Bus** acts as central message broker
+- **5 specialized observers** handle different aspects of monitoring
+- **Loose coupling** enables extensibility and maintainability
+- **Concurrent execution** ensures efficient event processing
+- **Type-safe events** provide clear contracts and error prevention
+
+For implementation details of specific observers, refer to the individual component documentation pages linked above.
diff --git a/documents/docs/galaxy/observer/progress_observer.md b/documents/docs/galaxy/observer/progress_observer.md
new file mode 100644
index 000000000..13df76860
--- /dev/null
+++ b/documents/docs/galaxy/observer/progress_observer.md
@@ -0,0 +1,483 @@
+# Constellation Progress Observer
+
+The **ConstellationProgressObserver** is responsible for tracking task execution progress and coordinating between the orchestrator and the agent. It acts as the bridge that enables the agent to react to task completion events and make necessary constellation modifications.
+
+**Location:** `galaxy/session/observers/base_observer.py`
+
+## Purpose
+
+The Progress Observer serves two critical functions:
+
+- **Task Completion Coordination** — Queues task completion events for the agent to process
+- **Constellation Event Handling** — Notifies the agent when constellation execution completes
+
+## Architecture
+
+```mermaid
+graph TB
+ subgraph "Orchestrator Layer"
+ O[TaskConstellationOrchestrator]
+ end
+
+ subgraph "Event System"
+ EB[EventBus]
+ end
+
+ subgraph "Observer Layer"
+ CPO[ConstellationProgressObserver]
+ end
+
+ subgraph "Agent Layer"
+ A[ConstellationAgent]
+ Q[Task Completion Queue]
+ end
+
+ O -->|publish events| EB
+ EB -->|notify| CPO
+ CPO -->|queue events| Q
+ A -->|process from| Q
+
+ style CPO fill:#66bb6a,stroke:#333,stroke-width:3px
+ style Q fill:#fff4e1,stroke:#333,stroke-width:2px
+ style EB fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
+```
+
+**Component Interaction:**
+
+| Component | Role | Communication |
+|-----------|------|---------------|
+| **Orchestrator** | Executes tasks, publishes events | → EventBus |
+| **EventBus** | Distributes events | → Progress Observer |
+| **Progress Observer** | Filters & queues relevant events | → Agent Queue |
+| **Agent** | Processes completions, modifies constellation | ← Agent Queue |
+
+## Event Handling
+
+The Progress Observer handles two types of events:
+
+### Task Events
+
+Monitors task execution lifecycle and queues completion events:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as EventBus
+ participant PO as ProgressObserver
+ participant Q as Agent Queue
+ participant A as Agent
+
+ O->>EB: TASK_STARTED
+ EB->>PO: on_event(event)
+ Note over PO: Store task result Log progress
+
+ O->>EB: TASK_COMPLETED
+ EB->>PO: on_event(event)
+ Note over PO: Store result Queue for agent
+ PO->>Q: add_task_completion_event()
+
+ Note over A: Agent in Continue state waiting for events
+ A->>Q: get event
+ Note over A: Process result Modify constellation
+```
+
+**Handled Event Types:**
+
+| Event Type | Action | Data Stored |
+|------------|--------|-------------|
+| `TASK_STARTED` | Store task result placeholder | task_id, status, timestamp |
+| `TASK_COMPLETED` | Store result, queue for agent | task_id, status, result, timestamp |
+| `TASK_FAILED` | Store error, queue for agent | task_id, status, error, timestamp |
+
+### Constellation Events
+
+Handles constellation lifecycle events:
+
+| Event Type | Action | Effect |
+|------------|--------|--------|
+| `CONSTELLATION_COMPLETED` | Queue completion event for agent | Wakes up agent's Continue state to process final results |
+
+## Implementation
+
+### Initialization
+
+```python
+from galaxy.session.observers import ConstellationProgressObserver
+from galaxy.agents import ConstellationAgent
+
+# Create progress observer with agent reference
+agent = ConstellationAgent(orchestrator=orchestrator)
+progress_observer = ConstellationProgressObserver(agent=agent)
+
+# Subscribe to event bus
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(progress_observer)
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `agent` | `ConstellationAgent` | The agent that will process queued events |
+
+### Internal Data Structures
+
+The observer maintains:
+
+```python
+class ConstellationProgressObserver(IEventObserver):
+ def __init__(self, agent: ConstellationAgent):
+ self.agent = agent
+
+ # Task results storage: task_id -> result dict
+ self.task_results: Dict[str, Dict[str, Any]] = {}
+
+ self.logger = logging.getLogger(__name__)
+```
+
+**Task Result Structure:**
+
+```python
+{
+ "task_id": "task_123",
+ "status": "COMPLETED", # or "FAILED"
+ "result": {...}, # Task execution result
+ "error": None, # Exception if failed
+ "timestamp": 1234567890.123
+}
+```
+
+## Event Processing Flow
+
+### Task Event Processing
+
+```python
+async def _handle_task_event(self, event: TaskEvent) -> None:
+ """Handle task progress events and queue them for agent processing."""
+
+ try:
+ self.logger.info(
+ f"Task progress: {event.task_id} -> {event.status}. "
+ f"Event Type: {event.event_type}"
+ )
+
+ # 1. Store task result for tracking
+ self.task_results[event.task_id] = {
+ "task_id": event.task_id,
+ "status": event.status,
+ "result": event.result,
+ "error": event.error,
+ "timestamp": event.timestamp,
+ }
+
+ # 2. Queue completion/failure events for agent
+ if event.event_type in [EventType.TASK_COMPLETED, EventType.TASK_FAILED]:
+ await self.agent.add_task_completion_event(event)
+
+ except Exception as e:
+ self.logger.error(f"Error handling task event: {e}", exc_info=True)
+```
+
+**Processing Steps:**
+
+1. **Log Progress**: Record task status change
+2. **Store Result**: Update internal task_results dictionary
+3. **Queue for Agent**: If completion/failure, add to agent's queue
+4. **Error Handling**: Catch and log any exceptions
+
+### Constellation Event Processing
+
+```python
+async def _handle_constellation_event(self, event: ConstellationEvent) -> None:
+ """Handle constellation update events."""
+
+ try:
+ if event.event_type == EventType.CONSTELLATION_COMPLETED:
+ # Queue completion event for agent
+ await self.agent.add_constellation_completion_event(event)
+
+ except Exception as e:
+ self.logger.error(
+ f"Error handling constellation event: {e}",
+ exc_info=True
+ )
+```
+
+## API Reference
+
+### Constructor
+
+```python
+def __init__(self, agent: ConstellationAgent)
+```
+
+Initialize the progress observer with a reference to the agent.
+
+**Parameters:**
+
+- `agent` — `ConstellationAgent` instance that will process queued events
+
+**Example:**
+
+```python
+from galaxy.agents import ConstellationAgent
+from galaxy.session.observers import ConstellationProgressObserver
+
+agent = ConstellationAgent(orchestrator=orchestrator)
+progress_observer = ConstellationProgressObserver(agent=agent)
+```
+
+### Event Handler
+
+```python
+async def on_event(self, event: Event) -> None
+```
+
+Handle constellation-related events (TaskEvent or ConstellationEvent).
+
+**Parameters:**
+
+- `event` — Event instance (TaskEvent or ConstellationEvent)
+
+**Behavior:**
+
+- Filters events by type (TaskEvent vs ConstellationEvent)
+- Delegates to appropriate handler method
+- Logs progress and stores results
+- Queues completion events for agent
+
+## Usage Examples
+
+### Example 1: Basic Setup
+
+```python
+import asyncio
+from galaxy.core.events import get_event_bus
+from galaxy.agents import ConstellationAgent
+from galaxy.constellation import TaskConstellationOrchestrator
+from galaxy.session.observers import ConstellationProgressObserver
+
+async def setup_progress_tracking():
+ """Set up progress tracking for constellation execution."""
+
+ # Create orchestrator and agent
+ orchestrator = TaskConstellationOrchestrator()
+ agent = ConstellationAgent(orchestrator=orchestrator)
+
+ # Create and subscribe progress observer
+ progress_observer = ConstellationProgressObserver(agent=agent)
+ event_bus = get_event_bus()
+ event_bus.subscribe(progress_observer)
+
+ # Now orchestrator events will be tracked and queued for agent
+ return agent, orchestrator, progress_observer
+```
+
+### Example 2: Monitoring Task Results
+
+```python
+async def monitor_task_progress(observer: ConstellationProgressObserver):
+ """Monitor task execution progress."""
+
+ # Wait for some tasks to complete
+ await asyncio.sleep(5)
+
+ # Access stored results
+ for task_id, result in observer.task_results.items():
+ status = result["status"]
+ timestamp = result["timestamp"]
+
+ if status == "COMPLETED":
+ print(f"✅ Task {task_id} completed at {timestamp}")
+ print(f" Result: {result['result']}")
+ elif status == "FAILED":
+ print(f"❌ Task {task_id} failed at {timestamp}")
+ print(f" Error: {result['error']}")
+```
+
+### Example 3: Custom Progress Observer
+
+```python
+from galaxy.core.events import IEventObserver, TaskEvent, EventType
+
+class CustomProgressObserver(IEventObserver):
+ """Custom observer with additional progress tracking."""
+
+ def __init__(self, agent, on_progress_callback=None):
+ self.agent = agent
+ self.on_progress_callback = on_progress_callback
+
+ # Track progress statistics
+ self.total_tasks = 0
+ self.completed_tasks = 0
+ self.failed_tasks = 0
+
+ async def on_event(self, event: Event) -> None:
+ if isinstance(event, TaskEvent):
+ # Update statistics
+ if event.event_type == EventType.TASK_STARTED:
+ self.total_tasks += 1
+ elif event.event_type == EventType.TASK_COMPLETED:
+ self.completed_tasks += 1
+ elif event.event_type == EventType.TASK_FAILED:
+ self.failed_tasks += 1
+
+ # Call custom callback
+ if self.on_progress_callback:
+ progress = self.completed_tasks / self.total_tasks if self.total_tasks > 0 else 0
+ self.on_progress_callback(progress, event)
+
+ # Queue for agent
+ if event.event_type in [EventType.TASK_COMPLETED, EventType.TASK_FAILED]:
+ await self.agent.add_task_completion_event(event)
+
+# Usage
+def progress_callback(progress, event):
+ print(f"Progress: {progress*100:.1f}% - {event.task_id} {event.status}")
+
+custom_observer = CustomProgressObserver(
+ agent=agent,
+ on_progress_callback=progress_callback
+)
+event_bus.subscribe(custom_observer)
+```
+
+## Integration with Agent
+
+The Progress Observer integrates tightly with the ConstellationAgent's state machine:
+
+### Agent Queue Interface
+
+The observer calls these agent methods:
+
+```python
+# Queue task completion event
+await self.agent.add_task_completion_event(task_event)
+
+# Queue constellation completion event
+await self.agent.add_constellation_completion_event(constellation_event)
+```
+
+### Agent Processing
+
+The agent processes queued events in its `Continue` state:
+
+```mermaid
+stateDiagram-v2
+ [*] --> Continue: Task completes
+ Continue --> ProcessEvent: Get event from queue
+ ProcessEvent --> UpdateConstellation: Event is TASK_COMPLETED
+ ProcessEvent --> HandleFailure: Event is TASK_FAILED
+ UpdateConstellation --> Continue: More tasks pending
+ UpdateConstellation --> Finish: All tasks done
+ HandleFailure --> Continue: Retry task
+ HandleFailure --> Finish: Max retries exceeded
+ Finish --> [*]
+```
+
+**Agent State Machine States:**
+
+| State | Description | Trigger |
+|-------|-------------|---------|
+| **Continue** | Wait for task completion events | Events queued by Progress Observer |
+| **ProcessEvent** | Extract event from queue | Event available |
+| **UpdateConstellation** | Modify constellation based on result | Task completed successfully |
+| **HandleFailure** | Handle task failure, retry if needed | Task failed |
+| **Finish** | Complete constellation execution | All tasks done or unrecoverable error |
+
+## Performance Considerations
+
+### Memory Management
+
+The observer stores all task results in memory:
+
+```python
+self.task_results: Dict[str, Dict[str, Any]] = {}
+```
+
+**Best Practices:**
+
+- **Clear results** after constellation completion to free memory
+- **Limit result size** by storing only essential data
+- **Use weak references** for large result objects if needed
+
+### Queue Management
+
+Events are queued to the agent's asyncio queue:
+
+```python
+await self.agent.add_task_completion_event(event)
+```
+
+**Considerations:**
+
+- **Queue size** is unbounded by default
+- **Back pressure** may occur if agent processes slowly
+- **Memory growth** possible with many rapid completions
+
+!!! warning "Memory Usage"
+ For long-running sessions with many tasks, consider periodically clearing the `task_results` dictionary to prevent memory growth.
+
+## Best Practices
+
+### 1. Clean Up After Completion
+
+Clear task results after constellation execution:
+
+```python
+async def execute_with_cleanup(orchestrator, constellation, progress_observer):
+ """Execute constellation and clean up observer."""
+
+ try:
+ await orchestrator.execute_constellation(constellation)
+ finally:
+ # Clear stored results
+ progress_observer.task_results.clear()
+```
+
+### 2. Handle Errors Gracefully
+
+The observer includes comprehensive error handling:
+
+```python
+try:
+ # Process event
+ await self._handle_task_event(event)
+except AttributeError as e:
+ self.logger.error(f"Attribute error: {e}", exc_info=True)
+except KeyError as e:
+ self.logger.error(f"Missing key: {e}", exc_info=True)
+except Exception as e:
+ self.logger.error(f"Unexpected error: {e}", exc_info=True)
+```
+
+### 3. Monitor Queue Size
+
+Check agent queue size periodically:
+
+```python
+# Access agent's internal queue
+queue_size = self.agent.task_completion_queue.qsize()
+if queue_size > 100:
+ logger.warning(f"Task completion queue growing large: {queue_size}")
+```
+
+## Related Documentation
+
+- **[Observer System Overview](overview.md)** — Architecture and design principles
+- **[Agent Output Observer](agent_output_observer.md)** — Agent response and action display
+- **[Constellation Agent](../constellation_agent/overview.md)** — Agent state machine and event processing
+- **[Constellation Modification Synchronizer](synchronizer.md)** — Coordination between agent and orchestrator
+
+## Summary
+
+The Constellation Progress Observer:
+
+- **Tracks** task execution progress
+- **Stores** task results for historical reference
+- **Queues** completion events for agent processing
+- **Coordinates** between orchestrator and agent
+- **Enables** event-driven constellation modification
+
+This observer is essential for the agent-orchestrator coordination pattern in Galaxy, replacing complex callback mechanisms with a clean event-driven interface.
diff --git a/documents/docs/galaxy/observer/synchronizer.md b/documents/docs/galaxy/observer/synchronizer.md
new file mode 100644
index 000000000..ae3b8ef39
--- /dev/null
+++ b/documents/docs/galaxy/observer/synchronizer.md
@@ -0,0 +1,553 @@
+# Constellation Modification Synchronizer
+
+The **ConstellationModificationSynchronizer** prevents race conditions between constellation modifications by the agent and task execution by the orchestrator. It ensures proper synchronization so the orchestrator doesn't execute newly ready tasks before the agent finishes updating the constellation structure.
+
+**Location:** `galaxy/session/observers/constellation_sync_observer.py`
+
+## Problem Statement
+
+Without synchronization, the following race condition can occur:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant T as Task A
+ participant A as Agent
+ participant C as Constellation
+
+ T->>O: Task A completes
+ O->>A: Publish TASK_COMPLETED
+ O->>C: Get ready tasks
+ Note over O: Task B appears ready!
+ O->>T: Execute Task B
+
+ Note over A: Slow: Processing Task A completion...
+ A->>C: Modify Task B (changes dependencies!)
+
+ Note over T: ERROR: Task B executing with outdated state!
+```
+
+**The Race Condition:**
+
+- **Task A completes** → triggers constellation update
+- **Orchestrator immediately** gets ready tasks → might execute Task B
+- **Agent is still** modifying Task B or its dependencies
+- **Result**: Task B executes with outdated/incorrect configuration
+
+!!! danger "Critical Issue"
+ Executing tasks with outdated constellation state can lead to incorrect task parameters, wrong dependency chains, data inconsistencies, and unpredictable workflow behavior.
+
+## Solution: Synchronization Pattern
+
+The Synchronizer implements a **wait-before-execute** pattern:
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant T as Task A
+ participant S as Synchronizer
+ participant A as Agent
+ participant C as Constellation
+
+ T->>O: Task A completes
+ O->>S: Publish TASK_COMPLETED
+ S->>S: Register pending modification
+ O->>A: Forward to Agent
+
+ Note over O: Before getting ready tasks
+ O->>S: wait_for_pending_modifications()
+ Note over S: Block until agent done
+
+ A->>C: Modify constellation
+ A->>S: Publish CONSTELLATION_MODIFIED
+ S->>S: Mark modification complete
+
+ Note over S: Unblock orchestrator
+ O->>C: Get ready tasks
+ Note over C: Now safe to execute!
+ O->>T: Execute Task B
+```
+
+## Architecture
+
+```mermaid
+graph TB
+ subgraph "Orchestrator Loop"
+ OL[Execute Task Loop]
+ WF[Wait for Modifications]
+ GT[Get Ready Tasks]
+ ET[Execute Tasks]
+ end
+
+ subgraph "Synchronizer"
+ PM[Pending Modifications Dict]
+ TC[Task Completion Handler]
+ MC[Modification Complete Handler]
+ WP[Wait Point]
+ end
+
+ subgraph "Agent"
+ A[Agent Process Results]
+ M[Modify Constellation]
+ end
+
+ OL --> WF
+ WF --> WP
+ WP -->|all modifications complete| GT
+ GT --> ET
+ ET --> OL
+
+ TC --> PM
+ MC --> PM
+ PM --> WP
+
+ A --> M
+ M -->|CONSTELLATION_MODIFIED| MC
+
+ style WP fill:#ffa726,stroke:#333,stroke-width:3px
+ style PM fill:#fff4e1,stroke:#333,stroke-width:2px
+ style WF fill:#4a90e2,stroke:#333,stroke-width:2px,color:#fff
+```
+
+## Synchronization Flow
+
+### Step-by-Step Process
+
+1. **Task Completes** → `TASK_COMPLETED` event published
+2. **Synchronizer Registers** → Creates pending modification Future
+3. **Orchestrator Waits** → Calls `wait_for_pending_modifications()`
+4. **Agent Processes** → Modifies constellation structure
+5. **Agent Publishes** → `CONSTELLATION_MODIFIED` event published
+6. **Synchronizer Completes** → Sets Future result, unblocks orchestrator
+7. **Orchestrator Continues** → Gets ready tasks with updated constellation
+
+### Event Flow
+
+```mermaid
+stateDiagram-v2
+ [*] --> WaitingForCompletion: Task executing
+ WaitingForCompletion --> PendingModification: TASK_COMPLETED event
+ PendingModification --> AgentProcessing: Registered in synchronizer
+ AgentProcessing --> ModificationComplete: CONSTELLATION_MODIFIED event
+ ModificationComplete --> Ready: Future completed
+ Ready --> WaitingForCompletion: Next task
+
+ note right of PendingModification
+ Orchestrator blocks here
+ until modification completes
+ end note
+```
+
+## Implementation
+
+### Initialization
+
+```python
+from galaxy.session.observers import ConstellationModificationSynchronizer
+from galaxy.constellation import TaskConstellationOrchestrator
+
+# Create synchronizer with orchestrator reference
+synchronizer = ConstellationModificationSynchronizer(
+ orchestrator=orchestrator,
+ logger=logger
+)
+
+# Subscribe to event bus
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(synchronizer)
+
+# Attach to orchestrator (for easy access)
+orchestrator.set_modification_synchronizer(synchronizer)
+```
+
+### Constructor Parameters
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `orchestrator` | `TaskConstellationOrchestrator` | Orchestrator to synchronize with |
+| `logger` | `logging.Logger` | Optional logger instance |
+
+### Internal State
+
+The synchronizer maintains:
+
+```python
+class ConstellationModificationSynchronizer(IEventObserver):
+ def __init__(self, orchestrator, logger=None):
+ self.orchestrator = orchestrator
+
+ # Pending modifications: task_id -> asyncio.Future
+ self._pending_modifications: Dict[str, asyncio.Future] = {}
+
+ # Current constellation being modified
+ self._current_constellation_id: Optional[str] = None
+ self._current_constellation: Optional[TaskConstellation] = None
+
+ # Timeout for modifications (safety measure)
+ self._modification_timeout = 600.0 # 10 minutes
+
+ # Statistics
+ self._stats = {
+ "total_modifications": 0,
+ "completed_modifications": 0,
+ "timeout_modifications": 0,
+ }
+```
+
+## API Reference
+
+### Main Wait Point
+
+#### wait_for_pending_modifications()
+
+Wait for all pending modifications to complete before proceeding.
+
+```python
+async def wait_for_pending_modifications(
+ self,
+ timeout: Optional[float] = None
+) -> bool
+```
+
+**Parameters:**
+
+- `timeout` — Optional timeout in seconds (uses default 600s if None)
+
+**Returns:**
+
+- `True` if all modifications completed successfully
+- `False` if timeout occurred
+
+**Usage in Orchestrator:**
+
+```python
+async def execute_constellation(self, constellation):
+ """Execute constellation with synchronized modifications."""
+
+ while True:
+ # Wait for any pending modifications
+ await self.synchronizer.wait_for_pending_modifications()
+
+ # Now safe to get ready tasks
+ ready_tasks = constellation.get_ready_tasks()
+
+ if not ready_tasks:
+ break # All tasks complete
+
+ # Execute ready tasks
+ await self._execute_tasks(ready_tasks)
+```
+
+### State Management Methods
+
+#### get_current_constellation()
+
+Get the most recent constellation state after modifications.
+
+```python
+def get_current_constellation(self) -> Optional[TaskConstellation]
+```
+
+**Returns:** Latest constellation instance or None
+
+#### has_pending_modifications()
+
+Check if any modifications are pending.
+
+```python
+def has_pending_modifications(self) -> bool
+```
+
+**Returns:** `True` if modifications pending, `False` otherwise
+
+#### get_pending_count()
+
+Get number of pending modifications.
+
+```python
+def get_pending_count(self) -> int
+```
+
+**Returns:** Count of pending modifications
+
+### Constellation State Merging
+
+#### merge_and_sync_constellation_states()
+
+Merge constellation states to preserve both structural changes and execution state.
+
+```python
+def merge_and_sync_constellation_states(
+ self,
+ orchestrator_constellation: TaskConstellation
+) -> TaskConstellation
+```
+
+**Purpose:** Prevents loss of execution state when agent modifies constellation structure.
+
+**Merge Strategy:**
+
+1. **Use agent's constellation as base** (has structural modifications)
+2. **Preserve orchestrator's execution state** for existing tasks
+3. **Priority rule**: More advanced state wins (COMPLETED > RUNNING > PENDING)
+4. **Update constellation state** after merging
+
+**Example Scenario:**
+
+```
+Before Merge:
+- Orchestrator's Task A: COMPLETED (execution state)
+- Agent's Task A: RUNNING (structural changes applied)
+
+After Merge:
+- Task A: COMPLETED (preserved from orchestrator)
+ + structural changes from agent
+```
+
+## Usage Examples
+
+### Example 1: Basic Integration
+
+```python
+from galaxy.core.events import get_event_bus
+from galaxy.session.observers import ConstellationModificationSynchronizer
+
+async def setup_synchronized_execution():
+ """Set up synchronized constellation execution."""
+
+ # Create orchestrator
+ orchestrator = TaskConstellationOrchestrator()
+
+ # Create and attach synchronizer
+ synchronizer = ConstellationModificationSynchronizer(
+ orchestrator=orchestrator,
+ logger=logger
+ )
+
+ # Subscribe to events
+ event_bus = get_event_bus()
+ event_bus.subscribe(synchronizer)
+
+ # Attach to orchestrator
+ orchestrator.set_modification_synchronizer(synchronizer)
+
+ # Execute constellation (now synchronized)
+ await orchestrator.execute_constellation(constellation)
+```
+
+### Example 2: Monitor Synchronization
+
+```python
+async def monitor_synchronization(synchronizer):
+ """Monitor synchronization status during execution."""
+
+ while True:
+ await asyncio.sleep(1)
+
+ if synchronizer.has_pending_modifications():
+ count = synchronizer.get_pending_count()
+ pending = synchronizer.get_pending_task_ids()
+ print(f"⏳ Waiting for {count} modifications: {pending}")
+ else:
+ print("✅ No pending modifications")
+
+ # Check statistics
+ stats = synchronizer.get_statistics()
+ print(f"Stats: {stats['completed_modifications']} completed, "
+ f"{stats['timeout_modifications']} timeouts")
+```
+
+### Example 3: Custom Timeout Handling
+
+```python
+# Set custom timeout (default is 600 seconds)
+synchronizer.set_modification_timeout(300.0) # 5 minutes
+
+# Wait with custom timeout
+success = await synchronizer.wait_for_pending_modifications(timeout=120.0)
+
+if not success:
+ print("⚠️ Modifications timed out, proceeding anyway")
+ # Handle timeout scenario
+ synchronizer.clear_pending_modifications() # Emergency cleanup
+```
+
+## Advanced Features
+
+### Automatic Timeout Handling
+
+The synchronizer automatically times out stuck modifications:
+
+```python
+async def _auto_complete_on_timeout(
+ self,
+ task_id: str,
+ future: asyncio.Future
+) -> None:
+ """Auto-complete a pending modification if it times out."""
+
+ await asyncio.sleep(self._modification_timeout)
+
+ if not future.done():
+ self._stats["timeout_modifications"] += 1
+ self.logger.warning(
+ f"⚠️ Modification for task '{task_id}' timed out after "
+ f"{self._modification_timeout}s. Auto-completing to prevent deadlock."
+ )
+ future.set_result(False)
+ del self._pending_modifications[task_id]
+```
+
+**Timeout Benefits:**
+
+- Prevents deadlocks if agent fails
+- Allows execution to continue
+- Logs timeout for debugging
+- Tracks timeout statistics
+
+### Dynamic Modification Tracking
+
+Handles new modifications registered during wait:
+
+```python
+async def wait_for_pending_modifications(self, timeout=None) -> bool:
+ """Wait for all pending modifications, including those added during wait."""
+
+ while self._pending_modifications:
+ # Get snapshot of current pending modifications
+ pending_tasks = list(self._pending_modifications.keys())
+ pending_futures = list(self._pending_modifications.values())
+
+ # Wait for current batch
+ await asyncio.wait_for(
+ asyncio.gather(*pending_futures, return_exceptions=True),
+ timeout=remaining_timeout
+ )
+
+ # Check if new modifications were added during wait
+ # If yes, loop again; if no, we're done
+ if not self._pending_modifications:
+ break
+
+ return True
+```
+
+## Statistics and Monitoring
+
+### Available Statistics
+
+```python
+stats = synchronizer.get_statistics()
+
+{
+ "total_modifications": 10, # Total registered
+ "completed_modifications": 9, # Successfully completed
+ "timeout_modifications": 1 # Timed out
+}
+```
+
+### Monitoring Points
+
+| Metric | Method | Description |
+|--------|--------|-------------|
+| Pending count | `get_pending_count()` | Number of pending modifications |
+| Pending tasks | `get_pending_task_ids()` | List of task IDs with pending modifications |
+| Has pending | `has_pending_modifications()` | Boolean check |
+| Statistics | `get_statistics()` | Complete stats dictionary |
+
+## Performance Considerations
+
+### Memory Usage
+
+The synchronizer stores futures for each pending modification:
+
+```python
+self._pending_modifications: Dict[str, asyncio.Future] = {}
+```
+
+**Memory Impact:**
+
+- **Low overhead**: Only stores Future objects (small)
+- **Temporary**: Cleared after completion
+- **Bounded**: Limited by concurrent task completions
+
+### Timeout Configuration
+
+Choose appropriate timeout based on constellation complexity:
+
+```python
+# Simple constellations
+synchronizer.set_modification_timeout(60.0) # 1 minute
+
+# Complex constellations with slow LLM
+synchronizer.set_modification_timeout(600.0) # 10 minutes
+
+# Very complex multi-device scenarios
+synchronizer.set_modification_timeout(1800.0) # 30 minutes
+```
+
+## Best Practices
+
+### 1. Always Attach to Orchestrator
+
+The orchestrator needs to call `wait_for_pending_modifications()`:
+
+```python
+# ✅ Good: Orchestrator can access synchronizer
+orchestrator.set_modification_synchronizer(synchronizer)
+
+# ❌ Bad: No way for orchestrator to wait
+# synchronizer exists but orchestrator doesn't use it
+```
+
+### 2. Handle Timeouts Gracefully
+
+```python
+success = await synchronizer.wait_for_pending_modifications()
+
+if not success:
+ # Log timeout
+ logger.warning("Modifications timed out")
+
+ # Get current state anyway (may be partially updated)
+ constellation = synchronizer.get_current_constellation()
+
+ # Continue execution (with caution)
+```
+
+### 3. Monitor Statistics
+
+Track synchronization health:
+
+```python
+stats = synchronizer.get_statistics()
+
+timeout_rate = (
+ stats["timeout_modifications"] / stats["total_modifications"]
+ if stats["total_modifications"] > 0
+ else 0
+)
+
+if timeout_rate > 0.1: # More than 10% timing out
+ logger.warning(f"High timeout rate: {timeout_rate:.1%}")
+ # Consider increasing timeout or investigating agent performance
+```
+
+## Related Documentation
+
+- **[Observer System Overview](overview.md)** — Architecture and design
+- **[Constellation Progress Observer](progress_observer.md)** — Task completion events
+- **[Constellation Agent](../constellation_agent/overview.md)** — Agent modification process
+
+## Summary
+
+The Constellation Modification Synchronizer:
+
+- **Prevents** race conditions between agent and orchestrator
+- **Synchronizes** constellation modifications with task execution
+- **Blocks** orchestrator until modifications complete
+- **Handles** timeouts to prevent deadlocks
+- **Merges** constellation states to preserve execution data
+
+This observer is critical for ensuring correct constellation execution when the agent dynamically modifies workflow structure during execution.
diff --git a/documents/docs/galaxy/observer/visualization_observer.md b/documents/docs/galaxy/observer/visualization_observer.md
new file mode 100644
index 000000000..cd4f8c176
--- /dev/null
+++ b/documents/docs/galaxy/observer/visualization_observer.md
@@ -0,0 +1,534 @@
+# DAG Visualization Observer
+
+The **DAGVisualizationObserver** provides real-time visual feedback during constellation execution. It displays DAG topology, task progress, and constellation modifications using rich terminal graphics.
+
+**Location:** `galaxy/session/observers/dag_visualization_observer.py`
+
+## Purpose
+
+The Visualization Observer enables developers and users to:
+
+- **See DAG Structure** — View constellation topology and task dependencies
+- **Monitor Progress** — Track task execution in real-time
+- **Observe Modifications** — Visualize how the constellation changes
+- **Debug Issues** — Identify bottlenecks and failed tasks visually
+
+## Architecture
+
+The observer uses a **delegation pattern** with specialized handlers:
+
+```mermaid
+graph TB
+ subgraph "Main Observer"
+ DVO[DAGVisualizationObserver]
+ CE[Constellation Events]
+ TE[Task Events]
+ end
+
+ subgraph "Specialized Handlers"
+ CVH[ConstellationVisualizationHandler]
+ TVH[TaskVisualizationHandler]
+ end
+
+ subgraph "Display Components"
+ CD[ConstellationDisplay]
+ TD[TaskDisplay]
+ DV[DAGVisualizer]
+ end
+
+ DVO --> CE
+ DVO --> TE
+ CE --> CVH
+ TE --> TVH
+
+ CVH --> CD
+ CVH --> DV
+ TVH --> TD
+ TVH --> DV
+
+ style DVO fill:#66bb6a,stroke:#333,stroke-width:3px
+ style CVH fill:#ffa726,stroke:#333,stroke-width:2px
+ style TVH fill:#ffa726,stroke:#333,stroke-width:2px
+```
+
+**Component Responsibilities:**
+
+| Component | Role | Handled Events |
+|-----------|------|----------------|
+| **DAGVisualizationObserver** | Main coordinator, routes events | All constellation and task events |
+| **ConstellationVisualizationHandler** | Handles constellation-level displays | CONSTELLATION_STARTED, COMPLETED, MODIFIED |
+| **TaskVisualizationHandler** | Handles task-level displays | TASK_STARTED, COMPLETED, FAILED |
+| **DAGVisualizer** | Renders complex DAG visualizations | Used by handlers for topology |
+| **ConstellationDisplay** | Renders constellation information | Used by handler for constellation events |
+| **TaskDisplay** | Renders task information | Used by handler for task events |
+
+## Implementation
+
+### Initialization
+
+```python
+from galaxy.session.observers import DAGVisualizationObserver
+from rich.console import Console
+
+# Create visualization observer
+viz_observer = DAGVisualizationObserver(
+ enable_visualization=True,
+ console=Console() # Optional: provide custom console
+)
+
+# Subscribe to event bus
+from galaxy.core.events import get_event_bus
+event_bus = get_event_bus()
+event_bus.subscribe(viz_observer)
+```
+
+**Constructor Parameters:**
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `enable_visualization` | `bool` | `True` | Whether to enable visualization |
+| `console` | `rich.Console` | `None` | Optional rich console for output |
+
+### Disabling Visualization
+
+Visualization can be toggled at runtime:
+
+```python
+# Disable visualization temporarily
+viz_observer.set_visualization_enabled(False)
+
+# Re-enable
+viz_observer.set_visualization_enabled(True)
+```
+
+## Visualization Types
+
+The observer produces several types of visualizations:
+
+### 1. Constellation Started
+
+Displays when a constellation begins execution:
+
+```
+╭──────────────────────────────────────────────────────────────╮
+│ 🌟 Constellation Started: email_batch_constellation │
+├──────────────────────────────────────────────────────────────┤
+│ ID: const_abc123 │
+│ Total Tasks: 8 │
+│ Status: ACTIVE │
+│ Parallel Capacity: 3 │
+╰──────────────────────────────────────────────────────────────╯
+```
+
+Followed by DAG topology:
+
+```mermaid
+graph TD
+ fetch_emails[Fetch Emails]
+ parse_1[Parse Email 1]
+ parse_2[Parse Email 2]
+ parse_3[Parse Email 3]
+ reply_1[Reply Email 1]
+ reply_2[Reply Email 2]
+ reply_3[Reply Email 3]
+ summarize[Summarize Results]
+
+ fetch_emails --> parse_1
+ fetch_emails --> parse_2
+ fetch_emails --> parse_3
+ parse_1 --> reply_1
+ parse_2 --> reply_2
+ parse_3 --> reply_3
+ reply_1 --> summarize
+ reply_2 --> summarize
+ reply_3 --> summarize
+```
+
+### 2. Task Progress
+
+Displays task execution events:
+
+**Task Started:**
+```
+▶ Task Started: parse_email_1
+ └─ Type: parse_email
+ └─ Device: windows_pc_001
+ └─ Priority: MEDIUM
+```
+
+**Task Completed:**
+```
+✅ Task Completed: parse_email_1
+ Duration: 2.3s
+ Result: Parsed 1 email with 2 attachments
+ Newly Ready: [reply_email_1]
+```
+
+**Task Failed:**
+```
+❌ Task Failed: parse_email_2
+ Duration: 1.8s
+ Error: NetworkTimeout: Failed to connect to email server
+ Retry: 1/3
+ Newly Ready: []
+```
+
+### 3. Constellation Modified
+
+Shows structural changes to the constellation:
+
+```
+🔄 Constellation Modified: email_batch_constellation
+ Modification Type: add_tasks
+ On Task: parse_email_1
+
+ Changes:
+ ├─ Tasks Added: 2
+ │ └─ extract_attachment_1
+ │ └─ extract_attachment_2
+ ├─ Dependencies Added: 2
+ │ └─ parse_email_1 → extract_attachment_1
+ │ └─ parse_email_1 → extract_attachment_2
+ └─ Tasks Modified: 1
+ └─ reply_email_1 (dependencies updated)
+```
+
+Followed by updated DAG topology showing new tasks.
+
+### 4. Execution Flow
+
+Shows current execution state (for smaller constellations):
+
+```
+Execution Flow:
+┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
+┃ Task ID ┃ Status ┃ Device ┃ Duration ┃
+┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
+│ fetch_emails │ COMPLETED │ win_001 │ 1.2s │
+│ parse_email_1 │ RUNNING │ win_001 │ 0.8s... │
+│ parse_email_2 │ RUNNING │ mac_002 │ 0.5s... │
+│ parse_email_3 │ PENDING │ - │ - │
+│ reply_email_1 │ PENDING │ - │ - │
+└─────────────────┴───────────┴─────────┴──────────┘
+```
+
+## Event Handling Flow
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant EB as EventBus
+ participant DVO as DAGVisualizationObserver
+ participant CVH as ConstellationHandler
+ participant TVH as TaskHandler
+ participant D as Display Components
+
+ O->>EB: CONSTELLATION_STARTED
+ EB->>DVO: on_event(event)
+ DVO->>CVH: handle_constellation_event()
+ CVH->>D: Display constellation start
+ CVH->>D: Display DAG topology
+
+ O->>EB: TASK_STARTED
+ EB->>DVO: on_event(event)
+ DVO->>TVH: handle_task_event()
+ TVH->>D: Display task start
+
+ O->>EB: TASK_COMPLETED
+ EB->>DVO: on_event(event)
+ DVO->>TVH: handle_task_event()
+ TVH->>D: Display task completion
+ TVH->>D: Display execution flow
+
+ Note over O: Agent modifies constellation
+
+ O->>EB: CONSTELLATION_MODIFIED
+ EB->>DVO: on_event(event)
+ DVO->>CVH: handle_constellation_event()
+ CVH->>D: Display modifications
+ CVH->>D: Display updated topology
+```
+
+## API Reference
+
+### Main Observer Methods
+
+#### Constructor
+
+```python
+def __init__(
+ self,
+ enable_visualization: bool = True,
+ console=None
+)
+```
+
+**Parameters:**
+
+- `enable_visualization` — Enable/disable visualization output
+- `console` — Optional `rich.Console` for output control
+
+#### set_visualization_enabled()
+
+Toggle visualization at runtime:
+
+```python
+def set_visualization_enabled(self, enabled: bool) -> None
+```
+
+**Example:**
+
+```python
+# Disable during bulk operations
+viz_observer.set_visualization_enabled(False)
+await orchestrator.execute_constellation(constellation)
+
+# Re-enable for interactive use
+viz_observer.set_visualization_enabled(True)
+```
+
+### Constellation Management
+
+#### register_constellation()
+
+Manually register a constellation for visualization:
+
+```python
+def register_constellation(
+ self,
+ constellation_id: str,
+ constellation: TaskConstellation
+) -> None
+```
+
+**Use Case:** Pre-register constellations before execution starts.
+
+#### get_constellation()
+
+Retrieve stored constellation reference:
+
+```python
+def get_constellation(self, constellation_id: str) -> Optional[TaskConstellation]
+```
+
+#### clear_constellations()
+
+Clear all stored constellation references:
+
+```python
+def clear_constellations(self) -> None
+```
+
+## Customization
+
+### Custom Console
+
+Provide custom Rich console for output control:
+
+```python
+from rich.console import Console
+
+# Console with custom width and theme
+custom_console = Console(
+ width=120,
+ theme=my_custom_theme,
+ record=True # Enable recording for export
+)
+
+viz_observer = DAGVisualizationObserver(
+ enable_visualization=True,
+ console=custom_console
+)
+```
+
+### Selective Visualization
+
+Visualize only specific event types:
+
+```python
+from galaxy.core.events import EventType
+
+# Subscribe to specific events only
+event_bus.subscribe(viz_observer, {
+ EventType.CONSTELLATION_STARTED,
+ EventType.CONSTELLATION_MODIFIED,
+ EventType.TASK_FAILED # Only show failures
+})
+```
+
+## Usage Examples
+
+### Example 1: Basic Visualization
+
+```python
+from galaxy.session.observers import DAGVisualizationObserver
+from galaxy.core.events import get_event_bus
+
+async def visualize_execution():
+ """Execute constellation with visualization."""
+
+ # Create and subscribe visualization observer
+ viz_observer = DAGVisualizationObserver(enable_visualization=True)
+ event_bus = get_event_bus()
+ event_bus.subscribe(viz_observer)
+
+ # Execute constellation (visualization happens automatically)
+ await orchestrator.execute_constellation(constellation)
+
+ # Clean up
+ event_bus.unsubscribe(viz_observer)
+```
+
+### Example 2: Conditional Visualization
+
+```python
+async def execute_with_conditional_viz(constellation, verbose: bool = False):
+ """Execute with visualization only if verbose mode enabled."""
+
+ viz_observer = DAGVisualizationObserver(enable_visualization=verbose)
+ event_bus = get_event_bus()
+
+ if verbose:
+ event_bus.subscribe(viz_observer)
+
+ try:
+ await orchestrator.execute_constellation(constellation)
+ finally:
+ if verbose:
+ event_bus.unsubscribe(viz_observer)
+```
+
+### Example 3: Export Visualization
+
+```python
+from rich.console import Console
+
+async def execute_and_export_visualization():
+ """Execute constellation and export visualization to HTML."""
+
+ # Create console with recording enabled
+ console = Console(record=True, width=120)
+ viz_observer = DAGVisualizationObserver(
+ enable_visualization=True,
+ console=console
+ )
+
+ event_bus = get_event_bus()
+ event_bus.subscribe(viz_observer)
+
+ try:
+ await orchestrator.execute_constellation(constellation)
+ finally:
+ event_bus.unsubscribe(viz_observer)
+
+ # Export recorded output to HTML
+ console.save_html("execution_visualization.html")
+ print("Visualization saved to execution_visualization.html")
+```
+
+### Example 4: Multiple Constellations
+
+```python
+async def visualize_multiple_constellations():
+ """Visualize multiple constellation executions."""
+
+ viz_observer = DAGVisualizationObserver(enable_visualization=True)
+ event_bus = get_event_bus()
+ event_bus.subscribe(viz_observer)
+
+ try:
+ for constellation in constellations:
+ print(f"\n{'='*60}")
+ print(f"Executing: {constellation.name}")
+ print(f"{'='*60}\n")
+
+ await orchestrator.execute_constellation(constellation)
+
+ # Clear constellation references between executions
+ viz_observer.clear_constellations()
+ finally:
+ event_bus.unsubscribe(viz_observer)
+```
+
+## Performance Considerations
+
+### Visualization Overhead
+
+Visualization adds minimal overhead:
+
+- **Small DAGs** (< 10 tasks): Negligible impact
+- **Medium DAGs** (10-50 tasks): < 1% overhead
+- **Large DAGs** (> 50 tasks): Topology rendering may be slow
+
+### Optimization Strategies
+
+```python
+# Strategy 1: Disable for large constellations
+if constellation.task_count > 50:
+ viz_observer.set_visualization_enabled(False)
+
+# Strategy 2: Subscribe to fewer events
+event_bus.subscribe(viz_observer, {
+ EventType.CONSTELLATION_STARTED,
+ EventType.CONSTELLATION_COMPLETED,
+ EventType.TASK_FAILED # Only show problems
+})
+
+# Strategy 3: Conditional topology display
+# (Handler automatically skips topology for constellations > 10 tasks)
+```
+
+## Best Practices
+
+### 1. Enable for Interactive Sessions
+
+```python
+# ✅ Good: Interactive development/debugging
+if __name__ == "__main__":
+ viz_observer = DAGVisualizationObserver(enable_visualization=True)
+ # ...
+
+# ✅ Good: Batch processing
+if running_in_batch_mode:
+ viz_observer = DAGVisualizationObserver(enable_visualization=False)
+```
+
+### 2. Clean Up Constellation References
+
+```python
+# After processing many constellations
+for constellation in constellation_list:
+ await orchestrator.execute_constellation(constellation)
+ viz_observer.clear_constellations() # Free memory
+```
+
+### 3. Export for Documentation
+
+```python
+# Record visualization for documentation/reports
+console = Console(record=True)
+viz_observer = DAGVisualizationObserver(console=console)
+
+# ... execute constellation ...
+
+# Export
+console.save_html("docs/execution_example.html")
+console.save_text("logs/execution.txt")
+```
+
+## Related Documentation
+
+- **[Observer System Overview](overview.md)** — Architecture and design
+- **[Progress Observer](progress_observer.md)** — Task completion tracking
+
+## Summary
+
+The DAG Visualization Observer:
+
+- **Displays** constellation structure and execution progress
+- **Delegates** to specialized handlers for clean separation
+- **Uses** Rich terminal graphics for beautiful output
+- **Supports** conditional enabling/disabling
+- **Exports** visualization for documentation
+
+This observer is essential for understanding and debugging constellation execution, providing intuitive visual feedback for complex DAG workflows.
diff --git a/documents/docs/galaxy/overview.md b/documents/docs/galaxy/overview.md
new file mode 100644
index 000000000..4eb3674ec
--- /dev/null
+++ b/documents/docs/galaxy/overview.md
@@ -0,0 +1,604 @@
+# UFO³ — Weaving the Digital Agent Galaxy
+
+
+
+
From isolated device agents to interconnected constellations — Building the Digital Agent Galaxy
+
+
+---
+
+## 🚀 What is UFO³ Galaxy?
+
+**UFO³ Galaxy** is a revolutionary **cross-device orchestration framework** that transforms isolated device agents into a unified digital ecosystem. It models complex user requests as **Task Constellations** (星座) — dynamic distributed DAGs where nodes represent executable subtasks and edges capture dependencies across heterogeneous devices.
+
+### 🎯 The Vision
+
+Building truly ubiquitous intelligent agents requires moving beyond single-device automation. UFO³ Galaxy addresses four fundamental challenges in cross-device agent orchestration:
+
+**🔄 Asynchronous Parallelism**
+Enabling concurrent task execution across multiple devices while maintaining correctness through event-driven coordination and safe concurrency control
+
+**⚡ Dynamic Adaptation**
+Real-time workflow evolution in response to intermediate results, transient failures, and runtime observations without workflow abortion
+
+**🌐 Distributed Coordination**
+Reliable, low-latency communication across heterogeneous devices via WebSocket-based Agent Interaction Protocol with fault tolerance
+
+**🛡️ Safety Guarantees**
+Formal invariants ensuring DAG consistency during concurrent modifications and parallel execution, verified through rigorous proofs
+
+---
+
+## 🏗️ Architecture
+
+
+
+
UFO³ Galaxy Layered Architecture — From natural language to distributed execution
+
+
+
+### Layered Design
+
+UFO³ Galaxy follows a **hierarchical orchestration model** that separates global coordination from local execution. This architecture enables scalable cross-device orchestration while maintaining consistent control and responsiveness across diverse operating systems and network environments.
+
+#### 🎛️ Hierarchical Control Plane
+
+**ConstellationClient** serves as the **global control plane**, maintaining a live registry of all connected device agents with their:
+- Capability profiles and system specifications
+- Runtime health metrics and availability status
+- Current load and resource utilization
+
+This registry enables intelligent task placement based on device capabilities, avoiding mismatches between task requirements and device capacity.
+
+Each device hosts a **device agent server** that manages local orchestration through persistent WebSocket sessions with ConstellationClient. The server:
+- Maintains execution contexts on the host
+- Provides unified interface to underlying tools via MCP servers
+- Handles task execution, telemetry streaming, and resource monitoring
+
+**Clean separation**: Global orchestration policies are decoupled from device-specific heterogeneity, providing consistent abstraction across endpoints with different OS, hardware, or network conditions.
+
+#### 🔄 Orchestration Flow
+
+1. **DAG Synthesis**: ConstellationClient invokes the **Constellation Agent** to construct a TaskConstellation—a dynamic DAG encoding task decomposition, dependencies, and device mappings
+2. **Device Assignment**: Each TaskStar (DAG node) is assigned to suitable device agents based on capability profiles and system load
+3. **Asynchronous Execution**: The **Constellation Orchestrator** executes the DAG in an event-driven manner:
+ - Task completions trigger dependent nodes
+ - Failures prompt retry, migration, or partial DAG rewrites
+ - Workflows adapt to real-time system dynamics (device churn, network variability)
+
+**Result**: Highly parallel and resilient execution that sustains workflow completion even as subsets of devices fail or reconnect.
+
+#### 🔌 Cross-Agent Communication
+
+The **Agent Interaction Protocol (AIP)** handles all cross-agent interactions:
+- Agent registration and capability synchronization
+- Task dispatch and progress reporting
+- Result aggregation and telemetry streaming
+
+Built on persistent WebSocket channels, AIP provides:
+- **Lightweight**: Minimal overhead for control messages
+- **Bidirectional**: Full-duplex communication between client and agents
+- **Multiplexed**: Concurrent message streams over single connection
+- **Low-latency**: Fast propagation of control signals and state updates
+- **Resilient**: Maintains global consistency despite intermittent connectivity
+
+Together, these design elements form a cohesive foundation for orchestrating large-scale, heterogeneous, and adaptive workflows across a resilient multi-device execution fabric.
+
+---
+
+## ✨ Core Design Principles
+
+UFO³ Galaxy realizes cross-device orchestration through **five tightly integrated design principles**:
+
+### 1. 🌟 Declarative Decomposition into Dynamic DAG (Task Constellation)
+
+Natural-language or programmatic requests are decomposed by the **Constellation Agent** into a structured DAG of **TaskStars** (nodes) and **TaskStarLines** (edges) that encode workflow logic, dependencies, and device assignments. This declarative structure is amenable to automated scheduling, introspection, and dynamic modification throughout execution.
+
+**Key Benefits:**
+- 📋 **Declarative structure** for automated scheduling
+- 🔍 **Runtime introspection** for workflow visibility
+- ✏️ **Dynamic rewriting** throughout execution
+- 🔄 **Automated orchestration** across heterogeneous devices
+
+```mermaid
+graph LR
+ A[User Intent] --> B[Constellation Agent]
+ B --> C[Task Constellation DAG]
+ C --> D[TaskStar 1 Windows]
+ C --> E[TaskStar 2 Linux GPU]
+ C --> F[TaskStar 3 Linux CPU]
+ C --> G[TaskStar 4 Mobile]
+ E --> H[TaskStar 5]
+ F --> H
+ G --> H
+```
+
+[Learn more →](constellation/overview.md)
+
+### 2. 🔄 Continuous, Result-Driven Graph Evolution
+
+The **Task Constellation** is a **living data structure** that evolves in response to execution feedback. Intermediate outputs, transient failures, and new observations trigger controlled rewrites—adding diagnostic TaskStars, creating fallbacks, rewiring dependencies, or pruning completed nodes—so the system adapts dynamically instead of aborting on errors.
+
+**Adaptation Mechanisms:**
+- 🩺 **Diagnostic TaskStars** added for debugging
+- 🛡️ **Fallback creation** for error recovery
+- 🔗 **Dependency rewiring** for workflow optimization
+- ✂️ **Node pruning** after completion
+
+The **Constellation Agent** operates in two modes:
+- **Creation Mode**: Synthesizes initial DAG from user request with device-aware task decomposition
+- **Editing Mode**: Incrementally refines constellation based on task completion events and runtime feedback
+
+[Learn more →](constellation_agent/overview.md)
+
+### 3. 🎯 Heterogeneous, Asynchronous, and Safe Orchestration
+
+Each **Task Star** is matched to the most suitable device agent via rich **Agent Profiles** reflecting OS, hardware capabilities, and installed tools. The **Constellation Orchestrator** executes tasks asynchronously, allowing multiple TaskStars to progress in parallel.
+
+**Safety Guarantees:**
+- 🔒 **Safe assignment locking** prevents race conditions
+- 📅 **Event-driven scheduling** monitors DAG readiness
+- ✅ **DAG consistency checks** maintain structural integrity
+- 🔄 **Batched edits** ensure atomicity
+- 📐 **Formal verification** reinforces correctness
+- ⏱️ **Timeout protection** prevents deadlocks
+
+These mechanisms collectively ensure **high efficiency without compromising reliability**.
+
+[Learn more →](constellation_orchestrator/overview.md)
+
+### 4. 🔌 Unified Agent Interaction Protocol (AIP)
+
+Built atop persistent **WebSocket channels**, AIP provides a unified, secure, and fault-tolerant layer for the entire agent ecosystem.
+
+**Core Capabilities:**
+- 📝 **Agent registry** with capability profiles
+- 🔐 **Session management** for secure communication
+- 📤 **Task dispatch** with intelligent routing
+- 🎯 **Coordination primitives** for distributed workflows
+- 💓 **Heartbeat monitoring** for health tracking
+- 🔌 **Automatic reconnection** under network fluctuations
+- 🔄 **Retry mechanisms** for reliability
+
+**Architecture Benefits:**
+- 🪶 **Lightweight interface** for easy integration
+- 🧩 **Extensible design** supports new agent types
+- 🛡️ **Fault tolerance** ensures continuous operation
+
+This protocol **abstracts OS and network heterogeneity**, enabling seamless collaboration among agents across desktops, servers, and edge devices, while allowing new agents to integrate seamlessly into the UFO³ ecosystem.
+
+[Learn more →](../aip/overview.md)
+
+### 5. 🛠️ Template-Driven Framework for Device Agents
+
+To **democratize agent creation**, UFO³ provides a **lightweight development template and toolkit** for rapidly building new device agents.
+
+**Development Framework:**
+- 📄 **Capability declaration** defines agent profiles
+- 🔗 **Environment binding** connects to local systems
+- 🧩 **MCP server integration** for tool augmentation
+- 🔧 **Modular design** accelerates development
+
+**Model Context Protocol (MCP) Integration:**
+- 🎁 **Tool packages** via MCP servers
+- 🔌 **Plug-and-play** capability extension
+- 🌐 **Cross-platform** tool standardization
+- 🚀 **Rapid prototyping** of new agents
+
+This modular architecture maintains consistency across the constellation while enabling developers to extend UFO³ to new platforms (mobile, web, IoT, embedded systems, etc.) with minimal effort.
+
+**🔌 Extensibility:** UFO³ is designed as a **universal framework** that supports developing new device agents for different platforms (mobile, web, IoT, embedded systems, etc.) and applications. Through the **Agent Interaction Protocol (AIP)**, custom device agents can seamlessly integrate into UFO³ Galaxy for coordinated multi-device automation. **Want to build your own device agent?** See our [Creating Custom Device Agents tutorial](../tutorials/creating_device_agent/overview.md) to learn how to extend UFO³ to new platforms.
+
+[Learn more →](agent_registration/overview.md) | [MCP Integration →](../mcp/overview.md)
+
+---
+
+## 🎯 Key Capabilities
+
+### 🌐 Cross-Device Collaboration
+Execute workflows that span Windows desktops, Linux servers, GPU clusters, mobile devices, and edge nodes—all from a single natural language request.
+
+### ⚡ Asynchronous Parallelism
+Automatically identify parallelizable subtasks and execute them concurrently across devices through:
+- **Event-driven scheduling** that continuously monitors DAG topology for ready tasks
+- **Non-blocking execution** with Python `asyncio` for maximum concurrency
+- **Dynamic adaptation** that integrates new tasks without interrupting running execution
+
+Result: Dramatically reduced end-to-end latency compared to sequential execution.
+
+### 🛡️ Safety & Consistency
+- **Three formal invariants** (I1-I3) enforced at runtime for DAG correctness
+- **Safe assignment locking** prevents race conditions during concurrent modifications
+- **Acyclicity validation** ensures no circular dependencies
+- **State merging** algorithm preserves execution progress during dynamic edits
+- **Timeout protection** prevents deadlocks from agent failures
+
+### 🔄 Dynamic Workflow Evolution
+- **Dual-mode operation**: Separate creation and editing phases with controlled transitions
+- **Feedback-driven adaptation**: Task completion events trigger intelligent constellation refinement
+- **LLM-powered reasoning**: ReAct architecture for context-aware DAG modifications
+- **Undo/redo support**: ConstellationEditor with command pattern for safe interactive editing
+
+### 👁️ Rich Observability
+- Real-time constellation visualization with DAG topology updates
+- Event bus with publish-subscribe pattern for monitoring task progress
+- Detailed execution logs with markdown trajectory support
+- Task status tracking (pending, running, completed, failed, cancelled)
+- Dependency graph inspection and validation tools
+
+---
+
+## 🎨 Use Cases
+
+### 🖥️ Software Development & Deployment
+*"Clone the repo on my laptop, build the Docker image on the GPU server, deploy to staging, and run the test suite on the CI cluster."*
+
+**Workflow DAG:**
+```mermaid
+graph LR
+ A[Clone Windows] --> B[Build Linux GPU]
+ B --> C[Deploy Linux Server]
+ C --> D[Test Linux CI]
+```
+
+### 📊 Data Science Workflows
+*"Fetch the dataset from cloud storage, preprocess on the Linux workstation, train the model on the A100 node, and generate a visualization dashboard on my Windows machine."*
+
+**Workflow DAG:**
+```mermaid
+graph LR
+ A[Fetch Any] --> B[Preprocess Linux]
+ B --> C[Train Linux GPU]
+ C --> D[Visualize Windows]
+```
+
+### 📝 Cross-Platform Document Processing
+*"Extract data from Excel on Windows, process with Python scripts on Linux, generate PDF reports, and send summary emails."*
+
+**Workflow DAG:**
+```mermaid
+graph LR
+ A[Extract Windows] --> B[Process Linux]
+ B --> C[Generate PDF Windows]
+ B --> D[Send Email Windows]
+```
+
+### 🔬 Distributed System Monitoring
+*"Collect server logs from all Linux machines, analyze for errors, generate alerts, and create a consolidated report."*
+
+**Workflow DAG:**
+```mermaid
+graph LR
+ A[Collect Logs Linux 1] --> D[Analyze Errors Any]
+ B[Collect Logs Linux 2] --> D
+ C[Collect Logs Linux 3] --> D
+ D --> E[Generate Report Windows]
+```
+
+### 🏢 Enterprise Automation
+*"Query the database on the server, process the results, update Excel spreadsheets on Windows, and generate PowerPoint presentations."*
+
+**Workflow DAG:**
+```mermaid
+graph LR
+ A[Query DB Linux] --> B[Process Data Any]
+ B --> C[Update Excel Windows]
+ B --> D[Create PPT Windows]
+```
+
+---
+
+## 🗺️ Documentation Structure
+
+### 🚀 [Quick Start](../getting_started/quick_start_galaxy.md)
+Get UFO³ Galaxy up and running in minutes with our step-by-step guide
+
+### 👥 [Galaxy Client](client/overview.md)
+Device coordination, connection management, and ConstellationClient API
+
+### 🧠 [Constellation Agent](constellation_agent/overview.md)
+LLM-driven task decomposition, DAG creation, and dynamic workflow evolution
+
+### ⚙️ [Constellation Orchestrator](constellation_orchestrator/overview.md)
+Asynchronous execution engine, event-driven coordination, and safety guarantees
+
+### 📊 [Task Constellation](constellation/overview.md)
+DAG structure, TaskStar nodes, TaskStarLine edges, and constellation editor
+
+### 🆔 [Agent Registration](agent_registration/overview.md)
+Device registry, agent profiles, and registration flow
+
+### 🌐 [Agent Interaction Protocol](../aip/overview.md)
+WebSocket messaging, protocol specification, and communication patterns
+
+### ⚙️ [Configuration](../configuration/system/galaxy_devices.md)
+Device pools, capabilities, and orchestration policies
+
+---
+
+## 🚦 Getting Started
+
+Ready to build your Digital Agent Galaxy? Follow these steps:
+
+### 1. Install UFO³
+```bash
+# Clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### 2. Configure Device Pool
+
+Create configuration files in `config/galaxy/`:
+
+**`config/galaxy/devices.yaml`** - Define your devices:
+
+```yaml
+devices:
+ - device_id: "windowsagent"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "web_browsing"
+ - "office_applications"
+ - "file_management"
+ metadata:
+ location: "home_office"
+ os: "windows"
+ performance: "medium"
+ max_retries: 5
+
+ - device_id: "linux_agent_1"
+ server_url: "ws://localhost:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ - "python"
+ - "docker"
+ metadata:
+ os: "linux"
+ performance: "high"
+ logs_file_path: "/root/log/log1.txt"
+ auto_connect: true
+ max_retries: 5
+
+ - device_id: "mobile_agent_1"
+ server_url: "ws://localhost:5002/ws"
+ os: "android"
+ capabilities:
+ - "mobile"
+ - "adb"
+ - "ui_automation"
+ metadata:
+ os: "android"
+ performance: "medium"
+ device_type: "smartphone"
+ auto_connect: true
+ max_retries: 5
+```
+
+**`config/galaxy/constellation.yaml`** - Configure runtime settings:
+
+```yaml
+# Constellation Runtime Settings
+CONSTELLATION_ID: "my_constellation"
+HEARTBEAT_INTERVAL: 30.0 # Heartbeat interval in seconds
+RECONNECT_DELAY: 5.0 # Delay before reconnecting in seconds
+MAX_CONCURRENT_TASKS: 6 # Maximum concurrent tasks
+MAX_STEP: 15 # Maximum steps per session
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices.yaml"
+
+# Logging Configuration
+LOG_TO_MARKDOWN: true
+```
+
+See [Galaxy Configuration](../configuration/system/galaxy_devices.md) for complete documentation.
+
+### 3. Start Device Agents
+
+On each device, launch the Agent Server. For detailed setup instructions, see the respective quick start guides:
+
+**On Windows:**
+
+See [Windows Agent (UFO²) Quick Start →](../getting_started/quick_start_ufo2.md)
+
+**On Linux:**
+
+See [Linux Agent Quick Start →](../getting_started/quick_start_linux.md)
+
+**On Mobile (Android):**
+
+See [Mobile Agent Quick Start →](../getting_started/quick_start_mobile.md)
+
+### 4. Launch Galaxy Client
+
+**Interactive Mode:**
+```bash
+python -m galaxy --interactive
+```
+
+**Direct Request:**
+```bash
+python -m galaxy "Your cross-device task here"
+```
+
+**Programmatic API:**
+```python
+from galaxy.galaxy_client import GalaxyClient
+
+async def main():
+ client = GalaxyClient(session_name="my_session")
+ await client.initialize()
+ result = await client.process_request("Your task request")
+ await client.shutdown()
+```
+
+For detailed instructions, see the [Quick Start Guide](../getting_started/quick_start_galaxy.md).
+
+---
+
+## 🔧 System Components
+
+UFO³ Galaxy consists of several integrated components working together:
+
+### Core Components
+
+| Component | Location | Responsibility |
+|-----------|----------|----------------|
+| **GalaxyClient** | `galaxy/galaxy_client.py` | Session management, user interaction, orchestration coordination |
+| **ConstellationClient** | `galaxy/client/constellation_client.py` | Device management, connection lifecycle, task assignment |
+| **ConstellationAgent** | `galaxy/agents/constellation_agent.py` | LLM-driven DAG synthesis and evolution, state machine control |
+| **TaskConstellationOrchestrator** | `galaxy/constellation/orchestrator/` | Asynchronous execution, event coordination, safety enforcement |
+| **TaskConstellation** | `galaxy/constellation/task_constellation.py` | DAG data structure, validation, and modification APIs |
+| **DeviceManager** | `galaxy/client/device_manager.py` | WebSocket connections, heartbeat monitoring, message routing |
+
+### Supporting Infrastructure
+
+| Component | Purpose |
+|-----------|---------|
+| **Event Bus** | Publish-subscribe system for constellation events |
+| **Observer Pattern** | Event listeners for visualization and synchronization |
+| **Device Registry** | Centralized device information and capability tracking |
+| **Agent Profile** | Device metadata and capability declarations |
+| **MCP Servers** | Tool augmentation via Model Context Protocol |
+
+For detailed component documentation, see the respective sections in [Documentation Structure](#documentation-structure).
+
+### Technology Stack
+
+| Layer | Technologies |
+|-------|-------------|
+| **Programming** | Python 3.10+, asyncio, dataclasses |
+| **Communication** | WebSockets, JSON-RPC |
+| **LLM Integration** | OpenAI API, Azure OpenAI, Gemini, Claude, Custom Models |
+| **Tool Augmentation** | Model Context Protocol (MCP) |
+| **Configuration** | YAML, Pydantic models |
+| **Logging** | Python logging, Rich console, Markdown trajectory |
+| **Testing** | pytest, mock agents |
+
+---
+
+## 🌟 From Devices to Constellations to Galaxy
+
+UFO³ represents a paradigm shift in intelligent automation:
+
+- **Single Device** → Isolated agents operating within one OS
+- **Task Constellation** → Coordinated multi-device workflows for one task
+- **Digital Agent Galaxy** → Interconnected constellations spanning your entire digital estate
+
+Over time, multiple constellations can interconnect, weaving together agents, devices, and capabilities into a self-organizing **Digital Agent Galaxy**. This design elevates cross-device automation from a brittle engineering challenge to a unified orchestration paradigm, where multi-device workflows become naturally expressive, paving the way for large-scale, adaptive, and resilient intelligent ubiquitous computing systems.
+
+---
+
+## 📊 Performance Monitoring & Evaluation
+
+UFO³ Galaxy provides comprehensive performance monitoring and evaluation tools to analyze multi-device workflow execution:
+
+### Automated Metrics Collection
+
+Galaxy automatically collects detailed performance metrics during execution through an event-driven observer pattern:
+
+- **Task Metrics**: Execution times, success rates, bottleneck identification
+- **Constellation Metrics**: DAG statistics, parallelism analysis, critical path computation
+- **Modification Metrics**: Dynamic editing patterns and adaptation frequency
+- **Device Metrics**: Per-device performance and resource utilization
+
+All metrics are captured in real-time without impacting execution performance and saved to structured JSON files for programmatic analysis.
+
+### Trajectory Report
+
+Galaxy automatically generates a comprehensive Markdown trajectory report (`output.md`) documenting the complete execution lifecycle:
+
+```
+logs/galaxy//output.md
+```
+
+This human-readable report includes:
+- Step-by-step execution timeline with agent actions
+- Interactive DAG topology visualizations showing constellation evolution
+- Detailed task execution logs with results and errors
+- Device connection status and coordination events
+- Complete before/after constellation states at each step
+
+The trajectory report provides visual debugging and workflow understanding, complementing the quantitative `result.json` metrics.
+
+### Result JSON Format
+
+After each session, Galaxy also generates a comprehensive `result.json` file containing:
+
+```
+logs/galaxy//result.json
+```
+
+This file includes:
+- Complete session metadata and execution timeline
+- Task-by-task performance breakdown
+- Constellation statistics (parallelism ratio, critical path, max concurrency)
+- Modification history showing DAG evolution
+- Final results and outcomes
+
+**Example Key Metrics:**
+
+| Metric | Description | Use Case |
+|--------|-------------|----------|
+| `parallelism_ratio` | Efficiency of parallel execution (total_work / critical_path) | Optimization target |
+| `critical_path_length` | Minimum possible execution time | Theoretical performance limit |
+| `average_task_duration` | Mean task execution time | Baseline performance |
+| `modification_count` | Number of dynamic DAG edits | Adaptability analysis |
+
+### Performance Analysis Tools
+
+```python
+import json
+
+# Load session results
+with open("logs/galaxy/task_32/result.json", 'r') as f:
+ result = json.load(f)
+
+# Extract key metrics
+metrics = result["session_results"]["metrics"]
+task_stats = metrics["task_statistics"]
+const_stats = result["session_results"]["final_constellation_stats"]
+
+print(f"✅ Success Rate: {task_stats['success_rate'] * 100:.1f}%")
+print(f"⏱️ Avg Task Duration: {task_stats['average_task_duration']:.2f}s")
+print(f"🔀 Parallelism Ratio: {const_stats['parallelism_ratio']:.2f}")
+```
+
+**Documentation:**
+
+- **[Trajectory Report Guide](./evaluation/trajectory_report.md)** - Complete guide to the human-readable execution log with DAG visualizations
+- **[Performance Metrics Guide](./evaluation/performance_metrics.md)** - Comprehensive metrics documentation with analysis examples
+- **[Result JSON Reference](./evaluation/result_json.md)** - Complete schema reference and programmatic access guide
+
+---
+
+## 📚 Learn More
+
+- **Research Paper**: [UFO³: Weaving the Digital Agent Galaxy](https://arxiv.org/) *(Coming Soon)*
+- **UFO² (Desktop AgentOS)**: [Documentation](../ufo2/overview.md)
+- **UFO (Original)**: [GitHub Repository](https://github.com/microsoft/UFO)
+
+---
+
+## 🤝 Contributing
+
+We welcome contributions! Whether you're building new device agents, improving orchestration algorithms, or enhancing the protocol, check out our Contributing Guide on GitHub.
+
+---
+
+## 📄 License
+
+UFO³ Galaxy is released under the MIT License.
+
+---
+
+
+
Transform your distributed devices into a unified digital collective.
+
UFO³ Galaxy — Where every device is a star, and every task is a constellation.
+
diff --git a/documents/docs/galaxy/webui.md b/documents/docs/galaxy/webui.md
new file mode 100644
index 000000000..345910cf2
--- /dev/null
+++ b/documents/docs/galaxy/webui.md
@@ -0,0 +1,1696 @@
+# Galaxy WebUI
+
+The **Galaxy WebUI** is a modern, interactive web interface for the UFO³ Galaxy Framework. It provides real-time visualization of task constellations, device status, agent interactions, and execution flow through an elegant, space-themed interface.
+
+
+
+
Galaxy WebUI - Interactive constellation visualization and real-time monitoring
+
+
+---
+
+## 🌟 Overview
+
+The Galaxy WebUI transforms the command-line Galaxy experience into a rich, visual interface where you can:
+
+- **🗣️ Chat with Galaxy**: Submit natural language requests through an intuitive chat interface
+- **📊 Visualize Constellations**: Watch task constellations form and execute as interactive DAG graphs
+- **🎯 Monitor Execution**: Track task status, device assignments, and real-time progress
+- **🔄 See Agent Reasoning**: Observe agent thoughts, plans, and decision-making processes
+- **🖥️ Manage Devices**: View, monitor, and **add new devices** through the UI
+- **➕ Add Device Agents**: Register new device agents dynamically without restarting
+- **📡 Stream Events**: Follow the event log to understand system behavior in real-time
+
+---
+
+## 🚀 Quick Start
+
+### Starting the WebUI
+
+```powershell
+# Launch Galaxy with WebUI
+python -m galaxy --webui
+```
+
+The WebUI will automatically:
+1. Start the backend server on `http://localhost:8000` (or next available port)
+2. Open your default browser to the interface
+3. Establish WebSocket connection for real-time updates
+
+!!!tip "Custom Session Name"
+ ```powershell
+ python -m galaxy --webui --session-name "data_pipeline_demo"
+ ```
+
+### First Request
+
+1. **Enter your request** in the chat input at the bottom
+2. **Press Enter** or click Send
+3. **Watch the constellation form** in the DAG visualization panel
+4. **Monitor task execution** as devices process their assigned tasks
+5. **See results** displayed in the chat window
+
+---
+
+## 🏗️ Architecture
+
+### Design Principles
+
+The Galaxy WebUI backend follows **software engineering best practices**:
+
+**Separation of Concerns:**
+- **Models Layer**: Pydantic models ensure type safety and validation
+- **Services Layer**: Business logic isolated from presentation
+- **Handlers Layer**: WebSocket message processing logic
+- **Routers Layer**: HTTP endpoint definitions
+
+**Dependency Injection:**
+- `AppState` class provides centralized state management
+- `get_app_state()` dependency injection function
+- Replaces global variables with type-safe properties
+
+**Type Safety:**
+- Pydantic models for all API requests/responses
+- Enums for constants (`WebSocketMessageType`, `RequestStatus`)
+- `TYPE_CHECKING` pattern for forward references
+- Comprehensive type annotations throughout
+
+**Modularity:**
+- Clear module boundaries
+- Easy to test individual components
+- Simple to extend with new features
+- Better code organization and maintainability
+
+### System Architecture
+
+The Galaxy WebUI follows a modern client-server architecture with real-time event streaming:
+
+```mermaid
+graph TB
+ subgraph "Galaxy WebUI Stack"
+ subgraph Frontend["Frontend (React + TypeScript + Vite)"]
+ F1[Chat Interface]
+ F2[DAG Visualization ReactFlow]
+ F3[Device Management]
+ F4[Event Log]
+ F5[State Management Zustand]
+ end
+
+ subgraph Backend["Backend (FastAPI + WebSocket)"]
+ subgraph Presentation["Presentation Layer"]
+ B1[FastAPI App server.py]
+ B2[Routers health/devices/websocket]
+ end
+
+ subgraph Business["Business Logic Layer"]
+ B3[Services Config/Device/Galaxy]
+ B4[Handlers WebSocket Message Handler]
+ end
+
+ subgraph Data["Data & Models Layer"]
+ B5[Models Requests/Responses]
+ B6[Enums MessageType/Status]
+ B7[Dependencies AppState]
+ end
+
+ subgraph Events["Event Processing"]
+ B8[WebSocketObserver]
+ B9[EventSerializer]
+ end
+ end
+
+ subgraph Core["Galaxy Core"]
+ C1[ConstellationAgent]
+ C2[Task Orchestrator]
+ C3[Device Manager]
+ C4[Event System]
+ end
+
+ Frontend <-->|WebSocket| B2
+ B2 --> B4
+ B4 --> B3
+ B3 --> B7
+ B2 --> B5
+ B8 --> B9
+ B8 -->|Broadcast| Frontend
+ C4 -->|Publish Events| B8
+ B3 <-->|State Access| B7
+ Backend <-->|Event Bus| Core
+ end
+
+ style Frontend fill:#1a1a2e,stroke:#00d4ff,stroke-width:2px,color:#fff
+ style Presentation fill:#16213e,stroke:#7b2cbf,stroke-width:2px,color:#fff
+ style Business fill:#1a1a2e,stroke:#00d4ff,stroke-width:2px,color:#fff
+ style Data fill:#0f1419,stroke:#10b981,stroke-width:2px,color:#fff
+ style Events fill:#16213e,stroke:#ff006e,stroke-width:2px,color:#fff
+ style Core fill:#0a0e27,stroke:#ff006e,stroke-width:2px,color:#fff
+```
+
+### Component Overview
+
+#### Backend Components
+
+The Galaxy WebUI backend follows a **modular architecture** with clear separation of concerns:
+
+| Component | File/Directory | Responsibility |
+|-----------|----------------|----------------|
+| **FastAPI Server** | `galaxy/webui/server.py` | Application initialization, middleware, router registration, lifespan management |
+| **Models** | `galaxy/webui/models/` | Pydantic models for requests/responses, enums for type safety |
+| **Services** | `galaxy/webui/services/` | Business logic layer (config, device, galaxy operations) |
+| **Handlers** | `galaxy/webui/handlers/` | WebSocket message processing and routing |
+| **Routers** | `galaxy/webui/routers/` | FastAPI endpoint definitions organized by feature |
+| **Dependencies** | `galaxy/webui/dependencies.py` | Dependency injection for state management (AppState) |
+| **WebSocket Observer** | `galaxy/webui/websocket_observer.py` | Event subscription and broadcasting to WebSocket clients |
+| **Event Serializer** | Built into observer | Converts Python objects to JSON-compatible format |
+
+**Detailed Backend Structure:**
+
+```
+galaxy/webui/
+├── server.py # Main FastAPI application
+├── dependencies.py # AppState and dependency injection
+├── websocket_observer.py # EventSerializer + WebSocketObserver
+├── models/
+│ ├── __init__.py # Export all models
+│ ├── enums.py # WebSocketMessageType, RequestStatus enums
+│ ├── requests.py # Pydantic request models
+│ └── responses.py # Pydantic response models
+├── services/
+│ ├── __init__.py
+│ ├── config_service.py # Configuration management
+│ ├── device_service.py # Device operations and snapshots
+│ └── galaxy_service.py # Galaxy client interactions
+├── handlers/
+│ ├── __init__.py
+│ └── websocket_handlers.py # WebSocket message handler
+├── routers/
+│ ├── __init__.py
+│ ├── health.py # Health check endpoint
+│ ├── devices.py # Device management endpoints
+│ └── websocket.py # WebSocket endpoint
+└── templates/
+ └── index.html # Fallback HTML page
+```
+
+**Architecture Benefits:**
+
+✅ **Maintainability**: Each module has a single, clear responsibility
+✅ **Testability**: Services and handlers can be unit tested independently
+✅ **Type Safety**: Pydantic models validate all inputs/outputs
+✅ **Extensibility**: Easy to add new endpoints, message types, or services
+✅ **Readability**: Clear module boundaries improve code comprehension
+✅ **Reusability**: Services can be shared across multiple endpoints
+
+#### Frontend Components
+
+| Component | Location | Purpose |
+|-----------|----------|---------|
+| **App** | `src/App.tsx` | Main layout, connection status, theme management |
+| **ChatWindow** | `src/components/chat/ChatWindow.tsx` | Message display and input interface |
+| **DagPreview** | `src/components/constellation/DagPreview.tsx` | Interactive constellation graph visualization |
+| **DevicePanel** | `src/components/devices/DevicePanel.tsx` | Device status cards, search, and add button |
+| **DeviceCard** | `src/components/devices/DeviceCard.tsx` | Individual device status display |
+| **AddDeviceModal** | `src/components/devices/AddDeviceModal.tsx` | Modal dialog for adding new devices |
+| **RightPanel** | `src/components/layout/RightPanel.tsx` | Tabbed panel for constellation, tasks, details |
+| **EventLog** | `src/components/EventLog.tsx` | Real-time event stream display |
+| **GalaxyStore** | `src/store/galaxyStore.ts` | Zustand state management |
+| **WebSocket Client** | `src/services/websocket.ts` | WebSocket connection with auto-reconnect |
+
+---
+
+## 🔌 Communication Protocol
+
+### HTTP API Endpoints
+
+#### Health Check
+
+```http
+GET /health
+```
+
+**Response:**
+```json
+{
+ "status": "healthy",
+ "connections": 3,
+ "events_sent": 1247
+}
+```
+
+#### Add Device
+
+```http
+POST /api/devices
+Content-Type: application/json
+```
+
+**Request Body:**
+```json
+{
+ "device_id": "windows-laptop-1",
+ "server_url": "ws://192.168.1.100:8080",
+ "os": "Windows",
+ "capabilities": ["excel", "outlook", "browser"],
+ "metadata": {
+ "region": "us-west-2",
+ "owner": "data-team"
+ },
+ "auto_connect": true,
+ "max_retries": 5
+}
+```
+
+**Success Response (200):**
+```json
+{
+ "status": "success",
+ "message": "Device 'windows-laptop-1' added successfully",
+ "device": {
+ "device_id": "windows-laptop-1",
+ "server_url": "ws://192.168.1.100:8080",
+ "os": "Windows",
+ "capabilities": ["excel", "outlook", "browser"],
+ "auto_connect": true,
+ "max_retries": 5,
+ "metadata": {
+ "region": "us-west-2",
+ "owner": "data-team"
+ }
+ }
+}
+```
+
+**Error Responses:**
+
+- **404 Not Found**: `devices.yaml` configuration file not found
+ ```json
+ {
+ "detail": "devices.yaml not found"
+ }
+ ```
+
+- **409 Conflict**: Device ID already exists
+ ```json
+ {
+ "detail": "Device ID 'windows-laptop-1' already exists"
+ }
+ ```
+
+- **500 Internal Server Error**: Failed to add device
+ ```json
+ {
+ "detail": "Failed to add device: "
+ }
+ ```
+
+### WebSocket Connection
+
+The WebUI maintains a persistent WebSocket connection to the Galaxy backend for bidirectional real-time communication.
+
+**Connection URL:** `ws://localhost:8000/ws`
+
+### Message Types
+
+#### Client → Server
+
+**1. User Request**
+```json
+{
+ "type": "request",
+ "text": "Extract sales data and create an Excel report",
+ "timestamp": 1234567890
+}
+```
+
+**2. Session Reset**
+```json
+{
+ "type": "reset",
+ "timestamp": 1234567890
+}
+```
+
+**3. Ping (Keepalive)**
+```json
+{
+ "type": "ping",
+ "timestamp": 1234567890
+}
+```
+
+#### Server → Client
+
+**1. Welcome Message**
+```json
+{
+ "type": "welcome",
+ "message": "Connected to Galaxy Web UI",
+ "timestamp": 1234567890
+}
+```
+
+**2. Device Snapshot (on connect)**
+```json
+{
+ "event_type": "device_snapshot",
+ "source_id": "webui.server",
+ "timestamp": 1234567890,
+ "data": {
+ "event_name": "device_snapshot",
+ "device_count": 2
+ },
+ "all_devices": {
+ "windows_device_1": {
+ "device_id": "windows_device_1",
+ "status": "connected",
+ "os": "windows",
+ "capabilities": ["desktop_automation", "excel"],
+ "metadata": {},
+ "last_heartbeat": "2025-11-09T10:30:00",
+ "current_task_id": null
+ }
+ }
+}
+```
+
+**3. Galaxy Events**
+
+All Galaxy events are forwarded to the WebUI in real-time:
+
+```json
+{
+ "event_type": "agent_response",
+ "source_id": "ConstellationAgent",
+ "timestamp": 1234567890,
+ "agent_name": "ConstellationAgent",
+ "agent_type": "constellation",
+ "output_type": "response",
+ "output_data": {
+ "thought": "I need to decompose this task...",
+ "plan": ["Analyze requirements", "Create DAG", "Assign devices"],
+ "response": "Creating constellation with 3 tasks"
+ }
+}
+```
+
+```json
+{
+ "event_type": "constellation_created",
+ "source_id": "TaskConstellation",
+ "timestamp": 1234567890,
+ "constellation_id": "constellation_123",
+ "constellation_state": "planning",
+ "data": {
+ "constellation": {
+ "constellation_id": "constellation_123",
+ "name": "Sales Report Pipeline",
+ "state": "planning",
+ "tasks": {
+ "task_1": {
+ "task_id": "task_1",
+ "name": "Extract Data",
+ "status": "pending",
+ "target_device_id": "linux_device_1"
+ }
+ },
+ "dependencies": {
+ "task_2": ["task_1"]
+ }
+ }
+ }
+}
+```
+
+```json
+{
+ "event_type": "task_status_changed",
+ "source_id": "TaskOrchestrator",
+ "timestamp": 1234567890,
+ "task_id": "task_1",
+ "status": "running",
+ "result": null,
+ "error": null
+}
+```
+
+```json
+{
+ "event_type": "device_status_changed",
+ "source_id": "DeviceManager",
+ "timestamp": 1234567890,
+ "device_id": "windows_device_1",
+ "device_status": "busy",
+ "device_info": {
+ "current_task_id": "task_2"
+ }
+}
+```
+
+---
+
+## 🎨 User Interface
+
+### Main Layout
+
+The WebUI uses a three-panel layout:
+
+```mermaid
+graph LR
+ subgraph UI["Galaxy WebUI Layout"]
+ subgraph Header["🌌 Header Bar"]
+ H1[Galaxy Logo]
+ H2[Connection Status]
+ H3[Settings]
+ end
+
+ subgraph Left["📱 Left Panel: Devices"]
+ L1[Device Card 1 Windows 🟢 Connected]
+ L2[Device Card 2 Linux 🔵 Busy]
+ L3[Device Card 3 macOS 🟢 Idle]
+ end
+
+ subgraph Center["💬 Center Panel: Chat"]
+ C1[Message History User/Agent/Actions]
+ C2[Action Trees Collapsible]
+ C3[Input Box Type request...]
+ end
+
+ subgraph Right["📊 Right Panel: Tabs"]
+ R1[🌟 Constellation DAG Graph]
+ R2[📋 Tasks Task List]
+ R3[📝 Details Selected Info]
+ end
+
+ Header -.-> Left
+ Header -.-> Center
+ Header -.-> Right
+ Left -.-> Center
+ Center -.-> Right
+ end
+
+ style Header fill:#1a1a2e,stroke:#00d4ff,stroke-width:2px,color:#fff
+ style Left fill:#0f1419,stroke:#10b981,stroke-width:2px,color:#fff
+ style Center fill:#16213e,stroke:#7b2cbf,stroke-width:2px,color:#fff
+ style Right fill:#1a1a2e,stroke:#ff006e,stroke-width:2px,color:#fff
+```
+
+### Key Features
+
+#### 🗣️ Chat Interface
+
+**Location:** Center panel
+
+**Features:**
+- Natural language input for requests
+- Message history with agent responses
+- Collapsible action trees showing execution details
+- Thought, plan, and response display
+- Status indicators (pending, running, completed, failed)
+- Markdown rendering for rich text
+- Code block syntax highlighting
+
+**Message Types:**
+- **User Messages**: Your requests to Galaxy
+- **Agent Responses**: ConstellationAgent thoughts, plans, and responses
+- **Action Messages**: Individual constellation operations (add_task, build_constellation, etc.)
+- **System Messages**: Status updates and notifications
+
+#### 📊 DAG Visualization
+
+**Location:** Right panel → Constellation tab
+
+**Features:**
+- Interactive node-and-edge graph
+- Real-time task status updates
+- Color-coded status indicators:
+ - 🔵 Pending: Gray
+ - 🟡 Running: Blue (animated)
+ - 🟢 Completed: Green
+ - 🔴 Failed: Red
+ - ⚫ Skipped: Orange
+- Dependency edges showing task relationships
+- Pan and zoom controls
+- Automatic layout optimization
+- Node click to view task details
+
+**Interaction:**
+- **Click node**: Select task and show details
+- **Pan**: Click and drag background
+- **Zoom**: Mouse wheel or pinch gesture
+- **Fit view**: Click fit-to-screen button
+
+#### 🖥️ Device Management
+
+**Location:** Left sidebar
+
+**Features:**
+- Device status cards with real-time updates
+- Color-coded status indicators:
+ - 🟢 Connected/Idle: Green
+ - 🔵 Busy: Blue
+ - 🟡 Connecting: Yellow
+ - 🔴 Disconnected/Failed: Red
+- Capability badges
+- Current task assignment
+- Last heartbeat timestamp
+- Connection metrics
+- Click to view device details
+- **➕ Add Device Button**: Manually add new devices through UI
+
+**Device Information:**
+- OS type and version
+- Server URL
+- Installed applications
+- Performance tier
+- Custom metadata
+
+**Adding a New Device:**
+
+Click the **"+"** button in the Device Panel header to open the Add Device Modal:
+
+
+
+
Add Device Modal - Register new device agents through the UI
+
+
+1. **Basic Information:**
+ - **Device ID**: Unique identifier for the device (required)
+ - **Server URL**: WebSocket endpoint URL (must start with `ws://` or `wss://`)
+ - **Operating System**: Select from Windows, Linux, macOS, or enter custom OS
+
+2. **Capabilities:**
+ - Add capabilities one by one (e.g., `excel`, `outlook`, `browser`)
+ - Remove capabilities by clicking the ✕ icon
+ - At least one capability is required
+
+3. **Advanced Options:**
+ - **Auto-connect**: Automatically connect to device after registration (default: enabled)
+ - **Max Retries**: Maximum connection retry attempts (default: 5)
+
+4. **Metadata (Optional):**
+ - Add custom key-value pairs for additional device information
+ - Examples: `region: us-east-1`, `tier: premium`, `owner: team-a`
+
+**API Endpoint:**
+
+```http
+POST /api/devices
+Content-Type: application/json
+
+{
+ "device_id": "my-device-1",
+ "server_url": "ws://192.168.1.100:8080",
+ "os": "Windows",
+ "capabilities": ["excel", "outlook", "powerpoint"],
+ "metadata": {
+ "region": "us-east-1",
+ "tier": "standard"
+ },
+ "auto_connect": true,
+ "max_retries": 5
+}
+```
+
+**Response:**
+
+```json
+{
+ "status": "success",
+ "message": "Device 'my-device-1' added successfully",
+ "device": {
+ "device_id": "my-device-1",
+ "server_url": "ws://192.168.1.100:8080",
+ "os": "Windows",
+ "capabilities": ["excel", "outlook", "powerpoint"],
+ "auto_connect": true,
+ "max_retries": 5,
+ "metadata": {
+ "region": "us-east-1",
+ "tier": "standard"
+ }
+ }
+}
+```
+
+**Device Registration Process:**
+
+When a device is added through the UI:
+
+1. **Validation**: Form data is validated (required fields, URL format, duplicate device_id)
+2. **Configuration**: Device is saved to `config/galaxy/devices.yaml`
+3. **Registration**: Device is registered with the Galaxy Device Manager
+4. **Connection**: If `auto_connect` is enabled, connection is initiated automatically
+5. **Event Broadcast**: Device status updates are broadcast to all WebSocket clients
+6. **UI Update**: Device card appears in the Device Panel with real-time status
+
+#### 📋 Task Details
+
+**Location:** Right panel → Tasks tab / Details tab
+
+**Features:**
+- Task name and description
+- Current status with icon
+- Assigned device
+- Dependencies and dependents
+- Input and output data
+- Execution results
+- Error messages (if failed)
+- Execution timeline
+- Retry information
+
+#### 📡 Event Log
+
+**Location:** Right panel (optional view)
+
+**Features:**
+- Real-time event stream
+- Event type filtering
+- Timestamp display
+- JSON payload viewer
+- Search and filter
+- Auto-scroll option
+- Export to JSON
+
+---
+
+## 🎨 Theme and Styling
+
+### Design System
+
+The Galaxy WebUI uses a **space-themed design** with a dark color palette and vibrant accents.
+
+#### Color Palette
+
+```typescript
+// Primary Colors
+galaxy-dark: #0a0e27 // Deep space background
+galaxy-blue: #00d4ff // Cyan accent (primary actions)
+galaxy-purple: #7b2cbf // Purple accent (secondary)
+galaxy-pink: #ff006e // Pink accent (tertiary)
+
+// Status Colors
+emerald: #10b981 // Success/Completed
+cyan: #06b6d4 // Running/Active
+amber: #f59e0b // Warning/Pending
+rose: #f43f5e // Error/Failed
+slate: #64748b // Neutral/Disabled
+```
+
+#### Visual Effects
+
+- **Starfield Background**: Animated particle system with depth layers
+- **Glassmorphism**: Frosted glass panels with backdrop blur
+- **Glow Effects**: Neon-style glows on interactive elements
+- **Smooth Animations**: Framer Motion for transitions
+- **Gradient Accents**: Multi-color gradients on headers and buttons
+
+#### Accessibility
+
+- **High Contrast Mode**: Toggle for improved readability
+- **Keyboard Navigation**: Full keyboard support
+- **Screen Reader**: ARIA labels and semantic HTML
+- **Focus Indicators**: Clear focus rings on interactive elements
+
+---
+
+## 🔧 Configuration
+
+### Server Configuration
+
+The WebUI server is configured through command-line arguments:
+
+```powershell
+python -m galaxy --webui [OPTIONS]
+```
+
+**Options:**
+
+| Flag | Description | Default |
+|------|-------------|---------|
+| `--webui` | Enable WebUI mode | `False` |
+| `--session-name` | Session display name | `"Galaxy Session"` |
+| `--log-level` | Logging level | `INFO` |
+| `--port` | Server port (if implemented) | `8000` |
+
+### Frontend Configuration
+
+**Development Mode:**
+
+```bash
+cd galaxy/webui/frontend
+npm run dev
+```
+
+Access at: `http://localhost:5173` (Vite dev server with HMR)
+
+**Environment Variables:**
+
+```bash
+# .env.development
+VITE_WS_URL=ws://localhost:8000/ws
+VITE_API_URL=http://localhost:8000
+```
+
+**Build Configuration:**
+
+```bash
+cd galaxy/webui/frontend
+npm run build
+```
+
+Builds production-ready frontend to `galaxy/webui/frontend/dist/`
+
+---
+
+## 🔍 Event Handling
+
+### Event Flow
+
+```mermaid
+flowchart TD
+ A[Galaxy Core Event] --> B[Event Bus publish]
+ B --> C[WebSocketObserver on_event]
+ C --> D[EventSerializer serialize_event]
+ D --> D1[Type-specific field extraction]
+ D --> D2[Recursive value serialization]
+ D2 --> D3[Python → JSON]
+ D3 --> E[WebSocket Broadcast to all clients]
+ E --> F[Frontend Clients receive message]
+ F --> G[Store Update Zustand]
+ G --> H[UI Re-render React Components]
+
+ style A fill:#0a0e27,stroke:#ff006e,stroke-width:2px,color:#fff
+ style C fill:#16213e,stroke:#7b2cbf,stroke-width:2px,color:#fff
+ style D fill:#1a1a2e,stroke:#f59e0b,stroke-width:2px,color:#fff
+ style E fill:#1a1a2e,stroke:#00d4ff,stroke-width:2px,color:#fff
+ style G fill:#0f1419,stroke:#10b981,stroke-width:2px,color:#fff
+ style H fill:#1a1a2e,stroke:#f59e0b,stroke-width:2px,color:#fff
+```
+
+### Event Serialization
+
+The `EventSerializer` class handles conversion of complex Python objects to JSON-compatible format:
+
+**Features:**
+- **Type Handler Registry**: Pre-registered handlers for Galaxy-specific types (TaskStarLine, TaskConstellation)
+- **Type Caching**: Cached imports to avoid repeated import attempts
+- **Recursive Serialization**: Handles nested structures (dicts, lists, dataclasses, Pydantic models)
+- **Polymorphic Event Handling**: Different serialization logic for TaskEvent, ConstellationEvent, AgentEvent, DeviceEvent
+- **Fallback Strategies**: Multiple serialization attempts with graceful fallback to string representation
+
+**Serialization Chain:**
+1. Handle primitives (str, int, float, bool, None)
+2. Handle datetime objects → ISO format
+3. Handle collections (dict, list, tuple) → recursive serialization
+4. Check registered type handlers (TaskStarLine, TaskConstellation)
+5. Try dataclass serialization (`asdict()`)
+6. Try Pydantic model serialization (`model_dump()`)
+7. Try generic `to_dict()` method
+8. Fallback to `str()` representation
+
+### Event Types
+
+The WebUI subscribes to all Galaxy event types:
+
+| Event Type | Source | Description |
+|------------|--------|-------------|
+| `agent_request` | ConstellationAgent | User request received |
+| `agent_response` | ConstellationAgent | Agent thought/plan/response |
+| `constellation_created` | TaskConstellation | New constellation formed |
+| `constellation_updated` | TaskConstellation | Constellation modified |
+| `constellation_completed` | TaskConstellation | All tasks finished |
+| `task_created` | TaskOrchestrator | New task added |
+| `task_assigned` | TaskOrchestrator | Task assigned to device |
+| `task_started` | TaskOrchestrator | Task execution started |
+| `task_status_changed` | TaskOrchestrator | Task status updated |
+| `task_completed` | TaskOrchestrator | Task finished successfully |
+| `task_failed` | TaskOrchestrator | Task encountered error |
+| `device_connected` | DeviceManager | Device came online |
+| `device_disconnected` | DeviceManager | Device went offline |
+| `device_status_changed` | DeviceManager | Device status updated |
+| `device_heartbeat` | DeviceManager | Device health check |
+
+### State Management
+
+The frontend uses **Zustand** for centralized state management:
+
+```typescript
+// Store Structure
+interface GalaxyStore {
+ // Connection
+ connectionStatus: ConnectionStatus;
+ connected: boolean;
+
+ // Session
+ session: {
+ id: string | null;
+ displayName: string;
+ startedAt: number | null;
+ };
+
+ // Data
+ messages: Message[];
+ constellations: Record;
+ tasks: Record;
+ devices: Record;
+ notifications: NotificationItem[];
+
+ // UI State
+ ui: {
+ activeConstellationId: string | null;
+ activeTaskId: string | null;
+ activeDeviceId: string | null;
+ rightPanelTab: 'constellation' | 'tasks' | 'details';
+ showDeviceDrawer: boolean;
+ };
+}
+```
+
+---
+
+## 📱 Responsive Design
+
+The WebUI is designed to work on various screen sizes:
+
+### Desktop (1920px+)
+- Three-panel layout with full sidebar
+- Large DAG visualization
+- Expanded device cards
+
+### Laptop (1280px - 1919px)
+- Standard three-panel layout
+- Medium DAG visualization
+- Compact device cards
+
+### Tablet (768px - 1279px)
+- Collapsible sidebar
+- Simplified DAG view
+- Stacked layout option
+
+### Mobile (< 768px)
+- Single-panel navigation
+- Tab-based interface
+- Touch-optimized controls
+
+!!!warning "Recommended Resolution"
+ For the best experience, use a desktop or laptop with at least **1280px width**. The DAG visualization requires adequate screen space for clear readability.
+
+---
+
+## 🐛 Troubleshooting
+
+### Connection Issues
+
+**Problem:** WebSocket connection fails
+
+**Solutions:**
+
+1. **Verify backend is running:**
+ ```powershell
+ # Check health endpoint
+ curl http://localhost:8000/health
+ ```
+
+2. **Check firewall settings:**
+ - Allow incoming connections on port 8000
+ - Check corporate firewall/proxy settings
+
+3. **Verify WebSocket URL:**
+ - Browser console should show: `WebSocket connection established`
+ - Check for CORS errors in console
+
+4. **Try different port:**
+ ```powershell
+ python -m galaxy --webui --port 8080
+ ```
+
+### Frontend Not Loading
+
+**Problem:** Blank page or "Server is running" placeholder
+
+**Solutions:**
+
+1. **Build the frontend:**
+ ```bash
+ cd galaxy/webui/frontend
+ npm install
+ npm run build
+ ```
+
+2. **Check build output:**
+ - Verify `galaxy/webui/frontend/dist/` exists
+ - Check for TypeScript errors: `npm run build`
+
+3. **Clear browser cache:**
+ - Hard refresh: `Ctrl+Shift+R` (Windows) or `Cmd+Shift+R` (Mac)
+ - Clear site data in DevTools
+
+### Events Not Appearing
+
+**Problem:** No events shown in UI, DAG not updating
+
+**Solutions:**
+
+1. **Check event system:**
+ - Look for "WebSocket observer registered" in backend logs
+ - Verify connection count: `curl http://localhost:8000/health`
+
+2. **Check browser console:**
+ - Look for WebSocket message logs
+ - Check for JavaScript errors
+
+3. **Enable debug mode:**
+ ```powershell
+ python -m galaxy --webui --log-level DEBUG
+ ```
+
+### Performance Issues
+
+**Problem:** UI slow or unresponsive
+
+**Solutions:**
+
+1. **Limit event log size:**
+ - Event log keeps last 200 events
+ - Messages limited to 500
+
+2. **Reduce DAG complexity:**
+ - Large constellations (>50 tasks) may be slow
+ - Consider viewport culling for very large graphs
+
+3. **Check browser performance:**
+ - Close unnecessary tabs
+ - Use Chrome/Edge for best performance
+ - Disable browser extensions
+
+### Device Addition Issues
+
+**Problem:** Cannot add device through UI
+
+**Solutions:**
+
+1. **Check `devices.yaml` exists:**
+ ```powershell
+ # Verify configuration file
+ Test-Path config/galaxy/devices.yaml
+ ```
+
+2. **Verify device ID uniqueness:**
+ - Device ID must be unique across all devices
+ - Check existing devices in the Device Panel
+
+3. **Validate server URL format:**
+ - Must start with `ws://` or `wss://`
+ - Example: `ws://192.168.1.100:8080` or `wss://device.example.com`
+ - Ensure device server is actually running at that URL
+
+4. **Check backend logs:**
+ ```powershell
+ # Look for error messages
+ python -m galaxy --webui --log-level DEBUG
+ ```
+
+**Problem:** Device added but not connecting
+
+**Solutions:**
+
+1. **Verify device server is running:**
+ - Check that the device agent is running at the specified URL
+ - Test connection: `curl ws://your-device-url/`
+
+2. **Check firewall/network:**
+ - Ensure WebSocket port is open
+ - Verify no proxy/firewall blocking connection
+
+3. **Check device logs:**
+ - Look at the device agent logs for connection errors
+ - Verify device can reach the Galaxy server
+
+4. **Manual connection:**
+ - If `auto_connect` failed, devices will retry automatically
+ - Check `connection_attempts` in device details
+ - Increase `max_retries` if needed
+
+**Problem:** Validation errors when adding device
+
+**Common Validation Issues:**
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| "Device ID is required" | Empty device_id field | Provide a unique identifier |
+| "Device ID already exists" | Duplicate device_id | Choose a different ID |
+| "Server URL is required" | Empty server_url | Provide WebSocket URL |
+| "Invalid WebSocket URL" | Wrong URL format | Use `ws://` or `wss://` prefix |
+| "OS is required" | No OS selected | Select or enter OS type |
+| "At least one capability required" | No capabilities added | Add at least one capability |
+
+---
+
+## 🧪 Development
+
+### Prerequisites
+
+- **Node.js** >= 18
+- **npm** >= 9
+- **Python** >= 3.10
+- **Galaxy** installed and configured
+
+### Development Setup
+
+```bash
+# 1. Install frontend dependencies
+cd galaxy/webui/frontend
+npm install
+
+# 2. Start development server
+npm run dev
+
+# 3. In another terminal, start Galaxy backend
+cd ../../..
+python -m galaxy --webui
+```
+
+**Development URL:** `http://localhost:5173`
+
+### Project Structure
+
+```
+galaxy/webui/
+├── server.py # FastAPI application entry point
+├── dependencies.py # AppState and dependency injection
+├── websocket_observer.py # EventSerializer + WebSocketObserver
+├── __init__.py
+├── models/ # Data models and validation
+│ ├── __init__.py # Export all models
+│ ├── enums.py # WebSocketMessageType, RequestStatus
+│ ├── requests.py # WebSocketMessage, DeviceAddRequest, etc.
+│ └── responses.py # WelcomeMessage, DeviceSnapshot, etc.
+├── services/ # Business logic layer
+│ ├── __init__.py
+│ ├── config_service.py # Configuration management
+│ ├── device_service.py # Device operations and snapshots
+│ └── galaxy_service.py # Galaxy client interaction
+├── handlers/ # Request/message processing
+│ ├── __init__.py
+│ └── websocket_handlers.py # WebSocketMessageHandler class
+├── routers/ # API endpoint definitions
+│ ├── __init__.py
+│ ├── health.py # GET /health
+│ ├── devices.py # POST /api/devices
+│ └── websocket.py # WebSocket /ws
+├── templates/ # HTML templates
+│ └── index.html # Fallback page when frontend not built
+└── frontend/ # React frontend application
+ ├── src/
+ │ ├── main.tsx # Entry point
+ │ ├── App.tsx # Main layout
+ │ ├── components/ # React components
+ │ │ ├── chat/ # Chat interface
+ │ │ ├── constellation/ # DAG visualization
+ │ │ ├── devices/ # Device management
+ │ │ ├── layout/ # Layout components
+ │ │ ├── session/ # Session management
+ │ │ └── tasks/ # Task details
+ │ ├── services/ # WebSocket client
+ │ └── store/ # Zustand store
+ ├── public/ # Static assets
+ ├── dist/ # Build output (gitignored)
+ ├── package.json # Dependencies
+ ├── vite.config.ts # Vite configuration
+ ├── tailwind.config.js # Tailwind CSS
+ └── tsconfig.json # TypeScript config
+```
+
+### Building for Production
+
+```bash
+cd galaxy/webui/frontend
+npm run build
+```
+
+Output: `galaxy/webui/frontend/dist/`
+
+### Code Quality
+
+**Frontend:**
+
+```bash
+# Lint
+npm run lint
+
+# Type check
+npm run type-check
+
+# Format
+npm run format
+```
+
+**Backend:**
+
+The modular architecture improves testability. Example unit tests:
+
+```python
+# tests/webui/test_event_serializer.py
+import pytest
+from galaxy.webui.websocket_observer import EventSerializer
+from galaxy.core.events import TaskEvent
+
+def test_serialize_task_event():
+ """Test serialization of TaskEvent."""
+ serializer = EventSerializer()
+
+ event = TaskEvent(
+ event_type=EventType.TASK_STARTED,
+ source_id="test",
+ timestamp=1234567890,
+ task_id="task_1",
+ status="running",
+ result=None,
+ error=None
+ )
+
+ result = serializer.serialize_event(event)
+
+ assert result["event_type"] == "task_started"
+ assert result["task_id"] == "task_1"
+ assert result["status"] == "running"
+
+def test_serialize_nested_dict():
+ """Test recursive serialization of nested structures."""
+ serializer = EventSerializer()
+
+ data = {
+ "level1": {
+ "level2": {
+ "value": 42
+ }
+ }
+ }
+
+ result = serializer.serialize_value(data)
+ assert result["level1"]["level2"]["value"] == 42
+```
+
+```python
+# tests/webui/test_services.py
+import pytest
+from galaxy.webui.services.device_service import DeviceService
+from galaxy.webui.dependencies import AppState
+
+def test_build_device_snapshot():
+ """Test device snapshot building."""
+ app_state = AppState()
+ # Setup mock galaxy_client with devices
+
+ service = DeviceService(app_state)
+ snapshot = service.build_device_snapshot()
+
+ assert "device_count" in snapshot
+ assert "all_devices" in snapshot
+```
+
+```python
+# tests/webui/test_handlers.py
+import pytest
+from unittest.mock import AsyncMock, MagicMock
+from galaxy.webui.handlers.websocket_handlers import WebSocketMessageHandler
+from galaxy.webui.models.enums import WebSocketMessageType
+
+@pytest.mark.asyncio
+async def test_handle_ping():
+ """Test ping message handling."""
+ websocket = AsyncMock()
+ app_state = MagicMock()
+
+ handler = WebSocketMessageHandler(websocket, app_state)
+
+ response = await handler.handle_message({
+ "type": WebSocketMessageType.PING,
+ "timestamp": 1234567890
+ })
+
+ assert response["type"] == "pong"
+```
+
+---
+
+## 🚀 Advanced Usage
+
+### Extending the Backend
+
+The modular architecture makes it easy to extend the Galaxy WebUI backend:
+
+#### Adding a New API Endpoint
+
+**1. Define Pydantic models:**
+
+```python
+# galaxy/webui/models/requests.py
+from pydantic import BaseModel, Field
+
+class TaskQueryRequest(BaseModel):
+ """Request to query task status."""
+ task_id: str = Field(..., description="The task ID to query")
+ include_history: bool = Field(default=False)
+```
+
+```python
+# galaxy/webui/models/responses.py
+from pydantic import BaseModel
+
+class TaskQueryResponse(BaseModel):
+ """Response with task details."""
+ task_id: str
+ status: str
+ result: dict | None = None
+```
+
+**2. Create a service method:**
+
+```python
+# galaxy/webui/services/task_service.py
+from typing import Dict, Any
+from galaxy.webui.dependencies import AppState
+
+class TaskService:
+ """Service for task-related operations."""
+
+ def __init__(self, app_state: AppState):
+ self.app_state = app_state
+
+ def get_task_details(self, task_id: str, include_history: bool) -> Dict[str, Any]:
+ """Get details for a specific task."""
+ galaxy_session = self.app_state.galaxy_session
+ if not galaxy_session:
+ raise ValueError("No active Galaxy session")
+
+ # Your business logic here
+ task = galaxy_session.get_task(task_id)
+ return {
+ "task_id": task.task_id,
+ "status": task.status.value,
+ "result": task.result if include_history else None
+ }
+```
+
+**3. Add a router endpoint:**
+
+```python
+# galaxy/webui/routers/tasks.py
+from fastapi import APIRouter, Depends
+from galaxy.webui.dependencies import get_app_state
+from galaxy.webui.models.requests import TaskQueryRequest
+from galaxy.webui.models.responses import TaskQueryResponse
+from galaxy.webui.services.task_service import TaskService
+
+router = APIRouter(prefix="/api/tasks", tags=["tasks"])
+
+@router.post("/query", response_model=TaskQueryResponse)
+async def query_task(
+ request: TaskQueryRequest,
+ app_state = Depends(get_app_state)
+):
+ """Query task status and details."""
+ service = TaskService(app_state)
+ result = service.get_task_details(request.task_id, request.include_history)
+ return TaskQueryResponse(**result)
+```
+
+**4. Register the router:**
+
+```python
+# galaxy/webui/server.py
+from galaxy.webui.routers import tasks_router
+
+app.include_router(tasks_router)
+```
+
+#### Adding a New WebSocket Message Type
+
+**1. Add enum value:**
+
+```python
+# galaxy/webui/models/enums.py
+class WebSocketMessageType(str, Enum):
+ """Types of messages exchanged via WebSocket."""
+ # ... existing types ...
+ CUSTOM_ACTION = "custom_action"
+```
+
+**2. Add request model:**
+
+```python
+# galaxy/webui/models/requests.py
+class CustomActionMessage(BaseModel):
+ """Custom action message."""
+ action_name: str
+ parameters: Dict[str, Any] = Field(default_factory=dict)
+```
+
+**3. Add handler method:**
+
+```python
+# galaxy/webui/handlers/websocket_handlers.py
+async def _handle_custom_action(self, data: Dict[str, Any]) -> Dict[str, Any]:
+ """Handle custom action messages."""
+ message = CustomActionMessage(**data)
+
+ # Your logic here
+ result = await self.service.perform_custom_action(
+ message.action_name,
+ message.parameters
+ )
+
+ return {
+ "type": "custom_action_completed",
+ "result": result
+ }
+```
+
+**4. Register handler:**
+
+```python
+# galaxy/webui/handlers/websocket_handlers.py
+def __init__(self, websocket: WebSocket, app_state: AppState):
+ # ... existing code ...
+ self._handlers[WebSocketMessageType.CUSTOM_ACTION] = self._handle_custom_action
+```
+
+#### Customizing Event Serialization
+
+Add custom serialization for new types:
+
+```python
+# galaxy/webui/websocket_observer.py
+
+class EventSerializer:
+ def _register_handlers(self) -> None:
+ """Register type-specific serialization handlers."""
+ # ... existing handlers ...
+
+ # Add custom type handler
+ try:
+ from your_module import CustomType
+ self._cached_types["CustomType"] = CustomType
+ self._type_handlers[CustomType] = self._serialize_custom_type
+ except ImportError:
+ self._cached_types["CustomType"] = None
+
+ def _serialize_custom_type(self, value: Any) -> Dict[str, Any]:
+ """Serialize a CustomType object."""
+ try:
+ return {
+ "id": value.id,
+ "data": self.serialize_value(value.data),
+ "metadata": value.get_metadata()
+ }
+ except Exception as e:
+ self.logger.warning(f"Failed to serialize CustomType: {e}")
+ return str(value)
+```
+
+### Custom Event Handlers
+
+You can extend the WebUI with custom event handlers:
+
+```typescript
+// src/services/customHandlers.ts
+import { GalaxyEvent } from './websocket';
+
+export function handleCustomEvent(event: GalaxyEvent) {
+ if (event.event_type === 'custom_event') {
+ // Your custom logic
+ console.log('Custom event:', event);
+ }
+}
+```
+
+### Programmatic Device Management
+
+Add devices programmatically using the API:
+
+```typescript
+// Add a device via API
+async function addDevice(deviceConfig: {
+ device_id: string;
+ server_url: string;
+ os: string;
+ capabilities: string[];
+ metadata?: Record;
+ auto_connect?: boolean;
+ max_retries?: number;
+}) {
+ const response = await fetch('http://localhost:8000/api/devices', {
+ method: 'POST',
+ headers: {
+ 'Content-Type': 'application/json',
+ },
+ body: JSON.stringify(deviceConfig),
+ });
+
+ if (!response.ok) {
+ const error = await response.json();
+ throw new Error(error.detail || 'Failed to add device');
+ }
+
+ return await response.json();
+}
+
+// Usage example
+try {
+ const result = await addDevice({
+ device_id: 'production-server-1',
+ server_url: 'wss://prod-device.company.com',
+ os: 'Linux',
+ capabilities: ['docker', 'kubernetes', 'python'],
+ metadata: {
+ region: 'us-east-1',
+ environment: 'production',
+ tier: 'premium',
+ },
+ auto_connect: true,
+ max_retries: 10,
+ });
+
+ console.log('Device added:', result.device);
+} catch (error) {
+ console.error('Failed to add device:', error);
+}
+```
+
+**Batch Device Addition:**
+
+```python
+# Python script to add multiple devices
+import requests
+import json
+
+devices = [
+ {
+ "device_id": "win-desktop-1",
+ "server_url": "ws://192.168.1.10:8080",
+ "os": "Windows",
+ "capabilities": ["office", "excel", "outlook"],
+ },
+ {
+ "device_id": "linux-server-1",
+ "server_url": "ws://192.168.1.20:8080",
+ "os": "Linux",
+ "capabilities": ["python", "docker", "git"],
+ },
+ {
+ "device_id": "mac-laptop-1",
+ "server_url": "ws://192.168.1.30:8080",
+ "os": "macOS",
+ "capabilities": ["safari", "xcode", "python"],
+ }
+]
+
+for device in devices:
+ response = requests.post(
+ "http://localhost:8000/api/devices",
+ json=device,
+ headers={"Content-Type": "application/json"}
+ )
+
+ if response.status_code == 200:
+ result = response.json()
+ print(f"✅ Added: {result['device']['device_id']}")
+ else:
+ error = response.json()
+ print(f"❌ Failed: {device['device_id']} - {error.get('detail')}")
+```
+
+**Checking Device Status:**
+
+After adding devices, monitor their connection status through WebSocket events:
+
+```typescript
+// Listen for device connection events
+websocket.onmessage = (event) => {
+ const data = JSON.parse(event.data);
+
+ if (data.event_type === 'device_status_changed') {
+ console.log(`Device ${data.device_id} status: ${data.device_status}`);
+
+ if (data.device_status === 'connected') {
+ console.log('✅ Device connected successfully');
+ } else if (data.device_status === 'failed') {
+ console.log('❌ Device connection failed');
+ }
+ }
+};
+```
+
+### Custom Components
+
+Add custom visualization components:
+
+```tsx
+// src/components/custom/MyVisualization.tsx
+import React from 'react';
+import { useGalaxyStore } from '../../store/galaxyStore';
+
+export const MyVisualization: React.FC = () => {
+ const constellation = useGalaxyStore(s =>
+ s.constellations[s.ui.activeConstellationId || '']
+ );
+
+ return (
+
+ {/* Your custom visualization */}
+
+ );
+};
+```
+
+### Theming
+
+Create custom themes by extending Tailwind configuration:
+
+```javascript
+// tailwind.config.js
+module.exports = {
+ theme: {
+ extend: {
+ colors: {
+ 'custom-primary': '#your-color',
+ 'custom-secondary': '#your-color',
+ },
+ },
+ },
+};
+```
+
+---
+
+## 📊 Monitoring and Analytics
+
+### Health Check
+
+**Endpoint:** `GET /health`
+
+```json
+{
+ "status": "healthy",
+ "connections": 3,
+ "events_sent": 1247
+}
+```
+
+### Metrics
+
+The WebUI tracks:
+- Active WebSocket connections
+- Total events broadcasted
+- Device online/offline status
+- Task execution statistics
+- Session duration
+
+### Logging
+
+**Backend Logs:**
+```
+INFO - WebSocket connection established from ('127.0.0.1', 54321)
+INFO - Broadcasting event #42: agent_response to 2 clients
+INFO - WebSocket client disconnected. Total connections: 1
+```
+
+**Frontend Console:**
+```javascript
+🌌 Connected to Galaxy WebSocket
+📨 Raw WebSocket message received
+📦 Parsed event data: {event_type: 'constellation_created', ...}
+```
+
+---
+
+## 🔒 Security Considerations
+
+### Production Deployment
+
+When deploying to production:
+
+1. **Use HTTPS/WSS:**
+ ```python
+ # Use secure WebSocket
+ wss://your-domain.com/ws
+ ```
+
+2. **Configure CORS:**
+ ```python
+ # server.py
+ app.add_middleware(
+ CORSMiddleware,
+ allow_origins=["https://your-domain.com"], # Specific origins
+ allow_credentials=True,
+ allow_methods=["GET", "POST"],
+ allow_headers=["*"],
+ )
+ ```
+
+3. **Add Authentication:**
+ - Implement JWT tokens
+ - Validate WebSocket connections
+ - Secure API endpoints
+
+4. **Rate Limiting:**
+ - Limit request frequency
+ - Throttle WebSocket messages
+ - Prevent DoS attacks
+
+---
+
+## 📚 Additional Resources
+
+### Documentation
+- [FastAPI WebSocket Documentation](https://fastapi.tiangolo.com/advanced/websockets/)
+- [React Documentation](https://react.dev/)
+- [ReactFlow Documentation](https://reactflow.dev/)
+- [Zustand Documentation](https://github.com/pmndrs/zustand)
+- [Tailwind CSS Documentation](https://tailwindcss.com/)
+- [Vite Documentation](https://vitejs.dev/)
+
+### Galaxy Framework
+- [Galaxy Overview](overview.md)
+- [Constellation Agent](constellation_agent/overview.md)
+- [Task Orchestrator](constellation_orchestrator/overview.md)
+- [Device Manager](client/device_manager.md)
+
+### Community
+- [GitHub Issues](https://github.com/microsoft/UFO/issues)
+- [GitHub Discussions](https://github.com/microsoft/UFO/discussions)
+- [Email Support](mailto:ufo-agent@microsoft.com)
+
+---
+
+## 🎯 Next Steps
+
+Now that you understand the Galaxy WebUI:
+
+1. **[Quick Start Guide](../getting_started/quick_start_galaxy.md)** - Set up your first Galaxy session
+2. **[Constellation Agent](constellation_agent/overview.md)** - Learn about task decomposition
+3. **[Task Orchestrator](constellation_orchestrator/overview.md)** - Understand task execution
+4. **[Device Manager](client/device_manager.md)** - Configure and monitor devices
+
+Happy orchestrating with Galaxy WebUI! 🌌✨
diff --git a/documents/docs/getting_started/migration_ufo2_to_galaxy.md b/documents/docs/getting_started/migration_ufo2_to_galaxy.md
new file mode 100644
index 000000000..d218e9141
--- /dev/null
+++ b/documents/docs/getting_started/migration_ufo2_to_galaxy.md
@@ -0,0 +1,741 @@
+# Migration Guide: UFO² to UFO³ Galaxy
+
+This guide helps you understand the evolution from **UFO²** (Desktop AgentOS) to **UFO³ Galaxy** (Multi-Device AgentOS), and provides practical steps for migrating your workflows to leverage Galaxy's cross-device orchestration capabilities.
+
+---
+
+## 🌟 Understanding the UFO Evolution
+
+### The UFO Journey
+
+The UFO project has evolved through three major iterations, each addressing increasingly complex automation challenges:
+
+```mermaid
+graph LR
+ A[UFO v1 2024-02] -->|Desktop Agent| B[UFO² 2025-04]
+ B -->|Multi-Device| C[UFO³ Galaxy 2025-11]
+
+ style A fill:#e3f2fd
+ style B fill:#c8e6c9
+ style C fill:#fff9c4
+```
+
+#### **UFO (v1.0)** — The Beginning
+📅 *Released: February 2024*
+
+- **Vision**: Screenshot-based Windows automation
+- **Architecture**: Multi-agent (HostAgent + AppAgents)
+- **Approach**: GPT-4V + pure GUI automation (click/type)
+- **Scope**: Single Windows desktop, cross-app workflows
+- **Limitation**: No deep OS integration
+
+**Key Innovation:** First LLM-powered multi-agent GUI automation framework
+
+---
+
+#### **UFO² (v2.0)** — Desktop AgentOS
+📅 *Released: April 2025*
+📄 *Paper:* [UFO²: A Windows Agent for Seamless OS Interaction](https://arxiv.org/abs/2504.14603)
+
+- **Vision**: Deep OS integration for robust automation
+- **Architecture**: Two-tier hierarchy (HostAgent + AppAgents)
+- **Innovations**:
+ - ✅ **Hybrid GUI–API execution** (51% fewer LLM calls)
+ - ✅ **Windows UIA + Win32 + WinCOM APIs**
+ - ✅ **Continuous knowledge learning** from docs & experience
+ - ✅ **Picture-in-Picture desktop** (non-disruptive automation)
+ - ✅ **MCP server integration** for tool augmentation
+- **Scope**: Single Windows desktop
+- **Success**: 10%+ better than state-of-the-art CUAs
+
+**Key Innovation:** First agent to deeply integrate with Windows OS internals
+
+---
+
+#### **UFO³ Galaxy** — Multi-Device AgentOS
+📅 *Released: November 2025*
+📄 *Paper:* UFO³: Weaving the Digital Agent Galaxy *(Coming Soon)*
+
+- **Vision**: Cross-device orchestration at scale
+- **Architecture**: Constellation-based distributed DAG orchestration
+- **Innovations**:
+ - ✅ **Task Constellation** (dynamic DAG decomposition)
+ - ✅ **Asynchronous parallel execution** across devices
+ - ✅ **Event-driven coordination** with formal safety guarantees
+ - ✅ **Dual-mode DAG evolution** (creation + editing)
+ - ✅ **Agent Interaction Protocol** (persistent WebSocket)
+ - ✅ **Heterogeneous device support** (Windows, Linux, macOS)
+- **Scope**: Multi-device workflows across platforms
+- **Capability**: Orchestrate 10+ devices simultaneously
+
+**Key Innovation:** First LLM-powered multi-device orchestration framework with provable correctness
+
+---
+
+### Architecture Evolution
+
+#### UFO v1 Architecture
+
+**Multi-Agent (GUI-Only)**
+
+```
+User Request
+ ↓
+HostAgent
+ ↓
+AppAgent 1, 2, 3...
+ ↓
+Windows Apps (GUI)
+```
+
+**Capabilities:**
+
+- Multi-app workflows
+- Pure screenshot + click/type
+- No API integration
+- Single device
+
+#### UFO² Architecture
+
+**Two-Tier Hierarchy (Hybrid)**
+
+```
+User Request
+ ↓
+HostAgent
+ ↓
+AppAgent 1, 2, 3...
+ ↓
+Windows Apps (GUI + API)
+```
+
+**Capabilities:**
+
+- Multi-app workflows
+- Desktop orchestration
+- Hybrid GUI–API execution
+- Deep OS integration
+- Single device
+
+#### UFO³ Galaxy Architecture
+
+**Constellation Model (Distributed)**
+
+```
+User Request
+ ↓
+ConstellationAgent
+ ↓
+Task Constellation (DAG)
+ ↓
+Device 1, 2, 3... (UFO² instances)
+ ↓
+Cross-Platform Apps
+```
+
+**Capabilities:**
+
+- Multi-device workflows
+- Parallel execution
+- Dynamic adaptation
+- Heterogeneous platforms
+
+---
+
+## 🎯 When to Use Which?
+
+### Use **UFO²** (Desktop AgentOS) When:
+
+✅ You're automating tasks on a **single Windows desktop**
+✅ You need **deep Windows integration** (Office, File Explorer, etc.)
+✅ You want **fast, simple execution** without network overhead
+✅ You're learning agent automation basics
+✅ Your workflow is entirely **local** (no cross-device dependencies)
+
+**Examples:**
+- "Create a PowerPoint presentation from this Excel data"
+- "Organize my Downloads folder by file type"
+- "Send emails to all contacts in this spreadsheet"
+
+---
+
+### Use **UFO³ Galaxy** When:
+
+✅ Your workflow spans **multiple devices** (Windows, Linux, servers)
+✅ You need **parallel task execution** for performance
+✅ You have **complex dependencies** between subtasks
+✅ You want **dynamic workflow adaptation** based on results
+✅ You need **fault tolerance** and automatic recovery
+✅ You're orchestrating **heterogeneous systems** (desktop + server + cloud)
+
+**Examples:**
+- "Clone repo on my laptop, build Docker image on GPU server, deploy to staging, run tests on CI cluster"
+- "Fetch data from cloud storage, preprocess on Linux workstation, train model on A100 node, visualize on my Windows machine"
+- "Collect logs from all Linux servers, analyze for errors, generate report on Windows"
+
+---
+
+### Can You Use Both?
+
+**Yes!** UFO² can run as a **device agent** in the Galaxy:
+
+```
+Galaxy (Orchestrator)
+ ├── Windows Device (UFO² instance)
+ ├── Linux Device (UFO² instance)
+ └── Server Device (UFO² instance)
+```
+
+This is the **recommended hybrid approach** for complex workflows.
+
+---
+
+## 🔄 Key Concept Mapping
+
+Understanding how UFO² concepts map to Galaxy:
+
+| UFO² Concept | Galaxy Equivalent | Relationship |
+|--------------|-------------------|--------------|
+| **HostAgent** | **ConstellationAgent** | Global orchestrator (but across devices) |
+| **AppAgent** | **Device Agent (HostAgent)** | Local executor on each device |
+| **Session** | **GalaxySession** | Workflow execution context |
+| **Round** | **Constellation Round** | Orchestration iteration |
+| **Action** | **TaskStar** | Executable unit (but on specific device) |
+| **Blackboard** | **Task Results** | Inter-task communication |
+| **Config File** | `config/ufo/` → `config/galaxy/` | Configuration location |
+| **Execution Mode** | `python -m ufo.server.app --port ` | Device runs as WebSocket server |
+
+### Architecture Translation
+
+**UFO² (Single Device):**
+```python
+# UFO² executes locally
+python -m ufo --task "Create report from data.xlsx"
+
+# HostAgent coordinates AppAgents on one desktop
+HostAgent
+ ├── ExcelAgent (data.xlsx)
+ ├── WordAgent (report.docx)
+ └── OutlookAgent (send email)
+```
+
+**Galaxy (Multi-Device):**
+```python
+# Galaxy orchestrates across devices
+python -m galaxy --request "Create report from data on Server, generate PDF on Windows"
+
+# ConstellationAgent creates DAG, assigns to devices
+ConstellationAgent
+ └── TaskConstellation (DAG)
+ ├── TaskStar-1: Fetch data → Linux Server
+ ├── TaskStar-2: Process → GPU Workstation
+ └── TaskStar-3: Generate PDF → Windows Desktop
+```
+
+---
+
+## ⚙️ Configuration Migration
+
+### Step 1: Preserve UFO² Configuration
+
+**Keep your existing UFO² config** — you'll use it for device agents:
+
+```
+config/ufo/
+├── agents.yaml # LLM config for device agents
+├── app_agent.yaml # AppAgent settings
+├── host_agent.yaml # HostAgent settings
+└── ...
+```
+
+**No changes needed** — each Galaxy device will use its own UFO² config.
+
+---
+
+### Step 2: Create Galaxy Configuration
+
+Galaxy adds **new orchestration-level config**:
+
+#### A. ConstellationAgent LLM Config
+
+```bash
+# Copy template
+copy config\galaxy\agent.yaml.template config\galaxy\agent.yaml
+```
+
+Edit `config/galaxy/agent.yaml`:
+
+```yaml
+# ConstellationAgent LLM (orchestrator)
+CONSTELLATION_AGENT:
+ API_TYPE: "openai" # or "azure", "qwen", etc.
+ API_BASE: "https://api.openai.com/v1"
+ API_KEY: "sk-your-api-key-here"
+ API_MODEL: "gpt-4o"
+ API_VERSION: null
+
+# Optional: Use different model for orchestration
+# Recommended: Use GPT-4o or Claude for complex DAG reasoning
+```
+
+---
+
+#### B. Device Pool Configuration
+
+**New in Galaxy:** Define all available devices
+
+```bash
+# Create device registry
+notepad config\galaxy\devices.yaml
+```
+
+```yaml
+devices:
+ # Your Windows desktop (existing UFO² instance)
+ - device_id: "my_windows_desktop"
+ server_url: "ws://localhost:5005/ws"
+ os: "windows"
+ capabilities:
+ - "office_applications" # Excel, Word, PowerPoint
+ - "web_browsing"
+ - "file_management"
+ metadata:
+ location: "local"
+ os: "windows"
+ performance: "high"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux workstation
+ - device_id: "linux_workstation"
+ server_url: "ws://192.168.1.100:5001/ws"
+ os: "linux"
+ capabilities:
+ - "python"
+ - "docker"
+ - "server"
+ metadata:
+ location: "office"
+ os: "ubuntu_22.04"
+ performance: "high"
+ gpu: "nvidia_a100"
+ auto_connect: true
+
+ # GPU server
+ - device_id: "gpu_server"
+ server_url: "ws://192.168.1.200:5002/ws"
+ os: "linux"
+ capabilities:
+ - "machine_learning"
+ - "cuda"
+ - "docker"
+ metadata:
+ os: "centos_7"
+ gpu: "nvidia_v100"
+ performance: "ultra"
+```
+
+**Capability Matching:** ConstellationAgent uses these capabilities to assign tasks intelligently.
+
+---
+
+#### C. Constellation Runtime Config
+
+```bash
+notepad config\galaxy\constellation.yaml
+```
+
+```yaml
+# Constellation Orchestration Settings
+CONSTELLATION_ID: "my_constellation"
+HEARTBEAT_INTERVAL: 30.0 # Device health check (seconds)
+RECONNECT_DELAY: 5.0 # Auto-reconnect delay
+MAX_CONCURRENT_TASKS: 6 # Parallel task limit
+MAX_STEP: 15 # Max orchestration rounds
+
+# Device Configuration
+DEVICE_INFO: "config/galaxy/devices.yaml"
+
+# Logging
+LOG_TO_MARKDOWN: true # Generate trajectory reports
+```
+
+---
+
+## 🚀 Migration Steps
+
+### Option 1: Keep UFO² for Local, Add Galaxy for Multi-Device
+
+**Best for:** Gradual adoption, maintaining existing workflows
+
+1. **Continue using UFO² for single-device tasks**
+ ```bash
+ python -m ufo --task "Your local task"
+ ```
+
+2. **Use Galaxy only when you need multi-device orchestration**
+ ```bash
+ python -m galaxy --request "Your cross-device task"
+ ```
+
+3. **No migration required** — both coexist independently
+
+---
+
+### Option 2: Convert UFO² Instance to Galaxy Device
+
+**Best for:** Leveraging Galaxy's orchestration for all workflows
+
+#### Step 1: Start UFO² as Agent Server
+
+**On each device** (Windows, Linux, etc.), run UFO² server:
+
+```bash
+# Windows Desktop
+python -m ufo.server.app --port 5005
+
+# Linux Workstation
+python -m ufo.server.app --port 5001
+
+# GPU Server
+python -m ufo.server.app --port 5002
+```
+
+**What this does:**
+- Starts WebSocket server on the device
+- Listens for task assignments from Galaxy
+- Uses existing UFO² agents (HostAgent/AppAgent) for local execution
+- Reports results back to ConstellationClient
+
+---
+
+#### Step 2: Configure Galaxy Client
+
+Create `config/galaxy/devices.yaml` with all your devices (see Configuration section above).
+
+---
+
+#### Step 3: Launch Galaxy Client
+
+```bash
+# Interactive mode
+python -m galaxy --interactive
+
+# Direct request
+python -m galaxy --request "Clone repo on laptop, build on server, test on Windows"
+```
+
+**What happens:**
+1. ConstellationAgent decomposes request into DAG
+2. TaskStars assigned to devices based on capabilities
+3. Devices execute tasks using their local UFO² agents
+4. Results aggregated and presented to user
+
+---
+
+### Option 3: Programmatic Migration
+
+**Best for:** Custom workflows, CI/CD integration
+
+#### UFO² API (Before):
+
+```python
+from ufo.module.session_pool import SessionFactory, SessionPool
+import asyncio
+
+async def main():
+ # Create UFO² session on local device
+ sessions = SessionFactory().create_session(
+ task="my_task",
+ mode="normal",
+ plan="",
+ request="Create a presentation from data.xlsx"
+ )
+
+ # Run session
+ pool = SessionPool(sessions)
+ await pool.run_all()
+
+asyncio.run(main())
+```
+
+#### Galaxy API (After):
+
+```python
+from galaxy import GalaxyClient
+import asyncio
+
+async def main():
+ # Galaxy session coordinating multiple devices
+ client = GalaxyClient(session_name="my_workflow")
+ await client.initialize()
+
+ result = await client.process_request(
+ "Clone repo on laptop, build on server, test on Windows"
+ )
+
+ print(f"Workflow completed: {result}")
+ await client.shutdown()
+
+asyncio.run(main())
+```
+
+**Key Differences:**
+- Both are **async** (UFO² v2.0+ uses asyncio)
+- UFO²: Uses `SessionFactory` + `SessionPool` pattern
+- Galaxy: Uses `GalaxyClient` for multi-device orchestration
+- Galaxy returns **constellation results** (multi-device)
+- Galaxy requires **device registration** first
+
+---
+
+## 📊 Feature Comparison
+
+### Preserved UFO² Features in Galaxy
+
+When running UFO² as a Galaxy device, you **keep all UFO² capabilities**:
+
+| UFO² Feature | Available in Galaxy Device? | Notes |
+|--------------|----------------------------|-------|
+| ✅ Hybrid GUI–API execution | ✅ Yes | Each device uses its native UFO² agent |
+| ✅ Windows UIA/Win32/COM | ✅ Yes | Full OS integration preserved |
+| ✅ MCP server integration | ✅ Yes | Devices can use custom MCP servers |
+| ✅ Continuous learning | ✅ Yes | Each device maintains its own RAG |
+| ✅ Picture-in-Picture | ✅ Yes | Non-disruptive execution on each device |
+| ✅ AppAgent specialization | ✅ Yes | HostAgent manages local AppAgents |
+
+---
+
+### New Galaxy-Only Features
+
+| Feature | Description | Benefit |
+|---------|-------------|---------|
+| **Task Constellation** | DAG-based task decomposition | Complex workflow planning |
+| **Parallel Execution** | Asynchronous multi-device tasks | 3-5x faster for parallelizable work |
+| **Dynamic Adaptation** | Runtime DAG modification | Self-healing workflows |
+| **Device Assignment** | Capability-based task placement | Optimal resource utilization |
+| **Cross-Platform** | Windows + Linux + macOS support | Heterogeneous orchestration |
+| **Event-Driven Coordination** | Observer pattern for task events | Reactive workflow control |
+| **Formal Safety Guarantees** | I1-I3 invariants | Provably correct concurrent execution |
+
+---
+
+## 🛠️ Practical Examples
+
+### Example 1: Simple Local Task
+
+**UFO² (Before):**
+```bash
+python -m ufo --task "Create a presentation from data.xlsx"
+```
+
+**Galaxy (After) — Option A: Keep UFO²**
+```bash
+# No change needed — continue using UFO² for local tasks
+python -m ufo --task "Create a presentation from data.xlsx"
+```
+
+**Galaxy (After) — Option B: Use Galaxy**
+```bash
+# Galaxy will assign to local Windows device automatically
+python -m galaxy --request "Create a presentation from data.xlsx on my desktop"
+```
+
+**When to use which?**
+- Use UFO² if you only have one Windows desktop (simpler)
+- Use Galaxy if you want logging/monitoring features
+
+---
+
+### Example 2: Cross-Device Workflow
+
+**UFO² (Before):**
+```bash
+# ❌ Not possible — UFO² is single-device only
+# You'd need to manually:
+# 1. SSH to server
+# 2. Run build command
+# 3. Copy results back
+# 4. Open locally
+```
+
+**Galaxy (After):**
+```bash
+python -m galaxy --request \
+ "Clone https://github.com/myrepo on laptop, \
+ build Docker image on gpu_server, \
+ deploy to staging server, \
+ open logs on my Windows desktop"
+```
+
+**Galaxy automatically:**
+1. Creates 4-task DAG
+2. Assigns tasks to capable devices
+3. Executes in parallel where possible
+4. Streams results back
+
+---
+
+### Example 3: Data Pipeline
+
+**UFO² (Before):**
+```python
+# UFO² requires manual orchestration across multiple steps
+from ufo.module.session_pool import SessionFactory, SessionPool
+import asyncio
+
+async def main():
+ # Step 1: Fetch data (local)
+ sessions_1 = SessionFactory().create_session(
+ task="fetch_data",
+ mode="normal",
+ plan="",
+ request="Download dataset from cloud storage"
+ )
+ pool_1 = SessionPool(sessions_1)
+ await pool_1.run_all()
+
+ # Step 2: Manually transfer to server
+ # scp data.csv user@server:/data/
+
+ # Step 3: SSH and run processing
+ # ssh server "python process.py"
+
+ # Step 4: Manually copy results back
+ # scp server:/output/results.csv .
+
+ # Step 5: Visualize locally
+ sessions_2 = SessionFactory().create_session(
+ task="visualize",
+ mode="normal",
+ plan="",
+ request="Create charts from results.csv"
+ )
+ pool_2 = SessionPool(sessions_2)
+ await pool_2.run_all()
+
+asyncio.run(main())
+```
+
+**Galaxy (After):**
+```python
+import asyncio
+from galaxy import GalaxyClient
+
+async def main():
+ client = GalaxyClient(session_name="data_pipeline")
+ await client.initialize()
+
+ # Single request — Galaxy handles orchestration
+ await client.process_request(
+ "Fetch dataset from cloud to laptop, "
+ "preprocess on linux_workstation, "
+ "train model on gpu_server, "
+ "visualize results on my Windows desktop"
+ )
+
+ await client.shutdown()
+
+asyncio.run(main())
+```
+
+**Galaxy automatically:**
+- Creates dependency chain
+- Transfers data between devices
+- Executes pipeline stages in order
+- Handles failures with retries
+
+---
+
+## 🎓 Learning Path
+
+### For UFO² Users
+
+1. **Week 1: Understand Concepts**
+ - Read [Galaxy Overview](../galaxy/overview.md)
+ - Understand Task Constellation and DAG model
+ - Compare with UFO² two-tier hierarchy
+
+2. **Week 2: Hands-On**
+ - Set up one Windows device as Galaxy agent
+ - Run simple multi-step workflow
+ - Compare logs: UFO² vs Galaxy
+
+3. **Week 3: Multi-Device**
+ - Add Linux device to pool
+ - Create cross-platform workflow
+ - Monitor with trajectory reports
+
+4. **Week 4: Advanced**
+ - Build custom device capabilities
+ - Integrate MCP servers across devices
+ - Optimize task assignment logic
+
+---
+
+## 📚 Related Documentation
+
+### Migration Resources
+
+- **[Galaxy Quick Start](./quick_start_galaxy.md)** — Step-by-step Galaxy setup
+- **[UFO² Quick Start](./quick_start_ufo2.md)** — UFO² reference
+- **[Device Configuration](../configuration/system/galaxy_devices.md)** — Device pool setup
+- **[Agent Registration](../galaxy/agent_registration/overview.md)** — How devices join Galaxy
+
+### Architecture Deep Dives
+
+- **[Galaxy Overview](../galaxy/overview.md)** — Constellation architecture
+- **[UFO² Overview](../ufo2/overview.md)** — Desktop AgentOS design
+- **[Constellation Agent](../galaxy/constellation_agent/overview.md)** — DAG orchestration
+- **[Task Constellation](../galaxy/constellation/overview.md)** — DAG structure
+
+### Operational Guides
+
+- **[Trajectory Report](../galaxy/evaluation/trajectory_report.md)** — Execution logs
+- **[Performance Metrics](../galaxy/evaluation/performance_metrics.md)** — Monitoring
+- **[AIP Protocol](../aip/overview.md)** — Device communication
+
+---
+
+## 🤝 Getting Help
+
+### Common Questions
+
+**Q: Can I still use UFO² after migrating to Galaxy?**
+A: Yes! They coexist. Use UFO² for simple local tasks, Galaxy for multi-device workflows.
+
+**Q: Do I need to rewrite my custom agents?**
+A: No. Existing UFO² agents work as-is when running as Galaxy devices.
+
+**Q: Is Galaxy production-ready?**
+A: Galaxy is in active development. UFO² is more mature for mission-critical single-device workflows.
+
+**Q: Can I mix Windows and Linux devices?**
+A: Yes! That's Galaxy's key feature. Each device uses its native UFO² implementation.
+
+**Q: How do I debug failed cross-device workflows?**
+A: Check `logs/galaxy//output.md` for step-by-step execution details and DAG visualizations.
+
+---
+
+## 🚦 Migration Checklist
+
+Use this checklist to track your migration progress:
+
+- [ ] **Understand UFO evolution** (v1 → UFO² → Galaxy)
+- [ ] **Decide migration strategy** (hybrid vs full Galaxy)
+- [ ] **Preserve UFO² config** (`config/ufo/` untouched)
+- [ ] **Create Galaxy config** (`config/galaxy/agent.yaml`, `devices.yaml`)
+- [ ] **Start devices as servers** (each device runs `python -m ufo.server.app --port `)
+- [ ] **Test single-device workflow** (verify connectivity)
+- [ ] **Test multi-device workflow** (cross-platform task)
+- [ ] **Review trajectory reports** (`logs/galaxy/*/output.md`)
+- [ ] **Compare performance** (UFO² vs Galaxy for your use cases)
+- [ ] **Update automation scripts** (if using programmatic API)
+- [ ] **Train team** (share this guide!)
+
+---
+
+**🎉 Congratulations!** You're now ready to leverage the full power of UFO³ Galaxy's multi-device orchestration while preserving your existing UFO² workflows.
+
+For questions or issues, please open an issue on [GitHub](https://github.com/microsoft/UFO) or check the [documentation](https://microsoft.github.io/UFO/).
diff --git a/documents/docs/getting_started/more_guidance.md b/documents/docs/getting_started/more_guidance.md
index 9c733f63b..ca5db3f34 100644
--- a/documents/docs/getting_started/more_guidance.md
+++ b/documents/docs/getting_started/more_guidance.md
@@ -1,13 +1,391 @@
# More Guidance
-## For Users
-If you are a user of UFO, and want to use it to automate your tasks on Windows, you can refer to [User Configuration](../configurations/user_configuration.md) to set up your environment and start using UFO.
-For instance, except for configuring the `HOST_AGENT` and `APP_AGENT`, you can also configure the LLM parameters and RAG parameters in the `config.yaml` file to enhance the UFO agent with additional knowledge sources.
+This page provides additional guidance and resources for different user types and use cases.
+---
-## For Developers
-If you are a developer who wants to contribute to UFO, you can take a look at the [Developer Configuration](../configurations/developer_configuration.md) to explore the development environment setup and the development workflow.
+## 🎯 For End Users
-You can also refer to the [Project Structure](../project_directory_structure.md) to understand the project structure and the role of each component in UFO, and use the rest of the documentation to understand the architecture and design of UFO. Taking a look at the [Session](../modules/session.md) and [Round](../modules/round.md) can help you understand the core logic of UFO.
+If you want to use UFO³ to automate your tasks on Windows, Linux, or across multiple devices, here's your learning path:
-For debugging and testing, it is recommended to check the log files in the `ufo/logs` directory to track the execution of UFO and identify any issues that may arise.
\ No newline at end of file
+### 1. Getting Started (5-10 minutes)
+
+Choose your path based on your needs:
+
+| Your Goal | Start Here | Time |
+|-----------|-----------|------|
+| **Automate Windows desktop tasks** | [UFO² Quick Start](quick_start_ufo2.md) | 5 min |
+| **Manage Linux servers** | [Linux Quick Start](quick_start_linux.md) | 10 min |
+| **Orchestrate multiple devices** | [Galaxy Quick Start](quick_start_galaxy.md) | 10 min |
+
+### 2. Configure Your Environment (10-20 minutes)
+
+After installation, customize UFO³ to your needs:
+
+**Essential Configuration:**
+
+- **[Agent Configuration](../configuration/system/agents_config.md)** - Set up LLM API keys (OpenAI, Azure, Gemini, Claude, etc.)
+- **[System Configuration](../configuration/system/system_config.md)** - Adjust runtime settings (step limits, timeouts, logging)
+
+**Optional Enhancements:**
+
+- **[RAG Configuration](../configuration/system/rag_config.md)** - Add external knowledge sources:
+ - Offline help documents
+ - Bing search integration
+ - Experience learning from past tasks
+ - User demonstrations
+- **[MCP Configuration](../configuration/system/mcp_reference.md)** - Enable tool servers for:
+ - Better Office automation
+ - Linux command execution
+ - Custom tool integration
+
+> **💡 Configuration Tip:** Start with default settings and adjust only what you need. See [Configuration Overview](../configuration/system/overview.md) for the big picture.
+
+### 3. Learn Core Features (20-30 minutes)
+
+**For UFO² Users (Windows Desktop Automation):**
+
+| Feature | Documentation | What It Does |
+|---------|---------------|--------------|
+| **Hybrid GUI-API Execution** | [Hybrid Actions](../ufo2/core_features/hybrid_actions.md) | Combines UI automation with native API calls for faster, more reliable execution |
+| **Knowledge Substrate** | [Knowledge Overview](../ufo2/core_features/knowledge_substrate/overview.md) | Augments agents with external knowledge (docs, search, experience) |
+| **MCP Integration** | [MCP Overview](../mcp/overview.md) | Extends capabilities with custom tools and Office APIs |
+
+**For Galaxy Users (Multi-Device Orchestration):**
+
+| Feature | Documentation | What It Does |
+|---------|---------------|--------------|
+| **Task Constellation** | [Constellation Overview](../galaxy/constellation_orchestrator/overview.md) | Decomposes tasks into parallel DAGs across devices |
+| **Device Capabilities** | [Galaxy Devices Config](../configuration/system/galaxy_devices.md) | Routes tasks based on device capabilities and metadata |
+| **Asynchronous Execution** | [Constellation Overview](../galaxy/constellation/overview.md) | Executes subtasks in parallel for faster completion |
+| **Agent Interaction Protocol** | [AIP Overview](../aip/overview.md) | Enables persistent WebSocket communication between devices |
+
+### 4. Troubleshooting & Support
+
+**When Things Go Wrong:**
+
+1. **Check the [FAQ](../faq.md)** - Common issues and solutions
+2. **Review logs** - Located in `logs//`:
+ ```
+ logs/my-task-2025-11-11/
+ ├── request.log # Request logs
+ ├── response.log # Response logs
+ ├── action_step*.png # Screenshots at each step
+ └── action_step*_annotated.png # Annotated screenshots
+ ```
+3. **Validate configuration:**
+ ```bash
+ python -m ufo.tools.validate_config ufo --show-config
+ ```
+4. **Enable debug logging:**
+ ```yaml
+ # config/ufo/system.yaml
+ LOG_LEVEL: "DEBUG"
+ ```
+
+**Get Help:**
+
+- **[GitHub Discussions](https://github.com/microsoft/UFO/discussions)** - Ask questions, share tips
+- **[GitHub Issues](https://github.com/microsoft/UFO/issues)** - Report bugs, request features
+- **Email:** ufo-agent@microsoft.com
+
+---
+
+## 👨💻 For Developers
+
+If you want to contribute to UFO³ or build extensions, here's your development guide:
+
+### 1. Understand the Architecture (30-60 minutes)
+
+**Start with the big picture:**
+
+- **[Project Structure](../project_directory_structure.md)** - Codebase organization and component roles
+- **[Configuration Architecture](../configuration/system/overview.md)** - New modular config system design
+
+**Deep dive into core components:**
+
+| Component | Documentation | What to Learn |
+|-----------|---------------|---------------|
+| **Session** | [Session Module](../infrastructure/modules/session.md) | Task lifecycle management, state tracking |
+| **Round** | [Round Module](../infrastructure/modules/round.md) | Single agent reasoning cycle |
+| **HostAgent** | [HostAgent](../ufo2/host_agent/overview.md) | High-level task planning and app selection |
+| **AppAgent** | [AppAgent](../ufo2/app_agent/overview.md) | Low-level action execution |
+| **ConstellationAgent** | [ConstellationAgent](../galaxy/constellation_agent/overview.md) | Multi-device task orchestration |
+
+### 2. Set Up Development Environment (15-30 minutes)
+
+**Installation:**
+
+```bash
+# Clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Create development environment
+conda create -n ufo-dev python=3.10
+conda activate ufo-dev
+
+# Install dependencies (including dev tools)
+pip install -r requirements.txt
+pip install pytest pytest-cov black flake8 # Testing & linting
+```
+
+**Configuration:**
+
+```bash
+# Create config files from templates
+cp config/ufo/agents.yaml.template config/ufo/agents.yaml
+cp config/galaxy/agent.yaml.template config/galaxy/agent.yaml
+
+# Edit with your development API keys
+# (Consider using lower-cost models for testing)
+```
+
+### 3. Explore the Codebase (1-2 hours)
+
+**Key Directories:**
+
+```
+UFO/
+├── ufo/ # Core UFO² implementation
+│ ├── agents/ # HostAgent, AppAgent
+│ ├── automator/ # UI automation engines
+│ ├── prompter/ # Prompt management
+│ └── module/ # Core modules (Session, Round)
+├── galaxy/ # Galaxy orchestration framework
+│ ├── agents/ # ConstellationAgent
+│ ├── constellation/ # DAG orchestration
+│ └── core/ # Core Galaxy infrastructure
+├── aip/ # Agent Interaction Protocol
+│ ├── protocol/ # Message definitions
+│ └── transport/ # WebSocket transport
+├── ufo/client/ # Device agents (Windows, Linux)
+│ ├── client.py # Generic client
+│ └── mcp/ # MCP integration
+├── ufo/server/ # Device agent server
+│ └── app.py # FastAPI server
+└── config/ # Configuration system
+ ├── ufo/ # UFO² configs
+ └── galaxy/ # Galaxy configs
+```
+
+**Entry Points:**
+
+- **UFO² Main:** `ufo/__main__.py`
+- **Galaxy Main:** `galaxy/__main__.py`
+- **Server:** `ufo/server/app.py`
+- **Client:** `ufo/client/client.py`
+
+### 4. Development Workflows
+
+#### Adding a New Feature
+
+1. **Identify the component** to modify (Agent, Module, Automator, etc.)
+2. **Read existing code** in that component
+3. **Check related tests** in `tests/` directory
+4. **Implement your feature** following existing patterns
+5. **Add tests** for your feature
+6. **Update documentation** if needed
+
+#### Extending Configuration
+
+See **[Extending Configuration](../configuration/system/extending.md)** for:
+- Adding custom fields
+- Creating new config modules
+- Environment-specific overrides
+- Plugin configuration patterns
+
+#### Creating Custom MCP Servers
+
+See **[Creating MCP Servers Tutorial](../tutorials/creating_mcp_servers.md)** for:
+- MCP server architecture
+- Tool definition and registration
+- HTTP vs. local vs. stdio servers
+- Integration with UFO³
+
+### 5. Testing & Debugging
+
+**Run Tests:**
+
+```bash
+# Run all tests
+pytest
+
+# Run specific test file
+pytest tests/config/test_config_system.py
+
+# Run with coverage
+pytest --cov=ufo --cov-report=html
+```
+
+**Debug Logging:**
+
+```python
+# Add debug logs to your code
+import logging
+logger = logging.getLogger(__name__)
+
+logger.debug("Debug message with context: %s", variable)
+logger.info("Informational message")
+logger.warning("Warning message")
+logger.error("Error message")
+```
+
+**Interactive Debugging:**
+
+```python
+# Add breakpoint in code
+import pdb; pdb.set_trace()
+
+# Or use VS Code debugger with launch.json
+```
+
+### 6. Code Style & Best Practices
+
+**Formatting:**
+
+```bash
+# Auto-format with black
+black ufo/ galaxy/
+
+# Check style with flake8
+flake8 ufo/ galaxy/
+```
+
+**Best Practices:**
+
+- ✅ Use type hints: `def process(data: Dict[str, Any]) -> Optional[str]:`
+- ✅ Write docstrings for public functions
+- ✅ Follow existing code patterns
+- ✅ Add comments for complex logic
+- ✅ Keep functions focused and modular
+- ✅ Handle errors gracefully
+- ✅ Write tests for new features
+
+**Configuration Best Practices:**
+
+- ✅ Use typed config access: `config.system.max_step`
+- ✅ Provide `.template` files for sensitive configs
+- ✅ Document custom fields in YAML comments
+- ✅ Use environment variables for secrets: `${OPENAI_API_KEY}`
+- ✅ Validate configurations early: `ConfigValidator.validate()`
+
+### 7. Contributing Guidelines
+
+**Before Submitting a PR:**
+
+1. **Test your changes** thoroughly
+2. **Update documentation** if needed
+3. **Follow code style** (black + flake8)
+4. **Write clear commit messages**
+5. **Reference related issues** in PR description
+
+**PR Template:**
+
+```markdown
+## Description
+Brief description of changes
+
+## Type of Change
+- [ ] Bug fix
+- [ ] New feature
+- [ ] Documentation update
+- [ ] Refactoring
+
+## Testing
+- [ ] Added tests for new functionality
+- [ ] All tests pass locally
+- [ ] Manual testing completed
+
+## Checklist
+- [ ] Code follows project style
+- [ ] Documentation updated
+- [ ] No breaking changes (or documented)
+```
+
+### 8. Advanced Topics
+
+**For Deep Customization:**
+
+- **[Prompt Engineering](../ufo2/prompts/overview.md)** - Customize agent prompts
+- **[State Management](../galaxy/constellation/overview.md)** - Constellation state machine internals
+- **[Protocol Extensions](../aip/messages.md)** - Extend AIP message types
+- **[Custom Automators](../ufo2/core_features/control_detection/overview.md)** - Implement new automation backends
+
+---
+
+## 🎓 Learning Paths
+
+### Path 1: Basic User → Power User
+
+1. ✅ Complete quick start for your platform
+2. ✅ Run 5-10 simple automation tasks
+3. ✅ Configure RAG for your organization's docs
+4. ✅ Enable MCP for better Office automation
+5. ✅ Set up experience learning for common tasks
+6. ✅ Create custom device configurations (Galaxy)
+
+**Time Investment:** 2-4 hours
+**Outcome:** Efficient automation of daily tasks
+
+### Path 2: Power User → Developer
+
+1. ✅ Understand project structure and architecture
+2. ✅ Read Session and Round module code
+3. ✅ Create a custom MCP server
+4. ✅ Add custom metadata to device configs
+5. ✅ Contribute documentation improvements
+6. ✅ Submit your first bug fix PR
+
+**Time Investment:** 10-20 hours
+**Outcome:** Ability to extend and customize UFO³
+
+### Path 3: Developer → Core Contributor
+
+1. ✅ Deep dive into agent implementations
+2. ✅ Understand Galaxy orchestration internals
+3. ✅ Study AIP protocol and transport layer
+4. ✅ Implement a new agent capability
+5. ✅ Add support for a new LLM provider
+6. ✅ Contribute major features or refactorings
+
+**Time Investment:** 40+ hours
+**Outcome:** Core contributor to UFO³ project
+
+---
+
+## 📚 Additional Resources
+
+### Documentation Hubs
+
+| Topic | Link | Description |
+|-------|------|-------------|
+| **Getting Started** | [Getting Started Index](../index.md#getting-started) | All quick start guides |
+| **Configuration** | [Configuration Overview](../configuration/system/overview.md) | Complete config system documentation |
+| **Architecture** | [Galaxy Overview](../galaxy/overview.md), [UFO² Overview](../ufo2/overview.md) | System architecture and design |
+| **API Reference** | [Agent APIs](../infrastructure/agents/overview.md) | Agent interfaces and APIs |
+| **Tutorials** | [Creating Device Agents](../tutorials/creating_device_agent/index.md) | Step-by-step guides |
+
+### Community Resources
+
+- **[GitHub Repository](https://github.com/microsoft/UFO)** - Source code and releases
+- **[GitHub Discussions](https://github.com/microsoft/UFO/discussions)** - Q&A and community
+- **[GitHub Issues](https://github.com/microsoft/UFO/issues)** - Bug reports and features
+- **[Project Website](https://microsoft.github.io/UFO/)** - Official website
+
+### Research Papers
+
+- **UFO v1** (Feb 2024): [A UI-Focused Agent for Windows OS Interaction](https://arxiv.org/abs/2402.07939)
+- **UFO² v2** (Apr 2025): [A Windows Agent for Seamless OS Interaction](https://arxiv.org/abs/2504.14603)
+- **UFO³ Galaxy** (Nov 2025): UFO³: Weaving the Digital Agent Galaxy *(Coming Soon)*
+
+---
+
+## 🆘 Need More Help?
+
+- **Can't find what you're looking for?** Check the [FAQ](../faq.md)
+- **Still stuck?** Ask on [GitHub Discussions](https://github.com/microsoft/UFO/discussions)
+- **Found a bug?** Open an issue on [GitHub Issues](https://github.com/microsoft/UFO/issues)
+- **Want to contribute?** Read the [Contributing Guidelines](https://github.com/microsoft/UFO/blob/main/CONTRIBUTING.md)
+
+**Happy automating!** 🚀
diff --git a/documents/docs/getting_started/quick_start.md b/documents/docs/getting_started/quick_start.md
deleted file mode 100644
index 62e8305a0..000000000
--- a/documents/docs/getting_started/quick_start.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# Quick Start
-
-### 🛠️ Step 1: Installation
-UFO requires **Python >= 3.10** running on **Windows OS >= 10**. It can be installed by running the following command:
-```powershell
-# [optional to create conda environment]
-# conda create -n ufo python=3.10
-# conda activate ufo
-
-# clone the repository
-git clone https://github.com/microsoft/UFO.git
-cd UFO
-# install the requirements
-pip install -r requirements.txt
-# If you want to use the Qwen as your LLMs, uncomment the related libs.
-```
-
-### ⚙️ Step 2: Configure the LLMs
-Before running UFO, you need to provide your LLM configurations **individually for HostAgent and AppAgent**. You can create your own config file `ufo/config/config.yaml`, by copying the `ufo/config/config.yaml.template` and editing config for **HOST_AGENT** and **APP_AGENT** as follows:
-
-```powershell
-copy ufo\config\config.yaml.template ufo\config\config.yaml
-notepad ufo\config\config.yaml # paste your key & endpoint
-```
-
-#### OpenAI
-```bash
-VISUAL_MODE: True, # Whether to use the visual mode
-API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
-API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
-API_KEY: "sk-", # The OpenAI API key, begin with sk-
-API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
-API_MODEL: "gpt-4-vision-preview", # The OpenAI model
-```
-
-
-#### Azure OpenAI (AOAI)
-```bash
-VISUAL_MODE: True, # Whether to use the visual mode
-API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
-API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
-API_KEY: "YOUR_KEY", # The aoai API key
-API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
-API_MODEL: "gpt-4-vision-preview", # The OpenAI model
-API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
-```
-You can also non-visial model (e.g., GPT-4) for each agent, by setting `VISUAL_MODE: False` and proper `API_MODEL` (openai) and `API_DEPLOYMENT_ID` (aoai). You can also optionally set an backup LLM engine in the field of `BACKUP_AGENT` if the above engines failed during the inference. The `API_MODEL` can be any GPT models that can accept images as input.
-
-
-
-#### Non-Visual Model Configuration
-You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the `config.yaml` file:
-
-!!! info
- - ```VISUAL_MODE: False```
- - Specify the appropriate `API_MODEL` (OpenAI) and `API_DEPLOYMENT_ID` (AOAI) for each agent.
-
-Optionally, you can set a backup language model (LLM) engine in the `BACKUP_AGENT` field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.
-
-!!! note
- UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the [documents](../supported_models/overview.md) for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in `config_dev.yaml`.
-
-### 📔 Step 3: Additional Setting for RAG (optional).
-If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.
-
-We provide the following options for RAG to enhance UFO's capabilities:
-
-- **[Offline Help Document](../advanced_usage/reinforce_appagent/learning_from_help_document.md)**: Enable UFO to retrieve information from offline help documents.
-
-- **[Online Bing Search Engine](../advanced_usage/reinforce_appagent/learning_from_bing_search.md)**: Enhance UFO's capabilities by utilizing the most up-to-date online search results.
-
-- **[Self-Experience](../advanced_usage/reinforce_appagent/experience_learning.md)**: Save task completion trajectories into UFO's memory for future reference.
-
-- **[User-Demonstration](../advanced_usage/reinforce_appagent/learning_from_demonstration.md)**: Boost UFO's capabilities through user demonstration.
-
-!!!tip
- Consult their respective documentation for more information on how to configure these settings.
-
-### 🎉 Step 4: Start UFO
-
-#### ⌨️ You can execute the following on your Windows command Line (CLI):
-
-```bash
-# assume you are in the cloned UFO folder
-python -m ufo --task
-```
-
-This will start the UFO process and you can interact with it through the command line interface.
-If everything goes well, you will see the following message:
-
-```bash
-Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
- _ _ _____ ___
-| | | || ___| / _ \
-| | | || |_ | | | |
-| |_| || _| | |_| |
- \___/ |_| \___/
-Please enter your request to be completed🛸:
-```
-
-Alternatively, you can also directly invoke UFO with a specific task and request by using the following command:
-
-```powershell
-python -m ufo --task -r ""
-```
-
-
-### Step 5 🎥: Execution Logs
-
-You can find the screenshots taken and request & response logs in the following folder:
-```
-./ufo/logs//
-```
-You may use them to debug, replay, or analyze the agent output.
-
-
-!!! note
- The LLM accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to [DISCLAIMER.md](https://github.com/microsoft/UFO/blob/vyokky/dev/DISCLAIMER.md).
\ No newline at end of file
diff --git a/documents/docs/getting_started/quick_start_galaxy.md b/documents/docs/getting_started/quick_start_galaxy.md
new file mode 100644
index 000000000..66bc19755
--- /dev/null
+++ b/documents/docs/getting_started/quick_start_galaxy.md
@@ -0,0 +1,711 @@
+# Quick Start Guide - UFO³ Galaxy
+
+Welcome to **UFO³ Galaxy** – the Multi-Device AgentOS! This guide will help you orchestrate complex cross-platform workflows across multiple devices in just a few steps.
+
+**What is UFO³ Galaxy?**
+
+UFO³ Galaxy is a multi-tier orchestration framework that coordinates distributed agents across Windows and Linux devices. It enables complex workflows that span multiple machines, combining desktop automation, server operations, and heterogeneous device capabilities into unified task execution.
+
+---
+
+## 🛠️ Step 1: Installation
+
+### Requirements
+
+- **Python** >= 3.10
+- **Windows OS** >= 10 (for Windows agents)
+- **Linux** (for Linux agents)
+- **Git** (for cloning the repository)
+- **Network connectivity** between all devices
+
+### Installation Steps
+
+```powershell
+# [Optional] Create conda environment
+conda create -n ufo python=3.10
+conda activate ufo
+
+# Clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+> **💡 Tip:** If you want to use Qwen as your LLM, uncomment the related libraries in `requirements.txt` before installing.
+
+---
+
+## ⚙️ Step 2: Configure ConstellationAgent LLM
+
+UFO³ Galaxy uses a **ConstellationAgent** that orchestrates all device agents. You need to configure its LLM settings.
+
+### Configure Constellation Agent
+
+```powershell
+# Copy template to create constellation agent config
+copy config\galaxy\agent.yaml.template config\galaxy\agent.yaml
+notepad config\galaxy\agent.yaml # Edit your LLM API credentials
+```
+
+**Configuration File Location:**
+```
+config/galaxy/
+├── agent.yaml.template # Template - COPY THIS
+├── agent.yaml # Your config with API keys (DO NOT commit)
+└── devices.yaml # Device pool configuration (Step 4)
+```
+
+### LLM Configuration Examples
+
+#### Azure OpenAI Configuration
+
+**Edit `config/galaxy/agent.yaml`:**
+
+```yaml
+CONSTELLATION_AGENT:
+ REASONING_MODEL: false
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+```
+
+> **ℹ️ More LLM Options:** Galaxy supports various LLM providers including Qwen, Gemini, Claude, DeepSeek, and more. See the [Model Configuration Guide](../configuration/models/overview.md) for complete details.
+
+---
+
+ # Prompt configurations (use defaults)
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+#### OpenAI Configuration
+
+```yaml
+CONSTELLATION_AGENT:
+ REASONING_MODEL: false
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o"
+
+ # Prompt configurations (use defaults)
+ CONSTELLATION_CREATION_PROMPT: "galaxy/prompts/constellation/share/constellation_creation.yaml"
+ CONSTELLATION_EDITING_PROMPT: "galaxy/prompts/constellation/share/constellation_editing.yaml"
+ CONSTELLATION_CREATION_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_creation_example.yaml"
+ CONSTELLATION_EDITING_EXAMPLE_PROMPT: "galaxy/prompts/constellation/examples/constellation_editing_example.yaml"
+```
+
+!!!info "More LLM Options"
+ Galaxy supports various LLM providers including **Qwen**, **Gemini**, **Claude**, **DeepSeek**, and more. See the **[Model Configuration Guide](../configuration/models/overview.md)** for complete details.
+
+---
+
+## 🖥️ Step 3: Set Up Device Agents
+
+Galaxy orchestrates **device agents** that execute tasks on individual machines. You need to start the appropriate device agents based on your needs.
+
+### Supported Device Agents
+
+| Device Agent | Platform | Documentation | Use Cases |
+|--------------|----------|---------------|-----------|
+| **WindowsAgent (UFO²)** | Windows 10/11 | [UFO² as Galaxy Device](../ufo2/as_galaxy_device.md) | Desktop automation, Office apps, GUI operations |
+| **LinuxAgent** | Linux | [Linux as Galaxy Device](../linux/as_galaxy_device.md) | Server management, CLI operations, log analysis |
+| **MobileAgent** | Android | [Mobile as Galaxy Device](../mobile/as_galaxy_device.md) | Mobile app automation, UI testing, device control |
+
+> **💡 Choose Your Devices:** You can use any combination of Windows, Linux, and Mobile agents. Galaxy will intelligently route tasks based on device capabilities.
+
+### Quick Setup Overview
+
+For each device agent you want to use, you need to:
+
+1. **Start the Device Agent Server** (manages tasks)
+2. **Start the Device Agent Client** (executes commands)
+3. **Start MCP Services** (provides automation tools, if needed)
+
+**Detailed Setup Instructions:**
+
+- **For Windows devices (UFO²):** See [UFO² as Galaxy Device](../ufo2/as_galaxy_device.md) for complete step-by-step instructions.
+- **For Linux devices:** See [Linux as Galaxy Device](../linux/as_galaxy_device.md) for complete step-by-step instructions.
+- **For Mobile devices:** See [Mobile as Galaxy Device](../mobile/as_galaxy_device.md) for complete step-by-step instructions.
+
+### Example: Quick Windows Device Setup
+
+**On your Windows machine:**
+
+```powershell
+# Terminal 1: Start UFO² Server
+python -m ufo.server.app --port 5000
+
+# Terminal 2: Start UFO² Client (connect to server)
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://localhost:5000/ws `
+ --client-id windows_device_1 `
+ --platform windows
+```
+
+> **💡 Important:** Always include `--platform windows` for Windows devices and `--platform linux` for Linux devices!
+
+### Example: Quick Linux Device Setup
+
+**On your Linux machine:**
+
+```bash
+# Terminal 1: Start Device Agent Server
+python -m ufo.server.app --port 5001
+
+# Terminal 2: Start Linux Client (connect to server)
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_device_1 \
+ --platform linux
+
+# Terminal 3: Start HTTP MCP Server (for Linux tools)
+python -m ufo.client.mcp.http_servers.linux_mcp_server
+```
+
+> **💡 Note:** For detailed Mobile Agent setup with ADB and Android device configuration, see [Mobile Quick Start](quick_start_mobile.md).
+
+---
+
+## 🔌 Step 4: Configure Device Pool
+
+After starting your device agents, register them in Galaxy's device pool configuration.
+
+### Option 1: Add Devices via Configuration File
+
+### Edit Device Configuration
+
+```powershell
+notepad config\galaxy\devices.yaml
+```
+
+### Example Device Pool Configuration
+
+```yaml
+# Device Configuration for Galaxy
+# Each device agent must be registered here
+
+devices:
+ # Windows Device (UFO²)
+ - device_id: "windows_device_1" # Must match --client-id
+ server_url: "ws://localhost:5000/ws" # Must match server WebSocket URL
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel"
+ - "word"
+ - "outlook"
+ - "email"
+ - "web_browsing"
+ metadata:
+ os: "windows"
+ version: "11"
+ performance: "high"
+ installed_apps:
+ - "Microsoft Excel"
+ - "Microsoft Word"
+ - "Microsoft Outlook"
+ - "Google Chrome"
+ description: "Primary Windows desktop for office automation"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Device
+ - device_id: "linux_device_1" # Must match --client-id
+ server_url: "ws://localhost:5001/ws" # Must match server WebSocket URL
+ os: "linux"
+ capabilities:
+ - "server_management"
+ - "log_analysis"
+ - "file_operations"
+ - "database_operations"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/var/log/myapp/app.log"
+ dev_path: "/home/user/projects/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ description: "Development server for backend operations"
+ auto_connect: true
+ max_retries: 5
+
+ # Mobile Device (Android)
+ - device_id: "mobile_phone_1" # Must match --client-id
+ server_url: "ws://localhost:5001/ws" # Must match server WebSocket URL
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "ui_automation"
+ - "messaging"
+ - "camera"
+ - "location"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ screen_size: "1080x2400"
+ installed_apps:
+ - "com.android.chrome"
+ - "com.google.android.apps.maps"
+ - "com.whatsapp"
+ description: "Android phone for mobile automation and testing"
+ auto_connect: true
+ max_retries: 5
+```
+
+> **⚠️ Critical:** IDs and URLs must match exactly:
+>
+> - `device_id` must exactly match the `--client-id` flag
+> - `server_url` must exactly match the server WebSocket URL
+> - Otherwise, Galaxy cannot control the device!
+
+**Complete Configuration Guide:** For detailed information about all configuration options, capabilities, and metadata, see [Galaxy Devices Configuration](../configuration/system/galaxy_devices.md).
+
+### Option 2: Add Devices via WebUI (When Using --webui Mode)
+
+If you start Galaxy with the `--webui` flag (see Step 5), you can add new device agents directly through the web interface without editing configuration files.
+
+**Steps to Add Device via WebUI:**
+
+1. **Launch Galaxy with WebUI** (as shown in Step 5):
+ ```powershell
+ python -m galaxy --webui
+ ```
+
+2. **Click the "+" button** in the top-right corner of the Device Agent panel (left sidebar)
+
+3. **Fill in the device information** in the Add Device Modal:
+
+
+
+
➕ Add Device Modal - Register new device agents through the WebUI
+
+
+**Required Fields:**
+- **Device ID**: Unique identifier (must match `--client-id` in device agent)
+- **Server URL**: WebSocket endpoint (e.g., `ws://localhost:5000/ws`)
+- **Operating System**: Select Windows, Linux, macOS, or enter custom OS
+- **Capabilities**: Add at least one capability (e.g., `excel`, `outlook`, `log_analysis`)
+
+**Optional Fields:**
+- **Auto-connect**: Enable to automatically connect after registration (default: enabled)
+- **Max Retries**: Maximum connection attempts (default: 5)
+- **Metadata**: Add custom key-value pairs (e.g., `region: us-east-1`)
+
+**Benefits of WebUI Device Management:**
+- ✅ No need to manually edit YAML files
+- ✅ Real-time validation of device ID uniqueness
+- ✅ Automatic connection after registration
+- ✅ Immediate visual feedback on device status
+- ✅ Form validation prevents configuration errors
+
+**After Adding:**
+The device will be:
+1. Saved to `config/galaxy/devices.yaml` automatically
+2. Registered with Galaxy's Device Manager
+3. Connected automatically (if auto-connect is enabled)
+4. Displayed in the Device Agent panel with real-time status
+
+> **💡 Tip:** You can add devices while Galaxy is running! No need to restart the server.
+
+---
+
+## 🎉 Step 5: Start UFO³ Galaxy
+
+With all device agents running and configured, you can now launch Galaxy!
+
+### Pre-Launch Checklist
+
+Before starting Galaxy, ensure:
+
+1. ✅ All Device Agent Servers are running
+2. ✅ All Device Agent Clients are connected
+3. ✅ MCP Services are running (for Linux devices)
+4. ✅ LLM configured in `config/galaxy/agent.yaml`
+5. ✅ Devices configured in `config/galaxy/devices.yaml`
+6. ✅ Network connectivity between all components
+
+### 🎨 Launch Galaxy - WebUI Mode (Recommended)
+
+Start Galaxy with an interactive web interface for real-time constellation visualization and monitoring:
+
+```powershell
+# Assume you are in the cloned UFO folder
+python -m galaxy --webui
+```
+
+This will start the Galaxy server with WebUI and automatically open your browser to the interactive interface:
+
+
+
+
🎨 Galaxy WebUI - Interactive constellation visualization and chat interface
+
+
+**WebUI Features:**
+
+- 🗣️ **Chat Interface**: Submit requests and interact with ConstellationAgent in real-time
+- 📊 **Live DAG Visualization**: Watch task constellation formation and execution
+- 🎯 **Task Status Tracking**: Monitor each TaskStar's progress and completion
+- 🔄 **Dynamic Updates**: See constellation evolution as tasks complete
+- 📱 **Responsive Design**: Works on desktop and tablet devices
+
+**Default URL:** `http://localhost:8000` (automatically finds next available port if 8000 is occupied)
+
+---
+
+### 💬 Launch Galaxy - Interactive Terminal Mode
+
+Start Galaxy in interactive mode where you can enter requests dynamically:
+
+```powershell
+# Assume you are in the cloned UFO folder
+python -m galaxy --interactive
+```
+
+**Expected Output:**
+
+```
+🌌 Welcome to UFO³ Galaxy Framework
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Multi-Device AI Orchestration System
+
+📡 Initializing Galaxy...
+✅ ConstellationAgent initialized
+✅ Connected to device: windows_device_1 (windows)
+✅ Connected to device: linux_device_1 (linux)
+
+🌟 Galaxy Ready - 2 devices online
+
+Please enter your request 🛸:
+```
+
+---
+
+### ⚡ Launch Galaxy - Direct Request Mode
+
+Invoke Galaxy with a specific request directly:
+
+```powershell
+python -m galaxy --request "Your task description here"
+```
+
+**Example:**
+
+```powershell
+python -m galaxy --request "Generate a sales report from the database and create an Excel dashboard"
+```
+
+---
+
+### 🎬 Launch Galaxy - Demo Mode
+
+Run Galaxy in demo mode to see example workflows:
+
+```powershell
+python -m galaxy --demo
+```
+
+---
+
+## 🎯 Step 6: Try Your First Multi-Device Workflow
+
+### Example 1: Simple Cross-Platform Task
+
+**User Request:**
+> "Check the server logs for errors and email me a summary"
+
+**Galaxy orchestrates:**
+
+1. **Linux Device**: Analyze server logs for error patterns
+2. **Windows Device**: Open Outlook, create email with log summary
+3. **Windows Device**: Send email
+
+**How to run:**
+
+```powershell
+python -m galaxy --request "Check the server logs for errors and email me a summary"
+```
+
+### Example 2: Data Processing Pipeline
+
+**User Request:**
+> "Export sales data from the database, create an Excel report with charts, and email it to the team"
+
+**Galaxy orchestrates:**
+
+1. **Linux Device**: Query database, export CSV
+2. **Windows Device**: Open Excel, import CSV, create charts
+3. **Windows Device**: Open Outlook, attach Excel file, send email
+
+**How to run:**
+
+```powershell
+python -m galaxy --request "Export sales data from the database, create an Excel report with charts, and email it to the team"
+```
+
+### Example 3: Multi-Server Monitoring
+
+**User Request:**
+> "Check all servers for disk usage and alert if any are above 80%"
+
+**Galaxy orchestrates:**
+
+1. **Linux Device 1**: Check disk usage on server 1
+2. **Linux Device 2**: Check disk usage on server 2
+3. **Galaxy**: Aggregate results, check thresholds
+4. **Windows Device**: Send alert email if needed
+
+---
+
+## 📔 Step 7: Understanding Device Routing
+
+Galaxy uses **capability-based routing** to intelligently assign tasks to appropriate devices.
+
+### How Galaxy Selects Devices
+
+| Factor | Description | Example |
+|--------|-------------|---------|
+| **Capabilities** | Matches task requirements | `"excel"` → Windows device with Excel |
+| **OS Requirement** | Platform-specific tasks | Linux commands → Linux device |
+| **Metadata** | Device-specific context | Email task → device with Outlook |
+| **Status** | Online and healthy devices only | Skips offline devices |
+
+### Example Task Decomposition
+
+**User Request:**
+> "Prepare monthly reports and distribute to team"
+
+**Galaxy Decomposition:**
+
+```yaml
+Subtask 1:
+ Description: "Extract monthly data from database"
+ Target Device: linux_device_1
+ Reason: Has "database_operations" capability
+
+Subtask 2:
+ Description: "Create Excel report with visualizations"
+ Target Device: windows_device_1
+ Reason: Has "excel" capability
+
+Subtask 3:
+ Description: "Email reports to distribution list"
+ Target Device: windows_device_1
+ Reason: Has "email" and "outlook" capabilities
+```
+
+---
+
+## 🔄 Step 8: Execution Logs
+
+Galaxy automatically saves execution logs, task graphs, and device traces for debugging and analysis.
+
+**Log Location:**
+
+```
+./logs//
+```
+
+**Log Contents:**
+
+| File/Folder | Description |
+|-------------|-------------|
+| `constellation/` | DAG visualization and task decomposition |
+| `device_logs/` | Individual device execution logs |
+| `screenshots/` | Screenshots from Windows devices (if enabled) |
+| `task_results/` | Task execution results |
+| `request_response.log` | Complete LLM request/response logs |
+
+> **Analyzing Logs:** Use the logs to debug task routing, identify bottlenecks, replay execution flow, and analyze orchestration decisions.
+
+---
+
+## 🔧 Advanced Configuration
+
+### Custom Session Name
+
+```powershell
+python -m galaxy --request "Your task" --session-name "my_project"
+```
+
+### Custom Output Directory
+
+```powershell
+python -m galaxy --request "Your task" --output-dir "./custom_results"
+```
+
+### Debug Mode
+
+```powershell
+python -m galaxy --interactive --log-level DEBUG
+```
+
+### Limit Maximum Rounds
+
+```powershell
+python -m galaxy --interactive --max-rounds 20
+```
+
+---
+
+## ❓ Troubleshooting
+
+### Issue 1: Device Not Appearing in Galaxy
+
+**Error:** Device not found in configuration
+
+```log
+ERROR - Device 'windows_device_1' not found in configuration
+```
+
+**Solutions:**
+
+1. Verify `devices.yaml` configuration:
+ ```powershell
+ notepad config\galaxy\devices.yaml
+ ```
+
+2. Check device ID matches:
+ - In `devices.yaml`: `device_id: "windows_device_1"`
+ - In client command: `--client-id windows_device_1`
+
+3. Check server URL matches:
+ - In `devices.yaml`: `server_url: "ws://localhost:5000/ws"`
+ - In client command: `--ws-server ws://localhost:5000/ws`
+
+### Issue 2: Device Agent Not Connecting
+
+**Error:** Connection refused
+
+```log
+ERROR - [WS] Failed to connect to ws://localhost:5000/ws
+Connection refused
+```
+
+**Solutions:**
+
+1. Verify server is running:
+ ```powershell
+ curl http://localhost:5000/api/health
+ ```
+
+2. Check port number is correct:
+ - Server: `--port 5000`
+ - Client: `ws://localhost:5000/ws`
+
+3. Ensure platform flag is set:
+ ```powershell
+ # For Windows devices
+ --platform windows
+
+ # For Linux devices
+ --platform linux
+ ```
+
+### Issue 3: Galaxy Cannot Find Constellation Agent Config
+
+**Error:** Configuration file not found
+
+```log
+ERROR - Cannot find config/galaxy/agent.yaml
+```
+
+**Solution:**
+```powershell
+# Copy template to create configuration file
+copy config\galaxy\agent.yaml.template config\galaxy\agent.yaml
+
+# Edit with your LLM credentials
+notepad config\galaxy\agent.yaml
+```
+
+### Issue 4: Task Not Routed to Expected Device
+
+**Issue:** Wrong device selected for task
+
+**Diagnosis:** Check device capabilities in `devices.yaml`:
+
+```yaml
+capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel" # Required for Excel tasks
+ - "outlook" # Required for email tasks
+```
+
+**Solution:** Add appropriate capabilities to your device configuration.
+
+---
+
+## 📚 Additional Resources
+
+### Core Documentation
+
+**Architecture & Concepts:**
+
+- [Galaxy Overview](../galaxy/overview.md) - System architecture and design principles
+- [Constellation Orchestrator](../galaxy/constellation_orchestrator/overview.md) - Task orchestration and DAG management
+- [Agent Interaction Protocol (AIP)](../aip/overview.md) - Communication substrate
+
+### Device Agent Setup
+
+**Device Agent Guides:**
+
+- [UFO² as Galaxy Device](../ufo2/as_galaxy_device.md) - Complete Windows device setup
+- [Linux as Galaxy Device](../linux/as_galaxy_device.md) - Complete Linux device setup
+- [Mobile as Galaxy Device](../mobile/as_galaxy_device.md) - Complete Android device setup
+- [UFO² Overview](../ufo2/overview.md) - Windows desktop automation capabilities
+- [Linux Agent Overview](../linux/overview.md) - Linux server automation capabilities
+- [Mobile Agent Overview](../mobile/overview.md) - Android mobile automation capabilities
+
+### Configuration
+
+**Configuration Guides:**
+
+- [Galaxy Devices Configuration](../configuration/system/galaxy_devices.md) - Complete device pool configuration
+- [Galaxy Constellation Configuration](../configuration/system/galaxy_constellation.md) - Runtime settings
+- [Agents Configuration](../configuration/system/agents_config.md) - LLM settings for all agents
+- [Model Configuration](../configuration/models/overview.md) - Supported LLM providers
+
+### Advanced Features
+
+**Advanced Topics:**
+
+- [Task Constellation](../galaxy/constellation/task_constellation.md) - DAG-based task planning
+- [Constellation Orchestrator](../galaxy/constellation_orchestrator/overview.md) - Multi-device orchestration
+- [Device Registry](../galaxy/agent_registration/device_registry.md) - Device management
+- [Agent Profiles](../galaxy/agent_registration/agent_profile.md) - Multi-source profiling
+
+---
+
+## ❓ Getting Help
+
+- 📖 **Documentation**: [https://microsoft.github.io/UFO/](https://microsoft.github.io/UFO/)
+- 🐛 **GitHub Issues**: [https://github.com/microsoft/UFO/issues](https://github.com/microsoft/UFO/issues) (preferred)
+- 📧 **Email**: [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+
+---
+
+## 🎯 Next Steps
+
+Now that Galaxy is set up, explore these guides to unlock its full potential:
+
+1. **[Add More Devices](../configuration/system/galaxy_devices.md)** - Expand your device pool
+2. **[Configure Capabilities](../configuration/system/galaxy_devices.md)** - Optimize task routing
+3. **[Constellation Agent](../galaxy/constellation_agent/overview.md)** - Deep dive into orchestration agent
+4. **[Advanced Orchestration](../galaxy/constellation_orchestrator/overview.md)** - Deep dive into DAG planning
+
+Happy orchestrating with UFO³ Galaxy! 🌌🚀
diff --git a/documents/docs/getting_started/quick_start_linux.md b/documents/docs/getting_started/quick_start_linux.md
new file mode 100644
index 000000000..7b48ef55f
--- /dev/null
+++ b/documents/docs/getting_started/quick_start_linux.md
@@ -0,0 +1,1013 @@
+# ⚡ Quick Start: Linux Agent
+
+Get your Linux device running as a UFO³ device agent in 5 minutes. This guide walks you through server/client configuration and MCP service initialization.
+
+---
+
+## 📋 Prerequisites
+
+Before you begin, ensure you have:
+
+- **Python 3.10+** installed on both server and client machines
+- **UFO repository** cloned
+- **Network connectivity** between server and client machines
+- **Linux machine** for task execution (client)
+- **Terminal access** (bash, ssh, etc.)
+- **LLM configured** in `config/ufo/agents.yaml` (same as AppAgent)
+
+| Component | Minimum Version | Verification Command |
+|-----------|----------------|---------------------|
+| Python | 3.10 | `python3 --version` |
+| Git | 2.0+ | `git --version` |
+| Network | N/A | `ping ` |
+| LLM API Key | N/A | Check `config/ufo/agents.yaml` |
+
+> **⚠️ LLM Configuration Required:** The Linux Agent shares the same LLM configuration with the AppAgent. Before starting, ensure you have configured your LLM provider (OpenAI, Azure OpenAI, Gemini, Claude, etc.) and added your API keys to `config/ufo/agents.yaml`. See [Model Setup Guide](../configuration/models/overview.md) for detailed instructions.
+
+---
+
+## 📦 Step 1: Install Dependencies
+
+Install all dependencies from the requirements file:
+
+```bash
+pip install -r requirements.txt
+```
+
+**Verify installation:**
+
+```bash
+python3 -c "import ufo; print('✅ UFO² installed successfully')"
+```
+
+> **Tip:** For production deployments, use a virtual environment to isolate dependencies:
+>
+> ```bash
+> python3 -m venv venv
+> source venv/bin/activate # Linux/macOS
+> pip install -r requirements.txt
+> ```
+
+---
+
+## 🖥️ Step 2: Start Device Agent Server
+
+**Server Component:** The Device Agent Server is the central hub that manages connections from client devices and dispatches tasks. It can run on any machine (Linux, Windows, or remote server).
+
+### Server Machine Setup
+
+You can run the server on:
+
+- ✅ Same machine as the client (localhost setup for testing)
+- ✅ Different machine on the same network
+- ✅ Remote server (requires proper network routing/SSH tunneling)
+
+### Basic Server Startup
+
+On the server machine, run:
+
+```bash
+python -m ufo.server.app --port 5001
+```
+
+**Expected Output:**
+
+```console
+2024-11-06 10:30:22 - ufo.server.app - INFO - Starting UFO Server on 0.0.0.0:5001
+INFO: Started server process [12345]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+INFO: Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)
+```
+
+Once you see "Uvicorn running", the server is ready at `ws://0.0.0.0:5001/ws`.
+
+### Server Configuration Options
+
+| Argument | Default | Description | Example |
+|----------|---------|-------------|---------|
+| `--port` | `5000` | Server listening port | `--port 5001` |
+| `--host` | `0.0.0.0` | Bind address (0.0.0.0 = all interfaces) | `--host 127.0.0.1` |
+| `--log-level` | `INFO` | Logging verbosity | `--log-level DEBUG` |
+
+**Custom Server Configuration:**
+
+**Custom Port:**
+```bash
+python -m ufo.server.app --port 8080
+```
+
+**Specific IP Binding:**
+```bash
+python -m ufo.server.app --host 192.168.1.100 --port 5001
+```
+
+**Debug Mode:**
+```bash
+python -m ufo.server.app --port 5001 --log-level DEBUG
+```
+
+### Verify Server is Running
+
+```bash
+# Test server health endpoint
+curl http://localhost:5001/api/health
+```
+
+**Expected Response:**
+
+```json
+{
+ "status": "healthy",
+ "online_clients": []
+}
+```
+
+> **Documentation Reference:** For detailed server configuration and advanced features, see [Server Quick Start Guide](../server/quick_start.md).
+
+---
+
+## 🐧 Step 3: Start Device Agent Client (Linux Machine)
+
+**Client Component:** The Device Agent Client runs on the Linux machine where you want to execute tasks. It connects to the server via WebSocket and receives task commands.
+
+### Basic Client Startup
+
+On the Linux machine where you want to execute tasks:
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://172.23.48.1:5001/ws \
+ --client-id linux_agent_1 \
+ --platform linux
+```
+
+### Client Parameters Explained
+
+| Parameter | Required | Description | Example |
+|-----------|----------|-------------|---------|
+| `--ws` | ✅ Yes | Enable WebSocket mode | `--ws` |
+| `--ws-server` | ✅ Yes | Server WebSocket URL | `ws://172.23.48.1:5001/ws` |
+| `--client-id` | ✅ Yes | **Unique** device identifier | `linux_agent_1` |
+| `--platform` | ✅ Yes (Linux) | Platform type (must be `linux` for Linux Agent) | `--platform linux` |
+
+> **⚠️ Critical Requirements:**
+>
+> 1. `--client-id` must be globally unique - No two devices can share the same ID
+> 2. `--platform linux` is mandatory - Without this flag, the Linux Agent won't work correctly
+> 3. Server address must be correct - Replace `172.23.48.1:5001` with your actual server IP and port
+
+### Understanding the WebSocket URL
+
+The `--ws-server` parameter format is:
+
+```
+ws://:/ws
+```
+
+Examples:
+
+| Scenario | WebSocket URL | Description |
+|----------|---------------|-------------|
+| **Localhost** | `ws://localhost:5001/ws` | Server and client on same machine |
+| **Same Network** | `ws://192.168.1.100:5001/ws` | Server on local network |
+| **Remote Server** | `ws://203.0.113.50:5001/ws` | Server on internet (public IP) |
+| **SSH Tunnel** | `ws://localhost:5001/ws` | After SSH reverse tunnel setup |
+
+### Connection Success Indicators
+
+**Client Logs:**
+
+```log
+INFO - Platform detected/specified: linux
+INFO - UFO Client initialized for platform: linux
+INFO - [WS] Connecting to ws://172.23.48.1:5001/ws (attempt 1/5)
+INFO - [WS] [AIP] Successfully registered as linux_agent_1
+INFO - [WS] Heartbeat loop started (interval: 30s)
+```
+
+**Server Logs:**
+
+```log
+INFO - [WS] ✅ Registered device client: linux_agent_1
+INFO - [WS] Device linux_agent_1 platform: linux
+```
+
+Client is connected and ready to receive tasks when you see "Successfully registered"!
+
+### Verify Connection
+
+```bash
+# Check connected clients on server
+curl http://172.23.48.1:5001/api/clients
+```
+
+**Expected Response:**
+
+```json
+{
+ "clients": [
+ {
+ "client_id": "linux_agent_1",
+ "type": "device",
+ "platform": "linux",
+ "connected_at": 1730899822.0,
+ "uptime_seconds": 45
+ }
+ ]
+}
+```
+
+> **Documentation Reference:** For detailed client configuration, see [Client Quick Start Guide](../client/quick_start.md).
+
+---
+
+## 🔌 Step 4: Start MCP Service (Linux Machine)
+
+**MCP Service Component:** The MCP (Model Context Protocol) Service provides the execution layer for CLI commands. It must be running on the same Linux machine as the client to handle command execution requests.
+
+### Start the MCP Server
+
+On the Linux machine (same machine as the client):
+
+```bash
+python -m ufo.client.mcp.http_servers.linux_mcp_server
+```
+
+**Expected Output:**
+
+```console
+INFO: Started server process [23456]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+INFO: Uvicorn running on http://127.0.0.1:8010 (Press CTRL+C to quit)
+```
+
+The MCP service is now ready to execute CLI commands at `http://127.0.0.1:8010`.
+
+### What is the MCP Service?
+
+The **Linux MCP Server** provides two main functionalities:
+
+| Command | Purpose | Example Use Case |
+|---------|---------|------------------|
+| `EXEC_CLI` | Execute shell commands | `ls -la`, `grep pattern file.txt`, `ps aux` |
+| `SYS_INFO` | Retrieve system information | CPU usage, memory stats, disk space |
+
+**Architecture:**
+
+```mermaid
+sequenceDiagram
+ participant Agent as Linux Agent
+ participant MCP as Linux MCP Server
+ participant Shell as Bash Shell
+
+ Agent->>MCP: EXEC_CLI {command: "ls -la"}
+ MCP->>Shell: Execute command
+ Shell-->>MCP: stdout, stderr, exit_code
+ MCP-->>Agent: {result, output}
+```
+
+### MCP Service Configuration
+
+The MCP server typically runs on `localhost:8010` by default. The client automatically connects to it when configured properly.
+
+> **⚠️ MCP Service Must Be Running:** If the MCP service is not running, the Linux Agent cannot execute commands and will fail with:
+> ```
+> ERROR: Cannot connect to MCP server at http://127.0.0.1:8010
+> ```
+
+**Documentation Reference:** For detailed MCP command specifications, see [MCP Overview](../mcp/overview.md), [Linux MCP Commands](../linux/commands.md), and [BashExecutor Server](../mcp/servers/bash_executor.md).
+
+---
+
+## 🎯 Step 5: Dispatch Tasks via HTTP API
+
+Once the server, client, and MCP service are all running, you can dispatch tasks to the Linux agent through the server's HTTP API.
+
+### API Endpoint
+
+```
+POST http://:/api/dispatch
+```
+
+### Request Format
+
+```json
+{
+ "client_id": "linux_agent_1",
+ "request": "Your natural language task description",
+ "task_name": "optional_task_identifier"
+}
+```
+
+### Example: Simple File Listing
+
+**Using cURL:**
+```bash
+curl -X POST http://172.23.48.1:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "linux_agent_1",
+ "request": "List all files in the /tmp directory",
+ "task_name": "list_tmp_files"
+ }'
+```
+
+**Using Python:**
+```python
+import requests
+
+response = requests.post(
+ "http://172.23.48.1:5001/api/dispatch",
+ json={
+ "client_id": "linux_agent_1",
+ "request": "List all files in the /tmp directory",
+ "task_name": "list_tmp_files"
+ }
+)
+print(response.json())
+```
+
+**Using HTTPie:**
+```bash
+http POST http://172.23.48.1:5001/api/dispatch \
+ client_id=linux_agent_1 \
+ request="List all files in the /tmp directory" \
+ task_name=list_tmp_files
+```
+
+**Successful Response:**
+
+```json
+{
+ "status": "dispatched",
+ "task_name": "list_tmp_files",
+ "client_id": "linux_agent_1",
+ "session_id": "550e8400-e29b-41d4-a716-446655440000"
+}
+```
+
+### Example: System Information Query
+
+```bash
+curl -X POST http://172.23.48.1:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "linux_agent_1",
+ "request": "Show disk usage for all mounted filesystems",
+ "task_name": "check_disk_usage"
+ }'
+```
+
+### Example: Log File Analysis
+
+```bash
+curl -X POST http://172.23.48.1:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "linux_agent_1",
+ "request": "Find all ERROR or FATAL entries in /var/log/app.log from the last hour",
+ "task_name": "analyze_error_logs"
+ }'
+```
+
+### Task Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant API as HTTP Client
+ participant Server as Agent Server
+ participant Client as Linux Client
+ participant MCP as MCP Service
+ participant Shell as Bash
+
+ Note over API,Server: 1. Task Submission
+ API->>Server: POST /api/dispatch {client_id, request}
+ Server->>Server: Generate session_id
+ Server-->>API: {status: dispatched, session_id}
+
+ Note over Server,Client: 2. Task Assignment
+ Server->>Client: TASK_ASSIGNMENT (via WebSocket)
+ Client->>Client: Parse request Plan actions
+
+ Note over Client,MCP: 3. Command Execution
+ Client->>MCP: EXEC_CLI {command: "ls -la /tmp"}
+ MCP->>Shell: Execute command
+ Shell-->>MCP: stdout, stderr, exit_code
+ MCP-->>Client: {result, output}
+
+ Note over Client,Server: 4. Result Reporting
+ Client->>Server: TASK_RESULT {status, result}
+```
+
+### Request Parameters
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `client_id` | ✅ Yes | string | Target Linux agent ID (must match `--client-id`) | `"linux_agent_1"` |
+| `request` | ✅ Yes | string | Natural language task description | `"List files in /var/log"` |
+| `task_name` | ❌ Optional | string | Unique task identifier (auto-generated if omitted) | `"task_001"` |
+
+> **⚠️ Client Must Be Online:** If the `client_id` is not connected, you'll receive:
+> ```json
+> {
+> "detail": "Client not online"
+> }
+> ```
+>
+> Verify the client is connected:
+> ```bash
+> curl http://172.23.48.1:5001/api/clients
+> ```
+
+---
+
+## 🌉 Network Connectivity & SSH Tunneling
+
+When the server and client are on different networks or behind firewalls, you may need SSH tunneling to establish connectivity.
+
+### Scenario 1: Same Network (No Tunnel Needed)
+
+**Setup:**
+- Server: `192.168.1.100:5001`
+- Client: `192.168.1.50` (same LAN)
+
+**Client Command:**
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5001/ws \
+ --client-id linux_agent_1 \
+ --platform linux
+```
+
+**No additional configuration needed** ✅
+
+### Scenario 2: Client Behind Firewall (Reverse SSH Tunnel)
+
+**Problem:**
+- Server: `203.0.113.50:5001` (public IP, accessible)
+- Client: `192.168.1.50` (private network, behind NAT/firewall)
+- **Client cannot directly reach server**
+
+**Solution: SSH Reverse Tunnel**
+
+On the **client machine**, create an SSH reverse tunnel:
+
+```bash
+ssh -N -R 5001:localhost:5001 user@203.0.113.50
+```
+
+**Parameters:**
+- `-N`: No remote command execution (tunnel only)
+- `-R 5001:localhost:5001`: Forward remote port 5001 to local port 5001
+- `user@203.0.113.50`: SSH server address (where the UFO server runs)
+
+**What This Does:**
+
+```mermaid
+graph LR
+ Client[Client Machine 192.168.1.50]
+ SSH[SSH Tunnel]
+ Server[Server Machine 203.0.113.50]
+
+ Client -->|SSH Reverse Tunnel| SSH
+ SSH -->|Port 5001| Server
+
+ style Client fill:#e1f5ff
+ style Server fill:#ffe1e1
+ style SSH fill:#fffacd
+```
+
+**After tunnel is established:**
+
+```bash
+# Client can now connect to localhost:5001
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_agent_1 \
+ --platform linux
+```
+
+### Scenario 3: Server Behind Firewall (Forward SSH Tunnel)
+
+**Problem:**
+- Server: `192.168.1.100:5001` (private network)
+- Client: `203.0.113.75` (public network)
+- **Client cannot directly reach server**
+
+**Solution: SSH Forward Tunnel**
+
+On the **client machine**, create an SSH forward tunnel to the server's network:
+
+```bash
+ssh -N -L 5001:192.168.1.100:5001 gateway-user@vpn.company.com
+```
+
+**Parameters:**
+- `-N`: No remote command execution
+- `-L 5001:192.168.1.100:5001`: Forward local port 5001 to remote 192.168.1.100:5001
+- `gateway-user@vpn.company.com`: SSH gateway that can access the server
+
+**After tunnel is established:**
+
+```bash
+# Client connects to localhost, which forwards to server
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_agent_1 \
+ --platform linux
+```
+
+### Example: Complex Tunnel Setup
+
+**Situation:**
+- Server IP: `10.0.0.50:5001` (corporate network)
+- Client IP: `192.168.1.75` (home network)
+- SSH Gateway: `vpn.company.com` (accessible from internet)
+
+**Step 1: Create SSH Tunnel**
+```bash
+# On client machine
+ssh -N -L 5001:10.0.0.50:5001 myuser@vpn.company.com
+```
+
+**Step 2: Start Client (in another terminal)**
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_agent_home_1 \
+ --platform linux
+```
+
+### SSH Tunnel Best Practices
+
+For production use, add these flags to your SSH tunnel:
+
+```bash
+ssh -N \
+ -L 5001:server:5001 \
+ -o ServerAliveInterval=60 \
+ -o ServerAliveCountMax=3 \
+ -o ExitOnForwardFailure=yes \
+ user@gateway
+```
+
+**Flags explained:**
+- `ServerAliveInterval=60`: Send keep-alive every 60 seconds
+- `ServerAliveCountMax=3`: Disconnect after 3 failed keep-alives
+- `ExitOnForwardFailure=yes`: Exit if port forwarding fails
+
+### Persistent SSH Tunnel with Autossh
+
+For production, use `autossh` to automatically restart the tunnel if it fails:
+
+```bash
+# Install autossh
+sudo apt-get install autossh # Debian/Ubuntu
+
+# Start persistent tunnel
+autossh -M 0 \
+ -N \
+ -L 5001:server:5001 \
+ -o ServerAliveInterval=60 \
+ -o ServerAliveCountMax=3 \
+ user@gateway
+```
+
+> **ℹ️ Network Configuration:** For more network configuration details, see [Server Quick Start - Troubleshooting](../server/quick_start.md#common-issues-troubleshooting).
+
+---
+
+## 🌌 Step 6: Configure as UFO³ Galaxy Device
+
+To use the Linux Agent as a managed device within the **UFO³ Galaxy** multi-tier framework, you need to register it in the `devices.yaml` configuration file.
+
+### Device Configuration File
+
+The Galaxy configuration is located at:
+
+```
+config/galaxy/devices.yaml
+```
+
+### Add Linux Agent Configuration
+
+Edit `config/galaxy/devices.yaml` and add your Linux agent under the `devices` section:
+
+```yaml
+devices:
+ - device_id: "linux_agent_1"
+ server_url: "ws://172.23.48.1:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ - "log_analysis"
+ - "file_operations"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/var/log/myapp/app.log"
+ dev_path: "/home/user/development/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Configuration Fields Explained
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `device_id` | ✅ Yes | string | **Must match client `--client-id`** | `"linux_agent_1"` |
+| `server_url` | ✅ Yes | string | **Must match server WebSocket URL** | `"ws://172.23.48.1:5001/ws"` |
+| `os` | ✅ Yes | string | Operating system | `"linux"` |
+| `capabilities` | ❌ Optional | list | Device capabilities (for task routing) | `["server", "log_analysis"]` |
+| `metadata` | ❌ Optional | dict | Custom metadata for task context | See below |
+| `auto_connect` | ❌ Optional | boolean | Auto-connect on Galaxy startup | `true` |
+| `max_retries` | ❌ Optional | integer | Connection retry attempts | `5` |
+
+### Metadata Fields (Custom)
+
+The `metadata` section can contain any custom fields relevant to your Linux agent:
+
+| Field | Purpose | Example |
+|-------|---------|---------|
+| `logs_file_path` | Path to application logs | `"/var/log/app.log"` |
+| `dev_path` | Development directory | `"/home/user/dev/"` |
+| `warning_log_pattern` | Regex pattern for warnings | `"WARN"` |
+| `error_log_pattern` | Regex pattern for errors | `"ERROR\|FATAL"` |
+| `performance` | Performance tier | `"high"`, `"medium"`, `"low"` |
+| `description` | Human-readable description | `"Production database server"` |
+
+### Multiple Linux Agents Example
+
+```yaml
+devices:
+ - device_id: "linux_agent_1"
+ server_url: "ws://172.23.48.1:5001/ws"
+ os: "linux"
+ capabilities:
+ - "web_server"
+ metadata:
+ logs_file_path: "/var/log/nginx/access.log"
+ dev_path: "/var/www/html/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ auto_connect: true
+ max_retries: 5
+
+ - device_id: "linux_agent_2"
+ server_url: "ws://172.23.48.2:5002/ws"
+ os: "linux"
+ capabilities:
+ - "database_server"
+ metadata:
+ logs_file_path: "/var/log/postgresql/postgresql.log"
+ dev_path: "/var/lib/postgresql/"
+ warning_log_pattern: "WARNING"
+ error_log_pattern: "ERROR|FATAL|PANIC"
+ auto_connect: true
+ max_retries: 5
+
+ - device_id: "linux_agent_3"
+ server_url: "ws://172.23.48.3:5003/ws"
+ os: "linux"
+ capabilities:
+ - "monitoring"
+ metadata:
+ logs_file_path: "/var/log/prometheus/prometheus.log"
+ dev_path: "/opt/prometheus/"
+ warning_log_pattern: "level=warn"
+ error_log_pattern: "level=error"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Critical Requirements
+
+> **⚠️ Configuration Validation - These fields MUST match exactly:**
+>
+> 1. **`device_id` in YAML** ↔ **`--client-id` in client command**
+> ```yaml
+> device_id: "linux_agent_1" # In devices.yaml
+> ```
+> ```bash
+> --client-id linux_agent_1 # In client command
+> ```
+>
+> 2. **`server_url` in YAML** ↔ **`--ws-server` in client command**
+> ```yaml
+> server_url: "ws://172.23.48.1:5001/ws" # In devices.yaml
+> ```
+> ```bash
+> --ws-server ws://172.23.48.1:5001/ws # In client command
+> ```
+>
+> **If these don't match, Galaxy cannot control the device!**
+
+### Using Galaxy to Control Linux Agents
+
+Once configured, you can launch Galaxy and it will automatically manage the Linux agents:
+
+```bash
+python -m galaxy --interactive
+```
+
+**Galaxy will:**
+1. ✅ Automatically load device configuration from `config/galaxy/devices.yaml`
+2. ✅ Connect to all configured devices
+3. ✅ Orchestrate multi-device tasks
+4. ✅ Route tasks based on capabilities
+5. ✅ Monitor device health
+
+> **ℹ️ Galaxy Documentation:** For detailed Galaxy configuration and usage, see:
+>
+> - [Galaxy Overview](../galaxy/overview.md)
+> - [Galaxy Quick Start](quick_start_galaxy.md)
+> - [Constellation Orchestrator](../galaxy/constellation_orchestrator/overview.md)
+
+---
+
+## 🐛 Common Issues & Troubleshooting
+
+### Issue 1: Client Cannot Connect to Server
+
+**Error: Connection Refused**
+
+Symptoms:
+```log
+ERROR - [WS] Failed to connect to ws://172.23.48.1:5001/ws
+Connection refused
+```
+
+**Diagnosis Checklist:**
+
+- [ ] Is the server running? (`curl http://172.23.48.1:5001/api/health`)
+- [ ] Is the port correct? (Check server startup logs)
+- [ ] Can client reach server IP? (`ping 172.23.48.1`)
+- [ ] Is firewall blocking port 5001?
+- [ ] Is SSH tunnel established (if needed)?
+
+**Solutions:**
+
+Verify Server:
+```bash
+# On server machine
+curl http://localhost:5001/api/health
+
+# From client machine
+curl http://172.23.48.1:5001/api/health
+```
+
+Check Network:
+```bash
+# Test connectivity
+ping 172.23.48.1
+
+# Test port accessibility
+nc -zv 172.23.48.1 5001
+telnet 172.23.48.1 5001
+```
+
+Check Firewall:
+```bash
+# On server machine (Ubuntu/Debian)
+sudo ufw status
+sudo ufw allow 5001/tcp
+
+# On server machine (RHEL/CentOS)
+sudo firewall-cmd --list-ports
+sudo firewall-cmd --add-port=5001/tcp --permanent
+sudo firewall-cmd --reload
+```
+
+### Issue 2: MCP Service Not Responding
+
+**Error: Cannot Execute Commands**
+
+Symptoms:
+```log
+ERROR - Cannot connect to MCP server at http://127.0.0.1:8010
+ERROR - Command execution failed
+```
+
+**Diagnosis:**
+
+- [ ] Is the MCP service running?
+- [ ] Is it running on the correct port?
+- [ ] Are there any startup errors in MCP logs?
+
+**Solutions:**
+
+Verify MCP Service:
+```bash
+# Check if MCP service is running
+curl http://localhost:8010/health
+
+# Or check process
+ps aux | grep linux_mcp_server
+```
+
+Restart MCP Service:
+```bash
+# Kill existing process (if hung)
+pkill -f linux_mcp_server
+
+# Start fresh
+python -m ufo.client.mcp.http_servers.linux_mcp_server
+```
+
+Check Port Conflict:
+```bash
+# See if something else is using port 8010
+lsof -i :8010
+netstat -tuln | grep 8010
+
+# If port is taken, start MCP on different port
+python -m ufo.client.mcp.http_servers.linux_mcp_server --port 8011
+```
+
+### Issue 3: Missing `--platform linux` Flag
+
+**Error: Incorrect Agent Type**
+
+Symptoms:
+- Client connects but cannot execute Linux commands
+- Server logs show wrong platform type
+- Tasks fail with "unsupported operation" errors
+
+**Cause:** Forgot to add `--platform linux` flag when starting the client.
+
+**Solution:**
+```bash
+# Wrong (missing platform)
+python -m ufo.client.client --ws --client-id linux_agent_1
+
+# Correct
+python -m ufo.client.client \
+ --ws \
+ --client-id linux_agent_1 \
+ --platform linux
+```
+
+### Issue 4: Duplicate Client ID
+
+**Error: Registration Failed**
+
+Symptoms:
+```log
+ERROR - [WS] Registration failed: client_id already exists
+ERROR - Another device is using ID 'linux_agent_1'
+```
+
+**Cause:** Multiple clients trying to use the same `client_id`.
+
+**Solutions:**
+
+1. **Use unique client IDs:**
+ ```bash
+ # Device 1
+ --client-id linux_agent_1
+
+ # Device 2
+ --client-id linux_agent_2
+
+ # Device 3
+ --client-id linux_agent_3
+ ```
+
+2. **Check currently connected clients:**
+ ```bash
+ curl http://172.23.48.1:5001/api/clients
+ ```
+
+### Issue 5: Galaxy Cannot Find Device
+
+**Error: Device Not Configured**
+
+Symptoms:
+```log
+ERROR - Device 'linux_agent_1' not found in configuration
+WARNING - Cannot dispatch task to unknown device
+```
+
+**Cause:** Mismatch between `devices.yaml` configuration and actual client setup.
+
+**Diagnosis:**
+
+Check that these match **exactly**:
+
+| Location | Field | Example |
+|----------|-------|---------|
+| `devices.yaml` | `device_id` | `"linux_agent_1"` |
+| Client command | `--client-id` | `linux_agent_1` |
+| `devices.yaml` | `server_url` | `"ws://172.23.48.1:5001/ws"` |
+| Client command | `--ws-server` | `ws://172.23.48.1:5001/ws` |
+
+**Solution:** Update `devices.yaml` to match your client configuration, or vice versa.
+
+### Issue 6: SSH Tunnel Keeps Disconnecting
+
+**Error: Tunnel Connection Lost**
+
+Symptoms:
+- Client disconnects after a few minutes
+- SSH tunnel closes unexpectedly
+- "Connection reset by peer" errors
+
+**Solutions:**
+
+Use ServerAliveInterval:
+```bash
+ssh -N \
+ -L 5001:server:5001 \
+ -o ServerAliveInterval=60 \
+ -o ServerAliveCountMax=3 \
+ user@gateway
+```
+
+Use Autossh:
+```bash
+autossh -M 0 \
+ -N \
+ -L 5001:server:5001 \
+ -o ServerAliveInterval=60 \
+ user@gateway
+```
+
+Run in Screen/Tmux:
+```bash
+# Start screen session
+screen -S ssh-tunnel
+
+# Run SSH tunnel
+ssh -N -L 5001:server:5001 user@gateway
+
+# Detach: Ctrl+A, then D
+# Reattach: screen -r ssh-tunnel
+```
+
+---
+
+## 📚 Next Steps
+
+You've successfully set up a Linux Agent! Explore these topics to deepen your understanding:
+
+### Immediate Next Steps
+
+| Priority | Topic | Time | Link |
+|----------|-------|------|------|
+| 🥇 | **Linux Agent Architecture** | 10 min | [Overview](../linux/overview.md) |
+| 🥈 | **State Machine & Processing** | 15 min | [State Machine](../linux/state.md) |
+| 🥉 | **MCP Commands Reference** | 10 min | [Commands](../linux/commands.md) |
+
+### Advanced Topics
+
+| Topic | Description | Link |
+|-------|-------------|------|
+| **Processing Strategy** | 3-phase pipeline (LLM, Action, Memory) | [Strategy](../linux/strategy.md) |
+| **Galaxy Integration** | Multi-device orchestration | [Galaxy Overview](../galaxy/overview.md) |
+| **MCP Protocol** | Deep dive into command execution | [MCP Overview](../mcp/overview.md) |
+| **Server Architecture** | Understanding the server internals | [Server Overview](../server/overview.md) |
+
+### Production Deployment
+
+| Best Practice | Description | Link |
+|---------------|-------------|------|
+| **Systemd Service** | Run client as Linux service | [Client Guide](../client/quick_start.md#running-as-background-service) |
+| **Log Management** | Structured logging and rotation | [Server Monitoring](../server/monitoring.md) |
+| **Security Hardening** | SSL/TLS, authentication, firewalls | [Server Guide](../server/quick_start.md#production-deployment) |
+
+---
+
+## ✅ Summary
+
+## ✅ What You've Accomplished
+
+Congratulations! You've successfully:
+
+✅ Switched to the `linux-client` branch
+✅ Installed all dependencies
+✅ Started the Device Agent Server
+✅ Connected a Linux Device Agent Client
+✅ Launched the MCP service for command execution
+✅ Dispatched tasks via HTTP API
+✅ (Optional) Configured SSH tunneling for remote access
+✅ (Optional) Registered the device in Galaxy configuration
+
+**Your Linux Agent is Ready**
+
+You can now:
+
+- 🎯 Execute CLI commands on Linux machines remotely
+- 📊 Analyze log files across multiple servers
+- 🔧 Manage development environments
+- 🌌 Integrate with UFO³ Galaxy for multi-device workflows
+
+**Start exploring and automating your Linux infrastructure!** 🚀
diff --git a/documents/docs/getting_started/quick_start_mobile.md b/documents/docs/getting_started/quick_start_mobile.md
new file mode 100644
index 000000000..7b0f643bb
--- /dev/null
+++ b/documents/docs/getting_started/quick_start_mobile.md
@@ -0,0 +1,1478 @@
+# ⚡ Quick Start: Mobile Agent
+
+Get your Android device running as a UFO³ device agent in 10 minutes. This guide walks you through ADB setup, server/client configuration, and MCP service initialization for Android automation.
+
+> **📚 Documentation Navigation:**
+>
+> - **Architecture & Concepts:** [Mobile Agent Overview](../mobile/overview.md)
+> - **State Management:** [State Machine](../mobile/state.md)
+> - **Processing Pipeline:** [Processing Strategy](../mobile/strategy.md)
+> - **Available Commands:** [MCP Commands Reference](../mobile/commands.md)
+> - **Galaxy Integration:** [As Galaxy Device](../mobile/as_galaxy_device.md)
+
+---
+
+## 📋 Prerequisites
+
+Before you begin, ensure you have:
+
+- **Python 3.10+** installed on your computer
+- **UFO repository** cloned from [GitHub](https://github.com/microsoft/UFO)
+- **Android device** (physical device or emulator) with Android 5.0+ (API 21+)
+- **ADB (Android Debug Bridge)** installed and accessible
+- **USB debugging enabled** on your Android device (for physical devices)
+- **Network connectivity** between server and client machines
+- **LLM configured** in `config/ufo/agents.yaml` (see [Model Configuration](../configuration/models/overview.md))
+
+| Component | Minimum Version | Verification Command |
+|-----------|----------------|---------------------|
+| Python | 3.10 | `python --version` |
+| Android OS | 5.0 (API 21) | Check device settings |
+| ADB | Latest | `adb --version` |
+| LLM API Key | N/A | Check `config/ufo/agents.yaml` |
+
+> **⚠️ LLM Configuration Required:** The Mobile Agent shares the same LLM configuration with the AppAgent. Before starting, ensure you have configured your LLM provider (OpenAI, Azure OpenAI, Gemini, Claude, etc.) and added your API keys to `config/ufo/agents.yaml`. See [Model Setup Guide](../configuration/models/overview.md) for detailed instructions.
+
+---
+
+## 📱 Step 0: Android Device Setup
+
+You can use either a **physical Android device** or an **Android emulator**. Choose the setup method that fits your needs.
+
+### Option A: Physical Android Device Setup
+
+#### 1. Enable Developer Options
+
+On your Android device:
+
+1. Open **Settings** → **About phone**
+2. Tap **Build number** 7 times
+3. You'll see "You are now a developer!"
+
+#### 2. Enable USB Debugging
+
+1. Go to **Settings** → **System** → **Developer options**
+2. Turn on **USB debugging**
+3. (Optional) Turn on **Stay awake** (device won't sleep while charging)
+
+#### 3. Connect Device to Computer
+
+**Via USB Cable:**
+
+```bash
+# Connect device via USB
+# On device, allow USB debugging when prompted
+
+# Verify connection
+adb devices
+```
+
+**Expected Output:**
+```
+List of devices attached
+XXXXXXXXXXXXXX device
+```
+
+**Via Wireless ADB (Android 11+):**
+
+```bash
+# On device: Settings → Developer options → Wireless debugging
+# Get IP address and port (e.g., 192.168.1.100:5555)
+
+# On computer: Connect to device
+adb connect 192.168.1.100:5555
+
+# Verify connection
+adb devices
+```
+
+**Expected Output:**
+```
+List of devices attached
+192.168.1.100:5555 device
+```
+
+### Option B: Android Emulator Setup
+
+#### Option B1: Using Android Studio Emulator (Recommended)
+
+**Step 1: Install Android Studio**
+
+Download from: https://developer.android.com/studio
+
+**Windows:**
+```powershell
+# Download Android Studio installer
+# Run: android-studio-xxx.exe
+# Follow installation wizard
+```
+
+**macOS:**
+```bash
+# Download Android Studio DMG
+# Drag to Applications folder
+# Open Android Studio
+```
+
+**Linux:**
+```bash
+# Download Android Studio tarball
+tar -xzf android-studio-*.tar.gz
+cd android-studio/bin
+./studio.sh
+```
+
+**Step 2: Install Android SDK Components**
+
+1. Open Android Studio
+2. Go to **Tools** → **SDK Manager**
+3. Install:
+ - ✅ Android SDK Platform (API 33 or higher)
+ - ✅ Android SDK Platform-Tools
+ - ✅ Android SDK Build-Tools
+ - ✅ Android Emulator
+
+**Step 3: Create Virtual Device**
+
+1. In Android Studio, click **Device Manager** (phone icon)
+2. Click **Create Device**
+3. Select hardware:
+ - **Phone** category
+ - Choose **Pixel 6** or **Pixel 7** (recommended)
+ - Click **Next**
+
+4. Select system image:
+ - Choose **Release Name**: **Tiramisu** (Android 13, API 33) or newer
+ - Click **Download** if not installed
+ - Click **Next**
+
+5. Configure AVD:
+ - **AVD Name**: `Pixel_6_API_33` (or your choice)
+ - **Startup orientation**: Portrait
+ - **Graphics**: Automatic or Hardware
+ - Click **Finish**
+
+**Step 4: Start Emulator**
+
+**From Android Studio:**
+1. Open **Device Manager**
+2. Click ▶️ (Play button) next to your AVD
+
+**From Command Line:**
+```bash
+# List available emulators
+emulator -list-avds
+
+# Start emulator
+emulator -avd Pixel_6_API_33 &
+```
+
+**Step 5: Verify ADB Connection**
+
+```bash
+# Wait for emulator to fully boot (~1-2 minutes)
+adb devices
+```
+
+**Expected Output:**
+```
+List of devices attached
+emulator-5554 device
+```
+
+#### Option B2: Using Genymotion (Alternative)
+
+**Step 1: Install Genymotion**
+
+Download from: https://www.genymotion.com/download/
+
+```bash
+# Free personal edition available
+# Requires VirtualBox (auto-installed)
+```
+
+**Step 2: Create Virtual Device**
+
+1. Open Genymotion
+2. Click **+** (Add new device)
+3. Sign in with Genymotion account (free)
+4. Select device:
+ - **Google Pixel 6** or similar
+ - **Android 13.0** or newer
+5. Click **Install**
+6. Click **Start**
+
+**Step 3: Verify ADB Connection**
+
+```bash
+adb devices
+```
+
+**Expected Output:**
+```
+List of devices attached
+192.168.56.101:5555 device
+```
+
+### Verify Device is Ready
+
+Run this test to ensure device is accessible:
+
+```bash
+# Get device model
+adb shell getprop ro.product.model
+
+# Get Android version
+adb shell getprop ro.build.version.release
+
+# Test screenshot capability
+adb shell screencap -p /sdcard/test.png
+adb pull /sdcard/test.png .
+```
+
+If all commands succeed, your device is ready! ✅
+
+---
+
+## 🔧 Step 1: Install ADB (Android Debug Bridge)
+
+ADB is essential for communicating with Android devices. Choose your platform:
+
+### Windows
+
+**Option 1: Install via Android Studio (Recommended)**
+
+ADB is included with Android Studio (see Step 0 Option B1).
+
+After installation, add to PATH:
+
+```powershell
+# Add Android SDK platform-tools to PATH
+# Default location:
+$env:PATH += ";C:\Users\\AppData\Local\Android\Sdk\platform-tools"
+
+# Test
+adb --version
+```
+
+**Option 2: Standalone ADB Installation**
+
+```powershell
+# Download platform-tools
+# https://developer.android.com/studio/releases/platform-tools
+
+# Extract to C:\adb
+# Add to PATH:
+$env:PATH += ";C:\adb"
+
+# Test
+adb --version
+```
+
+**Make PATH Permanent (Optional):**
+
+1. Open **System Properties** → **Environment Variables**
+2. Under **User variables**, edit **Path**
+3. Add: `C:\Users\\AppData\Local\Android\Sdk\platform-tools`
+4. Click **OK**
+
+### macOS
+
+**Option 1: Via Homebrew (Recommended)**
+
+```bash
+# Install Homebrew (if not installed)
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+# Install ADB
+brew install android-platform-tools
+
+# Verify
+adb --version
+```
+
+**Option 2: Via Android Studio**
+
+ADB is included with Android Studio. Add to PATH:
+
+```bash
+# Add to ~/.zshrc or ~/.bash_profile
+export PATH="$PATH:$HOME/Library/Android/sdk/platform-tools"
+
+# Reload
+source ~/.zshrc
+
+# Test
+adb --version
+```
+
+### Linux
+
+**Ubuntu/Debian:**
+
+```bash
+sudo apt update
+sudo apt install -y adb
+
+# Verify
+adb --version
+```
+
+**Fedora/RHEL:**
+
+```bash
+sudo dnf install android-tools
+
+# Verify
+adb --version
+```
+
+**Arch Linux:**
+
+```bash
+sudo pacman -S android-tools
+
+# Verify
+adb --version
+```
+
+### Verify ADB Installation
+
+```bash
+adb version
+```
+
+**Expected Output:**
+```
+Android Debug Bridge version 1.0.41
+Version 34.0.5-10900879
+```
+
+---
+
+## 📦 Step 2: Install Python Dependencies
+
+Install all UFO dependencies:
+
+```bash
+cd /path/to/UFO
+pip install -r requirements.txt
+```
+
+**Verify installation:**
+
+```bash
+python -c "import ufo; print('✅ UFO installed successfully')"
+```
+
+> **Tip:** For production deployments, use a virtual environment:
+>
+> ```bash
+> python -m venv venv
+>
+> # Windows
+> venv\Scripts\activate
+>
+> # macOS/Linux
+> source venv/bin/activate
+>
+> pip install -r requirements.txt
+> ```
+
+---
+
+## 🖥️ Step 3: Start Device Agent Server
+
+**Server Component:** The Device Agent Server manages connections from Android devices and dispatches tasks.
+
+### Basic Server Startup
+
+On your computer (where Python is installed):
+
+```bash
+python -m ufo.server.app --port 5001 --platform mobile
+```
+
+**Expected Output:**
+
+```console
+INFO - Starting UFO Server on 0.0.0.0:5001
+INFO - Platform: mobile
+INFO - Log level: WARNING
+INFO: Started server process [12345]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+INFO: Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)
+```
+
+Once you see "Uvicorn running", the server is ready at `ws://0.0.0.0:5001/ws`.
+
+### Server Configuration Options
+
+| Argument | Default | Description | Example |
+|----------|---------|-------------|---------|
+| `--port` | `5000` | Server listening port | `--port 5001` |
+| `--host` | `0.0.0.0` | Bind address | `--host 127.0.0.1` |
+| `--platform` | Auto | Platform override | `--platform mobile` |
+| `--log-level` | `WARNING` | Logging verbosity | `--log-level DEBUG` |
+
+**Custom Configuration Examples:**
+
+```bash
+# Different port
+python -m ufo.server.app --port 8080 --platform mobile
+
+# Localhost only
+python -m ufo.server.app --host 127.0.0.1 --port 5001 --platform mobile
+
+# Debug mode
+python -m ufo.server.app --port 5001 --platform mobile --log-level DEBUG
+```
+
+### Verify Server is Running
+
+```bash
+curl http://localhost:5001/api/health
+```
+
+**Expected Response (when no clients connected):**
+
+```json
+{
+ "status": "healthy",
+ "online_clients": []
+}
+```
+
+> **💡 Tip:** The `online_clients` list will be empty until you start and connect the Mobile Client in Step 5.
+
+---
+
+## 🔌 Step 4: Start MCP Services (Android Machine)
+
+**MCP Service Component:** Two MCP servers provide Android device interaction capabilities. They must be running before starting the client.
+
+> **💡 Learn More:** For detailed documentation on all available MCP commands and their usage, see the [MCP Commands Reference](../mobile/commands.md).
+
+### Understanding the Two MCP Servers
+
+MobileAgent uses **two separate MCP servers** for different responsibilities:
+
+| Server | Port | Purpose | Tools |
+|--------|------|---------|-------|
+| **Data Collection** | 8020 | Screenshot, UI tree, device info, apps list | 5 read-only tools |
+| **Action** | 8021 | Touch actions, typing, app launching | 8 control tools |
+
+### Start Both MCP Servers
+
+**Recommended: Start Both Servers Together**
+
+On the machine with ADB access to your Android device:
+
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --data-port 8020 \
+ --action-port 8021 \
+ --server both
+```
+
+**Expected Output:**
+
+```console
+====================================================================
+UFO Mobile MCP Servers (Android)
+Android device control via ADB and Model Context Protocol
+====================================================================
+
+Using ADB: adb
+Checking ADB connection...
+
+List of devices attached
+emulator-5554 device
+
+✅ Found 1 connected device(s)
+====================================================================
+
+🚀 Starting both servers on localhost (shared state)
+ - Data Collection Server: localhost:8020
+ - Action Server: localhost:8021
+
+Note: Both servers share the same MobileServerState for caching
+
+✅ Starting both servers in same process (shared MobileServerState)
+ - Data Collection Server: localhost:8020
+ - Action Server: localhost:8021
+
+======================================================================
+Both servers share MobileServerState cache. Press Ctrl+C to stop.
+======================================================================
+
+✅ Data Collection Server thread started
+✅ Action Server thread started
+
+======================================================================
+Both servers are running. Press Ctrl+C to stop.
+======================================================================
+```
+
+**Alternative: Start Servers Separately**
+
+If needed, you can start each server in separate terminals:
+
+**Terminal 1: Data Collection Server**
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --data-port 8020 \
+ --server data
+```
+
+**Terminal 2: Action Server**
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --action-port 8021 \
+ --server action
+```
+
+> **⚠️ Important:** When running servers separately, they won't share cached state, which may impact performance. Running both together is recommended.
+
+### MCP Server Configuration Options
+
+| Argument | Default | Description | Example |
+|----------|---------|-------------|---------|
+| `--host` | `localhost` | Server host | `--host 127.0.0.1` |
+| `--data-port` | `8020` | Data collection server port | `--data-port 8020` |
+| `--action-port` | `8021` | Action server port | `--action-port 8021` |
+| `--server` | `both` | Which server(s) to start | `--server both` |
+| `--adb-path` | `adb` | Path to ADB executable | `--adb-path /path/to/adb` |
+
+### Verify MCP Servers are Running
+
+**Check Data Collection Server:**
+```bash
+curl http://localhost:8020/health
+```
+
+**Check Action Server:**
+```bash
+curl http://localhost:8021/health
+```
+
+Both should return a health status response indicating the server is operational.
+
+### What if ADB is not in PATH?
+
+If ADB is not in your system PATH, specify the full path:
+
+**Windows:**
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --adb-path "C:\Users\YourUsername\AppData\Local\Android\Sdk\platform-tools\adb.exe" \
+ --server both
+```
+
+**macOS:**
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --adb-path "$HOME/Library/Android/sdk/platform-tools/adb" \
+ --server both
+```
+
+**Linux:**
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --adb-path /usr/bin/adb \
+ --server both
+```
+
+---
+
+## 📱 Step 5: Start Device Agent Client
+
+**Client Component:** The Device Agent Client connects your Android device to the server and executes mobile automation tasks.
+
+### Basic Client Startup
+
+On your computer (same machine as MCP servers):
+
+```bash
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id mobile_phone_1 \
+ --platform mobile
+```
+
+### Client Parameters Explained
+
+| Parameter | Required | Description | Example |
+|-----------|----------|-------------|---------|
+| `--ws` | ✅ Yes | Enable WebSocket mode | `--ws` |
+| `--ws-server` | ✅ Yes | Server WebSocket URL | `ws://localhost:5001/ws` |
+| `--client-id` | ✅ Yes | **Unique** device identifier | `mobile_phone_1` |
+| `--platform` | ✅ Yes | Platform type (must be `mobile`) | `--platform mobile` |
+
+> **⚠️ Critical Requirements:**
+>
+> 1. `--client-id` must be globally unique - No two devices can share the same ID
+> 2. `--platform mobile` is mandatory - Without this flag, the Mobile Agent won't work correctly
+> 3. Server address must be correct - Use actual server IP if not on localhost
+
+### Understanding the WebSocket URL
+
+The `--ws-server` parameter format is:
+
+```
+ws://:/ws
+```
+
+Examples:
+
+| Scenario | WebSocket URL | Description |
+|----------|---------------|-------------|
+| **Same Machine** | `ws://localhost:5001/ws` | Server and client on same computer |
+| **Same Network** | `ws://192.168.1.100:5001/ws` | Server on local network |
+| **Remote Server** | `ws://203.0.113.50:5001/ws` | Server on internet (public IP) |
+
+### Connection Success Indicators
+
+**Client Logs:**
+
+```log
+INFO - Platform detected/specified: mobile
+INFO - UFO Client initialized for platform: mobile
+INFO - [WS] Connecting to ws://localhost:5001/ws (attempt 1/5)
+INFO - [WS] [AIP] Successfully registered as mobile_phone_1
+INFO - [WS] Heartbeat loop started (interval: 30s)
+```
+
+**Server Logs:**
+
+```log
+INFO - [WS] ✅ Registered device client: mobile_phone_1
+INFO - [WS] Device mobile_phone_1 platform: mobile
+```
+
+Client is connected and ready to receive tasks when you see "Successfully registered"! ✅
+
+### Verify Connection
+
+```bash
+# Check connected clients on server
+curl http://localhost:5001/api/clients
+```
+
+**Expected Response:**
+
+```json
+{
+ "online_clients": ["mobile_phone_1"]
+}
+```
+
+> **Note:** The response shows only client IDs. For detailed information about each client, check the server logs.
+
+---
+
+## 🎯 Step 6: Dispatch Tasks via HTTP API
+
+Once the server, client, and MCP services are all running, you can dispatch tasks to your Android device through the server's HTTP API.
+
+### API Endpoint
+
+```
+POST http://:/api/dispatch
+```
+
+### Request Format
+
+```json
+{
+ "client_id": "mobile_phone_1",
+ "request": "Your natural language task description",
+ "task_name": "optional_task_identifier"
+}
+```
+
+### Example 1: Launch an App
+
+**Using cURL:**
+```bash
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "mobile_phone_1",
+ "request": "Open Google Chrome browser",
+ "task_name": "launch_chrome"
+ }'
+```
+
+**Using Python:**
+```python
+import requests
+
+response = requests.post(
+ "http://localhost:5001/api/dispatch",
+ json={
+ "client_id": "mobile_phone_1",
+ "request": "Open Google Chrome browser",
+ "task_name": "launch_chrome"
+ }
+)
+print(response.json())
+```
+
+**Successful Response:**
+
+```json
+{
+ "status": "dispatched",
+ "task_name": "launch_chrome",
+ "client_id": "mobile_phone_1",
+ "session_id": "550e8400-e29b-41d4-a716-446655440000"
+}
+```
+
+### Example 2: Search on Maps
+
+```bash
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "mobile_phone_1",
+ "request": "Open Google Maps and search for coffee shops nearby",
+ "task_name": "search_coffee"
+ }'
+```
+
+### Example 3: Type and Submit Text
+
+```bash
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "mobile_phone_1",
+ "request": "Open Chrome, search for weather forecast, and show me the results",
+ "task_name": "check_weather"
+ }'
+```
+
+### Example 4: Take Screenshot
+
+```bash
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "mobile_phone_1",
+ "request": "Take a screenshot of the current screen",
+ "task_name": "capture_screen"
+ }'
+```
+
+### Task Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant API as HTTP Client
+ participant Server as Agent Server
+ participant Client as Mobile Client
+ participant MCP as MCP Services
+ participant Device as Android Device
+
+ Note over API,Server: 1. Task Submission
+ API->>Server: POST /api/dispatch {client_id, request}
+ Server->>Server: Generate session_id
+ Server-->>API: {status: dispatched, session_id}
+
+ Note over Server,Client: 2. Task Assignment
+ Server->>Client: TASK_ASSIGNMENT (via WebSocket)
+ Client->>Client: Initialize Mobile Agent
+
+ Note over Client,MCP: 3. Data Collection
+ Client->>MCP: Capture screenshot
+ Client->>MCP: Get installed apps
+ Client->>MCP: Get UI controls
+ MCP->>Device: ADB commands
+ Device-->>MCP: Screenshot + Apps + Controls
+ MCP-->>Client: Visual context
+
+ Note over Client: 4. LLM Decision
+ Client->>Client: Construct prompt with screenshots
+ Client->>Client: Get action from LLM
+
+ Note over Client,MCP: 5. Action Execution
+ Client->>MCP: Execute mobile action (tap, swipe, launch_app, etc.)
+ MCP->>Device: ADB input commands
+ Device-->>MCP: Action result
+ MCP-->>Client: Success/Failure
+
+ Note over Client,Server: 6. Result Reporting
+ Client->>Server: TASK_RESULT {status, screenshots, actions}
+ Server-->>API: Task completed
+```
+
+### Request Parameters
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `client_id` | ✅ Yes | string | Target mobile device ID (must match `--client-id`) | `"mobile_phone_1"` |
+| `request` | ✅ Yes | string | Natural language task description | `"Open Chrome"` |
+| `task_name` | ❌ Optional | string | Unique task identifier (auto-generated if omitted) | `"task_001"` |
+
+> **⚠️ Client Must Be Online:** If the `client_id` is not connected, you'll receive:
+> ```json
+> {
+> "detail": "Client not online"
+> }
+> ```
+>
+> Verify the client is connected:
+> ```bash
+> curl http://localhost:5001/api/clients
+> ```
+
+---
+
+## 🌌 Step 7: Configure as UFO³ Galaxy Device
+
+To use the Mobile Agent as a managed device within the **UFO³ Galaxy** multi-tier framework, you need to register it in the `devices.yaml` configuration file.
+
+> **📖 Detailed Guide:** For comprehensive information on using Mobile Agent in Galaxy, including multi-device workflows and advanced configuration, see [Using Mobile Agent as Galaxy Device](../mobile/as_galaxy_device.md).
+
+### Device Configuration File
+
+The Galaxy configuration is located at:
+
+```
+config/galaxy/devices.yaml
+```
+
+### Add Mobile Agent Configuration
+
+Edit `config/galaxy/devices.yaml` and add your Mobile agent:
+
+```yaml
+devices:
+ - device_id: "mobile_phone_1"
+ server_url: "ws://localhost:5001/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "messaging"
+ - "maps"
+ - "camera"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ screen_size: "1080x2400"
+ installed_apps:
+ - "com.android.chrome"
+ - "com.google.android.apps.maps"
+ - "com.whatsapp"
+ description: "Android phone for mobile automation"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Configuration Fields Explained
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `device_id` | ✅ Yes | string | **Must match client `--client-id`** | `"mobile_phone_1"` |
+| `server_url` | ✅ Yes | string | **Must match server WebSocket URL** | `"ws://localhost:5001/ws"` |
+| `os` | ✅ Yes | string | Operating system | `"mobile"` |
+| `capabilities` | ❌ Optional | list | Device capabilities | `["mobile", "android"]` |
+| `metadata` | ❌ Optional | dict | Custom metadata | See below |
+| `auto_connect` | ❌ Optional | boolean | Auto-connect on Galaxy startup | `true` |
+| `max_retries` | ❌ Optional | integer | Connection retry attempts | `5` |
+
+### Metadata Fields (Custom)
+
+The `metadata` section provides context to the LLM:
+
+| Field | Purpose | Example |
+|-------|---------|---------|
+| `device_type` | Phone, tablet, emulator | `"phone"` |
+| `android_version` | OS version | `"13"` |
+| `screen_size` | Resolution | `"1080x2400"` |
+| `installed_apps` | Available apps | `["com.android.chrome", ...]` |
+| `description` | Human-readable description | `"Personal phone"` |
+
+### Multiple Mobile Devices Example
+
+```yaml
+devices:
+ # Personal Phone
+ - device_id: "mobile_phone_personal"
+ server_url: "ws://192.168.1.100:5001/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "messaging"
+ - "whatsapp"
+ - "maps"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ installed_apps:
+ - "com.whatsapp"
+ - "com.google.android.apps.maps"
+ description: "Personal Android phone"
+ auto_connect: true
+ max_retries: 5
+
+ # Work Phone
+ - device_id: "mobile_phone_work"
+ server_url: "ws://192.168.1.101:5002/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "email"
+ - "teams"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "12"
+ installed_apps:
+ - "com.microsoft.office.outlook"
+ - "com.microsoft.teams"
+ description: "Work Android phone"
+ auto_connect: true
+ max_retries: 5
+
+ # Tablet
+ - device_id: "mobile_tablet_home"
+ server_url: "ws://192.168.1.102:5003/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "tablet"
+ - "media"
+ metadata:
+ os: "mobile"
+ device_type: "tablet"
+ android_version: "13"
+ screen_size: "2560x1600"
+ installed_apps:
+ - "com.netflix.mediaclient"
+ description: "Home tablet for media"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Critical Requirements
+
+> **⚠️ Configuration Validation - These fields MUST match exactly:**
+>
+> 1. **`device_id` in YAML** ↔ **`--client-id` in client command**
+> 2. **`server_url` in YAML** ↔ **`--ws-server` in client command**
+>
+> **If these don't match, Galaxy cannot control the device!**
+
+### Using Galaxy to Control Mobile Agents
+
+Once configured, launch Galaxy:
+
+```bash
+python -m galaxy --interactive
+```
+
+**Galaxy will:**
+1. ✅ Load device configuration from `config/galaxy/devices.yaml`
+2. ✅ Connect to all configured Android devices
+3. ✅ Orchestrate multi-device tasks
+4. ✅ Route tasks based on capabilities
+
+> **ℹ️ Galaxy Documentation:** For detailed Galaxy usage, see:
+>
+> - [Galaxy Overview](../galaxy/overview.md)
+> - [Galaxy Quick Start](quick_start_galaxy.md)
+> - [Mobile Agent as Galaxy Device](../mobile/as_galaxy_device.md)
+
+---
+
+## 🔍 Understanding Mobile Agent Internals
+
+Now that you have Mobile Agent running, you may want to understand how it works under the hood:
+
+### State Machine
+
+Mobile Agent uses a **3-state finite state machine** to manage task execution:
+
+- **CONTINUE** - Active execution, processing user requests
+- **FINISH** - Task completed successfully
+- **FAIL** - Unrecoverable error occurred
+
+Learn more: [State Machine Documentation](../mobile/state.md)
+
+### Processing Pipeline
+
+During the CONTINUE state, Mobile Agent executes a **4-phase pipeline**:
+
+1. **Data Collection** - Capture screenshots, get apps, collect UI controls
+2. **LLM Interaction** - Send visual context to LLM for decision making
+3. **Action Execution** - Execute mobile actions (tap, swipe, type, etc.)
+4. **Memory Update** - Record actions and results for context
+
+Learn more: [Processing Strategy Documentation](../mobile/strategy.md)
+
+### Available Commands
+
+Mobile Agent uses **13 MCP commands** across two servers:
+
+- **Data Collection Server (8020)**: 5 read-only commands
+- **Action Server (8021)**: 8 control commands
+
+Learn more: [MCP Commands Reference](../mobile/commands.md)
+
+---
+
+## 🐛 Common Issues & Troubleshooting
+
+### Issue 1: ADB Device Not Found
+
+**Error: No Devices Detected**
+
+Symptoms:
+```bash
+$ adb devices
+List of devices attached
+# Empty list
+```
+
+**Solutions:**
+
+**For Physical Devices:**
+
+1. **Check USB connection:**
+ - Use a different USB cable (some cables are charge-only)
+ - Try a different USB port on your computer
+ - Ensure USB debugging is enabled on device
+
+2. **Authorize computer on device:**
+ - Disconnect and reconnect USB
+ - On device, tap "Allow USB debugging" when prompted
+ - Check "Always allow from this computer"
+
+3. **Restart ADB server:**
+ ```bash
+ adb kill-server
+ adb start-server
+ adb devices
+ ```
+
+4. **Check USB driver (Windows):**
+ - Install Google USB Driver via Android Studio SDK Manager
+ - Or install device-specific driver from manufacturer
+
+**For Emulators:**
+
+1. **Wait for emulator to fully boot** (can take 1-2 minutes)
+
+2. **Restart emulator:**
+ - Close emulator completely
+ - Start emulator again from Android Studio or command line
+
+3. **Check emulator is running:**
+ ```bash
+ emulator -list-avds
+ emulator -avd Pixel_6_API_33
+ ```
+
+### Issue 2: MCP Server Cannot Connect to Device
+
+**Error: ADB Connection Failed**
+
+Symptoms:
+```log
+ERROR - Failed to execute ADB command
+ERROR - Device not accessible
+```
+
+**Solutions:**
+
+1. **Verify ADB connection first:**
+ ```bash
+ adb devices
+ ```
+ Device should show "device" status (not "offline" or "unauthorized")
+
+2. **Test ADB commands manually:**
+ ```bash
+ adb shell getprop ro.product.model
+ adb shell screencap -p /sdcard/test.png
+ ```
+
+3. **Restart MCP servers with debug output:**
+ ```bash
+ # Kill existing servers
+ pkill -f mobile_mcp_server
+
+ # Start with explicit ADB path
+ python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --adb-path $(which adb) \
+ --server both
+ ```
+
+4. **Check device permissions:**
+ - Ensure USB debugging is still authorized
+ - Revoke and re-grant USB debugging authorization on device
+
+### Issue 3: Client Cannot Connect to Server
+
+**Error: Connection Refused or Failed**
+
+Symptoms:
+```log
+ERROR - [WS] Failed to connect to ws://localhost:5001/ws
+Connection refused
+```
+
+**Solutions:**
+
+1. **Verify server is running:**
+ ```bash
+ curl http://localhost:5001/api/health
+ ```
+
+ Should return:
+ ```json
+ {
+ "status": "healthy",
+ "online_clients": []
+ }
+ ```
+
+2. **Check server address:**
+ - If server and client are on different machines, use server's IP address
+ - Replace `localhost` with actual IP address (e.g., `ws://192.168.1.100:5001/ws`)
+ - Ensure the port number matches the server's `--port` argument
+
+3. **Check firewall settings:**
+ ```bash
+ # Windows: Allow port 5001
+ netsh advfirewall firewall add rule name="UFO Server" dir=in action=allow protocol=TCP localport=5001
+
+ # macOS: System Preferences → Security & Privacy → Firewall → Firewall Options
+
+ # Linux (Ubuntu):
+ sudo ufw allow 5001/tcp
+ ```
+
+### Issue 4: Missing `--platform mobile` Flag
+
+**Error: Incorrect Agent Type**
+
+Symptoms:
+- Client connects but cannot execute mobile commands
+- Server logs show wrong platform type
+- Tasks fail with "unsupported operation" errors
+
+**Solution:**
+
+Always include `--platform mobile` when starting the client:
+
+```bash
+# Wrong (missing platform)
+python -m ufo.client.client --ws --client-id mobile_phone_1
+
+# Correct
+python -m ufo.client.client \
+ --ws \
+ --client-id mobile_phone_1 \
+ --platform mobile
+```
+
+### Issue 5: Screenshot Capture Fails
+
+**Error: Cannot Capture Screenshot**
+
+Symptoms:
+```log
+ERROR - Failed to capture screenshot
+ERROR - screencap command failed
+```
+
+**Solutions:**
+
+1. **Test screenshot manually:**
+ ```bash
+ adb shell screencap -p /sdcard/test.png
+ adb pull /sdcard/test.png .
+ ```
+
+2. **Check device storage:**
+ ```bash
+ adb shell df -h /sdcard
+ ```
+ Ensure sufficient space on device
+
+3. **Check permissions:**
+ ```bash
+ adb shell ls -l /sdcard
+ ```
+
+4. **Try alternative screenshot method:**
+ ```bash
+ adb exec-out screencap -p > screenshot.png
+ ```
+
+### Issue 6: UI Controls Not Found
+
+**Error: Control Information Missing**
+
+Symptoms:
+```log
+WARNING - Failed to get UI controls
+WARNING - UI tree dump failed
+```
+
+**Solutions:**
+
+1. **Test UI dump manually:**
+ ```bash
+ adb shell uiautomator dump /sdcard/window_dump.xml
+ adb shell cat /sdcard/window_dump.xml
+ ```
+
+2. **Enable accessibility services:**
+ - Some apps require accessibility services for UI automation
+ - Settings → Accessibility → Enable required services
+
+3. **Update Android WebView:**
+ - Old WebView versions may cause UI dump issues
+ - Update via Play Store: Android System WebView
+
+4. **Restart device:**
+ ```bash
+ adb reboot
+ # Wait for device to restart
+ adb wait-for-device
+ ```
+
+### Issue 7: Emulator Too Slow
+
+**Error: Performance Issues**
+
+Symptoms:
+- Emulator lags or freezes
+- Actions take very long to execute
+- Timeouts occur frequently
+
+**Solutions:**
+
+1. **Enable Hardware Acceleration:**
+ - **Windows:** Ensure Hyper-V or Intel HAXM is enabled
+ - **macOS:** Hypervisor.framework is used automatically
+ - **Linux:** Install KVM
+
+2. **Allocate More Resources:**
+ - In Android Studio AVD Manager, edit AVD
+ - Increase RAM to 2048 MB or higher
+ - Increase VM heap to 512 MB
+ - Set Graphics to "Hardware - GLES 2.0"
+
+3. **Use x86_64 System Image:**
+ - Faster than ARM images
+ - Download x86_64 image in SDK Manager
+
+4. **Reduce Screen Resolution:**
+ - Edit AVD settings
+ - Choose lower resolution (e.g., 720x1280 instead of 1080x2400)
+
+### Issue 8: Multiple Devices Connected
+
+**Error: More Than One Device**
+
+Symptoms:
+```bash
+$ adb devices
+List of devices attached
+emulator-5554 device
+192.168.1.100:5555 device
+```
+
+**Solutions:**
+
+1. **Specify device for ADB:**
+ ```bash
+ # Use emulator
+ export ANDROID_SERIAL=emulator-5554
+
+ # Use physical device
+ export ANDROID_SERIAL=192.168.1.100:5555
+ ```
+
+2. **Disconnect other devices:**
+ ```bash
+ # Disconnect wireless device
+ adb disconnect 192.168.1.100:5555
+ ```
+
+3. **Run separate MCP servers:**
+ ```bash
+ # Server for emulator
+ ANDROID_SERIAL=emulator-5554 python -m ufo.client.mcp.http_servers.mobile_mcp_server --data-port 8020 --action-port 8021 --server both
+
+ # Server for physical device
+ ANDROID_SERIAL=192.168.1.100:5555 python -m ufo.client.mcp.http_servers.mobile_mcp_server --data-port 8022 --action-port 8023 --server both
+ ```
+
+---
+
+## 📚 Next Steps
+
+You've successfully set up a Mobile Agent! Explore these topics to deepen your understanding:
+
+### Immediate Next Steps
+
+| Priority | Topic | Time | Link |
+|----------|-------|------|------|
+| 🥇 | **Mobile Agent Architecture** | 10 min | [Overview](../mobile/overview.md) |
+| 🥈 | **State Machine & Processing** | 15 min | [State Machine](../mobile/state.md) |
+| 🥉 | **MCP Commands Reference** | 15 min | [Commands](../mobile/commands.md) |
+
+### Advanced Topics
+
+| Topic | Description | Link |
+|-------|-------------|------|
+| **Processing Strategy** | 4-phase pipeline (Data, LLM, Action, Memory) | [Strategy](../mobile/strategy.md) |
+| **Galaxy Integration** | Multi-device orchestration with UFO³ | [As Galaxy Device](../mobile/as_galaxy_device.md) |
+| **MCP Protocol Details** | Deep dive into mobile interaction protocol | [Commands](../mobile/commands.md) |
+
+### Production Deployment
+
+| Best Practice | Description |
+|---------------|-------------|
+| **Persistent ADB** | Keep ADB connection stable for physical devices |
+| **Emulator Management** | Automate emulator lifecycle (start/stop/reset) |
+| **Screenshot Storage** | Configure log paths and cleanup policies in `config/ufo/system.yaml` |
+| **Security** | Use secure WebSocket (wss://) for remote deployments |
+
+> **💡 Learn More:** For comprehensive understanding of the Mobile Agent architecture and processing flow, see the [Mobile Agent Overview](../mobile/overview.md).
+
+---
+
+## ✅ Summary
+
+Congratulations! You've successfully:
+
+✅ Set up Android device (physical or emulator)
+✅ Installed ADB (Android Debug Bridge)
+✅ Installed Python dependencies
+✅ Started the Device Agent Server
+✅ Launched MCP services (data collection + action)
+✅ Connected Mobile Device Agent Client
+✅ Dispatched mobile automation tasks via HTTP API
+✅ (Optional) Configured device in Galaxy
+
+**Your Mobile Agent is Ready**
+
+You can now:
+
+- 📱 Automate Android apps remotely
+- 🖼️ Capture and analyze screenshots
+- 🎯 Interact with UI controls precisely
+- 🌌 Integrate with UFO³ Galaxy for cross-platform workflows
+
+**Start exploring mobile automation!** 🚀
+
+---
+
+## 💡 Pro Tips
+
+### Quick Start Command Summary
+
+**Start everything in order:**
+
+```bash
+# Terminal 1: Start server
+python -m ufo.server.app --port 5001 --platform mobile
+
+# Terminal 2: Start MCP services
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both
+
+# Terminal 3: Start client
+python -m ufo.client.client --ws --ws-server ws://localhost:5001/ws --client-id mobile_phone_1 --platform mobile
+
+# Terminal 4: Dispatch task
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{"client_id": "mobile_phone_1", "request": "Open Chrome browser"}'
+```
+
+### Development Shortcuts
+
+**Create shell scripts for common operations:**
+
+**Windows (PowerShell):**
+```powershell
+# start-mobile-agent.ps1
+Start-Process powershell -ArgumentList "-NoExit", "-Command", "python -m ufo.server.app --port 5001 --platform mobile"
+Start-Sleep 2
+Start-Process powershell -ArgumentList "-NoExit", "-Command", "python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both"
+Start-Sleep 2
+Start-Process powershell -ArgumentList "-NoExit", "-Command", "python -m ufo.client.client --ws --ws-server ws://localhost:5001/ws --client-id mobile_phone_1 --platform mobile"
+```
+
+**macOS/Linux (Bash):**
+```bash
+#!/bin/bash
+# start-mobile-agent.sh
+
+# Start server in background
+python -m ufo.server.app --port 5001 --platform mobile &
+sleep 2
+
+# Start MCP services in background
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both &
+sleep 2
+
+# Start client in foreground
+python -m ufo.client.client --ws --ws-server ws://localhost:5001/ws --client-id mobile_phone_1 --platform mobile
+```
+
+Make executable:
+```bash
+chmod +x start-mobile-agent.sh
+./start-mobile-agent.sh
+```
+
+### Testing Your Setup
+
+**Quick test to verify everything works:**
+
+```bash
+# Test 1: Check ADB
+adb devices
+# Should show your device
+
+# Test 2: Check Server
+curl http://localhost:5001/api/health
+# Should return {"status": "healthy"}
+
+# Test 3: Check MCP
+curl http://localhost:8020/health
+curl http://localhost:8021/health
+# Should return health status
+
+# Test 4: Check Client
+curl http://localhost:5001/api/clients
+# Should show mobile_phone_1
+
+# Test 5: Dispatch simple task
+curl -X POST http://localhost:5001/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{"client_id": "mobile_phone_1", "request": "Take a screenshot"}'
+# Should return dispatched status
+```
+
+**Happy Mobile Automation! 🎉**
diff --git a/documents/docs/getting_started/quick_start_ufo2.md b/documents/docs/getting_started/quick_start_ufo2.md
new file mode 100644
index 000000000..f91412bed
--- /dev/null
+++ b/documents/docs/getting_started/quick_start_ufo2.md
@@ -0,0 +1,343 @@
+# Quick Start Guide
+
+Welcome to **UFO²** – the Desktop AgentOS! This guide will help you get started with UFO² in just a few minutes.
+
+**What is UFO²?**
+
+UFO² is a Desktop AgentOS that turns natural-language requests into automatic, reliable, multi-application workflows on Windows. It goes beyond UI-focused automation by combining GUI actions with native API calls for faster and more robust execution.
+
+---
+
+## 🛠️ Step 1: Installation
+
+### Requirements
+
+- **Python** >= 3.10
+- **Windows OS** >= 10
+- **Git** (for cloning the repository)
+
+### Installation Steps
+
+```powershell
+# [Optional] Create conda environment
+conda create -n ufo python=3.10
+conda activate ufo
+
+# Clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+> **💡 Tip:** If you want to use Qwen as your LLM, uncomment the related libraries in `requirements.txt` before installing.
+
+---
+
+---
+
+## ⚙️ Step 2: Configure LLMs
+
+> **📢 New Configuration System (Recommended)**
+> UFO² now uses a **new modular config system** located in `config/ufo/` with auto-discovery and type validation. While the legacy `ufo/config/config.yaml` is still supported for backward compatibility, we strongly recommend migrating to the new system for better maintainability.
+
+### Option 1: New Config System (Recommended)
+
+The new config files are organized in `config/ufo/` with separate YAML files for different components:
+
+```powershell
+# Copy template to create your agent config file (contains API keys)
+copy config\ufo\agents.yaml.template config\ufo\agents.yaml
+notepad config\ufo\agents.yaml # Edit your LLM API credentials
+```
+
+**Directory Structure:**
+```
+config/ufo/
+├── agents.yaml.template # Template: Agent configs (HOST_AGENT, APP_AGENT) - COPY & EDIT THIS
+├── agents.yaml # Your agent configs with API keys (DO NOT commit to git)
+├── rag.yaml # RAG and knowledge settings (default values, edit if needed)
+├── system.yaml # System settings (default values, edit if needed)
+├── mcp.yaml # MCP integration settings (default values, edit if needed)
+└── ... # Other modular configs with defaults
+```
+
+> **Configuration Files:** `agents.yaml` contains sensitive information (API keys) and must be configured. Other config files have default values and only need editing for customization.
+
+**Migration Benefits:**
+
+- ✅ **Type Safety**: Automatic validation with Pydantic schemas
+- ✅ **Auto-Discovery**: No manual config loading needed
+- ✅ **Modular**: Separate concerns into individual files
+- ✅ **IDE Support**: Better autocomplete and error detection
+
+### Option 2: Legacy Config (Backward Compatible)
+
+For existing users, the old config path still works:
+
+```powershell
+copy ufo\config\config.yaml.template ufo\config\config.yaml
+notepad ufo\config\config.yaml # Paste your key & endpoint
+```
+
+> **Config Precedence:** If both old and new configs exist, the new config in `config/ufo/` takes precedence. A warning will be displayed during startup.
+
+---
+
+### LLM Configuration Examples
+
+#### OpenAI Configuration
+
+**New Config (`config/ufo/agents.yaml`):**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE" # Replace with your actual API key
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o"
+
+APP_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE" # Replace with your actual API key
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-4o"
+```
+
+**Legacy Config (`ufo/config/config.yaml`):**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: True
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o"
+```
+
+#### Azure OpenAI (AOAI) Configuration
+
+**New Config (`config/ufo/agents.yaml`):**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+
+APP_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-4o"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+```
+
+> **ℹ️ More LLM Options:** UFO² supports various LLM providers including Qwen, Gemini, Claude, DeepSeek, and more. See the [Model Configuration Guide](../configuration/models/overview.md) for complete details.
+
+---
+
+---
+
+## 📔 Step 3: Additional Settings (Optional)
+
+### RAG Configuration
+
+Enhance UFO's capabilities with external knowledge through Retrieval Augmented Generation (RAG):
+
+**For New Config**: Edit `config/ufo/rag.yaml` (already exists with default values)
+**For Legacy Config**: Edit `ufo/config/config.yaml`
+
+**Available RAG Options:**
+
+| Feature | Documentation | Description |
+|---------|--------------|-------------|
+| **Offline Help Documents** | [Learning from Help Documents](../ufo2/core_features/knowledge_substrate/learning_from_help_document.md) | Retrieve information from offline help documentation |
+| **Online Bing Search** | [Learning from Bing Search](../ufo2/core_features/knowledge_substrate/learning_from_bing_search.md) | Utilize up-to-date online search results |
+| **Self-Experience** | [Experience Learning](../ufo2/core_features/knowledge_substrate/experience_learning.md) | Save task trajectories into memory for future reference |
+| **User Demonstrations** | [Learning from Demonstrations](../ufo2/core_features/knowledge_substrate/learning_from_demonstration.md) | Learn from user-provided demonstrations |
+
+**Example RAG Config (`config/ufo/rag.yaml`):**
+```yaml
+# Enable Bing search
+RAG_ONLINE_SEARCH: true
+BING_API_KEY: "YOUR_BING_API_KEY" # Get from https://www.microsoft.com/en-us/bing/apis
+
+# Enable experience learning
+RAG_EXPERIENCE: true
+```
+
+> **ℹ️ RAG Resources:** See [Knowledge Substrate Overview](../ufo2/core_features/knowledge_substrate/overview.md) for complete RAG configuration and best practices.
+
+---
+
+---
+
+## 🎉 Step 4: Start UFO²
+
+### Interactive Mode
+
+Start UFO² in interactive mode where you can enter requests dynamically:
+
+```powershell
+# Assume you are in the cloned UFO folder
+python -m ufo --task
+```
+
+**Expected Output:**
+```
+Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
+ _ _ _____ ___
+| | | || ___| / _ \
+| | | || |_ | | | |
+| |_| || _| | |_| |
+ \___/ |_| \___/
+Please enter your request to be completed🛸:
+```
+
+### Direct Request Mode
+
+Invoke UFO² with a specific task and request directly:
+
+```powershell
+python -m ufo --task -r ""
+```
+
+**Example:**
+```powershell
+python -m ufo --task email_demo -r "Send an email to john@example.com with subject 'Meeting Reminder'"
+```
+
+---
+
+
+---
+
+## 🎥 Step 5: Execution Logs
+
+UFO² automatically saves execution logs, screenshots, and traces for debugging and analysis.
+
+**Log Location:**
+```
+./logs//
+```
+
+**Log Contents:**
+
+| File/Folder | Description |
+|-------------|-------------|
+| `screenshots/` | Screenshots captured during execution |
+| `action_*.json` | Agent actions and responses |
+| `ui_trees/` | UI control tree snapshots (if enabled) |
+| `request_response.log` | Complete LLM request/response logs |
+
+> **Analyzing Logs:** Use the logs to debug agent behavior, replay execution flow, and analyze agent decision-making patterns.
+
+> **Privacy Notice:** Screenshots may contain sensitive or confidential information. Ensure no private data is visible during execution. See [DISCLAIMER.md](https://github.com/microsoft/UFO/blob/main/DISCLAIMER.md) for details.
+
+---
+
+## 🔄 Migrating from Legacy Config
+
+If you're upgrading from an older version that used `ufo/config/config.yaml`, UFO² provides an **automated conversion tool**.
+
+### Automatic Conversion (Recommended)
+
+```powershell
+# Interactive conversion with automatic backup
+python -m ufo.tools.convert_config
+
+# Preview changes first (dry run)
+python -m ufo.tools.convert_config --dry-run
+
+# Force conversion without confirmation
+python -m ufo.tools.convert_config --force
+```
+
+**What the tool does:**
+
+- ✅ Splits monolithic `config.yaml` into modular files
+- ✅ Converts flow-style YAML (with braces) to block-style YAML
+- ✅ Maps legacy file names to new structure
+- ✅ Preserves all configuration values
+- ✅ Creates timestamped backup for rollback
+- ✅ Validates output files
+
+**Conversion Mapping:**
+
+| Legacy File | → | New File(s) | Transformation |
+|-------------|---|-------------|----------------|
+| `config.yaml` (monolithic) | → | `agents.yaml` + `rag.yaml` + `system.yaml` | Smart field splitting |
+| `agent_mcp.yaml` | → | `mcp.yaml` | Rename + format conversion |
+| `config_prices.yaml` | → | `prices.yaml` | Rename + format conversion |
+
+> **Migration Guide:** For detailed migration instructions, rollback procedures, and troubleshooting, see the [Configuration Migration Guide](../configuration/system/migration.md).
+
+---
+
+## 📚 Additional Resources
+
+### Core Documentation
+
+**Architecture & Concepts:**
+
+- [UFO² Overview](../ufo2/overview.md) - System architecture and design principles
+- [HostAgent](../ufo2/host_agent/overview.md) - Desktop-level coordination agent
+- [AppAgent](../ufo2/app_agent/overview.md) - Application-level execution agent
+
+### Configuration
+
+**Configuration Guides:**
+
+- [Configuration Overview](../configuration/system/overview.md) - Configuration system architecture
+- [Agents Configuration](../configuration/system/agents_config.md) - LLM and agent settings
+- [System Configuration](../configuration/system/system_config.md) - Runtime and execution settings
+- [MCP Configuration](../configuration/system/mcp_reference.md) - MCP server settings
+- [Model Configuration](../configuration/models/overview.md) - Supported LLM providers
+
+### Advanced Features
+
+**Advanced Topics:**
+
+- [Hybrid Actions](../ufo2/core_features/hybrid_actions.md) - GUI + API automation
+- [Control Detection](../ufo2/core_features/control_detection/overview.md) - UIA + Vision detection
+- [Knowledge Substrate](../ufo2/core_features/knowledge_substrate/overview.md) - RAG and learning
+- [Multi-Action Execution](../ufo2/core_features/multi_action.md) - Speculative action batching
+
+### Evaluation & Benchmarks
+
+**Benchmarking:**
+
+- [Benchmark Overview](../ufo2/evaluation/benchmark/overview.md) - Evaluation framework and datasets
+- [Windows Agent Arena](../ufo2/evaluation/benchmark/windows_agent_arena.md) - 154 real Windows tasks
+- [OSWorld](../ufo2/evaluation/benchmark/osworld.md) - Cross-application benchmarks
+
+---
+
+## ❓ Getting Help
+
+- 📖 **Documentation**: [https://microsoft.github.io/UFO/](https://microsoft.github.io/UFO/)
+- 🐛 **GitHub Issues**: [https://github.com/microsoft/UFO/issues](https://github.com/microsoft/UFO/issues) (preferred)
+- 📧 **Email**: [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+
+---
+
+## 🎯 Next Steps
+
+Now that UFO² is set up, explore these guides to unlock its full potential:
+
+1. **[Configuration Customization](../configuration/system/overview.md)** - Fine-tune UFO² behavior
+2. **[Knowledge Substrate Setup](../ufo2/core_features/knowledge_substrate/overview.md)** - Enable RAG capabilities
+3. **[Creating Custom Agents](../tutorials/creating_app_agent/overview.md)** - Build specialized agents
+4. **[MCP Integration](../mcp/overview.md)** - Extend with custom MCP servers
+
+Happy automating with UFO²! 🛸
\ No newline at end of file
diff --git a/documents/docs/img/add_device.png b/documents/docs/img/add_device.png
new file mode 100644
index 000000000..ab4beec34
Binary files /dev/null and b/documents/docs/img/add_device.png differ
diff --git a/documents/docs/img/agent_registry.png b/documents/docs/img/agent_registry.png
new file mode 100644
index 000000000..a9792043d
Binary files /dev/null and b/documents/docs/img/agent_registry.png differ
diff --git a/documents/docs/img/agent_state.png b/documents/docs/img/agent_state.png
new file mode 100644
index 000000000..7e028dd3b
Binary files /dev/null and b/documents/docs/img/agent_state.png differ
diff --git a/documents/docs/img/aip.png b/documents/docs/img/aip.png
new file mode 100644
index 000000000..7e7f0dcdf
Binary files /dev/null and b/documents/docs/img/aip.png differ
diff --git a/documents/docs/img/aip_new.png b/documents/docs/img/aip_new.png
new file mode 100644
index 000000000..7b74dd3c1
Binary files /dev/null and b/documents/docs/img/aip_new.png differ
diff --git a/documents/docs/img/async_timeline.png b/documents/docs/img/async_timeline.png
new file mode 100644
index 000000000..dbea9663e
Binary files /dev/null and b/documents/docs/img/async_timeline.png differ
diff --git a/documents/docs/img/case_excel.png b/documents/docs/img/case_excel.png
new file mode 100644
index 000000000..7d4223d1f
Binary files /dev/null and b/documents/docs/img/case_excel.png differ
diff --git a/documents/docs/img/case_log.png b/documents/docs/img/case_log.png
new file mode 100644
index 000000000..515b6aa27
Binary files /dev/null and b/documents/docs/img/case_log.png differ
diff --git a/documents/docs/img/case_resource.png b/documents/docs/img/case_resource.png
new file mode 100644
index 000000000..7ae271e9e
Binary files /dev/null and b/documents/docs/img/case_resource.png differ
diff --git a/documents/docs/img/constellation_agent.png b/documents/docs/img/constellation_agent.png
new file mode 100644
index 000000000..69e8d302e
Binary files /dev/null and b/documents/docs/img/constellation_agent.png differ
diff --git a/documents/docs/img/desomposition.png b/documents/docs/img/decomposition.png
similarity index 100%
rename from documents/docs/img/desomposition.png
rename to documents/docs/img/decomposition.png
diff --git a/documents/docs/img/device_agent.png b/documents/docs/img/device_agent.png
new file mode 100644
index 000000000..b807988bb
Binary files /dev/null and b/documents/docs/img/device_agent.png differ
diff --git a/documents/docs/img/device_cs.png b/documents/docs/img/device_cs.png
new file mode 100644
index 000000000..85d6247c0
Binary files /dev/null and b/documents/docs/img/device_cs.png differ
diff --git a/documents/docs/img/failed_cased.png b/documents/docs/img/failed_cased.png
new file mode 100644
index 000000000..cec3b3cf9
Binary files /dev/null and b/documents/docs/img/failed_cased.png differ
diff --git a/documents/docs/img/first.png b/documents/docs/img/first.png
new file mode 100644
index 000000000..c9732cb30
Binary files /dev/null and b/documents/docs/img/first.png differ
diff --git a/documents/docs/img/linux_agent_state.png b/documents/docs/img/linux_agent_state.png
new file mode 100644
index 000000000..d0f798626
Binary files /dev/null and b/documents/docs/img/linux_agent_state.png differ
diff --git a/documents/docs/img/logo3.png b/documents/docs/img/logo3.png
new file mode 100644
index 000000000..c0caa51f4
Binary files /dev/null and b/documents/docs/img/logo3.png differ
diff --git a/documents/docs/img/mcp.png b/documents/docs/img/mcp.png
new file mode 100644
index 000000000..b6cfdbeb0
Binary files /dev/null and b/documents/docs/img/mcp.png differ
diff --git a/documents/docs/img/orchestrator.png b/documents/docs/img/orchestrator.png
new file mode 100644
index 000000000..14a34bbed
Binary files /dev/null and b/documents/docs/img/orchestrator.png differ
diff --git a/documents/docs/img/overview2.png b/documents/docs/img/overview2.png
new file mode 100644
index 000000000..5312c73f9
Binary files /dev/null and b/documents/docs/img/overview2.png differ
diff --git a/documents/docs/img/poster.png b/documents/docs/img/poster.png
new file mode 100644
index 000000000..92bb74b74
Binary files /dev/null and b/documents/docs/img/poster.png differ
diff --git a/documents/docs/img/safe_assignment.png b/documents/docs/img/safe_assignment.png
new file mode 100644
index 000000000..feb6aaba0
Binary files /dev/null and b/documents/docs/img/safe_assignment.png differ
diff --git a/documents/docs/img/task_constellation.png b/documents/docs/img/task_constellation.png
new file mode 100644
index 000000000..127d8ceda
Binary files /dev/null and b/documents/docs/img/task_constellation.png differ
diff --git a/documents/docs/img/task_logging.png b/documents/docs/img/task_logging.png
new file mode 100644
index 000000000..5fe1f65fb
Binary files /dev/null and b/documents/docs/img/task_logging.png differ
diff --git a/documents/docs/img/webui.png b/documents/docs/img/webui.png
new file mode 100644
index 000000000..d2a1d606f
Binary files /dev/null and b/documents/docs/img/webui.png differ
diff --git a/documents/docs/index.md b/documents/docs/index.md
index cd4307b28..51b4c6d80 100644
--- a/documents/docs/index.md
+++ b/documents/docs/index.md
@@ -1,91 +1,451 @@
-# Welcome to UFO²'s Document!
+# Welcome to UFO³ Documentation
-[](https://arxiv.org/abs/2504.14603)
-
-[](https://opensource.org/licenses/MIT)
-[](https://github.com/microsoft/UFO)
-[](https://www.youtube.com/watch?v=QT_OhygMVXU)
+
+
+ UFO³ : Weaving the Digital Agent Galaxy
+
+
A Multi-Device Orchestration Framework for Cross-Platform Intelligent Automation
+
+[](https://arxiv.org/abs/2511.11332)
+[](https://arxiv.org/abs/2504.14603)
+
+[](https://opensource.org/licenses/MIT)
+[](https://github.com/microsoft/UFO)
+[](https://www.youtube.com/watch?v=QT_OhygMVXU)
-## Introduction
-UFO now evolves into **UFO²** (Desktop AgentOS), a new generation of agent framework that can run on Windows desktop OS. It is designed to **automate** and **orchestrate** tasks across multiple applications, enabling users to seamlessly interact with their operating system using natural language commands beyond just **UI automation**.
-
-
-
+---
+
+
+
+
-## ✨ Key Capabilities
+## 📖 About This Documentation
-| Feature | Description |
-|----------------------------------|-------------|
-| **Deep OS Integration** | Combines Windows UIA, Win32 and WinCOM for first‑class control detection and native commands. |
-| **Picture‑in‑Picture Desktop** *(coming soon)* | Automation runs in a sandboxed virtual desktop so you can keep using your main screen. |
-| [**Hybrid GUI + API Actions**](./automator/overview.md) | Chooses native APIs when available, falls back to clicks/keystrokes when not—fast *and* robust. |
-| [**Speculative Multi‑Action**](./advanced_usage/multi_action.md) | Bundles several predicted steps into one LLM call, validated live—up to **51 % fewer** queries. |
-| [**Continuous Knowledge Substrate**](./advanced_usage/reinforce_appagent/overview.md) | Mixes docs, Bing search, user demos and execution traces via RAG for agents that learn over time. |
-| [**UIA + Visual Control Detection**](./advanced_usage/control_detection/hybrid_detection.md) | Detects standard *and* custom controls with a hybrid UIA + vision pipeline. |
+Welcome to the official documentation for **UFO³**, Microsoft's open-source framework for intelligent automation across devices and platforms. Whether you're looking to automate Windows applications or orchestrate complex workflows across multiple devices, this documentation will guide you through every step.
-Please refer to the [UFO² paper](https://arxiv.org/abs/2504.14603) and the hyperlinked sections for more details on each capability.
+**What you'll find here:**
+- 🚀 **[Quick Start Guides](getting_started/quick_start_galaxy.md)** – Get up and running in minutes
+- 📚 **[Core Concepts](galaxy/overview.md)** – Understand the architecture and key components
+- ⚙️ **[Configuration](configuration/system/agents_config.md)** – Set up your agents and models
+- 🔧 **[Advanced Features](ufo2/core_features/multi_action.md)** – Deep dive into powerful capabilities
+- 💡 **[FAQ](faq.md)** – Common questions and troubleshooting
---
+## 🎯 Choose Your Path
-## 🏗️ Architecture overview
-
-
-
+UFO³ consists of two complementary frameworks. Choose the one that best fits your needs, or use both together!
+
+| Framework | Best For | Key Strength | Get Started |
+|-----------|----------|--------------|-------------|
+| **🌌 Galaxy** ✨ NEW & RECOMMENDED | Cross-device workflows Complex automation Parallel execution | Multi-device orchestration DAG-based planning Real-time monitoring | [Quick Start →](getting_started/quick_start_galaxy.md) |
+| **🪟 UFO²** ⚡ STABLE & LTS | Windows automation Quick tasks Learning basics | Deep Windows integration Hybrid GUI + API Stable & reliable | [Quick Start →](getting_started/quick_start_ufo2.md) |
+
+### 🤔 Decision Guide
+
+| Question | Galaxy | UFO² |
+|----------|:------:|:----:|
+| Need cross-device collaboration? | ✅ | ❌ |
+| Complex multi-step workflows? | ✅ | ⚠️ Limited |
+| Windows-only automation? | ✅ | ✅ Optimized |
+| Quick setup & learning? | ⚠️ Moderate | ✅ Easy |
+| Stable & reliable? | 🚧 Active Dev | ✅ LTS |
+
+---
+
+## 🌟 What's New in UFO³?
+
+**UFO³ is a scalable, universal cross-device agent framework** that enables you to develop new device agents for different platforms and applications. Through the **Agent Interaction Protocol (AIP)**, custom device agents can seamlessly integrate into UFO³ Galaxy for coordinated multi-device orchestration.
+
+### Evolution Timeline
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#E8F4F8','primaryTextColor':'#1A1A1A','primaryBorderColor':'#7CB9E8','lineColor':'#A8D5E2','secondaryColor':'#B8E6F0','tertiaryColor':'#D4F1F4','fontSize':'16px','fontFamily':'Segoe UI, Arial, sans-serif'}}}%%
+graph LR
+ A["🎈 UFO February 2024 GUI Agent for Windows"]
+ B["🖥️ UFO² April 2025 Desktop AgentOS"]
+ C["🌌 UFO³ Galaxy November 2025 Multi-Device Orchestration"]
+
+ A -->|Evolve| B
+ B -->|Scale| C
+
+ style A fill:#E8F4F8,stroke:#7CB9E8,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+ style B fill:#C5E8F5,stroke:#5BA8D0,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+ style C fill:#A4DBF0,stroke:#3D96BE,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+```
+
+### 🚀 UFO³ = **Galaxy** (Multi-Device Orchestration) + **UFO²** (Device Agent)
+
+UFO³ introduces **Galaxy**, a revolutionary multi-device orchestration framework that coordinates intelligent agents across heterogeneous platforms. Built on five tightly integrated design principles:
+
+1. **🌟 Declarative Decomposition into Dynamic DAG** - Requests decomposed into structured DAG with TaskStars and dependencies for automated scheduling and runtime rewriting
+
+2. **🔄 Continuous Result-Driven Graph Evolution** - Living constellation that adapts to execution feedback through controlled rewrites and dynamic adjustments
+
+3. **⚡ Heterogeneous, Asynchronous & Safe Orchestration** - Capability-based device matching with async execution, safe locking, and formally verified correctness
+
+4. **🔌 Unified Agent Interaction Protocol (AIP)** - WebSocket-based secure coordination layer with fault tolerance and automatic reconnection
+5. **🛠️ Template-Driven MCP-Empowered Device Agents** - Lightweight toolkit for rapid agent development with MCP integration for tool augmentation
-UFO² operates as a **Desktop AgentOS**, encompassing a multi-agent framework that includes:
+| Aspect | UFO² | UFO³ Galaxy |
+|--------|------|-------------|
+| **Architecture** | Single Windows Agent | Multi-Device Orchestration |
+| **Task Model** | Sequential ReAct Loop | DAG-based Constellation Workflows |
+| **Scope** | Single device, multi-app | Multi-device, cross-platform |
+| **Coordination** | HostAgent + AppAgents | ConstellationAgent + TaskOrchestrator |
+| **Device Support** | Windows Desktop | Windows, Linux, macOS, Android, Web |
+| **Task Planning** | Application-level | Device-level with dependencies |
+| **Execution** | Sequential | Parallel DAG execution |
+| **Device Agent Role** | Standalone | Can serve as Galaxy device agent |
+| **Complexity** | Simple to Moderate | Simple to Very Complex |
+| **Learning Curve** | Low | Moderate |
+| **Cross-Device Collaboration** | ❌ Not Supported | ✅ Core Feature |
+| **Setup Difficulty** | ✅ Easy | ⚠️ Moderate |
+| **Status** | ✅ LTS (Long-Term Support) | ⚡ Active Development |
-1. **HostAgent** – Parses the natural‑language goal, launches the necessary applications, spins up / coordinates AppAgents, and steers a global finite‑state machine (FSM).
-2. **AppAgents** – One per application; each runs a ReAct loop with multimodal perception, hybrid control detection, retrieval‑augmented knowledge, and the **Puppeteer** executor that chooses between GUI actions and native APIs.
-3. **Knowledge Substrate** – Blends offline documentation, online search, demonstrations, and execution traces into a vector store that is retrieved on‑the‑fly at inference.
-4. **Speculative Executor** – Slashes LLM latency by predicting batches of likely actions and validating them against live UIA state in a single shot.
-5. **Picture‑in‑Picture Desktop** *(coming soon)* – Runs the agent in an isolated virtual desktop so your main workspace and input devices remain untouched.
+### 🎓 Migration Path
-For a deep dive see our [technical report](https://arxiv.org/abs/2504.14603).
+**For UFO² Users:**
+1. ✅ **Keep using UFO²** – Fully supported, actively maintained
+2. 🔄 **Gradual adoption** – Galaxy can use UFO² as Windows device agent
+3. 📈 **Scale up** – Move to Galaxy when you need multi-device capabilities
+4. 📚 **Learning resources** – [Migration Guide](./getting_started/migration_ufo2_to_galaxy.md)
+
+---
+
+## ✨ Capabilities at a Glance
+
+### 🌌 Galaxy Framework – What's Different?
+
+#### 🌟 Constellation Planning
+
+```
+User Request
+ ↓
+ConstellationAgent
+ ↓
+ [Task DAG]
+ / | \
+Task1 Task2 Task3
+(Win) (Linux)(Mac)
+```
+
+**Benefits:**
+- Cross-device dependency tracking
+- Parallel execution optimization
+- Cross-device dataflow management
+
+#### 🎯 Device Assignment
+
+```
+Selection Criteria
+ • Platform
+ • Resource
+ • Task requirements
+ • Performance history
+ ↓
+ Auto-Assignment
+ ↓
+ Optimal Devices
+```
+
+**Smart Matching:**
+- Capability-based selection
+- Real-time resource monitoring
+- Dynamic reallocation
+
+#### 📊 Orchestration
+
+```
+Task1 → Running ✅
+Task2 → Pending ⏸️
+Task3 → Running 🔄
+ ↓
+ Completion
+ ↓
+ Final Report
+```
+
+**Orchestration:**
+- Real-time status updates
+- Automatic error recovery
+- Progress tracking with feedback
+
+---
+
+### 🪟 UFO² Desktop AgentOS – Core Strengths
+
+UFO² serves dual roles: **standalone Windows automation** and **Galaxy device agent** for Windows platforms.
+
+| Feature | Description | Documentation |
+|---------|-------------|---------------|
+| **Deep OS Integration** | Windows UIA, Win32, WinCOM native control | [Learn More](ufo2/overview.md) |
+| **Hybrid Actions** | GUI clicks + API calls for optimal performance | [Learn More](ufo2/core_features/hybrid_actions.md) |
+| **Speculative Multi-Action** | Batch predictions → **51% fewer LLM calls** | [Learn More](ufo2/core_features/multi_action.md) |
+| **Visual + UIA Detection** | Hybrid control detection for robustness | [Learn More](ufo2/core_features/control_detection/hybrid_detection.md) |
+| **Knowledge Substrate** | RAG with docs, demos, execution traces | [Learn More](ufo2/core_features/knowledge_substrate/overview.md) |
+| **Device Agent Role** | Can serve as Windows executor in Galaxy orchestration | [Learn More](galaxy/overview.md) |
+
+**As Galaxy Device Agent:**
+- Receives tasks from ConstellationAgent through Galaxy orchestration layer
+- Executes Windows-specific operations using proven UFO² capabilities
+- Reports status and results back to TaskOrchestrator
+- Seamlessly participates in cross-device workflows
+
+---
+
+## 🏗️ Architecture
+
+### UFO³ Galaxy – Multi-Device Orchestration
+
+
+
+
+
+| Component | Role |
+|-----------|------|
+| **ConstellationAgent** | Plans and decomposes tasks into DAG workflows |
+| **TaskConstellation** | DAG representation with TaskStar nodes and dependencies |
+| **Device Pool Manager** | Matches tasks to capable devices dynamically |
+| **TaskOrchestrator** | Coordinates parallel execution and handles data flow |
+| **Event System** | Real-time monitoring with observer pattern |
+
+[📖 Learn More →](galaxy/overview.md)
+
+### UFO² – Desktop AgentOS
+
+
+
+
+
+| Component | Role |
+|-----------|------|
+| **HostAgent** | Desktop orchestrator, application lifecycle management |
+| **AppAgents** | Per-application executors with hybrid GUI–API actions |
+| **Knowledge Substrate** | RAG-enhanced learning from docs & execution history |
+| **Speculative Executor** | Multi-action prediction for efficiency |
+
+[📖 Learn More →](ufo2/overview.md)
---
## 🚀 Quick Start
-Please follow the [Quick Start Guide](./getting_started/quick_start.md) to get started with UFO.
+Ready to dive in? Follow these guides to get started with your chosen framework:
-## 🌐 Media Coverage
+### 🌌 Galaxy Quick Start (Multi-Device Orchestration)
-Check out our official deep dive of UFO on [this Youtube Video](https://www.youtube.com/watch?v=QT_OhygMVXU).
+Perfect for complex workflows across multiple devices and platforms.
+```bash
+# 1. Install dependencies
+pip install -r requirements.txt
-UFO sightings have garnered attention from various media outlets, including:
+# 2. Configure agents (see detailed guide for API key setup)
+copy config\galaxy\agent.yaml.template config\galaxy\agent.yaml
+copy config\ufo\agents.yaml.template config\ufo\agents.yaml
-- [微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」](https://www.jiqizhixin.com/articles/2025-05-06-13)
+# 3. Start device agents
+python -m ufo.server.app --port 5000
+python -m ufo.client.client --ws --ws-server ws://localhost:5000/ws --client-id device_1 --platform windows
+
+# 4. Launch Galaxy
+python -m galaxy --interactive
+```
+
+**📖 [Complete Galaxy Quick Start Guide →](getting_started/quick_start_galaxy.md)**
+**⚙️ [Galaxy Configuration Details →](configuration/system/galaxy_devices.md)**
-- [Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
+### 🪟 UFO² Quick Start (Windows Automation)
-- [🚀 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo🌌](https://www.linkedin.com/posts/gutierrezfrancois_ai-ufo-microsoft-activity-7176819900399652865-pLoo?utm_source=share&utm_medium=member_desktop)
+Perfect for Windows-only automation tasks with quick setup.
-- [The AI PC - The Future of Computers? - Microsoft UFO](https://www.youtube.com/watch?v=1k4LcffCq3E)
+```bash
+# 1. Install
+pip install -r requirements.txt
-- [下一代Windows系统曝光:基于GPT-4V,Agent跨应用调度,代号UFO](https://baijiahao.baidu.com/s?id=1790938358152188625&wfr=spider&for=pc)
+# 2. Configure (add your API keys)
+copy config\ufo\agents.yaml.template config\ufo\agents.yaml
-- [下一代智能版 Windows 要来了?微软推出首个 Windows Agent,命名为 UFO!](https://blog.csdn.net/csdnnews/article/details/136161570)
+# 3. Run
+python -m ufo --task
+```
-- [Microsoft発のオープンソース版「UFO」登場! Windowsを自動操縦するAIエージェントを試す](https://internet.watch.impress.co.jp/docs/column/shimizu/1570581.html)
+**📖 [Complete UFO² Quick Start Guide →](getting_started/quick_start_ufo2.md)**
+**⚙️ [UFO² Configuration Details →](configuration/system/agents_config.md)**
-## ❓Get help
-* ❔GitHub Issues (prefered)
-* For other communications, please contact [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
---
-## 📚 Citation
+## 📚 Documentation Navigation
+
+### 🎯 Getting Started
+
+Start here if you're new to UFO³:
+
+| Guide | Description | Framework |
+|-------|-------------|-----------|
+| [Galaxy Quick Start](getting_started/quick_start_galaxy.md) | Set up multi-device orchestration in 10 minutes | 🌌 Galaxy |
+| [UFO² Quick Start](getting_started/quick_start_ufo2.md) | Start automating Windows in 5 minutes | 🪟 UFO² |
+| [Linux Agent Quick Start](getting_started/quick_start_linux.md) | Automate Linux systems | 🐧 Linux |
+| [Mobile Agent Quick Start](getting_started/quick_start_mobile.md) | Automate Android devices via ADB | 📱 Mobile |
+| [Choosing Your Path](choose_path.md) | Decision guide for selecting the right framework | Both |
+
+### 🏗️ Core Architecture
+
+Understand how UFO³ works under the hood:
+
+| Topic | Description | Framework |
+|-------|-------------|-----------|
+| [Galaxy Overview](galaxy/overview.md) | Multi-device orchestration architecture | 🌌 Galaxy |
+| [UFO² Overview](ufo2/overview.md) | Desktop AgentOS architecture and concepts | 🪟 UFO² |
+| [Task Constellation](galaxy/constellation/overview.md) | DAG-based workflow representation | 🌌 Galaxy |
+| [ConstellationAgent](galaxy/constellation_agent/overview.md) | Intelligent task planner and decomposer | 🌌 Galaxy |
+| [Task Orchestrator](galaxy/constellation_orchestrator/overview.md) | Execution engine and coordinator | 🌌 Galaxy |
+| [AIP Protocol](aip/overview.md) | Agent communication protocol | 🌌 Galaxy |
+
+### ⚙️ Configuration & Setup
+
+Configure your agents, models, and environments:
+
+| Topic | Description | Framework |
+|-------|-------------|-----------|
+| [Agent Configuration](configuration/system/agents_config.md) | LLM and agent settings | Both |
+| [Galaxy Devices](configuration/system/galaxy_devices.md) | Device pool and capability management | 🌌 Galaxy |
+| [Model Providers](configuration/models/overview.md) | Supported LLMs (OpenAI, Azure, Qwen, etc.) | Both |
-If you build on this work, please cite our the AgentOS framework:
+### 🎓 Tutorials & Examples
-**UFO² – The Desktop AgentOS (2025)**
-
+Learn through practical examples in the documentation:
+
+| Topic | Description | Framework |
+|-------|-------------|-----------|
+| [Creating App Agents](tutorials/creating_app_agent/overview.md) | Build custom application agents | 🪟 UFO² |
+| [Multi-Action Prediction](ufo2/core_features/multi_action.md) | Efficient batch predictions | 🪟 UFO² |
+| [Knowledge Substrate](ufo2/core_features/knowledge_substrate/overview.md) | RAG-enhanced learning | 🪟 UFO² |
+
+### 🔧 Advanced Topics
+
+Deep dive into powerful features:
+
+| Topic | Description | Framework |
+|-------|-------------|-----------|
+| [Multi-Action Prediction](ufo2/core_features/multi_action.md) | Batch actions for 51% fewer LLM calls | 🪟 UFO² |
+| [Hybrid Detection](ufo2/core_features/control_detection/hybrid_detection.md) | Visual + UIA control detection | 🪟 UFO² |
+| [Knowledge Substrate](ufo2/core_features/knowledge_substrate/overview.md) | RAG-enhanced learning | 🪟 UFO² |
+| [Constellation Agent](galaxy/constellation_agent/overview.md) | Task planning and decomposition | 🌌 Galaxy |
+| [Task Orchestrator](galaxy/constellation_orchestrator/overview.md) | Execution coordination | 🌌 Galaxy |
+
+### 🛠️ Development & Extension
+
+Customize and extend UFO³:
+
+| Topic | Description |
+|-------|-------------|
+| [Project Structure](project_directory_structure.md) | Understand the codebase layout |
+| [Creating Custom Device Agents](tutorials/creating_device_agent/overview.md) | Build device agents for new platforms (mobile, web, IoT, etc.) |
+| [Creating App Agents](tutorials/creating_app_agent/overview.md) | Build custom application agents |
+| [Contributing Guide](about/CONTRIBUTING.md) | How to contribute to UFO³ |
+
+### ❓ Support & Troubleshooting
+
+Get help when you need it:
+
+| Resource | What You'll Find |
+|----------|------------------|
+| [FAQ](faq.md) | Common questions and answers |
+| [GitHub Discussions](https://github.com/microsoft/UFO/discussions) | Community Q&A |
+| [GitHub Issues](https://github.com/microsoft/UFO/issues) | Bug reports and feature requests |
+
+---
+
+## 📊 Feature Matrix
+
+| Feature | UFO² Desktop AgentOS | UFO³ Galaxy | Winner |
+|---------|:--------------------:|:-----------:|:------:|
+| **Windows Automation** | ⭐⭐⭐⭐⭐ Optimized | ⭐⭐⭐⭐ Supported | UFO² |
+| **Cross-Device Tasks** | ❌ Not supported | ⭐⭐⭐⭐⭐ Core feature | Galaxy |
+| **Setup Complexity** | ⭐⭐⭐⭐⭐ Very easy | ⭐⭐⭐ Moderate | UFO² |
+| **Learning Curve** | ⭐⭐⭐⭐⭐ Gentle | ⭐⭐⭐ Moderate | UFO² |
+| **Task Complexity** | ⭐⭐⭐ Good | ⭐⭐⭐⭐⭐ Excellent | Galaxy |
+| **Parallel Execution** | ❌ Sequential | ⭐⭐⭐⭐⭐ Native DAG | Galaxy |
+| **Stability** | ⭐⭐⭐⭐⭐ Stable | ⭐⭐⭐ Active dev | UFO² |
+| **Monitoring Tools** | ⭐⭐⭐ Logs | ⭐⭐⭐⭐⭐ Real-time viz | Galaxy |
+| **API Flexibility** | ⭐⭐⭐ Good | ⭐⭐⭐⭐⭐ Extensive | Galaxy |
+
+---
+
+## 🎯 Use Cases & Examples
+
+Explore what you can build with UFO³:
+
+### 🌌 Galaxy Use Cases (Cross-Device)
+
+Perfect for complex, multi-device workflows:
+
+- **Cross-Platform Data Pipelines**: Extract from Windows Excel → Process on Linux → Visualize on Mac
+- **Distributed Testing**: Run tests on Windows → Deploy to Linux → Update mobile app
+- **Multi-Device Monitoring**: Collect logs from multiple devices → Aggregate centrally
+- **Complex Automation**: Orchestrate workflows across heterogeneous platforms
+
+### 🪟 UFO² Use Cases (Windows)
+
+Perfect for Windows automation and rapid task execution:
+
+- **Office Automation**: Excel/Word/PowerPoint report generation and data processing
+- **Web Automation**: Browser-based research, form filling, data extraction
+- **File Management**: Organize, rename, convert files based on rules
+- **System Tasks**: Windows configuration, software installation, backups
+
+---
+
+## 🌐 Community & Resources
+
+### 📺 Media & Videos
+
+Check out our official deep dive of UFO on [YouTube](https://www.youtube.com/watch?v=QT_OhygMVXU).
+
+### Media Coverage:
+- [微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」](https://www.jiqizhixin.com/articles/2025-05-06-13)
+- [Microsoft's UFO: Smarter Windows Experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
+- [下一代Windows系统曝光:基于GPT-4V](https://baijiahao.baidu.com/s?id=1790938358152188625)
+
+### 💬 Get Help & Connect
+- **📖 Documentation**: You're here! Browse the navigation above
+- **💬 Discussions**: [GitHub Discussions](https://github.com/microsoft/UFO/discussions)
+- **🐛 Issues**: [GitHub Issues](https://github.com/microsoft/UFO/issues)
+- **📧 Email**: [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+
+### 🎨 Related Projects
+- **[TaskWeaver](https://github.com/microsoft/TaskWeaver)** – Code-first LLM agent framework
+- **[Windows Agent Arena](https://github.com/nice-mee/WindowsAgentArena)** – Evaluation benchmark
+- **[GUI Agents Survey](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)** – Latest research
+
+---
+
+## 📚 Research & Citation
+
+UFO³ is built on cutting-edge research in multi-agent systems and GUI automation.
+
+### Papers
+
+If you use UFO³ in your research, please cite:
+
+**UFO³ Galaxy Framework (2025)**
+```bibtex
+@article{zhang2025ufo3,
+ title={UFO$^3$: Weaving the Digital Agent Galaxy},
+ author = {Zhang, Chaoyun and Li, Liqun and Huang, He and Ni, Chiming and Qiao, Bo and Qin, Si and Kang, Yu and Ma, Minghua and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2511.11332},
+ year = {2025},
+}
+```
+
+**UFO² Desktop AgentOS (2025)**
```bibtex
@article{zhang2025ufo2,
title = {{UFO2: The Desktop AgentOS}},
@@ -95,8 +455,7 @@ If you build on this work, please cite our the AgentOS framework:
}
```
-**UFO – A UI‑Focused Agent for Windows OS Interaction (2024)**
-
+**Original UFO (2024)**
```bibtex
@article{zhang2024ufo,
title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
@@ -106,24 +465,71 @@ If you build on this work, please cite our the AgentOS framework:
}
```
+**📖 [Read the Papers →](https://arxiv.org/abs/2504.14603)**
---
-## 📝 Roadmap
+## 🗺️ Roadmap & Future
+
+### UFO² Desktop AgentOS (Stable/LTS)
+- ✅ Long-term support and maintenance
+- ✅ Windows device agent integration
+- 🔜 Enhanced device capabilities
+- 🔜 Picture-in-Picture mode
+
+### UFO³ Galaxy (Active Development)
+- ✅ Constellation Framework
+- ✅ Multi-device coordination
+- 🔄 Mobile, Web, IoT agents
+- 🔄 Interactive visualization
+- 🔜 Advanced fault tolerance
-The UFO² team is actively working on the following features and improvements:
+**Legend:** ✅ Done | 🔄 In Progress | 🔜 Planned
-- [ ] **Picture‑in‑Picture Mode** – Completed and will be available in the next release
-- [ ] **AgentOS‑as‑a‑Service** – Completed and will be available in the next release
-- [ ] **Auto‑Debugging Toolkit** – Completed and will be available in the next release
-- [ ] **Integration with MCP and Agent2Agent Communication** – Planned; under implementation
+---
+
+## ⚖️ License & Legal
+
+- **License**: [MIT License](https://github.com/microsoft/UFO/blob/main/LICENSE)
+- **Disclaimer**: [Read our disclaimer](https://github.com/microsoft/UFO/blob/main/DISCLAIMER.md)
+- **Trademarks**: [Microsoft Trademark Guidelines](https://www.microsoft.com/legal/intellectualproperty/trademarks)
+- **Contributing**: [Contribution Guidelines](about/CONTRIBUTING.md)
---
-## 🎨 Related Projects
-- **TaskWeaver** — a code‑first LLM agent for data analytics:
-- **LLM‑Brained GUI Agents: A Survey**: • [GitHub](https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey) • [Interactive site](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)
+
+## 🚀 Ready to Start?
+
+Choose your framework and begin your automation journey:
+
+
+### 🌌 Start with Galaxy
+**For multi-device orchestration**
+
+[](getting_started/quick_start_galaxy.md)
+
+
+### 🪟 Start with UFO²
+**For Windows automation**
+
+[](getting_started/quick_start_ufo2.md)
+
+
+### 📖 Explore the Documentation
+
+[Core Concepts](galaxy/overview.md) | [Configuration](configuration/system/agents_config.md) | [FAQ](faq.md) | [GitHub](https://github.com/microsoft/UFO)
+
+
+---
+
+
+
+
+ From Single Agent to Digital Galaxy
+
+ UFO³ - Weaving the Future of Intelligent Automation
+
---
@@ -134,4 +540,4 @@ The UFO² team is actively working on the following features and improvements:
gtag('js', new Date());
gtag('config', 'G-FX17ZGJYGC');
-
+
\ No newline at end of file
diff --git a/documents/docs/infrastructure/agents/agent_types.md b/documents/docs/infrastructure/agents/agent_types.md
new file mode 100644
index 000000000..4c444297b
--- /dev/null
+++ b/documents/docs/infrastructure/agents/agent_types.md
@@ -0,0 +1,1075 @@
+# Platform-Specific Agent Implementations
+
+This document describes how the unified three-layer Device Agent architecture is implemented across different platforms. While the core framework (State, Processor, Command layers) remains consistent, each platform implements specialized agents optimized for their native control mechanisms and hierarchies. Understanding these implementations is essential for extending UFO3 to new platforms or customizing existing agents.
+
+## Overview
+
+UFO3's Device Agent architecture achieves cross-platform compatibility through **platform-specific agent implementations** that inherit from a common abstract framework. Each platform's agents implement the same `BasicAgent` interface while adapting the three-layer architecture to their unique execution environments:
+
+```mermaid
+graph TB
+ subgraph "Unified Framework"
+ Framework[Three-Layer Architecture State + Processor + Command]
+ end
+
+ subgraph "Windows Platform"
+ HostAgent[HostAgent Application Selection]
+ AppAgent[AppAgent Application Control]
+
+ HostAgent -->|Delegates| AppAgent
+ end
+
+ subgraph "Linux Platform"
+ LinuxAgent[LinuxAgent Shell Commands]
+ end
+
+ subgraph "Future Platforms"
+ MacAgent[macOS Agent]
+ AndroidAgent[Android Agent]
+ IOSAgent[iOS Agent]
+ end
+
+ Framework -.Implements.-> HostAgent
+ Framework -.Implements.-> AppAgent
+ Framework -.Implements.-> LinuxAgent
+ Framework -.Extends to.-> MacAgent
+ Framework -.Extends to.-> AndroidAgent
+ Framework -.Extends to.-> IOSAgent
+
+ style Framework fill:#fff4e1
+ style HostAgent fill:#e1f5ff
+ style AppAgent fill:#e1f5ff
+ style LinuxAgent fill:#c8e6c9
+ style MacAgent fill:#f0f0f0
+ style AndroidAgent fill:#f0f0f0
+ style IOSAgent fill:#f0f0f0
+```
+
+**Unified Framework Benefits:**
+
+- **Code Reuse**: State management, strategy orchestration, and command dispatch logic shared across platforms
+- **Consistent Interface**: All agents implement BasicAgent interface with same lifecycle (handle, next_state, next_agent)
+- **Extensibility**: New platforms inherit three-layer architecture, only implementing platform-specific strategies and commands
+- **Multi-Platform Coordination**: HostAgent on Windows can coordinate with LinuxAgent on Linux device via Blackboard
+
+---
+
+## Platform Comparison
+
+| Feature | Windows (Two-Tier) | Linux (Single-Tier) | Future (macOS, Mobile) |
+|---------|-------------------|---------------------|------------------------|
+| **Agent Hierarchy** | HostAgent ?AppAgent delegation | LinuxAgent (flat) | Platform-specific (TBD) |
+| **Observation Method** | UI Automation API (COM) | Shell output, accessibility tree | Platform APIs (Accessibility, Screen) |
+| **Action Mechanism** | UI element manipulation (click, type) | Shell command execution | Platform-specific controls |
+| **Application Model** | Windowed applications | Command-line tools, X11 apps | Application frameworks |
+| **State Complexity** | 7 states (CONTINUE, FINISH, CONFIRM, etc.) | Simplified state set | Platform-dependent |
+| **Multi-Agent Coordination** | HostAgent ?AppAgent via Blackboard | N/A (single agent per device) | Cross-device via Blackboard |
+| **Primary Use Cases** | Office automation, GUI apps | Server management, DevOps | Mobile apps, embedded systems |
+
+!!! info "Platform Selection Strategy"
+ - **Windows**: Use HostAgent + AppAgent for GUI-based applications requiring multi-step workflows (e.g., Excel data analysis, Word document editing)
+ - **Linux**: Use LinuxAgent for command-line tasks, server administration, scripting workflows
+ - **Cross-Platform**: Coordinate Windows and Linux agents via Blackboard for hybrid tasks (e.g., Windows collects data, Linux processes on server)
+
+---
+
+## Windows Platform: Two-Tier Agent Hierarchy
+
+Windows implements a **two-tier hierarchy** where HostAgent manages application selection and task decomposition, delegating execution to AppAgent instances for specific applications.
+
+### Architecture Overview
+
+```mermaid
+graph TB
+ subgraph "Windows Two-Tier Hierarchy"
+ User[User Request: 'Create chart in Excel from Word data']
+
+ HostAgent[HostAgent Task Orchestrator]
+
+ AppAgent1[AppAgent Microsoft Word]
+ AppAgent2[AppAgent Microsoft Excel]
+
+ User --> HostAgent
+ HostAgent -->|1. Extract data| AppAgent1
+ HostAgent -->|2. Create chart| AppAgent2
+
+ AppAgent1 -.Result via Blackboard.-> HostAgent
+ AppAgent2 -.Result via Blackboard.-> HostAgent
+ end
+
+ subgraph "HostAgent Responsibilities"
+ TaskDecomp[Task Decomposition]
+ AppSelect[Application Selection]
+ SubtaskDist[Subtask Distribution]
+ ResultAgg[Result Aggregation]
+ end
+
+ subgraph "AppAgent Responsibilities"
+ UIObserve[UI Observation]
+ ActionExec[Action Execution]
+ AppControl[Application Control]
+ end
+
+ HostAgent --> TaskDecomp
+ HostAgent --> AppSelect
+ HostAgent --> SubtaskDist
+ HostAgent --> ResultAgg
+
+ AppAgent1 --> UIObserve
+ AppAgent1 --> ActionExec
+ AppAgent1 --> AppControl
+
+ style HostAgent fill:#fff4e1
+ style AppAgent1 fill:#e1f5ff
+ style AppAgent2 fill:#e1f5ff
+```
+
+**Two-Tier Execution Flow Example:**
+
+**User Request**: "Extract data from sales.docx and create a bar chart in Excel"
+
+**HostAgent**:
+1. Analyzes request → Identifies need for Word + Excel
+2. Creates subtask 1: "Extract sales data from Word document"
+3. Delegates to AppAgent (Word) via `next_agent()`
+
+**AppAgent (Word)**:
+1. Observes Word UI, locates sales data table
+2. Executes `select_text` + `copy_to_clipboard` actions
+3. Writes result to Blackboard: `blackboard.add_data(data, blackboard.trajectories)`
+4. Returns to HostAgent via `next_agent(HostAgent)`
+
+**HostAgent**:
+1. Reads result from Blackboard
+2. Creates subtask 2: "Create bar chart in Excel from extracted data"
+3. Delegates to AppAgent (Excel) via `next_agent()`
+
+**AppAgent (Excel)**:
+1. Reads data from Blackboard trajectories
+2. Executes actions: `paste_data` → `select_data_range` → `insert_chart`
+3. Returns to HostAgent with `AgentStatus.FINISH`
+
+---
+
+## HostAgent: Application Selection and Task Orchestration
+
+The **HostAgent** is the top-level coordinator in the Windows two-tier hierarchy, responsible for **application selection**, **task decomposition**, and **subtask distribution**.
+
+### HostAgent Architecture
+
+```python
+@AgentRegistry.register(agent_name="hostagent")
+class HostAgent(BasicAgent):
+ """
+ The HostAgent class is the manager of AppAgents.
+ Coordinates multi-application workflows on Windows.
+ """
+
+ def __init__(
+ self,
+ name: str,
+ is_visual: bool,
+ main_prompt: str,
+ example_prompt: str,
+ api_prompt: str,
+ ) -> None:
+ super().__init__(name=name)
+ self.prompter = HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)
+ self.agent_factory = AgentFactory()
+ self.appagent_dict = {} # Cache of created AppAgent instances
+ self._active_appagent = None
+ self._blackboard = Blackboard() # Shared coordination space
+ self.set_state(self.default_state)
+```
+
+### Key Responsibilities
+
+| Responsibility | Implementation | Example |
+|----------------|----------------|---------|
+| **Task Decomposition** | LLM analyzes user request, breaks into subtasks | "Create report" ?["Extract data", "Generate chart", "Format document"] |
+| **Application Selection** | Identifies required applications for each subtask | Subtask "Extract data" ?Microsoft Word |
+| **AppAgent Creation** | Factory pattern creates AppAgent instances on-demand | `agent_factory.create_agent("app", process="WINWORD.EXE")` |
+| **Subtask Delegation** | Routes subtasks to appropriate AppAgent | `next_agent() ?AppAgent(Word)` |
+| **Result Aggregation** | Collects results from AppAgents via Blackboard | `blackboard.get_value("appagent/word/result")` |
+| **Multi-App Coordination** | Sequences actions across multiple applications | Word → Excel → PowerPoint workflow |
+
+### HostAgent Processor
+
+```python
+class HostAgentProcessor(ProcessorTemplate):
+ """
+ Processor for HostAgent with specialized strategies.
+ """
+
+ def __init__(self, agent, context):
+ super().__init__(agent, context)
+
+ # DATA_COLLECTION: Get list of running applications
+ self.register_strategy(
+ ProcessingPhase.DATA_COLLECTION,
+ HostDataCollectionStrategy(agent, context)
+ )
+
+ # LLM_INTERACTION: Application selection and task planning
+ self.register_strategy(
+ ProcessingPhase.LLM_INTERACTION,
+ HostLLMInteractionStrategy(agent, context)
+ )
+
+ # ACTION_EXECUTION: Create AppAgent, delegate subtask
+ self.register_strategy(
+ ProcessingPhase.ACTION_EXECUTION,
+ HostActionExecutionStrategy(agent, context)
+ )
+```
+
+**HostAgent Strategy Specializations:**
+
+- **DATA_COLLECTION**: Uses MCP tools to observe available Windows apps
+- **LLM_INTERACTION**: Specialized prompt template for application selection:
+ - Input: User request + list of running apps
+ - Output: Selected application + decomposed subtask
+- **ACTION_EXECUTION**: Instead of executing UI commands, creates/retrieves AppAgent instance and delegates via `next_agent()`
+
+### HostAgent State Transitions
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: User request received
+
+ CONTINUE --> CONTINUE: Select app, delegate to AppAgent
+ CONTINUE --> FINISH: All subtasks completed
+ CONTINUE --> CONFIRM: Need user confirmation
+ CONTINUE --> ERROR: Application selection failed
+
+ CONFIRM --> CONTINUE: User confirms
+ CONFIRM --> FINISH: User rejects
+
+ ERROR --> FINISH: Unrecoverable error
+ FINISH --> [*]
+
+ note right of CONTINUE
+ HostAgent delegates to AppAgent
+ via next_agent() method
+ end note
+
+ note right of FINISH
+ HostAgent aggregates results
+ from all AppAgents
+ end note
+```
+
+**HostAgent Delegation Pattern Example:**
+
+```python
+class HostAgent(BasicAgent):
+ def handle(self, context: Context) -> Tuple[AgentStatus, Optional[BasicAgent]]:
+ """
+ Handle HostAgent state: select application and delegate.
+ """
+ # Execute processor strategies
+ processor = HostAgentProcessor(self, context)
+ result = processor.process()
+
+ # Get selected application from LLM response
+ selected_app = result.parsed_response.get("application")
+ subtask = result.parsed_response.get("subtask")
+
+ # Create or retrieve AppAgent for selected application
+ appagent = self.get_or_create_appagent(selected_app)
+
+ # Write subtask to Blackboard for AppAgent to read
+ self._blackboard.add_data(
+ {"subtask": subtask, "app": selected_app},
+ self._blackboard.requests
+ )
+
+ # Delegate to AppAgent
+ return AgentStatus.CONTINUE, appagent
+
+ def get_or_create_appagent(self, app_name: str) -> AppAgent:
+ """
+ Factory method: Create AppAgent if not exists, otherwise return cached instance.
+ """
+ if app_name not in self.appagent_dict:
+ self.appagent_dict[app_name] = self.agent_factory.create_agent(
+ agent_type="app",
+ name=f"AppAgent/{app_name}",
+ process_name=app_name,
+ app_root_name=app_name
+ )
+ return self.appagent_dict[app_name]
+```
+
+---
+
+## AppAgent: Application-Specific Control
+
+The **AppAgent** is responsible for **direct control of a specific Windows application**, executing UI-based actions through Windows UI Automation APIs.
+
+### AppAgent Architecture
+
+```python
+@AgentRegistry.register(agent_name="appagent", processor_cls=AppAgentProcessor)
+class AppAgent(BasicAgent):
+ """
+ The AppAgent class manages interaction with a specific Windows application.
+ """
+
+ def __init__(
+ self,
+ name: str,
+ process_name: str,
+ app_root_name: str,
+ is_visual: bool,
+ main_prompt: str,
+ example_prompt: str,
+ mode: str = "normal",
+ ) -> None:
+ super().__init__(name=name)
+ self.prompter = AppAgentPrompter(is_visual, main_prompt, example_prompt)
+ self._process_name = process_name # e.g., "WINWORD.EXE"
+ self._app_root_name = app_root_name # e.g., "Microsoft Word"
+ self._mode = mode
+ self.set_state(self.default_state)
+```
+
+### Key Responsibilities
+
+| Responsibility | Implementation | Example |
+|----------------|----------------|---------|
+| **UI Observation** | Screenshot + UI Automation tree capture | `get_ui_tree` returns hierarchical element structure |
+| **Element Identification** | Parse UI tree to locate target elements | Find "Save" button by name, control type, bounding box |
+| **Action Execution** | Execute UI commands via MCP tools | `click_element(element_id="save_button")` |
+| **Application Context** | Maintain application-specific state | Current document, active window, focus element |
+| **Error Handling** | Detect and recover from UI failures | Retry on stale element, fallback to keyboard shortcuts |
+| **Result Reporting** | Write results to Blackboard for HostAgent | `blackboard.add_key_value("result", "Document saved")` |
+
+### AppAgent Processor
+
+```python
+class AppAgentProcessor(ProcessorTemplate):
+ """
+ Processor for AppAgent with UI-focused strategies.
+ """
+
+ def __init__(self, agent, context):
+ super().__init__(agent, context)
+
+ # DATA_COLLECTION: Screenshot + UI tree
+ self.register_strategy(
+ ProcessingPhase.DATA_COLLECTION,
+ ComposedStrategy([
+ ScreenshotStrategy(agent, context),
+ UITreeStrategy(agent, context)
+ ])
+ )
+
+ # LLM_INTERACTION: UI element selection
+ self.register_strategy(
+ ProcessingPhase.LLM_INTERACTION,
+ AppAgentLLMStrategy(agent, context)
+ )
+
+ # ACTION_EXECUTION: Execute UI commands
+ self.register_strategy(
+ ProcessingPhase.ACTION_EXECUTION,
+ UIActionExecutionStrategy(agent, context)
+ )
+```
+
+### Windows UI Automation Integration
+
+AppAgent leverages **Windows UI Automation (UIA)** for robust UI control:
+
+```mermaid
+graph LR
+ subgraph "AppAgent Observation"
+ AppAgent[AppAgent]
+
+ Screenshot[Screenshot Visual Context]
+ UITree[UI Automation Tree Element Hierarchy]
+
+ AppAgent --> Screenshot
+ AppAgent --> UITree
+ end
+
+ subgraph "Windows UI Automation"
+ UIA[UI Automation API]
+
+ Elements[UI Elements Button, TextBox, etc.]
+ Properties[Element Properties Name, Type, BoundingBox]
+ Patterns[Control Patterns Invoke, Value, Selection]
+
+ UIA --> Elements
+ UIA --> Properties
+ UIA --> Patterns
+ end
+
+ UITree -.Query.-> UIA
+
+ subgraph "Action Execution"
+ Commands[MCP Commands]
+
+ Click[click_element]
+ Type[type_text]
+ Select[select_item]
+
+ Commands --> Click
+ Commands --> Type
+ Commands --> Select
+ end
+
+ AppAgent --> Commands
+ Commands -.Invoke.-> Patterns
+
+ style AppAgent fill:#e1f5ff
+ style UIA fill:#fff4e1
+ style Commands fill:#ffe1e1
+```
+
+**UI Automation Capabilities:**
+
+- **Element Discovery**: Traverse UI tree to find controls by name, type, automation ID
+- **Property Access**: Read element properties (text, state, position, visibility)
+- **Pattern Invocation**: Execute control-specific actions:
+ - InvokePattern: Click buttons, menu items
+ - ValuePattern: Set text in textboxes
+ - SelectionPattern: Select items in lists, dropdowns
+ - TogglePattern: Toggle checkboxes, radio buttons
+
+### AppAgent Commands
+
+| Command Category | Commands | Description |
+|-----------------|----------|-------------|
+| **Observation** | `screenshot`, `get_ui_tree`, `get_accessibility_tree` | Capture visual and structural UI information |
+| **Navigation** | `click_element`, `double_click`, `right_click` | Navigate UI through mouse interactions |
+| **Text Input** | `type_text`, `set_value`, `clear_text` | Input and modify text in UI controls |
+| **Selection** | `select_item`, `select_dropdown`, `toggle_checkbox` | Manipulate selection controls |
+| **Scrolling** | `scroll`, `scroll_to_element` | Navigate large UI areas |
+| **Window Management** | `activate_window`, `close_window`, `maximize_window` | Control window state |
+| **File Operations** | `open_file`, `save_file`, `save_as` | Application-specific file actions |
+
+**AppAgent UI Control Pattern Example:**
+
+```python
+class AppAgent(BasicAgent):
+ def handle(self, context: Context) -> Tuple[AgentStatus, Optional[BasicAgent]]:
+ """
+ Handle AppAgent state: Control application UI.
+ """
+ # Read subtask from Blackboard (written by HostAgent)
+ subtask_memory = self._blackboard.requests.to_list_of_dicts()
+ if subtask_memory:
+ subtask = subtask_memory[-1].get("subtask")
+
+ # Execute processor strategies
+ processor = AppAgentProcessor(self, context)
+ context.set(ContextNames.REQUEST, subtask)
+ result = processor.process()
+
+ # Check if subtask completed
+ if result.status == AgentStatus.FINISH:
+ # Write result to Blackboard
+ self._blackboard.add_data(
+ {"result": result.parsed_response.get("result")},
+ self._blackboard.trajectories
+ )
+
+ # Return to HostAgent
+ return AgentStatus.FINISH, self.parent_agent
+
+ return result.status, None
+```
+
+---
+
+## Linux Platform: Single-Tier Agent System
+
+Linux implements a **single-tier architecture** where LinuxAgent directly executes shell commands without hierarchical delegation.
+
+### LinuxAgent Architecture
+
+```mermaid
+graph TB
+ subgraph "Linux Single-Tier System"
+ User[User Request: 'Check server logs and restart service']
+
+ LinuxAgent[LinuxAgent Shell Command Executor]
+
+ Shell[Linux Shell bash, zsh, etc.]
+
+ User --> LinuxAgent
+ LinuxAgent -->|1. Execute: tail /var/log/app.log| Shell
+ Shell -.Output.-> LinuxAgent
+ LinuxAgent -->|2. Execute: systemctl restart app| Shell
+ Shell -.Status.-> LinuxAgent
+ end
+
+ subgraph "LinuxAgent Capabilities"
+ ShellExec[Shell Command Execution]
+ OutputParse[Output Parsing]
+ ChainCmd[Command Chaining]
+ ErrorHandle[Error Detection]
+ end
+
+ LinuxAgent --> ShellExec
+ LinuxAgent --> OutputParse
+ LinuxAgent --> ChainCmd
+ LinuxAgent --> ErrorHandle
+
+ style LinuxAgent fill:#c8e6c9
+ style Shell fill:#fff4e1
+```
+
+```python
+@AgentRegistry.register(
+ agent_name="LinuxAgent",
+ third_party=True,
+ processor_cls=LinuxAgentProcessor
+)
+class LinuxAgent(CustomizedAgent):
+ """
+ LinuxAgent is a specialized agent that interacts with Linux systems.
+ Executes shell commands and parses output.
+ """
+
+ def __init__(
+ self,
+ name: str,
+ main_prompt: str,
+ example_prompt: str,
+ ) -> None:
+ super().__init__(
+ name=name,
+ main_prompt=main_prompt,
+ example_prompt=example_prompt,
+ process_name=None,
+ app_root_name=None,
+ is_visual=None # LinuxAgent typically operates without visual mode
+ )
+ self._blackboard = Blackboard()
+ self.set_state(ContinueLinuxAgentState())
+```
+
+### Key Differences from Windows Agents
+
+| Aspect | Windows (HostAgent + AppAgent) | Linux (LinuxAgent) |
+|--------|--------------------------------|-------------------|
+| **Hierarchy** | Two-tier (delegation pattern) | Single-tier (direct execution) |
+| **Observation** | Screenshot + UI Automation tree | Shell command output (stdout/stderr) |
+| **Action Mechanism** | UI element manipulation | Shell command execution |
+| **Context Tracking** | Application windows, UI state | Command history, working directory |
+| **Error Detection** | UI element not found, timeout | Exit code ?0, stderr output |
+| **Coordination** | Via Blackboard between HostAgent and AppAgent | Via Blackboard with other devices (cross-device) |
+
+### LinuxAgent Processor
+
+```python
+class LinuxAgentProcessor(ProcessorTemplate):
+ """
+ Processor for LinuxAgent with shell-focused strategies.
+ """
+
+ def __init__(self, agent, context):
+ super().__init__(agent, context)
+
+ # DATA_COLLECTION: No visual observation, use command output from previous step
+ self.register_strategy(
+ ProcessingPhase.DATA_COLLECTION,
+ LinuxDataCollectionStrategy(agent, context) # Collects shell output
+ )
+
+ # LLM_INTERACTION: Command generation from request
+ self.register_strategy(
+ ProcessingPhase.LLM_INTERACTION,
+ LinuxLLMStrategy(agent, context) # Generates shell commands
+ )
+
+ # ACTION_EXECUTION: Execute shell commands
+ self.register_strategy(
+ ProcessingPhase.ACTION_EXECUTION,
+ ShellExecutionStrategy(agent, context) # Executes via shell_execute
+ )
+```
+
+### LinuxAgent Commands
+
+| Command | Function | Example |
+|---------|----------|---------|
+| `shell_execute` | Execute shell command (non-blocking) | `shell_execute(command="ls -la /home/user")` |
+| `shell_execute_read` | Execute command and capture output | `shell_execute_read(command="cat /var/log/app.log")` |
+| `get_accessibility_tree` | Get GUI app accessibility tree (X11) | `get_accessibility_tree()` for GUI apps |
+| `screenshot` | Capture screen (optional, for GUI) | `screenshot()` |
+
+**LinuxAgent Best Practices:**
+
+- **Command Chaining**: Use `&&` and `||` for robust workflows:
+ ```bash
+ cd /app && ./deploy.sh || echo "Deployment failed"
+ ```
+- **Output Parsing**: Parse stdout for structured data:
+ ```python
+ output = shell_execute_read("systemctl status app")
+ if "active (running)" in output:
+ # Service is running
+ ```
+- **Error Handling**: Check exit codes and stderr:
+ ```python
+ result = shell_execute("restart_service.sh")
+ if result.status == ResultStatus.FAILURE:
+ # Handle error from stderr
+ ```
+- **Idempotency**: Design commands to be safely re-runnable:
+ ```bash
+ # Good: Check before creating
+ [ -d /app/backup ] || mkdir -p /app/backup
+
+ # Bad: Fails if directory exists
+ mkdir /app/backup
+ ```
+
+**LinuxAgent Cross-Device Coordination Example:**
+
+```python
+# Windows HostAgent prepares data for Linux processing
+windows_blackboard.add_data(
+ {"data_file": "C:/export/data.csv", "ready": True},
+ windows_blackboard.requests
+)
+
+# LinuxAgent polls Blackboard for task availability
+requests = linux_blackboard.requests.to_list_of_dicts()
+if requests and requests[-1].get("ready"):
+ # Download data from Windows device (via network share or AIP)
+ await linux_agent.execute_command(
+ "scp user@windows-pc:/c/export/data.csv /tmp/data.csv"
+ )
+
+ # Process data
+ await linux_agent.execute_command(
+ "python3 /app/process.py /tmp/data.csv"
+ )
+
+ # Report completion
+ linux_blackboard.add_data(
+ {"status": "completed"},
+ linux_blackboard.trajectories
+ )
+```
+
+---
+
+## Multi-Agent Coordination Patterns
+
+The three-layer architecture enables seamless coordination across different agent types through **Blackboard-based communication**.
+
+### Pattern 1: Windows Multi-App Workflow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant HostAgent
+ participant AppAgentWord
+ participant AppAgentExcel
+ participant Blackboard
+
+ User->>HostAgent: "Extract data from Word, create chart in Excel"
+
+ HostAgent->>HostAgent: Decompose task
+ HostAgent->>Blackboard: Write subtask_1: "Extract data"
+ HostAgent->>AppAgentWord: Delegate (next_agent)
+
+ AppAgentWord->>AppAgentWord: Observe Word UI
+ AppAgentWord->>AppAgentWord: Execute: select_text, copy
+ AppAgentWord->>Blackboard: Write result: extracted_data
+ AppAgentWord->>HostAgent: Return (next_agent)
+
+ HostAgent->>Blackboard: Read extracted_data
+ HostAgent->>Blackboard: Write subtask_2: "Create chart"
+ HostAgent->>AppAgentExcel: Delegate (next_agent)
+
+ AppAgentExcel->>Blackboard: Read extracted_data
+ AppAgentExcel->>AppAgentExcel: Execute: paste, select_range, insert_chart
+ AppAgentExcel->>Blackboard: Write result: chart_created
+ AppAgentExcel->>HostAgent: Return (next_agent)
+
+ HostAgent->>User: Task completed
+```
+
+### Pattern 2: Cross-Device Linux-Windows Coordination
+
+```mermaid
+sequenceDiagram
+ participant WindowsHost
+ participant WindowsApp
+ participant Blackboard
+ participant LinuxAgent
+
+ WindowsHost->>WindowsApp: "Export sales data to CSV"
+ WindowsApp->>WindowsApp: Execute: export to C:/data/sales.csv
+ WindowsApp->>Blackboard: Write: data_ready=true, path=C:/data/sales.csv
+
+ LinuxAgent->>Blackboard: Poll: data_ready?
+ Blackboard-->>LinuxAgent: data_ready=true
+
+ LinuxAgent->>LinuxAgent: Execute: scp windows-pc:/c/data/sales.csv /tmp/
+ LinuxAgent->>LinuxAgent: Execute: python3 analyze.py /tmp/sales.csv
+ LinuxAgent->>Blackboard: Write: analysis_complete=true, results=/tmp/report.pdf
+
+ WindowsHost->>Blackboard: Read: analysis_complete?
+ Blackboard-->>WindowsHost: analysis_complete=true, results=/tmp/report.pdf
+ WindowsHost->>LinuxAgent: Request: Download /tmp/report.pdf
+```
+
+### Pattern 3: Parallel Multi-Device Tasks
+
+```mermaid
+graph TB
+ subgraph "Orchestrator (HostAgent)"
+ Orchestrator[HostAgent Task Coordinator]
+ end
+
+ subgraph "Device 1: Windows Desktop"
+ AppAgent1[AppAgent PowerPoint]
+ end
+
+ subgraph "Device 2: Windows Laptop"
+ AppAgent2[AppAgent Excel]
+ end
+
+ subgraph "Device 3: Linux Server"
+ LinuxAgent1[LinuxAgent Data Processing]
+ end
+
+ subgraph "Shared Blackboard"
+ BB[Blackboard Coordination Space]
+ end
+
+ Orchestrator -->|Subtask 1: Create slides| AppAgent1
+ Orchestrator -->|Subtask 2: Generate charts| AppAgent2
+ Orchestrator -->|Subtask 3: Process data| LinuxAgent1
+
+ AppAgent1 -.Write result.-> BB
+ AppAgent2 -.Write result.-> BB
+ LinuxAgent1 -.Write result.-> BB
+
+ BB -.Aggregate results.-> Orchestrator
+
+ style Orchestrator fill:#fff4e1
+ style AppAgent1 fill:#e1f5ff
+ style AppAgent2 fill:#e1f5ff
+ style LinuxAgent1 fill:#c8e6c9
+ style BB fill:#ffe1e1
+```
+
+---
+
+## Platform Extensibility: Adding New Platforms
+
+The three-layer architecture provides a clear path for extending UFO3 to new platforms:
+
+### Extension Checklist
+
+**Steps to Add a New Platform:**
+
+1. **Define Agent Class**
+ ```python
+ @AgentRegistry.register(
+ agent_name="MacOSAgent",
+ processor_cls=MacOSAgentProcessor
+ )
+ class MacOSAgent(BasicAgent):
+ # Implement platform-specific initialization
+ ```
+
+2. **Implement Platform-Specific Strategies**
+ - **DATA_COLLECTION**: How to observe system state (screenshots, accessibility tree, shell output)
+ - **LLM_INTERACTION**: Adapt prompt template for platform capabilities
+ - **ACTION_EXECUTION**: Map actions to platform APIs (AppKit, Accessibility API, etc.)
+ - **MEMORY_UPDATE**: Standard implementation (usually no changes needed)
+
+3. **Define Platform Commands (MCP Tools)**
+ ```python
+ # macOS-specific commands
+ commands = [
+ "applescript_execute", # Execute AppleScript
+ "accessibility_tree", # macOS Accessibility API
+ "click_element", # macOS UI control
+ "type_text" # Text input
+ ]
+ ```
+
+4. **Implement AgentState Subclasses** (if needed)
+ ```python
+ class ContinueMacOSAgentState(AgentState):
+ def handle(self, agent, context):
+ # macOS-specific state handling
+ ```
+
+5. **Create Platform-Specific Processor**
+ ```python
+ class MacOSAgentProcessor(ProcessorTemplate):
+ def __init__(self, agent, context):
+ super().__init__(agent, context)
+ self.register_strategy(
+ ProcessingPhase.DATA_COLLECTION,
+ MacOSDataCollectionStrategy(agent, context)
+ )
+ # Register other strategies...
+ ```
+
+6. **Configure MCP Server** (on device client)
+ - Implement MCP tools for platform-specific operations
+ - Register tools with MCP server manager
+ - Ensure AIP client routes commands correctly
+
+### Platform-Specific Considerations
+
+| Platform | Key Considerations | Suggested Implementation |
+|----------|-------------------|--------------------------|
+| **macOS** | Accessibility API, AppleScript, window management | MacOSAgent (single-tier), AppleScript execution strategy |
+| **Android** | Activity lifecycle, UI Automator, touch gestures | AndroidAgent (single-tier), UI Automator integration |
+| **iOS** | Accessibility, XCTest, limited automation | iOSAgent (single-tier), XCTest framework |
+| **Embedded** | Limited resources, no GUI, command-line only | EmbeddedAgent (minimal strategies, shell-based) |
+| **Web** | Browser automation, DOM manipulation | WebAgent (Selenium/Playwright integration) |
+
+**Example: Adding macOS Support**
+
+```python
+# 1. Define macOS Agent
+@AgentRegistry.register(
+ agent_name="MacOSAgent",
+ processor_cls=MacOSAgentProcessor
+)
+class MacOSAgent(BasicAgent):
+ def __init__(self, name: str, main_prompt: str, example_prompt: str):
+ super().__init__(name=name)
+ self.prompter = MacOSAgentPrompter(main_prompt, example_prompt)
+ self.set_state(ContinueMacOSAgentState())
+
+# 2. Implement macOS-specific DATA_COLLECTION strategy
+class MacOSDataCollectionStrategy(ProcessingStrategy):
+ def execute(self, context: ProcessingContext):
+ # Use macOS Accessibility API
+ commands = [
+ Command(tool_name="get_accessibility_tree", parameters={}, tool_type="data_collection"),
+ Command(tool_name="screenshot", parameters={}, tool_type="data_collection")
+ ]
+ results = self.dispatcher.execute_commands(commands)
+
+ context.set_local("accessibility_tree", results[0].result)
+ context.set_local("screenshot", results[1].result)
+
+# 3. Implement macOS-specific ACTION_EXECUTION strategy
+class MacOSActionExecutionStrategy(ProcessingStrategy):
+ def execute(self, context: ProcessingContext):
+ action = context.get_global("action")
+
+ if action == "click_element":
+ # Use macOS Accessibility API via MCP tool
+ command = Command(
+ tool_name="macos_click_element",
+ parameters={"element_id": context.get_global("element_id")},
+ tool_type="action"
+ )
+ elif action == "applescript_execute":
+ # Execute AppleScript via MCP tool
+ command = Command(
+ tool_name="applescript_execute",
+ parameters={"script": context.get_global("applescript")},
+ tool_type="action"
+ )
+
+ results = self.dispatcher.execute_commands([command])
+ context.set_local("execution_results", results)
+
+# 4. Configure MCP tools on macOS device client
+# In device client code:
+mcp_server_manager.register_tool(
+ MCPToolInfo(
+ tool_name="macos_click_element",
+ description="Click element via macOS Accessibility API",
+ input_schema={
+ "element_id": {"type": "string", "description": "Accessibility element ID"}
+ },
+ # ... other fields
+ ),
+ handler=macos_accessibility_click_handler
+)
+```
+
+---
+
+## Agent Lifecycle Comparison
+
+### Windows HostAgent Lifecycle
+
+```mermaid
+sequenceDiagram
+ participant Session
+ participant HostAgent
+ participant AppAgent
+ participant Blackboard
+
+ Session->>HostAgent: Initialize (user request)
+ HostAgent->>HostAgent: Set state = ContinueHostAgentState
+
+ loop Until Task Complete
+ HostAgent->>HostAgent: Execute HostAgentProcessor
+ HostAgent->>HostAgent: LLM selects application
+ HostAgent->>HostAgent: Create/retrieve AppAgent
+ HostAgent->>Blackboard: Write subtask
+ HostAgent->>AppAgent: Delegate (next_agent)
+
+ AppAgent->>Blackboard: Read subtask
+ AppAgent->>AppAgent: Execute AppAgentProcessor
+ AppAgent->>Blackboard: Write result
+ AppAgent->>HostAgent: Return (next_agent)
+
+ HostAgent->>Blackboard: Read result
+ HostAgent->>HostAgent: Update task status
+ end
+
+ HostAgent->>Session: Return AgentStatus.FINISH
+```
+
+### Linux LinuxAgent Lifecycle
+
+```mermaid
+sequenceDiagram
+ participant Session
+ participant LinuxAgent
+ participant Shell
+
+ Session->>LinuxAgent: Initialize (user request)
+ LinuxAgent->>LinuxAgent: Set state = ContinueLinuxAgentState
+
+ loop Until Task Complete
+ LinuxAgent->>LinuxAgent: Execute LinuxAgentProcessor
+ LinuxAgent->>LinuxAgent: LLM generates shell command
+ LinuxAgent->>Shell: Execute command
+ Shell-->>LinuxAgent: Return output (stdout/stderr)
+ LinuxAgent->>LinuxAgent: Parse output
+ LinuxAgent->>LinuxAgent: Update context with result
+ LinuxAgent->>LinuxAgent: Check task completion
+ end
+
+ LinuxAgent->>Session: Return AgentStatus.FINISH
+```
+
+---
+
+## Performance and Scalability
+
+| Metric | Windows (Two-Tier) | Linux (Single-Tier) | Notes |
+|--------|-------------------|---------------------|-------|
+| **Agent Initialization** | ~500ms (HostAgent) + ~300ms per AppAgent | ~200ms (LinuxAgent) | AppAgent creation overhead for each application |
+| **Observation Latency** | ~1-2s (screenshot + UI tree) | ~100-500ms (shell output) | UI Automation API slower than shell |
+| **Action Execution** | ~200-500ms per UI action | ~50-200ms per shell command | UI actions require element discovery |
+| **Memory Footprint** | ~50MB (HostAgent) + ~30MB per AppAgent | ~20MB (LinuxAgent) | UI Automation increases memory usage |
+| **Scalability** | Limited by number of AppAgents | Handles many parallel commands | HostAgent manages AppAgent pool |
+| **Coordination Overhead** | Blackboard read/write per delegation | Minimal (only cross-device) | Two-tier hierarchy increases communication |
+
+**Performance Optimization:**
+
+- **Windows**: Reuse AppAgent instances across subtasks (cached in `appagent_dict`)
+- **Linux**: Batch multiple shell commands with `&&` to reduce round trips
+- **Cross-Platform**: Minimize Blackboard writes; use hierarchical keys for efficient reads
+
+---
+
+## Best Practices
+
+### Windows Agent Best Practices
+
+**HostAgent:**
+
+- **AppAgent Caching**: Reuse AppAgent instances for same application to avoid recreation overhead
+- **Task Decomposition**: Break complex tasks into independent subtasks for parallel execution
+- **Blackboard Namespacing**: Use clear keys within appropriate memory sections
+- **Error Propagation**: Detect AppAgent failures and retry with different strategy
+
+**AppAgent:**
+
+- **Element Stability**: Wait for UI elements to stabilize before interaction (use `wait_for_element`)
+- **Fallback Actions**: If UI Automation fails, fallback to keyboard shortcuts (e.g., Ctrl+S instead of clicking Save button)
+- **Context Awareness**: Track active window and focus to ensure actions target correct application
+- **Idempotent Actions**: Design actions to be safely retryable (e.g., check if file exists before creating)
+
+### Linux Agent Best Practices
+
+**LinuxAgent:**
+
+- **Command Validation**: Validate commands before execution to prevent injection attacks
+- **Output Parsing**: Use structured output formats (JSON, CSV) instead of parsing raw text
+- **Error Detection**: Check exit codes (`$?`) and stderr for failure detection
+- **Idempotency**: Use conditional commands (`[ -f file ] || create_file`) to safely re-run workflows
+- **Resource Cleanup**: Always clean up temporary files and processes after task completion
+
+### Cross-Platform Best Practices
+
+**Multi-Agent Coordination:**
+
+- **Blackboard Keys**: Use appropriate memory sections to separate agent-specific data:
+ ```python
+ # Good - using structured memory sections
+ blackboard.add_data({"status": "ready"}, blackboard.requests)
+ blackboard.add_data({"status": "processing"}, blackboard.trajectories)
+
+ # Bad - unclear categorization
+ blackboard.add_data({"status": "ready"}, blackboard.questions)
+ ```
+
+- **Synchronization**: Use polling or event-based patterns for cross-device synchronization:
+ ```python
+ # Polling pattern
+ while not any(r.get("task_complete") for r in blackboard.requests.to_list_of_dicts()):
+ await asyncio.sleep(1)
+
+ # Event-based (via AIP custom messages)
+ # Linux device sends completion event
+ aip_client.send_event("task_complete", {...})
+ ```
+
+- **Data Transfer**: For large data, use shared storage (network drive, S3) instead of Blackboard:
+ ```python
+ # Bad: Store large data in Blackboard
+ blackboard.add_data({"dataset": [1000000 rows]}, blackboard.trajectories)
+
+ # Good: Store reference to shared storage
+ blackboard.add_data({"dataset_path": "s3://bucket/data.csv"}, blackboard.requests)
+ ```
+
+---
+
+## Related Documentation
+
+- [Device Agent Overview](overview.md) - Three-layer architecture and design principles
+- [Server-Client Architecture](server_client_architecture.md) - Server and client separation
+- [State Layer](design/state.md) - AgentState interface and state machine
+- [Processor and Strategy Layer](design/processor.md) - ProcessorTemplate and strategy implementations
+- [Command Layer](design/command.md) - CommandDispatcher and MCP integration
+- [Memory System](design/memory.md) - Memory and Blackboard for agent coordination
+- [Server Architecture](../../server/overview.md) - Server-side orchestration
+- [Client Architecture](../../client/overview.md) - Device client MCP execution
+- [AIP Protocol](../../aip/overview.md) - Agent Interaction Protocol for communication
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **Windows Two-Tier Hierarchy**: HostAgent (orchestration) + AppAgent (application control) for GUI workflows
+- **Linux Single-Tier System**: LinuxAgent executes shell commands directly for command-line tasks
+- **Unified Framework**: Both platforms leverage same three-layer architecture (State, Processor, Command)
+- **Multi-Agent Coordination**: Blackboard enables seamless coordination across HostAgent → AppAgent and cross-device communication
+- **Platform Extensibility**: Clear extension path for macOS, Android, iOS, embedded systems
+- **HostAgent Responsibilities**: Task decomposition, application selection, AppAgent creation, subtask delegation
+- **AppAgent Capabilities**: UI observation (screenshot + UI Automation), element identification, UI action execution
+- **LinuxAgent Characteristics**: Shell command execution, output parsing, idempotent workflows
+- **Best Practices**: AppAgent caching, appropriate Blackboard usage, idempotent commands, structured output parsing
+- **Performance**: Windows UI Automation slower but more robust; Linux shell commands faster but less structured
+
+UFO3's platform-specific agent implementations demonstrate the flexibility and extensibility of the three-layer architecture, enabling cross-platform and cross-device task automation while maintaining consistent design principles and coordination mechanisms.
diff --git a/documents/docs/agents/design/blackboard.md b/documents/docs/infrastructure/agents/design/blackboard.md
similarity index 51%
rename from documents/docs/agents/design/blackboard.md
rename to documents/docs/infrastructure/agents/design/blackboard.md
index 586b11fb2..7682329cd 100644
--- a/documents/docs/agents/design/blackboard.md
+++ b/documents/docs/infrastructure/agents/design/blackboard.md
@@ -1,6 +1,6 @@
# Agent Blackboard
-The `Blackboard` is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The `Blackboard` is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The `Blackboard` is implemented as a class in the `ufo/agents/memory/blackboard.py` file.
+The `Blackboard` is a shared memory space visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The `Blackboard` enables agents to share information and collaborate to fulfill user requests.
## Components
@@ -9,19 +9,17 @@ The `Blackboard` consists of the following data components:
| Component | Description |
| --- | --- |
| `questions` | A list of questions that UFO asks the user, along with their corresponding answers. |
-| `requests` | A list of historical user requests received in previous `Round`. |
+| `requests` | A list of historical user requests received in previous rounds. |
| `trajectories` | A list of step-wise trajectories that record the agent's actions and decisions at each step. |
| `screenshots` | A list of screenshots taken by the agent when it believes the current state is important for future reference. |
-!!! tip
- The keys stored in the `trajectories` are configured as `HISTORY_KEYS` in the `config_dev.yaml` file. You can customize the keys based on your requirements and the agent's logic.
+The keys stored in the `trajectories` are configured as `HISTORY_KEYS` in the `config/ufo/system.yaml` file. You can customize the keys based on your requirements and the agent's logic.
-!!! tip
- Whether to save the screenshots is determined by the `AppAgent`. You can enable or disable screenshot capture by setting the `SCREENSHOT_TO_MEMORY` flag in the `config_dev.yaml` file.
+Whether to save the screenshots is determined by the `AppAgent`. You can enable or disable screenshot capture by setting the `SCREENSHOT_TO_MEMORY` flag in the `config/ufo/system.yaml` file.
## Blackboard to Prompt
-Data in the `Blackboard` is based on the `MemoryItem` class. It has a method `blackboard_to_prompt` that converts the information stored in the `Blackboard` to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The `blackboard_to_prompt` method is defined as follows:
+Data in the `Blackboard` is based on the `MemoryItem` class. It has a method `blackboard_to_prompt` that converts the information stored in the `Blackboard` to a list of prompt content objects. Agents call this method to construct the prompt for the LLM's inference. The `blackboard_to_prompt` method is defined as follows:
```python
def blackboard_to_prompt(self) -> List[str]:
@@ -40,7 +38,9 @@ def blackboard_to_prompt(self) -> List[str]:
prefix
+ self.texts_to_prompt(self.questions, "[Questions & Answers:]")
+ self.texts_to_prompt(self.requests, "[Request History:]")
- + self.texts_to_prompt(self.trajectories, "[Step Trajectories Completed Previously:]")
+ + self.texts_to_prompt(
+ self.trajectories, "[Step Trajectories Completed Previously:]"
+ )
+ self.screenshots_to_prompt()
)
@@ -51,5 +51,4 @@ def blackboard_to_prompt(self) -> List[str]:
:::agents.memory.blackboard.Blackboard
-!!!note
- You can customize the class to tailor the `Blackboard` to your requirements.
\ No newline at end of file
+You can customize the class to tailor the `Blackboard` to your requirements.
\ No newline at end of file
diff --git a/documents/docs/infrastructure/agents/design/command.md b/documents/docs/infrastructure/agents/design/command.md
new file mode 100644
index 000000000..a7b52f8ab
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/command.md
@@ -0,0 +1,942 @@
+# Command Layer (Level-3 System Interface)
+
+The **Command Layer** provides atomic, deterministic system operations that bridge device agents with underlying platform capabilities. Each command encapsulates a **tool** and its **parameters**, mapping directly to MCP tools on the device client. This layer ensures reliable, auditable, and extensible execution across heterogeneous devices.
+
+## Overview
+
+The Command Layer implements **Level-3** of the [three-layer device agent architecture](../overview.md#three-layer-architecture). It provides:
+
+- **Atomic Commands**: Self-contained execution units with tool + parameters
+- **MCP Integration**: Commands map to Model Context Protocol tools on device client
+- **CommandDispatcher**: Routes commands from agent server to device client via AIP
+- **Deterministic Execution**: Same inputs → same outputs, fully auditable
+- **Dynamic Discovery**: LLM queries available tools and selects appropriate commands
+
+```mermaid
+graph TB
+ subgraph "Command Layer Architecture"
+ Strategy[ProcessingStrategy Level-2] -->|creates| Commands[List of Commands tool_name + parameters]
+ Commands -->|executes via| Dispatcher[CommandDispatcher]
+
+ Dispatcher -->|routes| AIP[AIP Protocol WebSocket]
+ AIP -->|sends| Client[Device Client]
+
+ Client -->|dispatches to| MCP[MCP Server Manager]
+ MCP -->|invokes| Tool1[MCP Tool 1 click_element]
+ MCP -->|invokes| Tool2[MCP Tool 2 type_text]
+ MCP -->|invokes| Tool3[MCP Tool 3 run_command]
+
+ Tool1 -->|result| MCP
+ Tool2 -->|result| MCP
+ Tool3 -->|result| MCP
+
+ MCP -->|aggregates| Client
+ Client -->|returns| AIP
+ AIP -->|results| Dispatcher
+ Dispatcher -->|List of Results| Strategy
+ end
+```
+
+## Design Philosophy
+
+The Command Layer follows the **Command Pattern**:
+
+## Command Structure
+
+## Design Philosophy
+
+The Command Layer follows the **Command Pattern**:
+
+- **Encapsulation**: Each command encapsulates request as object
+- **Decoupling**: Invoker (strategy) decoupled from executor (MCP tool)
+- **Extensibility**: New commands added without changing invoker code
+- **Auditability**: Command history provides complete execution trace
+
+## Command Structure
+
+Each command is represented by the `Command` Pydantic model:
+
+```python
+class Command(BaseModel):
+ """
+ Represents a command to be executed by an agent.
+ Commands are atomic units of work dispatched by the orchestrator.
+ """
+
+ tool_name: str = Field(..., description="Name of the tool to execute")
+ parameters: Optional[Dict[str, Any]] = Field(
+ default=None, description="Parameters for the tool"
+ )
+ tool_type: Literal["data_collection", "action"] = Field(
+ ..., description="Type of tool: data_collection or action"
+ )
+ call_id: Optional[str] = Field(
+ default=None, description="Unique identifier for this command call"
+ )
+```
+
+### Command Properties
+
+| Property | Type | Purpose | Example |
+|----------|------|---------|---------|
+| **tool_name** | `str` | MCP tool name to invoke | `"click_element"`, `"type_text"`, `"shell_execute"` |
+| **parameters** | `Optional[Dict[str, Any]]` | Tool parameters | `{"control_id": "Button_123"}`, `{"text": "Hello"}` |
+| **tool_type** | `Literal["data_collection", "action"]` | Tool category | `"data_collection"` (observation), `"action"` (modification) |
+| **call_id** | `Optional[str]` | Unique execution identifier | `"uuid-1234-5678"` (auto-generated) |
+
+### Command Examples
+
+**Windows UI Automation Command**:
+
+```python
+Command(
+ tool_name="click_element",
+ parameters={
+ "control_id": "Button_InsertChart",
+ "process_name": "EXCEL.EXE",
+ "app_root_name": "Microsoft Excel"
+ },
+ tool_type="action"
+)
+```
+
+**Linux Shell Command**:
+
+```python
+Command(
+ tool_name="shell_execute",
+ parameters={
+ "command": "ls -la /home/user/documents",
+ "timeout": 30
+ },
+ tool_type="action"
+)
+```
+
+**File Operation Command**:
+
+```python
+Command(
+ tool_name="read_file",
+ parameters={
+ "file_path": "/home/user/data.csv",
+ "encoding": "utf-8"
+ },
+ tool_type="data_collection"
+)
+```
+
+## Result Structure
+
+Each command execution returns a `Result` object:
+
+```python
+class Result(BaseModel):
+ """
+ Represents the result of a command execution.
+ Contains status, error information, and the actual result payload.
+ """
+
+ status: ResultStatus = Field(..., description="Execution status")
+ error: Optional[str] = Field(default=None, description="Error message if failed")
+ result: Any = Field(default=None, description="Result payload")
+ namespace: Optional[str] = Field(
+ default=None, description="Namespace of the executed tool"
+ )
+ call_id: Optional[str] = Field(
+ default=None, description="ID matching the Command.call_id"
+ )
+```
+
+### ResultStatus Enum
+
+```python
+class ResultStatus(str, Enum):
+ """Represents the status of a command execution result."""
+ SUCCESS = "success" # Command executed successfully
+ FAILURE = "failure" # Command failed with error
+ SKIPPED = "skipped" # Command was skipped
+ NONE = "none" # No result available
+```
+
+### Result Examples
+
+**Successful Click**:
+
+```python
+Result(
+ status=ResultStatus.SUCCESS,
+ result={"clicked": True, "control_name": "Insert Chart"},
+ call_id="uuid-1234-5678"
+)
+```
+
+**Failed Command**:
+
+```python
+Result(
+ status=ResultStatus.FAILURE,
+ result=None,
+ error="Control 'Button_123' not found in UI tree",
+ call_id="uuid-1234-5678"
+)
+```
+
+**Shell Execution**:
+
+```python
+Result(
+ status=ResultStatus.SUCCESS,
+ result={
+ "stdout": "total 24\ndrwxr-xr-x 5 user user 4096...",
+ "stderr": "",
+ "exit_code": 0
+ },
+ call_id="uuid-1234-5678"
+)
+```
+
+## CommandDispatcher Interface
+
+The `BasicCommandDispatcher` provides the abstract interface for command execution:
+
+```python
+class BasicCommandDispatcher(ABC):
+ """
+ Abstract base class for command dispatcher.
+
+ Responsibilities:
+ - Send commands to device client
+ - Wait for execution results
+ - Handle errors and timeouts
+ """
+
+ @abstractmethod
+ async def execute_commands(
+ self, commands: List[Command], timeout: float = 6000
+ ) -> Optional[List[Result]]:
+ """
+ Execute commands and return results.
+
+ :param commands: List of commands to execute
+ :param timeout: Execution timeout in seconds
+ :return: List of results, or None if timeout
+ """
+ pass
+
+ def generate_error_results(
+ self, commands: List[Command], error: Exception
+ ) -> List[Result]:
+ """
+ Generate error results for failed commands.
+
+ :param commands: Commands that failed
+ :param error: The error that occurred
+ :return: List of error results
+ """
+ result_list = []
+ for command in commands:
+ error_msg = f"Error executing {command}: {error}"
+ result = Result(
+ status=ResultStatus.FAILURE,
+ error=error_msg,
+ result=error_msg,
+ call_id=command.call_id
+ )
+ result_list.append(result)
+ return result_list
+```
+
+## Command Execution Flow
+
+The following sequence diagram shows the complete command execution flow:
+
+```mermaid
+sequenceDiagram
+ participant Strategy as ProcessingStrategy (ACTION_EXECUTION)
+ participant Dispatcher as CommandDispatcher
+ participant AIP as AIP Protocol
+ participant Client as Device Client
+ participant MCP as MCP Server Manager
+ participant Tool as MCP Tool
+
+ Note over Strategy: Step 1: Create Commands
+ Strategy->>Strategy: Build Command objects from LLM response
+
+ Note over Strategy: Step 2: Execute via Dispatcher
+ Strategy->>Dispatcher: execute_commands([cmd1, cmd2])
+
+ Note over Dispatcher: Step 3: Add Call IDs
+ Dispatcher->>Dispatcher: Assign unique call_id to each command
+
+ Note over Dispatcher: Step 4: Send via AIP
+ Dispatcher->>AIP: Send ServerMessage (COMMAND type)
+ AIP->>Client: WebSocket message (serialized commands)
+
+ Note over Client: Step 5: Route to MCP
+ Client->>MCP: Route commands to appropriate MCP server
+
+ Note over MCP: Step 6: Execute Tools
+ loop For each command
+ MCP->>Tool: Invoke tool function with arguments
+ Tool->>Tool: Execute operation (click, type, shell, etc.)
+ Tool->>MCP: Return result
+ end
+
+ Note over Client: Step 7: Aggregate Results
+ MCP->>Client: List[Result]
+ Client->>AIP: Send ClientMessage (RESULT type)
+ AIP->>Dispatcher: WebSocket message (serialized results)
+
+ Note over Dispatcher: Step 8: Return Results
+ Dispatcher->>Strategy: List[Result]
+
+ Note over Strategy: Step 9: Process Results
+ Strategy->>Strategy: Update context with execution results
+```
+
+### Execution Phases
+
+1. **Command Creation**: Strategy builds `Command` objects from LLM response or predefined logic
+2. **Dispatcher Invocation**: Strategy calls `dispatcher.execute_commands()`
+3. **Call ID Assignment**: Dispatcher assigns unique `call_id` to each command
+4. **AIP Transmission**: Commands serialized and sent via WebSocket to device client
+5. **MCP Routing**: Client routes commands to appropriate MCP server
+6. **Tool Execution**: MCP server invokes tool functions with arguments
+7. **Result Aggregation**: Results collected and sent back via AIP
+8. **Result Handling**: Dispatcher returns results to strategy
+9. **Context Update**: Strategy updates processing context with results
+
+!!! warning "Timeout Handling"
+ If execution exceeds timeout:
+
+ 1. Dispatcher raises `asyncio.TimeoutError`
+ 2. Error results generated via `generate_error_results()`
+ 3. Strategy receives error results (status = FAILURE)
+ 4. Processor can retry or fail based on `fail_fast` setting
+
+---
+
+## Dispatcher Implementations
+
+UFO provides two dispatcher implementations for different deployment scenarios:
+
+### 1. LocalCommandDispatcher
+
+**Purpose**: Direct local execution (server and client on same machine)
+
+**Use Case**: Development, testing, single-device deployments
+
+```python
+class LocalCommandDispatcher(BasicCommandDispatcher):
+ """
+ Local command dispatcher - executes commands directly.
+
+ No network communication - calls MCP tools locally.
+ """
+
+ def __init__(self, session: BaseSession, mcp_server_manager: MCPServerManager):
+ self.session = session
+ self.mcp_server_manager = mcp_server_manager
+
+ # Direct local execution
+ self.computer_manager = ComputerManager(configs, mcp_server_manager)
+ self.command_router = CommandRouter(self.computer_manager)
+
+ async def execute_commands(
+ self, commands: List[Command], timeout=6000
+ ) -> Optional[List[Result]]:
+ """Execute commands locally via CommandRouter"""
+ try:
+ # Direct invocation (no network)
+ action_results = await asyncio.wait_for(
+ self.command_router.execute(
+ agent_name=self.session.current_agent_class,
+ root_name=self.session.context.get(ContextNames.APPLICATION_ROOT_NAME),
+ process_name=self.session.context.get(ContextNames.APPLICATION_PROCESS_NAME),
+ commands=commands
+ ),
+ timeout=timeout
+ )
+ return action_results
+ except Exception as e:
+ return self.generate_error_results(commands, e)
+```
+
+### 2. WebSocketCommandDispatcher
+
+**Purpose**: Remote execution via AIP (server and client on different machines)
+
+**Use Case**: Production, multi-device deployments, distributed systems
+
+```python
+class WebSocketCommandDispatcher(BasicCommandDispatcher):
+ """
+ WebSocket command dispatcher - executes commands remotely via AIP.
+
+ Uses AIP's TaskExecutionProtocol for structured messaging.
+ """
+
+ def __init__(self, session: BaseSession, protocol: TaskExecutionProtocol):
+ self.session = session
+ self.protocol = protocol # AIP protocol instance
+ self.pending: Dict[str, asyncio.Future] = {}
+ self.logger = logging.getLogger(__name__)
+
+ async def execute_commands(
+ self, commands: List[Command], timeout=6000
+ ) -> Optional[List[Result]]:
+ """Execute commands remotely via AIP WebSocket"""
+ try:
+ # Build ServerMessage
+ server_msg = self.make_server_response(commands)
+
+ # Send via AIP
+ await self.protocol.send_command(server_msg)
+
+ # Wait for results
+ results = await asyncio.wait_for(
+ self._wait_for_results(server_msg.response_id),
+ timeout=timeout
+ )
+
+ return results
+ except Exception as e:
+ return self.generate_error_results(commands, e)
+
+ def make_server_response(self, commands: List[Command]) -> ServerMessage:
+ """Create ServerMessage for commands"""
+ # Assign call_ids
+ for command in commands:
+ command.call_id = str(uuid.uuid4())
+
+ return ServerMessage(
+ type=ServerMessageType.COMMAND,
+ status=TaskStatus.CONTINUE,
+ agent_name=self.session.current_agent_class,
+ process_name=self.session.context.get(ContextNames.APPLICATION_PROCESS_NAME),
+ root_name=self.session.context.get(ContextNames.APPLICATION_ROOT_NAME),
+ actions=commands,
+ session_id=self.session.id,
+ task_name=self.session.task,
+ response_id=str(uuid.uuid4()),
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat()
+ )
+```
+
+### Dispatcher Selection
+
+The dispatcher is selected at session initialization:
+
+- **Local Mode**: `LocalCommandDispatcher` (no AIP client connection)
+- **Remote Mode**: `WebSocketCommandDispatcher` (AIP client connected)
+
+This is transparent to strategies - they call `dispatcher.execute_commands()` regardless.
+
+## MCP Integration
+
+Commands map to **Model Context Protocol (MCP)** tools on the device client:
+
+```mermaid
+graph TB
+ subgraph "Device Client"
+ Router[Command Router]
+ Manager[MCP Server Manager]
+
+ Router -->|routes by agent/app| Manager
+ end
+
+ subgraph "MCP Servers"
+ Win[Windows MCP Server ufo_windows]
+ Linux[Linux MCP Server ufo_linux]
+ Custom[Custom MCP Server user_defined]
+ end
+
+ Manager -->|manages| Win
+ Manager -->|manages| Linux
+ Manager -->|manages| Custom
+
+ subgraph "MCP Tools (Windows)"
+ Win -->|provides| T1[click_element]
+ Win -->|provides| T2[type_text]
+ Win -->|provides| T3[get_ui_tree]
+ Win -->|provides| T4[screenshot]
+ end
+
+ subgraph "MCP Tools (Linux)"
+ Linux -->|provides| T5[shell_execute]
+ Linux -->|provides| T6[read_file]
+ Linux -->|provides| T7[write_file]
+ end
+
+ Command[Command function + arguments] -->|routed to| Router
+```
+
+### MCP Tool Registration
+
+MCP tools are registered on the device client at initialization:
+
+```python
+# Windows MCP Server registration
+mcp_server_manager.register_server(
+ server_name="ufo_windows",
+ tools=[
+ MCPToolInfo(
+ name="click_element",
+ description="Click UI element by control ID",
+ arguments_schema={
+ "control_id": {"type": "string", "required": True},
+ "process_name": {"type": "string", "required": True}
+ }
+ ),
+ MCPToolInfo(
+ name="type_text",
+ description="Type text into focused element",
+ arguments_schema={
+ "text": {"type": "string", "required": True}
+ }
+ ),
+ # ... more tools
+ ]
+)
+```
+
+### Tool Discovery
+
+The LLM can query available tools via the `get_mcp_tools` command:
+
+```python
+# Strategy requests available tools
+tools_cmd = Command(
+ function="get_mcp_tools",
+ arguments={"server_name": "ufo_windows"}
+)
+
+results = await dispatcher.execute_commands([tools_cmd])
+
+# LLM receives tool registry
+available_tools = results[0].result # List[MCPToolInfo]
+```
+
+### Dynamic Tool Selection
+
+The LLM dynamically selects appropriate tools based on:
+
+1. **Tool Descriptions**: Natural language descriptions of tool capabilities
+2. **Input Schemas**: Required/optional parameters with types
+3. **Context**: Current task requirements and device state
+
+This enables **adaptive behavior** without hardcoded command sequences.
+
+See [MCP Documentation](../../../mcp/overview.md) for complete MCP integration details.
+
+## Command Categories
+
+Commands can be categorized by their purpose:
+
+### 1. Observation Commands (DATA_COLLECTION)
+
+**Purpose**: Gather information from device without modifying state
+
+| Command | Platform | Description | Arguments | Result |
+|---------|----------|-------------|-----------|--------|
+| `screenshot` | All | Capture screen image | `{"region": "fullscreen"}` | Base64 image |
+| `get_ui_tree` | Windows | Extract UI Automation tree | `{"process_name": "..."}` | XML/JSON tree |
+| `shell_execute_read` | Linux | Execute shell command (read-only) | `{"command": "ls -la"}` | stdout/stderr |
+| `get_accessibility_tree` | macOS | Extract Accessibility tree | `{"app_name": "..."}` | Tree structure |
+
+### 2. Action Commands (ACTION_EXECUTION)
+
+**Purpose**: Modify device state through interactions
+
+| Command | Platform | Description | Arguments | Result |
+|---------|----------|-------------|-----------|--------|
+| `click_element` | Windows | Click UI element | `{"control_id": "..."}` | Click success |
+| `type_text` | Windows | Type text into element | `{"text": "..."}` | Typing success |
+| `scroll` | Windows | Scroll element | `{"direction": "down", "amount": 3}` | Scroll success |
+| `shell_execute` | Linux | Execute shell command | `{"command": "...", "timeout": 30}` | stdout/stderr/exit_code |
+| `press_key` | All | Press keyboard key | `{"key": "Enter"}` | Key press success |
+
+### 3. System Commands
+
+**Purpose**: Interact with OS or hardware
+
+| Command | Platform | Description | Arguments | Result |
+|---------|----------|-------------|-----------|--------|
+| `launch_application` | All | Start application | `{"app_name": "...", "args": [...]}` | PID |
+| `close_application` | All | Terminate application | `{"process_name": "..."}` | Close success |
+| `read_file` | All | Read file contents | `{"file_path": "...", "encoding": "utf-8"}` | File contents |
+| `write_file` | All | Write file contents | `{"file_path": "...", "content": "..."}` | Write success |
+| `get_system_info` | All | Query system status | `{"info_type": "cpu"}` | CPU/memory/disk stats |
+
+### Command Naming Convention
+
+- Use **snake_case** for function names
+- Use **verb_noun** pattern (e.g., `click_element`, `read_file`)
+- Keep names **concise** but **descriptive**
+- Prefix with platform if platform-specific (e.g., `windows_get_registry`)
+
+## Command Validation
+
+Commands are validated at multiple stages:
+
+### 1. Client-Side Validation
+
+The device client validates commands before execution:
+
+```python
+class CommandRouter:
+ """Routes and validates commands on device client"""
+
+ async def execute(
+ self,
+ agent_name: str,
+ root_name: str,
+ process_name: str,
+ commands: List[Command]
+ ) -> List[Result]:
+ """Execute commands with validation"""
+ results = []
+
+ for command in commands:
+ # Validate command
+ validation_error = self._validate_command(command)
+ if validation_error:
+ results.append(Result(
+ status=ResultStatus.FAILURE,
+ error=validation_error,
+ call_id=command.call_id
+ ))
+ continue
+
+ # Execute command
+ try:
+ result = await self._execute_single_command(command)
+ results.append(result)
+ except Exception as e:
+ results.append(Result(
+ status=ResultStatus.FAILURE,
+ error=str(e),
+ call_id=command.call_id
+ ))
+
+ return results
+
+ def _validate_command(self, command: Command) -> Optional[str]:
+ """Validate command structure and arguments"""
+ # Check function exists
+ tool_info = self.mcp_server_manager.get_tool(command.function)
+ if not tool_info:
+ return f"Unknown command: {command.function}"
+
+ # Check required arguments
+ schema = tool_info.arguments_schema
+ for arg_name, arg_spec in schema.items():
+ if arg_spec.get("required") and arg_name not in command.arguments:
+ return f"Missing required argument: {arg_name}"
+
+ # Check argument types
+ for arg_name, arg_value in command.arguments.items():
+ expected_type = schema.get(arg_name, {}).get("type")
+ if expected_type and not self._check_type(arg_value, expected_type):
+ return f"Argument '{arg_name}' has wrong type"
+
+ return None # Valid
+```
+
+### 2. MCP Schema Validation
+
+MCP tools define argument schemas that are enforced:
+
+```python
+{
+ "name": "click_element",
+ "description": "Click UI element by control ID",
+ "arguments_schema": {
+ "control_id": {
+ "type": "string",
+ "required": True,
+ "description": "Unique identifier of control to click"
+ },
+ "process_name": {
+ "type": "string",
+ "required": True,
+ "description": "Process name of application"
+ },
+ "double_click": {
+ "type": "boolean",
+ "required": False,
+ "default": False,
+ "description": "Whether to double-click"
+ }
+ }
+}
+```
+
+### Validation Benefits
+
+- **Early Error Detection**: Invalid commands caught before execution
+- **Clear Error Messages**: Specific validation failures reported
+- **Type Safety**: Argument types validated against schema
+- **Security**: Prevents injection attacks and malformed requests
+
+## Command Execution Patterns
+
+Common patterns for command execution in strategies:
+
+### 1. Single Command Execution
+
+```python
+# Execute single command
+async def execute(self, agent, context):
+ command = Command(
+ function="screenshot",
+ arguments={"region": "fullscreen"}
+ )
+
+ results = await context.command_dispatcher.execute_commands([command])
+
+ if results[0].status == ResultStatus.SUCCESS:
+ screenshot = results[0].result
+ context.update_local({"screenshot": screenshot})
+ return ProcessingResult(success=True, data={"screenshot": screenshot})
+ else:
+ return ProcessingResult(success=False, error=results[0].error)
+```
+
+### 2. Batch Command Execution
+
+```python
+# Execute multiple commands in parallel
+async def execute(self, agent, context):
+ commands = [
+ Command(function="screenshot", arguments={}),
+ Command(function="get_ui_tree", arguments={"process_name": "EXCEL.EXE"}),
+ Command(function="get_system_info", arguments={"info_type": "memory"})
+ ]
+
+ results = await context.command_dispatcher.execute_commands(commands)
+
+ # Process results
+ data = {
+ "screenshot": results[0].result if results[0].status == ResultStatus.SUCCESS else None,
+ "ui_tree": results[1].result if results[1].status == ResultStatus.SUCCESS else None,
+ "memory_info": results[2].result if results[2].status == ResultStatus.SUCCESS else None
+ }
+
+ return ProcessingResult(success=True, data=data)
+```
+
+### 3. Conditional Command Execution
+
+```python
+# Execute command based on LLM decision
+async def execute(self, agent, context):
+ parsed_response = context.require_local("parsed_response")
+ action = parsed_response.get("ControlText")
+
+ if action == "click":
+ command = Command(
+ function="click_element",
+ arguments={"control_id": parsed_response.get("ControlID")}
+ )
+ elif action == "type":
+ command = Command(
+ function="type_text",
+ arguments={"text": parsed_response.get("InputText")}
+ )
+ else:
+ return ProcessingResult(success=False, error=f"Unknown action: {action}")
+
+ results = await context.command_dispatcher.execute_commands([command])
+ return ProcessingResult(success=results[0].status == ResultStatus.SUCCESS)
+```
+
+### 4. Retry Pattern
+
+```python
+# Retry command on failure
+async def execute(self, agent, context):
+ command = Command(
+ function="click_element",
+ arguments={"control_id": "Button_123"}
+ )
+
+ max_retries = 3
+ for attempt in range(max_retries):
+ results = await context.command_dispatcher.execute_commands([command])
+
+ if results[0].status == ResultStatus.SUCCESS:
+ return ProcessingResult(success=True, data=results[0].result)
+
+ # Retry with exponential backoff
+ await asyncio.sleep(2 ** attempt)
+
+ return ProcessingResult(success=False, error="Max retries exceeded")
+```
+
+## Best Practices
+
+### Command Design Guidelines
+
+**1. Atomic Operations**: Each command should perform one well-defined operation
+
+- ✅ Good: `click_element(control_id="Button_123")`
+- ❌ Bad: `click_and_wait_and_validate(...)` (too many responsibilities)
+
+**2. Idempotency**: Commands should be safe to retry
+
+- ✅ Good: `read_file(path="/data.csv")` (idempotent)
+- ⚠️ Caution: `append_to_file(path="/log.txt", text="...")` (not idempotent)
+
+**3. Clear Arguments**: Use descriptive argument names
+
+- ✅ Good: `{"file_path": "...", "encoding": "utf-8"}`
+- ❌ Bad: `{"p": "...", "e": "utf-8"}` (unclear)
+
+**4. Structured Results**: Return structured data, not just strings
+
+- ✅ Good: `{"stdout": "...", "stderr": "...", "exit_code": 0}`
+- ❌ Bad: `"output: ... error: ... code: 0"` (unstructured)
+
+### Security Considerations
+
+!!! warning "Security Best Practices"
+ **Validate All Inputs**: Never trust command arguments from LLM without validation
+
+ **Limit Command Scope**: Restrict commands to necessary operations only
+
+ - Use MCP tool permissions to limit file access
+ - Sandbox shell command execution
+ - Validate file paths against allowed directories
+
+ **Audit Command History**: Log all commands for compliance
+
+ ```python
+ self.logger.info(f"Executing command: {command.tool_name} with args: {command.parameters}")
+ ```
+
+ **Timeout All Commands**: Prevent runaway execution
+
+ ```python
+ results = await dispatcher.execute_commands(commands, timeout=30)
+ ```
+
+!!! danger "Dangerous Commands"
+ Some commands require extra caution:
+
+ **Shell Execution**: Risk of command injection
+
+ - Use argument escaping/sanitization
+ - Whitelist allowed commands
+ - Never concatenate user input directly
+
+ **File Operations**: Risk of unauthorized access
+
+ - Validate paths against allowed directories
+ - Check file permissions before access
+ - Never allow arbitrary path traversal
+
+ **System Modification**: Risk of breaking system state
+
+ - Require explicit user confirmation
+ - Implement undo/rollback mechanisms
+ - Never allow destructive ops without safeguards
+
+## Integration with Other Layers
+
+The Command Layer integrates with other components:
+
+```mermaid
+graph TB
+ subgraph "Strategy Layer (Level-2)"
+ AE[ACTION_EXECUTION Strategy]
+ DC[DATA_COLLECTION Strategy]
+ end
+
+ subgraph "Command Layer (Level-3)"
+ Dispatcher[CommandDispatcher]
+ Commands[Commands]
+ Results[Results]
+ end
+
+ subgraph "Communication"
+ AIP[AIP Protocol]
+ end
+
+ subgraph "Device Client"
+ MCP[MCP Servers]
+ Tools[MCP Tools]
+ end
+
+ AE -->|creates| Commands
+ DC -->|creates| Commands
+ Commands -->|via| Dispatcher
+ Dispatcher -->|via| AIP
+ AIP -->|routes to| MCP
+ MCP -->|invokes| Tools
+ Tools -->|results via| MCP
+ MCP -->|via| AIP
+ AIP -->|via| Dispatcher
+ Dispatcher -->|returns| Results
+ Results -->|used by| AE
+ Results -->|used by| DC
+```
+
+| Integration Point | Layer/Component | Relationship |
+|-------------------|-----------------|--------------|
+| **ProcessingStrategy** | Level-2 Strategy | Strategies create and execute commands via dispatcher |
+| **AIP Protocol** | Communication | Dispatcher uses AIP to send commands to client |
+| **Device Client** | Execution | Client receives commands, routes to MCP servers |
+| **MCP Servers** | Tool Registry | MCP servers execute tool functions, return results |
+| **Global Context** | Module System | Command dispatcher accessed via processing context |
+
+See [Strategy Layer](processor.md), [AIP Protocol](../../../aip/overview.md), and [MCP Integration](../../../mcp/overview.md) for integration details.
+
+## API Reference
+
+Below is the complete API reference for the Command Layer:
+
+**BasicCommandDispatcher** (Abstract Base Class)
+```python
+# Location: ufo/module/dispatcher.py
+class BasicCommandDispatcher(ABC):
+ """Abstract base class for command dispatcher handling."""
+
+ @abstractmethod
+ async def execute_commands(
+ self, commands: List[Command], timeout: float = 6000
+ ) -> Optional[List[Result]]:
+ """Execute commands and return results."""
+ pass
+```
+
+**LocalCommandDispatcher** (Local Execution)
+```python
+# Location: ufo/module/dispatcher.py
+class LocalCommandDispatcher(BasicCommandDispatcher):
+ """Command dispatcher for local execution (testing/development)."""
+ pass
+```
+
+**WebSocketCommandDispatcher** (Server-Client Communication)
+```python
+# Location: ufo/module/dispatcher.py
+class WebSocketCommandDispatcher(BasicCommandDispatcher):
+ """Command dispatcher using WebSocket/AIP protocol."""
+ pass
+```
+
+## Summary
+
+**Key Takeaways**:
+
+- **Atomic Execution**: Commands are self-contained units with tool_name + parameters
+- **MCP Integration**: Commands map to Model Context Protocol tools on device client
+- **CommandDispatcher**: Routes commands from server to client via AIP
+- **Deterministic**: Same inputs → same outputs, fully auditable
+- **Dynamic Discovery**: LLM queries and selects appropriate tools at runtime
+- **Validation**: Multi-stage validation (client + MCP schema) ensures safety
+- **Extensibility**: New commands added via MCP tool registration without code changes
+
+The Command Layer completes the three-layer device agent architecture, providing **reliable, auditable, and extensible system interfaces** that bridge high-level reasoning with low-level device operations across heterogeneous platforms.
diff --git a/documents/docs/infrastructure/agents/design/memory.md b/documents/docs/infrastructure/agents/design/memory.md
new file mode 100644
index 000000000..539687c37
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/memory.md
@@ -0,0 +1,672 @@
+# Memory System
+
+The Memory System provides both short-term and long-term memory capabilities for Device Agents in UFO3. The system consists of two primary components: **Memory** (agent-specific execution history) and **Blackboard** (shared multi-agent communication). This dual-memory architecture enables agents to maintain their own execution context while coordinating seamlessly across devices and sessions.
+
+## Overview
+
+The Memory System supports the Device Agent architecture through two distinct but complementary mechanisms:
+
+```mermaid
+graph TB
+ subgraph "Memory System Architecture"
+ Agent1[Agent Instance]
+ Agent2[Agent Instance]
+ AgentN[Agent Instance N]
+
+ Memory1[Memory Short-term]
+ Memory2[Memory Short-term]
+ MemoryN[Memory Short-term]
+
+ Blackboard[Blackboard Long-term Shared]
+
+ Agent1 --> Memory1
+ Agent2 --> Memory2
+ AgentN --> MemoryN
+
+ Agent1 -.Share.-> Blackboard
+ Agent2 -.Share.-> Blackboard
+ AgentN -.Share.-> Blackboard
+
+ Blackboard -.Read.-> Agent1
+ Blackboard -.Read.-> Agent2
+ Blackboard -.Read.-> AgentN
+ end
+
+ style Memory1 fill:#e1f5ff
+ style Memory2 fill:#e1f5ff
+ style MemoryN fill:#e1f5ff
+ style Blackboard fill:#fff4e1
+```
+
+| Component | Scope | Persistence | Primary Use Case |
+|-----------|-------|-------------|------------------|
+| **Memory** | Agent-specific | Session lifetime | Execution history, context tracking |
+| **Blackboard** | Multi-agent shared | Configurable (file-backed) | Cross-agent coordination, information sharing |
+
+**Design Benefits:**
+- **Separation of Concerns**: Agent-specific history isolated from shared state
+- **Scalability**: Each agent manages own memory independently
+- **Coordination**: Blackboard enables multi-agent communication without tight coupling
+- **Persistence**: Blackboard can survive session restarts (file-backed storage)
+
+---
+
+## Memory (Short-term Agent Memory)
+
+The `Memory` class manages the **short-term execution history** of a single agent. Each agent instance has its own `Memory` that records every interaction step, forming a chronological execution trace.
+
+### Memory Architecture
+
+```mermaid
+graph LR
+ subgraph "Memory Lifecycle"
+ Step1[Step 1 MemoryItem]
+ Step2[Step 2 MemoryItem]
+ Step3[Step 3 MemoryItem]
+ StepN[Step N MemoryItem]
+
+ Step1 --> Step2
+ Step2 --> Step3
+ Step3 --> StepN
+ end
+
+ subgraph "MemoryItem Contents"
+ Screenshot[Screenshot]
+ Action[Action Taken]
+ Result[Execution Result]
+ Observation[UI Observation]
+ Cost[LLM Cost]
+ end
+
+ StepN --> Screenshot
+ StepN --> Action
+ StepN --> Result
+ StepN --> Observation
+ StepN --> Cost
+
+ style Step1 fill:#e1f5ff
+ style Step2 fill:#e1f5ff
+ style Step3 fill:#e1f5ff
+ style StepN fill:#e1f5ff
+```
+
+### MemoryItem Structure
+
+A `MemoryItem` is a flexible dataclass that represents a **single execution step** in the agent's history. The structure is customizable to accommodate different agent types and platforms.
+
+::: agents.memory.memory.MemoryItem
+
+#### Common MemoryItem Fields
+
+| Field | Type | Description | Usage in Strategies |
+|-------|------|-------------|---------------------|
+| `step` | `int` | Execution step number | Tracking execution progress |
+| `screenshot` | `str` (path) | Screenshot file path | Visual context for LLM reasoning |
+| `action` | `str` | Action function name | Execution history, replay |
+| `arguments` | `Dict[str, Any]` | Action arguments | Debugging, audit logging |
+| `results` | `List[Result]` | Command execution results | Success/failure tracking |
+| `observation` | `str` | UI element descriptions | LLM prompt context |
+| `control_text` | `str` | UI text content | Element identification |
+| `request` | `str` | User request at this step | Task context |
+| `response` | `str` | LLM raw response | Debugging LLM decisions |
+| `parsed_response` | `Dict` | Parsed LLM output | Structured action extraction |
+| `cost` | `float` | LLM API cost | Budget tracking |
+| `error` | `Optional[str]` | Error message if failed | Error recovery |
+
+**Example: Creating a MemoryItem**
+ ```python
+ from ufo.agents.memory.memory import MemoryItem
+
+ # After executing a step, create memory item
+ memory_item = MemoryItem(
+ step=3,
+ screenshot="screenshots/step_3.png",
+ action="click_element",
+ arguments={"element_id": "submit_button"},
+ results=[Result(status=ResultStatus.SUCCESS, result="Button clicked")],
+ observation="Submit button located at (500, 300)",
+ request="Submit the form",
+ response='{"action": "click_element", "element": "submit_button"}',
+ parsed_response={"action": "click_element", "element": "submit_button"},
+ cost=0.0023
+ )
+ ```
+
+**Note on Flexible Schema:**
+`MemoryItem` uses a flexible dataclass structure. Agent implementations can add custom fields based on their specific requirements. For example, Windows agents might add `ui_automation_info`, while Linux agents might add `shell_output`.
+
+### Memory Class
+
+The `Memory` class manages a **list of MemoryItem instances**, providing methods to add, retrieve, and filter execution history.
+
+::: agents.memory.memory.Memory
+
+#### Key Methods
+
+| Method | Purpose | Usage |
+|--------|---------|-------|
+| `add_memory_item(item)` | Append new execution step | Called by `MEMORY_UPDATE` strategy after each step |
+| `get_latest_item()` | Retrieve the most recent item | Get the last execution step |
+| `filter_memory_from_keys(keys)` | Filter items by specific keys | Build LLM prompt with selected fields |
+| `filter_memory_from_steps(steps)` | Filter items by step numbers | Retrieve specific execution steps |
+| `clear()` | Reset memory | New session initialization |
+| `is_empty()` | Check if memory is empty | Validate memory state |
+
+**Example: Using Memory in Processor**
+ ```python
+ from ufo.agents.processors.strategies.memory_strategies import MemoryUpdateStrategy
+ from ufo.agents.memory.memory import Memory, MemoryItem
+
+ class AppAgentProcessor(ProcessorTemplate):
+ def __init__(self, agent, context):
+ super().__init__(agent, context)
+ self.memory = Memory() # Agent-specific memory
+
+ # MEMORY_UPDATE strategy adds items to memory
+ self.register_strategy(
+ ProcessingPhase.MEMORY_UPDATE,
+ MemoryUpdateStrategy(agent, context, self.memory)
+ )
+
+ def get_prompt_context(self) -> str:
+ """Build LLM prompt with recent execution history."""
+ # Get recent steps using content property
+ all_steps = self.memory.content
+ recent_steps = all_steps[-5:] if len(all_steps) > 5 else all_steps
+
+ context = "## Recent Execution History:\n"
+ for item in recent_steps:
+ context += f"Step {item.get_value('step')}: {item.get_value('action')}"
+ context += f"({item.get_value('arguments')}) -> {item.get_value('results')}\n"
+
+ return context
+ ```
+
+#### Memory Lifecycle
+
+```mermaid
+sequenceDiagram
+ participant Processor
+ participant Memory
+ participant MemoryUpdateStrategy
+
+ Note over Processor: Agent starts session
+ Processor->>Memory: Initialize Memory()
+
+ loop Each Execution Step
+ Note over Processor: Execute strategies
+ Processor->>MemoryUpdateStrategy: execute()
+ MemoryUpdateStrategy->>MemoryUpdateStrategy: Create MemoryItem from context
+ MemoryUpdateStrategy->>Memory: add_memory_item(item)
+ Memory->>Memory: Append to internal list
+ end
+
+ Note over Processor: Need prompt context
+ Processor->>Memory: content property (get all)
+ Memory-->>Processor: List[MemoryItem]
+
+ Note over Processor: Session ends
+ Processor->>Memory: clear()
+```
+
+**Memory Management Best Practices:**
+- **Limited Context**: When building LLM prompts, use the `content` property and slice for recent items to avoid token limits
+- **Selective Fields**: Only include relevant MemoryItem fields in prompts (e.g., action + results, not raw screenshots)
+- **Error Analysis**: Use `filter_memory_from_keys()` to extract specific information patterns
+
+---
+
+## Blackboard (Long-term Shared Memory)
+
+The `Blackboard` class implements the **Blackboard Pattern** for multi-agent coordination. It provides a shared memory space where agents can read and write information that persists across sessions and is accessible to all agents.
+
+### Blackboard Pattern
+
+The Blackboard Pattern is a well-known architectural pattern for multi-agent systems:
+
+```mermaid
+graph TB
+ subgraph "Blackboard Pattern"
+ BB[Blackboard Shared Knowledge Space]
+
+ HostAgent[HostAgent Windows]
+ AppAgent[AppAgent Windows]
+
+ HostAgent -->|Write: questions| BB
+ HostAgent -->|Write: requests| BB
+ AppAgent -->|Read: requests| BB
+ AppAgent -->|Write: trajectories| BB
+
+ BB -.Persist.-> FileStorage[(JSON/JSONL)]
+ end
+
+ style BB fill:#fff4e1
+ style HostAgent fill:#e1f5ff
+ style AppAgent fill:#e1f5ff
+```
+
+**Blackboard Pattern Characteristics:**
+- **Centralized Knowledge**: All agents read/write from a single shared space
+- **Loose Coupling**: Agents don't directly communicate; they interact via blackboard
+- **Opportunistic Scheduling**: Agents can act when relevant information appears on blackboard
+- **Persistence**: Knowledge survives agent restarts and session boundaries
+
+### Blackboard Architecture
+
+The Blackboard is organized with four main memory components, each storing a list of `MemoryItem` objects:
+
+```python
+# Blackboard internal structure
+class Blackboard:
+ _questions: Memory # Q&A pairs with user
+ _requests: Memory # Historical user requests
+ _trajectories: Memory # Step-wise execution history
+ _screenshots: Memory # Important screenshots
+```
+
+Each component is a `Memory` object that stores `MemoryItem` instances with flexible key-value pairs.
+
+### Blackboard Class
+
+::: agents.memory.blackboard.Blackboard
+
+#### Key Methods
+
+| Method | Purpose | Example Usage |
+|--------|---------|---------------|
+| `add_questions(item)` | Add Q&A with user | Store user clarification dialogs |
+| `add_requests(item)` | Add user request | Track historical user requests |
+| `add_trajectories(item)` | Add execution step | Record agent actions and decisions |
+| `add_image(path, metadata)` | Add screenshot | Save important UI states |
+| `blackboard_to_prompt()` | Convert to LLM prompt | Build context for agent inference |
+| `blackboard_to_dict()` | Export as dictionary | Serialize for persistence |
+| `blackboard_from_dict(data)` | Import from dictionary | Restore from persistence |
+| `clear()` | Reset blackboard | New session initialization |
+| `is_empty()` | Check if empty | Validate blackboard state |
+
+**Example: Multi-Agent Coordination via Blackboard**
+ ```python
+ from ufo.agents.memory.blackboard import Blackboard
+
+ # Initialize shared blackboard
+ blackboard = Blackboard()
+
+ # HostAgent adds user request to blackboard
+ class HostAgent:
+ def handle(self, context):
+ # ... process user request ...
+ user_request = "Create a presentation about AI"
+
+ # Write to blackboard for AppAgent to read
+ blackboard.add_requests({"request": user_request, "timestamp": "2025-11-12"})
+ blackboard.add_trajectories({
+ "step": 1,
+ "agent": "HostAgent",
+ "action": "delegate_task",
+ "app": "PowerPoint"
+ })
+
+ # Delegate to AppAgent
+ return AgentStatus.CONTINUE, AppAgent
+
+ # AppAgent reads from blackboard and performs task
+ class AppAgent:
+ def handle(self, context):
+ # Read from blackboard
+ recent_requests = blackboard.requests.content
+ if recent_requests:
+ last_request = recent_requests[-1]
+ print(f"AppAgent working on: {last_request.get_value('request')}")
+
+ # ... perform actions ...
+
+ # Write task result back to blackboard
+ blackboard.add_trajectories({
+ "step": 2,
+ "agent": "AppAgent",
+ "action": "create_presentation",
+ "status": "completed"
+ })
+
+ return AgentStatus.FINISH, None
+ ```
+
+### Blackboard Persistence
+
+The Blackboard supports serialization for session recovery:
+
+```mermaid
+graph LR
+ subgraph "Session Lifecycle"
+ Start[Session Start]
+ Execute[Agent Execution]
+ End[Session End]
+
+ Start --> Execute
+ Execute --> End
+ end
+
+ subgraph "Serialization"
+ Dict[blackboard_to_dict]
+ JSON[JSON Format]
+ end
+
+ Execute --> Dict
+ Dict --> JSON
+
+ style Execute fill:#ffe1e1
+```
+
+**Example: Blackboard Serialization**
+ ```python
+ from ufo.agents.memory.blackboard import Blackboard
+ import json
+
+ # Create and use blackboard
+ blackboard = Blackboard()
+ blackboard.add_requests({"request": "Create chart", "priority": "high"})
+ blackboard.add_trajectories({"step": 1, "action": "open_excel"})
+
+ # Serialize to dictionary
+ blackboard_dict = blackboard.blackboard_to_dict()
+
+ # Save to file
+ with open("blackboard_state.json", "w") as f:
+ json.dump(blackboard_dict, f)
+
+ # Later, restore from file
+ new_blackboard = Blackboard()
+ with open("blackboard_state.json", "r") as f:
+ loaded_dict = json.load(f)
+ new_blackboard.blackboard_from_dict(loaded_dict)
+ ```
+
+---
+
+## Memory Types and Usage Patterns
+
+The Memory System supports different types of information storage based on use cases:
+
+| Memory Type | Storage Location | Persistence | Access Pattern | Primary Use Case |
+|-------------|------------------|-------------|----------------|------------------|
+| **Execution History** | Memory (agent-specific) | Session lifetime | Sequential, recent-first | LLM context, debugging |
+| **Shared State** | Blackboard | File-backed | Key-value lookup | Multi-agent coordination |
+| **Session Context** | Blackboard | File-backed | Hierarchical access | Session recovery, checkpoints |
+| **Global Trajectories** | Blackboard | JSONL append | Sequential log | Audit trail, analytics |
+
+### Common Memory Patterns
+
+#### Pattern 1: Recent Context for LLM Prompts
+
+```python
+# Use Memory.content property to get recent execution context
+all_items = agent.memory.content
+recent_steps = all_items[-5:] if len(all_items) > 5 else all_items
+prompt_context = "\n".join([
+ f"Step {item.get_value('step')}: {item.get_value('action')}"
+ for item in recent_steps
+])
+```
+
+#### Pattern 2: Multi-Agent Information Sharing
+
+```python
+# HostAgent writes to Blackboard
+blackboard.add_requests({"request": "Create Excel chart", "app": "Excel"})
+
+# AppAgent reads from Blackboard
+requests = blackboard.requests.content
+if requests:
+ latest_request = requests[-1]
+ app = latest_request.get_value("app")
+```
+
+#### Pattern 3: Execution History Tracking
+
+```python
+# Record each step in trajectories
+blackboard.add_trajectories({
+ "step": 1,
+ "agent": "AppAgent",
+ "action": "click_button",
+ "target": "Save",
+ "status": "success"
+})
+
+# Later, review execution history
+history = blackboard.trajectories.content
+for item in history:
+ print(f"Step {item.get_value('step')}: {item.get_value('action')}")
+```
+
+#### Pattern 4: Screenshot Memory
+
+```python
+# Save important UI state with metadata
+blackboard.add_image(
+ screenshot_path="screenshots/step_5.png",
+ metadata={"step": 5, "description": "Before form submission"}
+)
+
+# Access screenshots for review
+screenshots = blackboard.screenshots.content
+for screenshot in screenshots:
+ metadata = screenshot.get_value("metadata")
+ path = screenshot.get_value("image_path")
+```
+
+---
+
+## Integration with Agent Architecture
+
+The Memory System integrates with all three architectural layers:
+
+```mermaid
+graph TB
+ subgraph "Level-1: State Layer"
+ State[AgentState.handle]
+ end
+
+ subgraph "Level-2: Strategy Layer"
+ DataCollection[DATA_COLLECTION Strategy]
+ LLMInteraction[LLM_INTERACTION Strategy]
+ ActionExecution[ACTION_EXECUTION Strategy]
+ MemoryUpdate[MEMORY_UPDATE Strategy]
+ end
+
+ subgraph "Memory System"
+ Memory[Memory Short-term]
+ Blackboard[Blackboard Long-term]
+ end
+
+ State --> DataCollection
+ DataCollection --> LLMInteraction
+ LLMInteraction --> ActionExecution
+ ActionExecution --> MemoryUpdate
+
+ MemoryUpdate -->|Write| Memory
+ LLMInteraction -.Read Context.-> Memory
+
+ State -.Read/Write.-> Blackboard
+ MemoryUpdate -.Write Trajectories.-> Blackboard
+
+ style Memory fill:#e1f5ff
+ style Blackboard fill:#fff4e1
+ style MemoryUpdate fill:#ffe1e1
+```
+
+### Integration Points
+
+| Component | Interaction with Memory | Interaction with Blackboard |
+|-----------|-------------------------|----------------------------|
+| **AgentState.handle()** | - | Read shared state, write delegation info |
+| **DATA_COLLECTION Strategy** | Read recent steps for context | - |
+| **LLM_INTERACTION Strategy** | Read history for prompt building | - |
+| **ACTION_EXECUTION Strategy** | - | - |
+| **MEMORY_UPDATE Strategy** | Write MemoryItem after each step | Write execution trajectories |
+| **ProcessorTemplate** | Maintain agent-specific Memory instance | Access shared Blackboard instance |
+
+**Memory vs Blackboard Decision Guide:**
+
+Use Memory when:
+- Information is agent-specific (execution history)
+- Data is only needed during current session
+- Building LLM prompts with recent context
+- Tracking agent's own performance
+
+Use Blackboard when:
+- Information needs to be shared across agents
+- Data should persist across session restarts
+- Coordinating multi-agent workflows
+- Implementing handoffs between agents
+- Storing global task state
+
+---
+
+## Best Practices
+
+### Memory Management
+
+**Limit Memory Size:**
+```python
+# Prevent unbounded memory growth
+class Memory:
+ MAX_ITEMS = 100
+
+ def add_memory_item(self, item):
+ self._content.append(item)
+ if len(self._content) > self.MAX_ITEMS:
+ self._content = self._content[-self.MAX_ITEMS:] # Keep latest 100
+```
+
+**Selective Context for LLM:**
+```python
+# Don't send full MemoryItem objects to LLM
+def build_prompt_context(memory):
+ all_items = memory.content
+ recent = all_items[-5:] if len(all_items) > 5 else all_items
+ return "\n".join([
+ f"Step {item.get_value('step')}: {item.get_value('action')} -> "
+ f"{item.get_value('status')}"
+ for item in recent
+ ])
+```
+
+**Avoid Storing Large Binary Data:**
+Store file paths instead of file contents in MemoryItem:
+```python
+# Good: Store path
+memory_item.set_value("screenshot", "screenshots/step_3.png")
+
+# Bad: Store binary data
+# memory_item.set_value("screenshot", )
+```
+
+### Blackboard Management
+
+**Organize with Descriptive Keys:**
+```python
+# Use descriptive keys in MemoryItem dictionaries
+blackboard.add_trajectories({
+ "step": 1,
+ "agent": "HostAgent",
+ "action": "select_app",
+ "app_name": "Word",
+ "timestamp": "2025-11-12T10:00:00"
+})
+```
+
+**Regular Serialization:**
+```python
+# Periodically save blackboard state
+class Session:
+ def __init__(self):
+ self.blackboard = Blackboard()
+ self.save_interval = 10 # Every 10 steps
+
+ def execute_step(self, step_num):
+ # ... execute step ...
+
+ if step_num % self.save_interval == 0:
+ state = self.blackboard.blackboard_to_dict()
+ with open("blackboard_backup.json", "w") as f:
+ json.dump(state, f)
+```
+
+**Clean Up Appropriately:**
+```python
+# Clear blackboard when starting new session
+if new_session:
+ blackboard.clear()
+```
+
+---
+
+## Common Pitfalls
+
+**Pitfall 1: Confusing Memory and Blackboard Scope**
+
+Problem: Storing agent-specific data in Blackboard or shared data in Memory.
+
+Solution: Follow the scope principle:
+- Memory = agent-specific, session-lifetime
+- Blackboard = multi-agent shared, persistent
+
+```python
+# Correct
+agent.memory.add_memory_item(...) # Agent's own history
+blackboard.add_trajectories({...}) # Shared execution history
+```
+
+**Pitfall 2: Memory Leaks in Long Sessions**
+
+Problem: Memory grows unbounded in long-running sessions.
+
+Solution: Implement memory size limits or periodic cleanup:
+```python
+# Add size limit
+if len(memory.content) > 1000:
+ memory._content = memory.content[-500:] # Keep recent half
+```
+
+**Pitfall 3: Not Preserving Important State**
+
+Problem: Losing important state during crashes.
+
+Solution: Periodically serialize critical Blackboard state:
+```python
+# After critical operations
+state = blackboard.blackboard_to_dict()
+with open("checkpoint.json", "w") as f:
+ json.dump(state, f)
+```
+
+## Related Documentation
+
+- [Device Agent Overview](../overview.md) - Memory system in overall architecture
+- [Strategy Layer](processor.md) - `MEMORY_UPDATE` strategy implementation
+- [State Layer](state.md) - States reading/writing Blackboard for coordination
+- [Module System - Round](../../modules/round.md) - Round-level memory management
+- [Module System - Context](../../modules/context.md) - Context data vs Memory data separation
+
+## API Reference
+
+For complete API documentation, see:
+
+::: agents.memory.memory.Memory
+::: agents.memory.memory.MemoryItem
+::: agents.memory.blackboard.Blackboard
+
+## Summary
+
+**Key Takeaways:**
+- **Dual-Memory Architecture**: Memory (short-term, agent-specific) + Blackboard (long-term, shared)
+- **Memory for Execution History**: Stores chronological MemoryItem instances for LLM context and debugging
+- **Blackboard for Coordination**: Implements Blackboard Pattern for multi-agent communication
+- **Flexible Schema**: MemoryItem supports custom fields for platform-specific requirements
+- **Persistence Support**: Blackboard can serialize/deserialize via dictionaries for session recovery
+- **Integration**: MEMORY_UPDATE strategy writes to Memory, states coordinate via Blackboard
+- **Best Practices**: Limit memory size, organize Blackboard with descriptive keys, periodically serialize state
+- **Scope Awareness**: Use Memory for agent-specific data, Blackboard for shared coordination
+
+The Memory System provides the foundation for both individual agent intelligence (through execution history) and collective multi-agent coordination (through shared knowledge space), enabling UFO3 to orchestrate complex cross-device tasks effectively.
\ No newline at end of file
diff --git a/documents/docs/infrastructure/agents/design/processor.md b/documents/docs/infrastructure/agents/design/processor.md
new file mode 100644
index 000000000..a8cb70d33
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/processor.md
@@ -0,0 +1,874 @@
+# Strategy Layer: Processor (Level-2)
+
+The **Processor** is the core component of the **Strategy Layer (Level-2)**, providing a configurable framework that orchestrates **ProcessingStrategies** through defined phases. Each agent state encapsulates a **ProcessorTemplate** that manages strategy registration, middleware chains, dependency validation, and context management. Together with modular strategies, the processor enables agents to compose complex execution workflows from reusable components.
+
+## Overview
+
+The Processor implements the orchestration framework within **Level-2: Strategy Layer** of the [three-layer device agent architecture](../overview.md#three-layer-architecture). The Strategy Layer handles:
+
+- **Processor Framework** (This Document): Orchestrates strategy execution workflow
+- **Processing Strategies** (See [strategy.md](strategy.md)): Modular execution units
+- **Middleware System**: Cross-cutting concerns (logging, metrics, error handling)
+- **Dependency Validation**: Ensure strategies execute in correct order
+- **Context Management**: Unified data access across strategies
+
+```mermaid
+graph TB
+ subgraph "Level-2: Strategy Layer"
+ State[AgentState Level-1 FSM] -->|encapsulates| Processor[ProcessorTemplate Strategy Orchestrator]
+
+ Processor -->|registers| Registry[Strategy Registry Phase → Strategy mapping]
+ Processor -->|configures| Middleware[Middleware Chain Logging, Metrics, etc.]
+
+ Processor -->|Phase 1| DC[DATA_COLLECTION Strategy/Strategies]
+ Processor -->|Phase 2| LLM[LLM_INTERACTION Strategy/Strategies]
+ Processor -->|Phase 3| AE[ACTION_EXECUTION Strategy/Strategies]
+ Processor -->|Phase 4| MU[MEMORY_UPDATE Strategy/Strategies]
+
+ DC -->|provides data| Context[ProcessingContext]
+ Context -->|consumed by| LLM
+ LLM -->|provides actions| Context
+ Context -->|consumed by| AE
+ AE -->|provides results| Context
+ Context -->|consumed by| MU
+ end
+
+ Strategies[ProcessingStrategy Implementations] -.registered by.-> Processor
+ Middleware -.wraps.-> DC
+ Middleware -.wraps.-> LLM
+ Middleware -.wraps.-> AE
+ Middleware -.wraps.-> MU
+```
+
+**Design Philosophy:** The Processor framework follows the **Template Method Pattern** where `ProcessorTemplate.process()` defines the workflow skeleton, subclasses configure phase-specific strategies, and middleware applies cross-cutting concerns uniformly. Strategies and middleware are injected at initialization, enabling extensibility without modifying the core framework.
+
+---
+
+## ProcessorTemplate Framework
+
+The `ProcessorTemplate` is an **abstract base class** that defines the execution workflow. Platform-specific processors (AppAgentProcessor, HostAgentProcessor, LinuxAgentProcessor) subclass it to configure platform-specific strategies and middleware.
+
+### ProcessorTemplate Structure
+
+```python
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Type
+from enum import Enum
+
+class ProcessingPhase(Enum):
+ """Enumeration of processor execution phases"""
+ SETUP = "setup" # Initialization (optional)
+ DATA_COLLECTION = "data_collection" # Gather context from device
+ LLM_INTERACTION = "llm_interaction" # LLM reasoning and decision
+ ACTION_EXECUTION = "action_execution" # Execute commands on device
+ MEMORY_UPDATE = "memory_update" # Update memory and blackboard
+ CLEANUP = "cleanup" # Cleanup (optional)
+
+
+class ProcessorTemplate(ABC):
+ """
+ Abstract processor template defining workflow orchestration framework.
+
+ Responsibilities:
+ 1. Strategy Registration: Configure strategies for each phase
+ 2. Middleware Management: Setup cross-cutting concern handlers
+ 3. Dependency Validation: Ensure strategy data flow is valid
+ 4. Workflow Execution: Orchestrate strategy execution in phase order
+ 5. Context Management: Create and manage ProcessingContext
+
+ Subclasses must implement:
+ - _setup_strategies(): Register strategies for processing phases
+ - _setup_middleware(): Register middleware (optional)
+ """
+
+ # Subclasses can override to use custom context class
+ processor_context_class: Type[BasicProcessorContext] = BasicProcessorContext
+
+ def __init__(self, agent: BasicAgent, global_context: Context):
+ """
+ Initialize processor.
+
+ :param agent: The agent instance
+ :param global_context: Shared global context (session-wide)
+ """
+ self.agent = agent
+ self.global_context = global_context
+
+ # Strategy registry: phase -> strategy mapping
+ self.strategies: Dict[ProcessingPhase, ProcessingStrategy] = {}
+
+ # Middleware chain (executed in order)
+ self.middleware_chain: List[ProcessorMiddleware] = []
+
+ # Logging
+ self.logger = logging.getLogger(self.__class__.__name__)
+
+ # Dependency validator
+ self.dependency_validator = StrategyDependencyValidator()
+
+ # Lifecycle
+ self._setup_strategies() # Subclass configures strategies
+ self._setup_middleware() # Subclass configures middleware
+ self._validate_strategy_chain() # Validate dependencies
+
+ # Create processing context (local data store)
+ self.processing_context = self._create_processing_context()
+
+ @abstractmethod
+ def _setup_strategies(self) -> None:
+ """
+ Setup strategies for each processing phase.
+
+ Subclasses must implement this method to configure their strategy workflow.
+
+ Example:
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ ScreenshotStrategy(),
+ UITreeStrategy()
+ ])
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = LLMStrategy()
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = ActionStrategy()
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = MemoryStrategy()
+ """
+ pass
+
+ def _setup_middleware(self) -> None:
+ """
+ Setup middleware for cross-cutting concerns.
+
+ Subclasses can override to add middleware (logging, metrics, etc.).
+ Default: no middleware.
+
+ Example:
+ self.middleware_chain = [
+ LoggingMiddleware(),
+ MetricsMiddleware(),
+ ErrorHandlingMiddleware()
+ ]
+ """
+ pass
+
+ def _validate_strategy_chain(self) -> None:
+ """
+ Validate that strategy dependencies are satisfied.
+
+ Raises ProcessingException if validation fails.
+ """
+ errors = self.dependency_validator.validate_chain(self.strategies)
+ if errors:
+ error_msg = "Strategy chain validation failed:\n" + "\n".join(errors)
+ self.logger.error(error_msg)
+ raise ProcessingException(error_msg)
+
+ def _create_processing_context(self) -> ProcessingContext:
+ """
+ Create processing context with local and global data separation.
+
+ :return: ProcessingContext instance
+ """
+ local_context = self.processor_context_class()
+ return ProcessingContext(
+ global_context=self.global_context,
+ local_context=local_context
+ )
+
+ async def process(self) -> None:
+ """
+ Main execution method - orchestrates workflow execution.
+
+ Workflow:
+ 1. Execute strategies in phase order (DATA_COLLECTION → LLM → ACTION → MEMORY)
+ 2. Apply middleware before/after each strategy
+ 3. Validate dependencies before each strategy execution
+ 4. Update context with strategy outputs
+ 5. Handle errors according to strategy fail_fast setting
+
+ :raises ProcessingException: If critical error occurs
+ """
+ try:
+ self.logger.info(f"Starting processor execution: {self.__class__.__name__}")
+
+ # Define execution order
+ execution_order = [
+ ProcessingPhase.SETUP,
+ ProcessingPhase.DATA_COLLECTION,
+ ProcessingPhase.LLM_INTERACTION,
+ ProcessingPhase.ACTION_EXECUTION,
+ ProcessingPhase.MEMORY_UPDATE,
+ ProcessingPhase.CLEANUP
+ ]
+
+ # Execute each phase
+ for phase in execution_order:
+ strategy = self.strategies.get(phase)
+ if not strategy:
+ self.logger.debug(f"No strategy registered for phase {phase.value}, skipping")
+ continue
+
+ self.logger.info(f"Executing phase: {phase.value} with strategy: {strategy.name}")
+
+ # Validate dependencies
+ missing_deps = strategy.validate_dependencies(self.processing_context)
+ if missing_deps:
+ raise ProcessingException(
+ f"Strategy {strategy.name} missing required dependencies: {missing_deps}"
+ )
+
+ # Apply middleware (before)
+ await self._apply_middleware_before(phase, strategy)
+
+ # Execute strategy
+ result = await strategy.execute(self.agent, self.processing_context)
+
+ # Handle result
+ if result.success:
+ self.logger.info(f"Strategy {strategy.name} succeeded")
+ # Update context with strategy outputs
+ self.processing_context.update_local(result.data)
+ else:
+ self.logger.error(f"Strategy {strategy.name} failed: {result.error}")
+ if strategy.fail_fast:
+ raise ProcessingException(
+ f"Strategy {strategy.name} failed in phase {phase.value}: {result.error}"
+ )
+ else:
+ self.logger.warning(f"Continuing despite failure in {strategy.name}")
+
+ # Apply middleware (after)
+ await self._apply_middleware_after(phase, strategy, result)
+
+ # Finalize context (promote local data to global if needed)
+ self._finalize_processing_context()
+
+ self.logger.info("Processor execution completed successfully")
+
+ except Exception as e:
+ self.logger.error(f"Processor execution failed: {e}", exc_info=True)
+ raise
+
+ async def _apply_middleware_before(
+ self,
+ phase: ProcessingPhase,
+ strategy: ProcessingStrategy
+ ) -> None:
+ """
+ Apply middleware before strategy execution.
+
+ :param phase: Current processing phase
+ :param strategy: Strategy about to execute
+ """
+ for middleware in self.middleware_chain:
+ await middleware.before_execute(phase, strategy, self.processing_context)
+
+ async def _apply_middleware_after(
+ self,
+ phase: ProcessingPhase,
+ strategy: ProcessingStrategy,
+ result: ProcessingResult
+ ) -> None:
+ """
+ Apply middleware after strategy execution.
+
+ :param phase: Current processing phase
+ :param strategy: Strategy that just executed
+ :param result: Strategy execution result
+ """
+ for middleware in self.middleware_chain:
+ await middleware.after_execute(phase, strategy, result, self.processing_context)
+
+ def _finalize_processing_context(self) -> None:
+ """
+ Finalize processing context after workflow completion.
+
+ Subclasses can override to customize context finalization.
+ Default: Promote selected local data to global context.
+ """
+ # Example: Promote final action status to global context
+ if self.processing_context.get_local("action_success") is not None:
+ self.global_context.set(
+ "last_action_success",
+ self.processing_context.get_local("action_success")
+ raise
+```
+
+### ProcessorTemplate Benefits
+
+**Consistent Workflow:** All processors follow the same execution pattern, ensuring predictable behavior across platforms.
+
+**Platform Customization:** Subclasses configure platform-specific strategies without modifying the core framework.
+
+**Reusable Framework:** Core orchestration logic is shared across all processors, reducing code duplication.
+
+**Middleware Support:** Cross-cutting concerns (logging, metrics, error handling) are applied uniformly to all strategy executions.
+
+**Testable:** Each phase can be tested independently with mock strategies and contexts.
+
+---
+
+## Strategy Registration
+
+Processors configure their workflow by **registering strategies** for each processing phase:
+
+```mermaid
+graph TB
+ subgraph "Strategy Registration"
+ Processor[ProcessorTemplate Subclass]
+
+ Processor -->|_setup_strategies()| Registry[Strategy Registry]
+
+ Registry -->|ProcessingPhase.DATA_COLLECTION| DC[ScreenshotStrategy + UITreeStrategy]
+ Registry -->|ProcessingPhase.LLM_INTERACTION| LLM[LLMStrategy]
+ Registry -->|ProcessingPhase.ACTION_EXECUTION| AE[ActionStrategy]
+ Registry -->|ProcessingPhase.MEMORY_UPDATE| MU[MemoryStrategy]
+ end
+```
+
+### Example: Windows AppAgent Processor
+
+```python
+from ufo.agents.processors.core.processor_framework import ProcessorTemplate, ProcessingPhase
+from ufo.agents.processors.strategies.processing_strategy import ComposedStrategy
+
+class AppAgentProcessor(ProcessorTemplate):
+ """Processor for Windows AppAgent (UI Automation)"""
+
+ processor_context_class = AppAgentProcessorContext # Custom context type
+
+ def _setup_strategies(self):
+ """Configure strategies for Windows UI automation workflow"""
+
+ # Phase 1: DATA_COLLECTION - Compose multiple strategies
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ AppScreenshotCaptureStrategy(), # Capture application screenshot
+ AppControlInfoStrategy() # Extract UI Automation tree
+ ],
+ name="AppDataCollection",
+ phase=ProcessingPhase.DATA_COLLECTION
+ )
+
+ # Phase 2: LLM_INTERACTION - Single strategy
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = AppLLMInteractionStrategy()
+
+ # Phase 3: ACTION_EXECUTION - Execute UI commands
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = AppActionExecutionStrategy()
+
+ # Phase 4: MEMORY_UPDATE - Update memory and blackboard
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = AppMemoryUpdateStrategy()
+
+ def _setup_middleware(self):
+ """Configure middleware for logging and metrics"""
+ self.middleware_chain = [
+ EnhancedLoggingMiddleware()
+ ]
+```
+
+### Example: Linux Agent Processor
+
+```python
+class LinuxAgentProcessor(ProcessorTemplate):
+ """Processor for Linux Agent (Shell Commands)"""
+
+ processor_context_class = LinuxAgentProcessorContext
+
+ def _setup_strategies(self):
+ """Configure strategies for Linux shell workflow"""
+
+ # Phase 1: DATA_COLLECTION - Screenshot + shell output
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ CustomizedScreenshotCaptureStrategy(),
+ ShellOutputStrategy()
+ ])
+
+ # Phase 2: LLM_INTERACTION - Generate shell commands
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = CustomizedLLMInteractionStrategy()
+
+ # Phase 3: ACTION_EXECUTION - Execute shell commands
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = LinuxActionExecutionStrategy()
+
+ # Phase 4: MEMORY_UPDATE - Record command history
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = LinuxMemoryUpdateStrategy()
+```
+
+**Registration Best Practices:**
+
+- Use ComposedStrategy for phases requiring multiple data sources (e.g., DATA_COLLECTION)
+- Use single strategy for phases with focused responsibility (e.g., LLM_INTERACTION)
+- Don't register SETUP/CLEANUP phases unless needed for initialization/cleanup
+- Override `processor_context_class` for platform-specific data structures
+
+---
+
+## Middleware System
+
+Middleware provides cross-cutting concerns that apply uniformly across all strategy executions. The middleware chain executes before/after processing and handles errors.
+
+```mermaid
+graph LR
+ subgraph "Middleware Chain"
+ MW1[EnhancedLogging Middleware]
+ end
+
+ Processor[ProcessorTemplate]
+
+ MW1 -.before_process.-> Processor
+ Processor -.after_process.-> MW1
+ Processor -.on_error.-> MW1
+```
+
+### ProcessorMiddleware Interface
+
+```python
+from abc import ABC, abstractmethod
+from typing import Optional
+
+class ProcessorMiddleware(ABC):
+ """
+ Abstract base for processor middleware.
+
+ Middleware wraps strategy execution to provide cross-cutting concerns
+ such as logging, metrics collection, error handling, caching, etc.
+ """
+
+ @abstractmethod
+ async def before_process(
+ self,
+ processor: ProcessorTemplate,
+ context: ProcessingContext
+ ) -> None:
+ """
+ Called before processing starts.
+
+ :param processor: The processor instance
+ :param context: Processing context
+ """
+ pass
+
+ @abstractmethod
+ async def after_process(
+ self,
+ processor: ProcessorTemplate,
+ result: ProcessingResult
+ ) -> None:
+ """
+ Called after processing completes.
+
+ :param processor: The processor instance
+ :param result: Processing execution result
+ """
+ pass
+
+ @abstractmethod
+ async def on_error(
+ self,
+ processor: ProcessorTemplate,
+ error: Exception
+ ) -> None:
+ """
+ Called when an error occurs during processing.
+
+ :param processor: The processor instance
+ :param error: The error that occurred
+ """
+ pass
+```
+
+### Built-in Middleware: EnhancedLoggingMiddleware
+
+The framework provides `EnhancedLoggingMiddleware` for comprehensive logging during processor execution:
+
+```python
+class EnhancedLoggingMiddleware(ProcessorMiddleware):
+ """Enhanced logging middleware that handles different types of errors appropriately"""
+
+ def __init__(self, log_level: int = logging.INFO, name: Optional[str] = None):
+ super().__init__(name)
+ self.logger = logging.getLogger(f"{self.__class__.__name__}.{self.name}")
+ self.log_level = log_level
+
+ async def before_process(self, processor, context):
+ """Log processing start with context information"""
+ round_num = context.get("round_num", 0)
+ round_step = context.get("round_step", 0)
+
+ self.logger.log(
+ self.log_level,
+ f"Starting processing: Round {round_num + 1}, Step {round_step + 1}, "
+ f"Processor: {processor.__class__.__name__}"
+ )
+
+ async def after_process(self, processor, result):
+ """Log processing completion with result summary and save to file"""
+ if result.success:
+ self.logger.log(
+ self.log_level,
+ f"Processing completed successfully in {result.execution_time:.2f}s"
+ )
+
+ # Log phase execution times if available
+ data_keys = list(result.data.keys())
+ if data_keys:
+ self.logger.debug(f"Result data keys: {data_keys}")
+ else:
+ self.logger.warning(f"Processing completed with failure: {result.error}")
+
+ # Save local context to log file
+ local_logger = processor.processing_context.global_context.get(ContextNames.LOGGER)
+ local_context = processor.processing_context.local_context
+
+ local_context.total_time = result.execution_time
+
+ # Record phase time costs
+ phrase_time_cost = {}
+ for phrase, phrase_result in processor.processing_context.phase_results.items():
+ phrase_time_cost[phrase.name] = phrase_result.execution_time
+
+ local_context.execution_times = phrase_time_cost
+
+ # Write to log file
+ safe_obj = to_jsonable_python(local_context.to_dict(selective=True))
+ local_context_string = json.dumps(safe_obj, ensure_ascii=False)
+ local_logger.write(local_context_string)
+
+ self.logger.info("Log saved successfully.")
+
+ async def on_error(self, processor, error):
+ """Enhanced error logging with context information"""
+ if isinstance(error, ProcessingException):
+ self.logger.error(
+ f"ProcessingException in {processor.__class__.__name__}:\n"
+ f" Phase: {error.phase}\n"
+ f" Message: {str(error)}\n"
+ f" Context: {error.context_data}\n"
+ f" Original Exception: {error.original_exception}"
+ )
+
+ if error.original_exception:
+ self.logger.info(
+ f"Original traceback:\n{traceback.format_exception(error.original_exception)}"
+ )
+ else:
+ self.logger.error(
+ f"Unexpected error in {processor.__class__.__name__}: {str(error)}\n"
+ f"Error type: {type(error).__name__}\n"
+ f"Traceback:\n{traceback.format_exception(error)}"
+ )
+```
+
+**Key Features:**
+
+- **Context-Aware Logging**: Logs round/step information for traceability
+- **Result Summary**: Logs execution time and phase breakdown
+- **Persistent Logging**: Saves structured context data to log files
+- **Enhanced Error Handling**: Distinguishes ProcessingException from general errors
+- **Traceback Capture**: Full stack traces for debugging
+
+### Configuring Middleware
+
+Processors configure middleware in `_setup_middleware()`:
+
+```python
+class AppAgentProcessor(ProcessorTemplate):
+ def _setup_middleware(self):
+ """Setup middleware chain"""
+ self.middleware_chain = [
+ EnhancedLoggingMiddleware(log_level=logging.INFO, name="AppAgent")
+ ]
+```
+
+**Middleware Execution Order:**
+
+1. **Before Processing**: `before_process()` called for each middleware in order
+2. **Strategy Execution**: Strategies execute through phases
+3. **After Processing**: `after_process()` called for each middleware in reverse order
+4. **On Error**: `on_error()` called for all middleware if exception occurs
+
+**Middleware Benefits:**
+
+- **Separation of Concerns**: Cross-cutting logic separated from strategy logic
+- **Reusability**: Same middleware can be used across different processors
+- **Non-invasive**: Add/remove middleware without modifying strategies
+
+---
+
+## Workflow Execution
+
+The processor executes the workflow by orchestrating strategies through defined phases:
+
+```mermaid
+sequenceDiagram
+ participant State as AgentState
+ participant Processor as ProcessorTemplate
+ participant MW as Middleware Chain
+ participant Strategy as ProcessingStrategy
+ participant Context as ProcessingContext
+
+ State->>Processor: process()
+
+ Processor->>MW: before_process(processor, context)
+ MW-->>Processor: Ready
+
+ loop For each Phase
+ Processor->>Processor: Get strategy for phase
+ Processor->>Strategy: validate_dependencies(context)
+ Strategy-->>Processor: [] (no missing deps)
+
+ Processor->>Strategy: execute(agent, context)
+ Strategy->>Context: get_local("screenshot")
+ Context-->>Strategy: screenshot data
+ Strategy->>Strategy: Process data
+ Strategy->>Context: update_local({"parsed_response": ...})
+ Strategy-->>Processor: ProcessingResult(success=True, data={...})
+
+ Processor->>Context: Update with strategy outputs
+ end
+
+ Processor->>MW: after_process(processor, result)
+ MW-->>Processor: (middleware processing)
+
+ Processor->>Processor: _finalize_processing_context()
+ Processor-->>State: ProcessingResult
+```
+
+### Execution Order
+
+```python
+# Defined in ProcessorTemplate.process()
+execution_order = [
+ ProcessingPhase.SETUP, # Optional: Initialize resources
+ ProcessingPhase.DATA_COLLECTION, # Gather device/environment context
+ ProcessingPhase.LLM_INTERACTION, # LLM reasoning and decision-making
+ ProcessingPhase.ACTION_EXECUTION, # Execute actions on device
+ ProcessingPhase.MEMORY_UPDATE, # Update memory and blackboard
+ ProcessingPhase.CLEANUP # Optional: Cleanup resources
+]
+```
+
+**Phase Execution Rules:**
+
+- **Optional Phases**: SETUP and CLEANUP are optional (skipped if no strategy registered)
+- **Sequential Execution**: Phases execute in fixed order (no parallelization)
+- **Dependency Validation**: Validated before each strategy execution using `StrategyDependencyValidator`
+- **Fail-Fast vs. Continue**: Strategy `fail_fast` setting determines error handling
+- **Context Updates**: Each strategy's outputs immediately available to next strategy via `ProcessingContext`
+
+---
+
+## ProcessingContext
+
+The `ProcessingContext` provides unified data access across strategies, separating local (processor-specific) and global (session-wide) data:
+
+```python
+@dataclass
+class ProcessingContext:
+ """
+ Processing context with local and global data separation.
+
+ :param global_context: Global context (shared across all components)
+ :param local_context: Local context (processor-specific data)
+ """
+ global_context: Context # Module system global context
+ local_context: BasicProcessorContext # Processor local data
+
+ def get_local(self, key: str, default=None) -> Any:
+ """
+ Get value from local context.
+
+ :param key: Field name
+ :param default: Default value if not found
+ :return: Field value or default
+ """
+ return getattr(self.local_context, key, default)
+
+ def get_global(self, key: str, default=None) -> Any:
+ """
+ Get value from global context.
+
+ :param key: Context key
+ :param default: Default value if not found
+ :return: Context value or default
+ """
+ return self.global_context.get(key, default)
+
+ def update_local(self, data: Dict[str, Any]) -> None:
+ """
+ Update local context with strategy outputs.
+
+ :param data: Dictionary of field name -> value pairs
+ """
+ self.local_context.update_from_dict(data)
+
+ def require_local(self, field_name: str, expected_type: Type = None) -> Any:
+ """
+ Get required field from local context.
+
+ :param field_name: Required field name
+ :param expected_type: Expected Python type (optional)
+ :return: Field value
+ :raises ProcessingException: If field missing or wrong type
+ """
+ value = self.get_local(field_name)
+ if value is None:
+ raise ProcessingException(f"Required field '{field_name}' not found in local context")
+ if expected_type and not isinstance(value, expected_type):
+ raise ProcessingException(
+ f"Field '{field_name}' has type {type(value).__name__}, "
+ f"expected {expected_type.__name__}"
+ )
+ return value
+```
+
+**Context Separation Rationale:**
+
+**Global Context** (session-wide, shared across all components):
+
+- User request (`REQUEST`)
+- Session ID, round number, step number
+- Configuration settings
+- Command dispatcher reference
+- Blackboard reference
+
+**Local Context** (processor-specific, temporary):
+
+- Screenshot data (`screenshot`, `screenshot_path`)
+- UI control information (`control_info`)
+- LLM parsed response (`parsed_response`)
+- Action execution results (`results`)
+- Temporary processing data
+
+---
+
+## Platform-Specific Processors
+
+Different agent types implement platform-specific processors:
+
+| Platform | Processor Class | DATA_COLLECTION | LLM_INTERACTION | ACTION_EXECUTION | MEMORY_UPDATE |
+|----------|----------------|-----------------|-----------------|------------------|---------------|
+| **Windows AppAgent** | `AppAgentProcessor` | Screenshot + UI tree | UI element selection | UI Automation commands | UI interaction history |
+| **Windows HostAgent** | `HostAgentProcessor` | Desktop screenshot + app list | Application selection | Launch app, create AppAgent | App selection history |
+| **Linux** | `LinuxAgentProcessor` | Screenshot + shell output | Shell command generation | Shell command execution | Command history |
+
+See the [Agent Types documentation](../agent_types.md) for platform-specific processor implementations.
+
+---
+
+## Best Practices
+
+### Processor Design Guidelines
+
+**1. Clear Phase Separation**: Each phase should have distinct responsibility
+
+- DATA_COLLECTION gathers raw data
+- LLM_INTERACTION performs reasoning
+- ACTION_EXECUTION executes commands
+- MEMORY_UPDATE persists state
+
+**2. Appropriate Strategy Composition**: Use `ComposedStrategy` for multi-source data collection
+
+```python
+self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ AppScreenshotCaptureStrategy(),
+ AppControlInfoStrategy()
+])
+```
+
+**3. Middleware for Cross-Cutting Concerns**: Don't implement logging/metrics in strategies
+
+**4. Dependency Validation**: Leverage automatic validation via `StrategyDependencyValidator`
+
+**5. Custom Context Classes**: Define platform-specific context classes when needed
+
+```python
+@dataclass
+class AppAgentProcessorContext(BasicProcessorContext):
+ """Extended context for Windows AppAgent"""
+ agent_type: str = "AppAgent"
+ screenshot: str = ""
+ screenshot_path: str = ""
+ control_info: str = ""
+ control_elements: List[Dict] = field(default_factory=list)
+ parsed_response: Dict = field(default_factory=dict)
+ action: List[Dict[str, Any]] = field(default_factory=list)
+ arguments: Dict = field(default_factory=dict)
+ results: str = ""
+```
+
+!!! warning "Common Pitfalls"
+ **Skipping Phases**: Don't skip required phases (DATA_COLLECTION → LLM → ACTION → MEMORY)
+
+ **Phase Order Changes**: Don't reorder phases (breaks dependency chain)
+
+ **Strategy State**: Don't store state in strategy instances (use context instead)
+
+ **Direct Agent Modification**: Don't modify agent attributes in processor (use proper channels like memory system)
+
+---
+
+## Integration with Other Layers
+
+```mermaid
+graph TB
+ subgraph "State Layer (Level-1)"
+ State[AgentState]
+ end
+
+ subgraph "Strategy Layer (Level-2)"
+ Processor[ProcessorTemplate]
+ Strategies[ProcessingStrategies]
+ Middleware[Middleware Chain]
+ end
+
+ subgraph "Command Layer (Level-3)"
+ Dispatcher[CommandDispatcher]
+ end
+
+ subgraph "Supporting Systems"
+ Memory[Memory System]
+ Context[Global Context]
+ end
+
+ State -->|calls process()| Processor
+ Processor -->|orchestrates| Strategies
+ Processor -->|applies| Middleware
+ Strategies -->|uses| Dispatcher
+ Strategies -->|updates| Memory
+ Strategies -->|reads/writes| Context
+```
+
+| Integration Point | Layer/Component | Relationship |
+|-------------------|-----------------|--------------|
+| **AgentState** | Level-1 State | State calls `processor.process()` to execute workflow |
+| **ProcessingStrategy** | Level-2 Strategy | Processor registers and executes strategies |
+| **CommandDispatcher** | Level-3 Command | ACTION_EXECUTION strategies use dispatcher |
+| **Memory/Blackboard** | Memory System | MEMORY_UPDATE strategies update agent memory |
+| **Global Context** | Module System | Processor reads request, writes results via context |
+
+See [State Layer](state.md), [Strategy Layer](strategy.md), and [Command Layer](command.md) for integration details.
+
+---
+
+## API Reference
+
+The following classes are documented via docstrings:
+
+- `ProcessorTemplate`: Abstract processor framework base class
+- `ProcessingPhase`: Enum defining processor execution phases
+- `ProcessingContext`: Unified context with local/global data separation
+- `ProcessorMiddleware`: Abstract middleware base class
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **ProcessorTemplate**: Abstract framework for workflow orchestration
+- **Strategy Registration**: Configure phase-specific strategies via `_setup_strategies()`
+- **Middleware System**: Cross-cutting concerns (logging, error handling) applied uniformly
+- **Workflow Execution**: Orchestrates DATA_COLLECTION → LLM → ACTION → MEMORY phases
+- **Dependency Validation**: Ensures strategies execute with required data available via `StrategyDependencyValidator`
+- **Context Management**: Separates local (processor) and global (session) data
+- **Platform Extensibility**: Subclass to create platform-specific processors
+- **Template Method Pattern**: Defines workflow skeleton, subclasses customize details
+
+The Processor provides the orchestration framework within the Strategy Layer that coordinates strategy execution, middleware application, and context management, enabling agents to execute complex workflows reliably and efficiently across diverse platforms.
diff --git a/documents/docs/infrastructure/agents/design/prompter.md b/documents/docs/infrastructure/agents/design/prompter.md
new file mode 100644
index 000000000..b86d897f2
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/prompter.md
@@ -0,0 +1,482 @@
+# Agent Prompter
+
+The `Prompter` is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. Each agent has its own `Prompter` class that defines the structure of the prompt and the information to be fed to the LLM.
+
+## Overview
+
+The Prompter system follows a hierarchical design pattern:
+
+```
+BasicPrompter (Abstract Base Class)
+├── HostAgentPrompter
+├── AppAgentPrompter
+├── EvaluationAgentPrompter
+├── ExperiencePrompter
+├── DemonstrationPrompter
+└── customized/
+ └── LinuxAgentPrompter (extends AppAgentPrompter)
+```
+
+Each prompter is responsible for:
+
+1. **Loading templates** from YAML configuration files
+2. **Constructing system prompts** with instructions, APIs, and examples
+3. **Building user prompts** from agent observations and context
+4. **Formatting multimodal content** (text + images for visual models)
+
+You can find all prompter implementations in the `ufo/prompter` folder.
+
+## Prompt Message Structure
+
+A prompt fed to the LLM is a list of dictionaries, where each dictionary represents a message with the following structure:
+
+| Key | Description | Example Values |
+| --- | --- | --- |
+| `role` | The role of the message | `system`, `user`, `assistant` |
+| `content` | The message content | String or list of content objects |
+
+For **visual models**, the `content` field can contain multiple elements:
+
+```python
+[
+ {"type": "text", "text": "Current Screenshots:"},
+ {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
+]
+```
+
+## Prompt Construction Workflow
+
+The final prompt is constructed through a multi-step process:
+
+```mermaid
+graph LR
+ A[Load Templates] --> B[Build System Prompt]
+ B --> C[Build User Prompt]
+ C --> D[Combine into Message List]
+ D --> E[Send to LLM]
+
+ B --> B1[Base Instructions]
+ B --> B2[API Documentation]
+ B --> B3[Examples]
+
+ C --> C1[Observation]
+ C --> C2[Retrieved Knowledge]
+ C --> C3[Blackboard State]
+```
+
+### Step 1: Template Loading
+
+Templates are loaded from YAML files during initialization:
+
+```python
+def __init__(self, is_visual: bool, prompt_template: str, example_prompt_template: str):
+ self.is_visual = is_visual
+ self.prompt_template = self.load_prompt_template(prompt_template, is_visual)
+ self.example_prompt_template = self.load_prompt_template(example_prompt_template, is_visual)
+```
+
+The `is_visual` parameter determines which template variant to load:
+- **Visual models**: Use templates with screenshot handling
+- **Non-visual models**: Use text-only templates
+
+### Step 2: System Prompt Construction
+
+The system prompt is built using the `system_prompt_construction()` method, which combines:
+
+1. **Base instructions** from the template
+2. **API documentation** via `api_prompt_helper()`
+3. **Demonstration examples** via `examples_prompt_helper()`
+4. **Third-party agent instructions** (for HostAgent)
+
+Example for HostAgent:
+
+```python
+def system_prompt_construction(self) -> str:
+ apis = self.api_prompt_helper(verbose=0)
+ examples = self.examples_prompt_helper()
+ third_party_instructions = self.third_party_agent_instruction()
+
+ system_key = "system" if self.is_visual else "system_nonvisual"
+
+ return self.prompt_template[system_key].format(
+ apis=apis,
+ examples=examples,
+ third_party_instructions=third_party_instructions,
+ )
+```
+
+### Step 3: User Prompt Construction
+
+The user prompt is constructed using the `user_prompt_construction()` method with agent-specific parameters:
+
+**HostAgent Parameters:**
+```python
+def user_prompt_construction(
+ self,
+ control_item: List[str], # Available applications/windows
+ prev_subtask: List[Dict], # Previous subtask history
+ prev_plan: List[str], # Previous plan steps
+ user_request: str, # Original user request
+ retrieved_docs: str = "", # Retrieved knowledge
+) -> str
+```
+
+**AppAgent Parameters:**
+```python
+def user_prompt_construction(
+ self,
+ control_item: List[str], # Available UI controls
+ prev_subtask: List[Dict], # Previous subtask history
+ prev_plan: List[str], # Previous plan steps
+ user_request: str, # Original user request
+ subtask: str, # Current subtask
+ current_application: str, # Current app name
+ host_message: List[str], # Messages from HostAgent
+ retrieved_docs: str = "", # Retrieved knowledge
+ last_success_actions: List = [], # Last successful actions
+) -> str
+```
+
+### Step 4: User Content Construction
+
+For multimodal models, the `user_content_construction()` method builds a list of content objects:
+
+```python
+def user_content_construction(self, image_list: List[str], ...) -> List[Dict]:
+ user_content = []
+
+ if self.is_visual:
+ # Add screenshots
+ for i, image in enumerate(image_list):
+ user_content.append({"type": "text", "text": f"Screenshot {i+1}:"})
+ user_content.append({"type": "image_url", "image_url": {"url": image}})
+
+ # Add text prompt
+ user_content.append({
+ "type": "text",
+ "text": self.user_prompt_construction(...)
+ })
+
+ return user_content
+```
+
+### Step 5: Final Assembly
+
+The `prompt_construction()` static method combines system and user prompts:
+
+```python
+@staticmethod
+def prompt_construction(system_prompt: str, user_content: List[Dict]) -> List:
+ return [
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": user_content}
+ ]
+```
+
+## Prompt Components
+
+### System Prompt
+
+The system prompt defines the agent's role, capabilities, and output format. It is loaded from YAML templates configured in the system configuration.
+
+**Template Locations:**
+- HostAgent: `ufo/prompts/share/base/host_agent.yaml`
+- AppAgent: `ufo/prompts/share/base/app_agent.yaml`
+- EvaluationAgent: `ufo/prompts/evaluation/evaluate.yaml`
+
+The system prompt is constructed by the `system_prompt_construction()` method and typically includes:
+
+| Component | Description | Method |
+| --- | --- | --- |
+| **Base Instructions** | Role definition, action guidelines, output format | Loaded from YAML template |
+| **API Documentation** | Available tools and their usage | `api_prompt_helper()` |
+| **Examples** | Demonstration examples for in-context learning | `examples_prompt_helper()` |
+| **Special Instructions** | Third-party agent integration (HostAgent only) | `third_party_agent_instruction()` |
+
+#### API Documentation
+
+The `api_prompt_helper()` method formats tool information for the LLM:
+
+```python
+def api_prompt_helper(self, verbose: int = 1) -> str:
+ """Construct formatted API documentation."""
+ return self.api_prompt_template
+```
+
+Tools are converted to LLM-readable format using `tool_to_llm_prompt()`:
+
+```
+Tool name: click_input
+Description: Click on a control item
+
+Parameters:
+- id (string, required): The ID of the control item
+- button (string, optional): Mouse button to click. Default: left
+- double (boolean, optional): Whether to double-click. Default: false
+
+Returns: Result of the click action
+
+Example usage:
+click_input(id="42", button="left", double=false)
+```
+
+#### Demonstration Examples
+
+The `examples_prompt_helper()` method constructs in-context learning examples:
+
+```python
+def examples_prompt_helper(
+ self,
+ header: str = "## Response Examples",
+ separator: str = "Example",
+ additional_examples: List[str] = []
+) -> str:
+ """Construct examples from YAML template."""
+ template = """
+ [User Request]:
+ {request}
+ [Response]:
+ {response}"""
+
+ example_list = []
+ for key, values in self.example_prompt_template.items():
+ if key.startswith("example"):
+ example = template.format(
+ request=values.get("Request"),
+ response=json.dumps(values.get("Response"))
+ )
+ example_list.append(example)
+
+ return self.retrieved_documents_prompt_helper(header, separator, example_list)
+```
+
+Examples are loaded from:
+- `ufo/prompts/examples/visual/` - For visual models
+- `ufo/prompts/examples/nonvisual/` - For text-only models
+
+### User Prompt
+
+The user prompt is constructed from the agent's current context and observations. It is built by the `user_prompt_construction()` method using information from:
+
+| Component | Description | Method |
+| --- | --- | --- |
+| **Observation** | Current state (UI controls, screenshots) | Passed as parameters |
+| **Retrieved Knowledge** | Documents from RAG system | `retrieved_documents_prompt_helper()` |
+| **Blackboard State** | Shared memory across agents | `blackboard_to_prompt()` |
+| **Task Context** | User request, subtask, plans | Passed as parameters |
+
+#### Retrieved Documents
+
+External knowledge is formatted using the `retrieved_documents_prompt_helper()` method:
+
+```python
+@staticmethod
+def retrieved_documents_prompt_helper(
+ header: str, # Section header
+ separator: str, # Document separator
+ documents: List[str] # Retrieved documents
+) -> str:
+ """Format retrieved documents for the prompt."""
+ if header:
+ prompt = f"\n<{header}:>\n"
+ else:
+ prompt = ""
+
+ for i, document in enumerate(documents):
+ if separator:
+ prompt += f"[{separator} {i+1}:]\n"
+ prompt += document + "\n\n"
+
+ return prompt
+```
+
+**Example Output:**
+```
+
+[Document 1:]
+To create a new email in Outlook, click the "New Email" button...
+
+[Document 2:]
+The email composition window has three main fields: To, Subject, and Body...
+```
+
+#### Blackboard Integration
+
+The Blackboard system allows agents to share information. Prompters can access this through:
+
+```python
+def blackboard_to_prompt(self) -> str:
+ """Convert Blackboard state to prompt text."""
+ # Implementation depends on specific agent needs
+ pass
+```
+
+## Specialized Prompters
+
+### HostAgentPrompter
+
+Specialized for desktop-level orchestration:
+
+**Key Features:**
+- Application selection and window management
+- Third-party agent integration support
+- Desktop-wide task planning
+
+**Unique Method:**
+```python
+def third_party_agent_instruction(self) -> str:
+ """Generate instructions for enabled third-party agents."""
+ enabled_agents = config.system.enabled_third_party_agents
+ instructions = []
+
+ for agent_name in enabled_agents:
+ config = get_third_party_config(agent_name)
+ instructions.append(f"{agent_name}: {config['INTRODUCTION']}")
+
+ return "\n".join(instructions)
+```
+
+### AppAgentPrompter
+
+Specialized for application-level interactions:
+
+**Key Features:**
+- UI control interaction
+- Multi-action sequence support
+- Application-specific API integration
+
+**Template Variants:**
+- `system`: Standard single-action mode
+- `system_as`: Action sequence mode (multi-action)
+- `system_nonvisual`: Text-only mode
+
+**Usage:**
+```python
+def system_prompt_construction(self, additional_examples: List[str] = []) -> str:
+ apis = self.api_prompt_helper(verbose=1)
+ examples = self.examples_prompt_helper(additional_examples=additional_examples)
+
+ # Select template based on configuration
+ if config.system.action_sequence:
+ system_key = "system_as"
+ else:
+ system_key = "system"
+
+ if not self.is_visual:
+ system_key += "_nonvisual"
+
+ return self.prompt_template[system_key].format(apis=apis, examples=examples)
+```
+
+### EvaluationAgentPrompter
+
+Specialized for task evaluation:
+
+**Purpose:** Assesses whether a Session or Round was successfully completed
+
+**Configuration:** Uses `ufo/prompts/evaluation/evaluate.yaml`
+
+### ExperiencePrompter
+
+Specialized for learning from execution traces:
+
+**Purpose:** Summarizes task completion trajectories for future reference
+
+**Use Case:** Self-experience learning in the Knowledge Substrate
+
+### DemonstrationPrompter
+
+Specialized for learning from human demonstrations:
+
+**Purpose:** Processes Step Recorder outputs into learnable examples
+
+**Use Case:** User demonstration learning in the Knowledge Substrate
+
+## Configuration
+
+Prompter behavior is controlled through system configuration:
+
+```yaml
+# config/ufo/system.yaml
+# Prompt template paths
+HOSTAGENT_PROMPT: "./ufo/prompts/share/base/host_agent.yaml"
+APPAGENT_PROMPT: "./ufo/prompts/share/base/app_agent.yaml"
+EVALUATION_PROMPT: "./ufo/prompts/evaluation/evaluate.yaml"
+
+# Example prompt paths (visual vs. non-visual)
+HOSTAGENT_EXAMPLE_PROMPT: "./ufo/prompts/examples/{mode}/host_agent_example.yaml"
+APPAGENT_EXAMPLE_PROMPT: "./ufo/prompts/examples/{mode}/app_agent_example.yaml"
+
+# Feature flags
+ACTION_SEQUENCE: False # Enable multi-action mode for AppAgent
+```
+
+The `{mode}` placeholder is automatically replaced with `visual` or `nonvisual` based on the LLM's capabilities.
+
+## Custom Prompters
+
+You can create custom prompters by extending `BasicPrompter` or existing specialized prompters:
+
+```python
+from ufo.prompter.agent_prompter import AppAgentPrompter
+
+class CustomAppPrompter(AppAgentPrompter):
+ """Custom prompter for specialized application."""
+
+ def system_prompt_construction(self, **kwargs) -> str:
+ # Add custom logic
+ base_prompt = super().system_prompt_construction(**kwargs)
+ custom_instructions = self.load_custom_instructions()
+ return base_prompt + "\n" + custom_instructions
+
+ def load_custom_instructions(self) -> str:
+ """Load application-specific instructions."""
+ return "Custom instructions for specialized app..."
+```
+
+
+# Reference
+
+## Class Hierarchy
+
+The `Prompter` system is implemented in the `ufo/prompter` folder with the following structure:
+
+```
+ufo/prompter/
+├── basic.py # BasicPrompter abstract base class
+├── agent_prompter.py # HostAgentPrompter, AppAgentPrompter
+├── eva_prompter.py # EvaluationAgentPrompter
+├── experience_prompter.py # ExperiencePrompter
+├── demonstration_prompter.py # DemonstrationPrompter
+└── customized/
+ └── linux_agent_prompter.py # LinuxAgentPrompter (custom)
+```
+
+## BasicPrompter API
+
+Below is the complete API reference for the `BasicPrompter` class:
+
+:::prompter.basic.BasicPrompter
+
+## Key Methods
+
+| Method | Purpose | Return Type |
+| --- | --- | --- |
+| `load_prompt_template()` | Load YAML template file | `Dict[str, str]` |
+| `system_prompt_construction()` | Build system prompt | `str` |
+| `user_prompt_construction()` | Build user text prompt | `str` |
+| `user_content_construction()` | Build full user content (text + images) | `List[Dict]` |
+| `prompt_construction()` | Combine system and user into message list | `List[Dict]` |
+| `api_prompt_helper()` | Format API documentation | `str` |
+| `examples_prompt_helper()` | Format demonstration examples | `str` |
+| `retrieved_documents_prompt_helper()` | Format retrieved knowledge | `str` |
+| `tool_to_llm_prompt()` | Convert single tool to LLM format | `str` |
+| `tools_to_llm_prompt()` | Convert multiple tools to LLM format | `str` |
+
+## See Also
+
+- [Prompts Overview](../../../ufo2/prompts/overview.md) - Prompt template structure
+- [Basic Template](../../../ufo2/prompts/basic_template.md) - YAML template format
+- [Example Prompts](../../../ufo2/prompts/examples_prompts.md) - Demonstration examples
+
+You can customize the `Prompter` class to tailor the prompt to your requirements. Start by extending `BasicPrompter` or one of the specialized prompters.
\ No newline at end of file
diff --git a/documents/docs/infrastructure/agents/design/state.md b/documents/docs/infrastructure/agents/design/state.md
new file mode 100644
index 000000000..524094151
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/state.md
@@ -0,0 +1,745 @@
+# State Layer (Level-1 FSM)
+
+The **State Layer** is the top-level control structure governing device agent lifecycle. It implements a Finite State Machine (FSM) that determines **when** and **what** to execute, delegating the **how** to the Strategy layer. Each state encapsulates transition logic, processor binding, and multi-agent coordination.
+
+## Overview
+
+The State Layer implements the **Level-1** of the [three-layer device agent architecture](../overview.md#three-layer-architecture). It provides:
+
+- **Finite State Machine (FSM)**: Governs agent execution lifecycle through state transitions
+- **State Management**: Singleton registry for state classes with lazy loading
+- **Transition Logic**: Rule-based and LLM-driven state transitions
+- **Multi-Agent Coordination**: State-level agent handoff for hierarchical workflows
+
+```mermaid
+graph TB
+ subgraph "State Layer Components"
+ Status[AgentStatus Enum 7 possible states]
+ Manager[AgentStateManager Singleton Registry]
+ State[AgentState Interface handle, next_state, next_agent]
+ Concrete[Concrete States ContinueState, FinishState, etc.]
+
+ Status --> Manager
+ Manager -->|lazy loads| Concrete
+ Concrete -.->|implements| State
+ end
+
+ Agent[BasicAgent] -->|current_state| State
+ State -->|delegates to| Processor[ProcessorTemplate Level-2 Strategy Layer]
+ State -->|transitions to| State
+ State -->|hands off to| Agent2[Next Agent]
+```
+
+## Design Philosophy
+
+The State Layer follows the **State Pattern** from Gang of Four design patterns:
+
+- **Encapsulation**: Each state encapsulates state-specific behavior
+- **Polymorphism**: States share common `AgentState` interface
+- **Dynamic Behavior**: Agent behavior changes dynamically as state changes
+- **Open/Closed Principle**: New states can be added via registration without modifying existing code
+
+## AgentStatus Enum
+
+The `AgentStatus` enum defines the **seven possible states** that a device agent can be in:
+
+```python
+class AgentStatus(Enum):
+ """Enumeration of agent states"""
+ ERROR = "ERROR" # Critical error occurred
+ FINISH = "FINISH" # Task completed successfully
+ CONTINUE = "CONTINUE" # Normal execution, continue processing
+ FAIL = "FAIL" # Task failed, cannot proceed
+ PENDING = "PENDING" # Waiting for external event (user input, async operation)
+ CONFIRM = "CONFIRM" # Awaiting user confirmation before proceeding
+ SCREENSHOT = "SCREENSHOT" # Capture observation data (screenshot, UI tree)
+```
+
+### State Characteristics
+
+| State | Type | Description | Typical Next States | Processor Executed |
+|-------|------|-------------|---------------------|-------------------|
+| **CONTINUE** | Active | Normal execution flow, agent processes next step | CONTINUE, FINISH, FAIL, ERROR, PENDING, CONFIRM | Yes ✅ |
+| **FINISH** | Terminal | Task completed successfully, agent stops | (none - end state) | No ❌ |
+| **FAIL** | Terminal | Task failed, agent stops with error | (none - end state) | No ❌ |
+| **ERROR** | Terminal | Critical error, agent stops immediately | (none - end state) | No ❌ |
+| **PENDING** | Waiting | Waiting for external event (user input, callback) | CONTINUE, FAIL | No ❌ |
+| **CONFIRM** | Waiting | Awaiting user confirmation (safety check) | CONTINUE, FAIL | Yes ✅ (collect confirmation) |
+| **SCREENSHOT** | Data Collection | Capture observation without action | CONTINUE | Yes ✅ (capture only) |
+
+### State Categories
+
+States can be categorized into three groups:
+
+- **Active States** (CONTINUE): Agent actively executing tasks
+- **Waiting States** (PENDING, CONFIRM, SCREENSHOT): Agent waiting for external input or data
+- **Terminal States** (FINISH, FAIL, ERROR): Agent execution completed (success or failure)
+
+## State Machine Diagram
+
+The following diagram shows the state machine transitions for a typical device agent:
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: Agent initialized
+
+ CONTINUE --> CONTINUE: Step executed successfully (LLM decides to continue)
+ CONTINUE --> PENDING: Waiting for external event (async operation, callback)
+ CONTINUE --> CONFIRM: User confirmation needed (safety check)
+ CONTINUE --> SCREENSHOT: Capture observation (data collection only)
+ CONTINUE --> FINISH: Task complete (LLM determines completion)
+ CONTINUE --> FAIL: Action failed (error handling)
+ CONTINUE --> ERROR: Critical failure (unrecoverable error)
+
+ PENDING --> CONTINUE: Event received (user input, callback returned)
+ PENDING --> FAIL: Timeout or error (event never received)
+
+ CONFIRM --> CONTINUE: User confirmed (approved to proceed)
+ CONFIRM --> FAIL: User rejected (operation cancelled)
+
+ SCREENSHOT --> CONTINUE: Screenshot captured (observation complete)
+
+ FINISH --> [*]: Success
+ FAIL --> [*]: Failure
+ ERROR --> [*]: Critical Error
+
+ note right of CONTINUE
+ Active state
+ Processor executes all strategies
+ end note
+
+ note right of PENDING
+ Waiting state
+ No processor execution
+ end note
+
+ note right of FINISH
+ Terminal state
+ Agent lifecycle ends
+ end note
+```
+
+### Transition Determination
+
+State transitions are determined by:
+
+1. **LLM Reasoning**: Agent analyzes results and decides next status (e.g., CONTINUE vs FINISH)
+2. **Rule-Based Logic**: Predefined rules trigger transitions (e.g., error → ERROR)
+3. **User Input**: User confirms or rejects → CONFIRM → CONTINUE/FAIL
+4. **External Events**: Async callback received → PENDING → CONTINUE
+
+## AgentStateManager (Singleton Registry)
+
+The `AgentStateManager` is a **singleton** that manages the registry of state classes. It provides:
+
+- **State Registration**: `@AgentStateManager.register` decorator to register state classes
+- **Lazy Loading**: State instances created only when first accessed
+- **Centralized Management**: Single source of truth for all agent states
+
+```mermaid
+graph TB
+ subgraph "AgentStateManager (Singleton)"
+ Registry[_state_mapping Dict[str, Type[AgentState]]]
+ Instances[_state_instance_mapping Dict[str, AgentState]]
+
+ Registry -->|lazy load on first access| Instances
+ end
+
+ Register[@register decorator] -->|adds class| Registry
+ GetState[get_state(status)] -->|creates/retrieves| Instances
+
+ Agent1[AppAgent] -->|requests| GetState
+ Agent2[HostAgent] -->|requests| GetState
+ Agent3[LinuxAgent] -->|requests| GetState
+
+ GetState -->|returns| State[AgentState instance]
+```
+
+### AgentStateManager Implementation
+
+```python
+class AgentStateManager(ABC, metaclass=SingletonABCMeta):
+ """
+ Singleton state manager for agent states.
+
+ Responsibilities:
+ - Register state classes via decorator
+ - Lazy load state instances on demand
+ - Provide centralized state access
+ """
+
+ _state_mapping: Dict[str, Type[AgentState]] = {} # Class registry
+
+ def __init__(self):
+ self._state_instance_mapping: Dict[str, AgentState] = {} # Instance cache
+
+ def get_state(self, status: str) -> AgentState:
+ """
+ Get state instance for the given status string.
+
+ :param status: The status string (e.g., "CONTINUE")
+ :return: The state instance
+
+ Note: Uses lazy loading - instances created on first access
+ """
+ # Lazy load: create instance only when first requested
+ if status not in self._state_instance_mapping:
+ state_class = self._state_mapping.get(status)
+ if state_class:
+ self._state_instance_mapping[status] = state_class()
+ else:
+ # Fallback to none_state if status not registered
+ self._state_instance_mapping[status] = self.none_state
+
+ return self._state_instance_mapping.get(status, self.none_state)
+
+ def add_state(self, status: str, state: AgentState) -> None:
+ """
+ Add a state instance at runtime (advanced usage).
+
+ :param status: The status string
+ :param state: The state instance
+ """
+ self._state_instance_mapping[status] = state
+
+ @property
+ def state_map(self) -> Dict[str, AgentState]:
+ """
+ The state mapping of status to state.
+ :return: The state mapping.
+ """
+ return self._state_instance_mapping
+
+ @classmethod
+ def register(cls, state_class: Type[AgentState]) -> Type[AgentState]:
+ """
+ Decorator to register state class.
+
+ Usage:
+ @AgentStateManager.register
+ class ContinueAppAgentState(AgentState):
+ @staticmethod
+ def name() -> str:
+ return AgentStatus.CONTINUE.value
+
+ :param state_class: The state class to register
+ :return: The state class (unchanged)
+ """
+ cls._state_mapping[state_class.name()] = state_class
+ return state_class
+
+ @property
+ @abstractmethod
+ def none_state(self) -> AgentState:
+ """
+ Fallback state when requested state not found.
+
+ :return: Default/fallback state instance
+ """
+ pass
+```
+
+### State Registration Pattern
+
+Each agent type (AppAgent, HostAgent, LinuxAgent) has its own `StateManager` subclass:
+
+```python
+# AppAgent states
+class AppAgentStateManager(AgentStateManager):
+ @property
+ def none_state(self):
+ return NoneAppAgentState()
+
+@AppAgentStateManager.register
+class ContinueAppAgentState(AgentState):
+ @classmethod
+ def name(cls):
+ return AgentStatus.CONTINUE.value
+```
+
+**Benefits of Singleton + Lazy Loading**:
+
+- **Memory Efficiency**: State instances created only when needed
+- **Single Source of Truth**: All agents share same state instances
+- **Thread-Safe**: Singleton metaclass ensures thread-safe instantiation
+- **Extensibility**: New states registered without modifying existing code
+
+## AgentState Interface
+
+All state classes implement the `AgentState` abstract interface:
+
+```python
+class AgentState(ABC):
+ """
+ Abstract base class for agent states.
+ """
+
+ @abstractmethod
+ async def handle(
+ self, agent: BasicAgent, context: Optional[Context] = None
+ ) -> None:
+ """
+ Handle the agent for the current step.
+ :param agent: The agent to handle.
+ :param context: The context for the agent and session.
+ """
+ pass
+
+ @abstractmethod
+ def next_agent(self, agent: BasicAgent) -> BasicAgent:
+ """
+ Get the agent for the next step.
+ :param agent: The agent for the current step.
+ :return: The agent for the next step.
+ """
+ return agent
+
+ @abstractmethod
+ def next_state(self, agent: BasicAgent) -> AgentState:
+ """
+ Get the state for the next step.
+ :param agent: The agent for the current step.
+ :return: The state for the next step.
+ """
+ pass
+
+ @abstractmethod
+ def is_round_end(self) -> bool:
+ """
+ Check if the round ends.
+ :return: True if the round ends, False otherwise.
+ """
+ pass
+
+ @abstractmethod
+ def is_subtask_end(self) -> bool:
+ """
+ Check if the subtask ends.
+ :return: True if the subtask ends, False otherwise.
+ """
+ pass
+
+ @classmethod
+ @abstractmethod
+ def agent_class(cls) -> Type[BasicAgent]:
+ """
+ The class of the agent.
+ :return: The class of the agent.
+ """
+ pass
+
+ @classmethod
+ @abstractmethod
+ def name(cls) -> str:
+ """
+ The class name of the state.
+ :return: The class name of the state.
+ """
+ return ""
+```
+
+### Method Responsibilities
+
+| Method | Purpose | Called By | Returns | Side Effects |
+|--------|---------|-----------|---------|--------------|
+| **handle()** | Execute state-specific logic | Round manager | None | Updates agent status, context, memory |
+| **next_state()** | FSM state transition | Round manager | Next `AgentState` instance | None (pure function) |
+| **next_agent()** | Multi-agent coordination | Round manager | Next `BasicAgent` instance | May create new agent instances |
+| **is_round_end()** | Check if round ends | Round manager | Boolean | None (pure function) |
+| **is_subtask_end()** | Check if subtask ends | Round manager | Boolean | None (pure function) |
+| **agent_class()** | Get agent class | State manager | Agent class type | None (class method) |
+| **name()** | State identifier | State manager registration | State name string | None (class method) |
+
+### Concrete State Example
+
+Here's an example of a concrete state for AppAgent's CONTINUE status:
+
+```python
+@AppAgentStateManager.register
+class ContinueAppAgentState(AgentState):
+ """
+ Continue state for AppAgent - normal execution flow.
+ """
+
+ async def handle(self, agent: AppAgent, context: Context):
+ """Execute AppAgent processor strategies."""
+ # Get processor (Level-2 Strategy Layer)
+ processor = agent.processor
+
+ # Execute all strategies in sequence
+ await processor.process(agent, context)
+
+ # Processor updates agent.status based on LLM response
+ # Possible status: CONTINUE, FINISH, FAIL, ERROR, CONFIRM, etc.
+
+ def next_state(self, agent: AppAgent) -> AgentState:
+ """Transition to next state based on agent status."""
+ state_manager = AppAgentStateManager()
+ return state_manager.get_state(agent.status)
+
+ def next_agent(self, agent: AppAgent) -> BasicAgent:
+ """For AppAgent, typically stays on same agent."""
+ # AppAgent continues executing unless delegating back to HostAgent
+ if agent.status == AgentStatus.FINISH:
+ return agent.host # Return to HostAgent
+ return agent # Continue with current agent
+
+ @classmethod
+ def name(cls) -> str:
+ """State name for registration"""
+ return AgentStatus.CONTINUE.value # "CONTINUE"
+```
+
+## State Lifecycle
+
+The following sequence diagram shows how states orchestrate agent execution:
+
+```mermaid
+sequenceDiagram
+ participant Round as Round Manager
+ participant Agent as BasicAgent
+ participant State as AgentState
+ participant Processor as ProcessorTemplate
+ participant LLM
+ participant Context
+
+ Round->>Agent: Get current_state
+ Agent-->>Round: Return state instance
+
+ Round->>State: handle(agent, context)
+ activate State
+
+ State->>Processor: process(agent, context)
+ activate Processor
+
+ Note over Processor: DATA_COLLECTION strategy
+ Processor->>Context: Store screenshot, UI info
+
+ Note over Processor: LLM_INTERACTION strategy
+ Processor->>LLM: Send prompt with context
+ LLM-->>Processor: Return action decision
+
+ Note over Processor: ACTION_EXECUTION strategy
+ Processor->>Processor: Execute commands
+
+ Note over Processor: MEMORY_UPDATE strategy
+ Processor->>Agent: Update memory, blackboard
+
+ Processor->>Agent: Set status (CONTINUE/FINISH/FAIL/etc)
+ deactivate Processor
+
+ deactivate State
+
+ Round->>State: next_state(agent)
+ State-->>Round: Return next state instance
+
+ Round->>State: next_agent(agent)
+ State-->>Round: Return next agent (may be same or different)
+
+ Round->>Round: Update current state, current agent
+ Round->>Round: Repeat until terminal state
+```
+
+### Execution Flow
+
+1. **Round Manager** calls `state.handle(agent, context)`
+2. **State** delegates to `processor.process(agent, context)` (Level-2 Strategy Layer)
+3. **Processor** executes strategies (DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE)
+4. **Processor** sets `agent.status` based on LLM response or error handling
+5. **Round Manager** calls `state.next_state(agent)` to get next state
+6. **Round Manager** calls `state.next_agent(agent)` to check for agent handoff
+7. **Round Manager** updates `agent.current_state` and repeats until terminal state
+
+## State-Specific Behaviors
+
+Different state types implement different behaviors in their `handle()` method:
+
+### Active State (CONTINUE)
+
+```python
+async def handle(self, agent, context):
+ """Execute full processor workflow"""
+ # Run all four strategy phases
+ await agent.processor.process(agent, context)
+ # Status updated by LLM response parsing
+```
+
+### Waiting State (PENDING)
+
+```python
+async def handle(self, agent, context):
+ """Wait for external event"""
+ # Do not execute processor
+ # Wait for callback, user input, or timeout
+ event = await wait_for_event(timeout=60)
+ if event:
+ agent.status = AgentStatus.CONTINUE
+ else:
+ agent.status = AgentStatus.FAIL
+```
+
+### Confirmation State (CONFIRM)
+
+```python
+async def handle(self, agent, context):
+ """Request user confirmation"""
+ # Execute DATA_COLLECTION to show current state
+ await agent.processor.execute_phase(ProcessingPhase.DATA_COLLECTION)
+
+ # Prompt user for confirmation
+ confirmed = await prompt_user_confirmation()
+
+ if confirmed:
+ agent.status = AgentStatus.CONTINUE
+ else:
+ agent.status = AgentStatus.FAIL
+```
+
+### Terminal State (FINISH/FAIL/ERROR)
+
+```python
+async def handle(self, agent, context):
+ """No action - state is terminal"""
+ # Terminal states do not execute processor
+ # Round manager will detect terminal status and end execution
+ pass
+```
+
+### Processor Execution by State Type
+
+| State Type | Executes Processor? | Which Phases? | Purpose |
+|------------|---------------------|---------------|---------|
+| CONTINUE | ✅ Yes | All phases | Full execution cycle |
+| SCREENSHOT | ✅ Yes | DATA_COLLECTION only | Observation without action |
+| CONFIRM | ✅ Yes | DATA_COLLECTION + custom | Show state, request confirmation |
+| PENDING | ❌ No | None | Wait for external event |
+| FINISH/FAIL/ERROR | ❌ No | None | Terminal states |
+
+## Multi-Agent Coordination
+
+The State Layer enables **multi-agent coordination** through the `next_agent()` method. This is critical for Windows agents (HostAgent → AppAgent hierarchy).
+
+```mermaid
+graph TB
+ subgraph "Multi-Agent State Transitions"
+ HS1[HostAgent: CONTINUE]
+ HS2[HostAgent: DELEGATE_TO_APP]
+ AS1[AppAgent: CONTINUE]
+ AS2[AppAgent: FINISH]
+ HS3[HostAgent: CONTINUE]
+
+ HS1 -->|next_state| HS2
+ HS2 -->|next_agent| AS1
+ AS1 -->|next_state| AS1
+ AS1 -->|next_state| AS2
+ AS2 -->|next_agent| HS3
+ end
+```
+
+### HostAgent → AppAgent Delegation
+
+```python
+class ContinueHostAgentState(AgentState):
+ def next_agent(self, agent: HostAgent) -> BasicAgent:
+ """Delegate to AppAgent when task decomposed"""
+ if agent.status == "DELEGATE_TO_APP":
+ # Create AppAgent for selected application
+ app_agent = AgentFactory.create_agent(
+ agent_type="app",
+ name=f"AppAgent/{agent.selected_app}",
+ process_name=agent.selected_process,
+ app_root_name=agent.selected_app,
+ is_visual=True,
+ main_prompt=config.appagent_prompt,
+ example_prompt=config.appagent_example_prompt
+ )
+
+ # Set HostAgent as host (for returning)
+ app_agent.host = agent
+
+ # Transfer context via blackboard
+ app_agent.blackboard = agent.blackboard
+
+ return app_agent
+
+ # No delegation, continue with HostAgent
+ return agent
+```
+
+### AppAgent → HostAgent Return
+
+```python
+class FinishAppAgentState(AgentState):
+ def next_agent(self, agent: AppAgent) -> BasicAgent:
+ """Return to HostAgent when app task complete"""
+ if agent.host:
+ # Update HostAgent's blackboard with results
+ agent.host.blackboard = agent.blackboard
+
+ # Set HostAgent status to continue
+ agent.host.status = AgentStatus.CONTINUE
+
+ return agent.host
+
+ # No host, AppAgent finishes independently
+ return agent
+```
+
+## Best Practices
+
+### State Design Guidelines
+
+**1. Single Responsibility**: Each state should have one clear purpose
+
+- ✅ Good: `ContinueState` (normal execution), `ErrorState` (error handling)
+- ❌ Bad: `ContinueOrErrorState` (mixed responsibilities)
+
+**2. Minimal State Logic**: Keep `handle()` simple, delegate to processor
+
+- ✅ Good: `await processor.process(agent, context)`
+- ❌ Bad: Implementing strategy logic directly in state
+
+**3. Predictable Transitions**: Make `next_state()` deterministic when possible
+
+- ✅ Good: Map status string to state instance
+- ❌ Bad: Complex conditional logic in `next_state()`
+
+**4. Document Invariants**: Clearly state what conditions trigger state
+
+- Example: "PENDING state entered when async operation started"
+
+### Common Pitfalls
+
+!!! warning "Stateful State Classes"
+ States should be stateless (data in agent/context, not state)
+
+ ```python
+ # BAD: Storing state-specific data in state class
+ class ContinueState(AgentState):
+ def __init__(self):
+ self.step_count = 0 # ❌ Don't do this
+
+ # GOOD: Store data in agent or context
+ class ContinueState(AgentState):
+ async def handle(self, agent, context):
+ step_count = context.get("step_count", 0) # ✅ Store in context
+ ```
+
+!!! warning "Tight Coupling"
+ States should not depend on specific processor implementations
+
+ ```python
+ # BAD: Directly calling strategy methods
+ async def handle(self, agent, context):
+ strategy = AppScreenshotCaptureStrategy() # ❌
+ await strategy.execute(agent, context)
+
+ # GOOD: Use processor abstraction
+ async def handle(self, agent, context):
+ await agent.processor.process(agent, context) # ✅
+ ```
+
+## Platform-Specific States
+
+Different agent types may define platform-specific states:
+
+### Windows AppAgent States
+
+```python
+@AppAgentStateManager.register
+class ContinueAppAgentState(AgentState):
+ """Continue state for Windows AppAgent"""
+ # Implements UI Automation-specific logic
+
+@AppAgentStateManager.register
+class ScreenshotAppAgentState(AgentState):
+ """Screenshot state for Windows AppAgent"""
+ # Captures Windows UI tree + screenshot
+```
+
+### Linux Agent States
+
+```python
+@LinuxAgentStateManager.register
+class ContinueLinuxAgentState(AgentState):
+ """Continue state for Linux Agent"""
+ # Implements shell command execution logic
+
+@LinuxAgentStateManager.register
+class FinishLinuxAgentState(AgentState):
+ """Finish state for Linux Agent"""
+ # Terminal state for Linux workflow
+```
+
+While state **names** (CONTINUE, FINISH, etc.) are consistent across platforms, state **implementations** (`handle()` method) differ based on:
+
+- Platform-specific processors (Windows UI Automation vs Linux shell)
+- Platform-specific strategies (screenshot+UI tree vs shell output)
+- Platform-specific MCP tools (Win32 API vs shell commands)
+
+## Integration with Other Layers
+
+The State Layer integrates with other components:
+
+```mermaid
+graph TB
+ subgraph "State Layer (Level-1)"
+ State[AgentState]
+ Manager[AgentStateManager]
+ end
+
+ subgraph "Strategy Layer (Level-2)"
+ Processor[ProcessorTemplate]
+ Strategies[ProcessingStrategies]
+ end
+
+ subgraph "Command Layer (Level-3)"
+ Dispatcher[CommandDispatcher]
+ MCP[MCP Tools]
+ end
+
+ subgraph "Module System"
+ Round[Round Manager]
+ Context[Global Context]
+ end
+
+ State -->|delegates to| Processor
+ Processor -->|executes| Strategies
+ Strategies -->|uses| Dispatcher
+ Dispatcher -->|calls| MCP
+
+ Round -->|orchestrates| State
+ Round -->|provides| Context
+ State -->|reads/writes| Context
+```
+
+| Integration Point | Layer/Component | Relationship |
+|-------------------|-----------------|--------------|
+| **Round Manager** | Module System | Round calls `handle()`, `next_state()`, `next_agent()` |
+| **ProcessorTemplate** | Level-2 Strategy | State delegates execution to processor |
+| **Global Context** | Module System | State reads request, writes results, shares data |
+| **Agent** | Agent System | State accesses agent properties (memory, blackboard, host) |
+
+See [Strategy Layer](processor.md), [Command Layer](command.md), and [Round Documentation](../../modules/round.md) for integration details.
+
+## API Reference
+
+Below is the complete API reference for the State Layer classes:
+
+::: agents.states.basic.AgentState
+::: agents.states.basic.AgentStateManager
+::: agents.states.basic.AgentStatus
+
+## Summary
+
+**Key Takeaways**:
+
+- **Finite State Machine**: State Layer implements FSM with 7 states (CONTINUE, FINISH, FAIL, ERROR, PENDING, CONFIRM, SCREENSHOT)
+- **Singleton Registry**: AgentStateManager provides centralized, lazy-loaded state management
+- **Core Methods**: `handle()` (execute), `next_state()` (FSM transition), `next_agent()` (multi-agent), `is_round_end()`, `is_subtask_end()`, `agent_class()`, `name()`
+- **State Pattern**: Encapsulates state-specific behavior, enables dynamic transitions
+- **Multi-Agent Coordination**: `next_agent()` enables HostAgent ↔ AppAgent delegation
+- **Platform Extensibility**: Same state names, different implementations per platform
+- **Clean Separation**: State controls **when/what**, Processor controls **how**
+
+The State Layer provides the **control structure** for device agent execution, orchestrating the transition between different behavioral modes while delegating actual execution to the Strategy layer.
diff --git a/documents/docs/infrastructure/agents/design/strategy.md b/documents/docs/infrastructure/agents/design/strategy.md
new file mode 100644
index 000000000..b1b5f432c
--- /dev/null
+++ b/documents/docs/infrastructure/agents/design/strategy.md
@@ -0,0 +1,992 @@
+# Processing Strategies
+
+**ProcessingStrategy** classes are the fundamental building blocks of agent execution logic. Each strategy encapsulates a specific unit of work (data collection, LLM reasoning, action execution, memory update) with explicit dependencies and outputs. Strategies are composed by Processors to form complete execution workflows.
+
+## Overview
+
+Processing Strategies implement the **Strategy Pattern**, providing interchangeable algorithms for different aspects of agent behavior. Each strategy:
+
+- Implements a **unified `execute()` interface**
+- Declares **explicit dependencies** (required inputs)
+- Declares **explicit outputs** (provided data)
+- Can be **composed** with other strategies
+- Operates on a **shared ProcessingContext**
+
+```mermaid
+graph TB
+ subgraph "Strategy Ecosystem"
+ Interface[ProcessingStrategy Protocol]
+
+ Base[BaseProcessingStrategy Abstract Base]
+ Composed[ComposedStrategy Multiple Strategies]
+
+ Interface -.implements.-> Base
+ Interface -.implements.-> Composed
+
+ subgraph "Concrete Strategies"
+ DC[DATA_COLLECTION Strategies]
+ LLM[LLM_INTERACTION Strategies]
+ AE[ACTION_EXECUTION Strategies]
+ MU[MEMORY_UPDATE Strategies]
+ end
+
+ Base -.extends.-> DC
+ Base -.extends.-> LLM
+ Base -.extends.-> AE
+ Base -.extends.-> MU
+ end
+
+ Processor[ProcessorTemplate] -->|registers & executes| Interface
+ Context[ProcessingContext] <-->|read/write data| Interface
+```
+
+**Strategy Benefits:**
+
+- **Modularity**: Each strategy does one thing well, can be tested independently
+- **Reusability**: Same strategy can be used across different processors
+- **Composability**: Combine multiple strategies within a phase via `ComposedStrategy`
+- **Extensibility**: Add new strategies without modifying processor framework
+- **Type Safety**: Explicit dependency declarations prevent runtime errors
+
+---
+
+## ProcessingStrategy Interface
+
+All strategies implement the `ProcessingStrategy` protocol:
+
+```python
+from typing import Protocol
+from ufo.agents.agent.basic import BasicAgent
+from ufo.agents.processors.context.processing_context import ProcessingContext, ProcessingResult
+
+class ProcessingStrategy(Protocol):
+ """
+ Protocol for processing strategies.
+
+ All strategies must implement the execute() method and provide
+ a name attribute for logging/debugging.
+ """
+
+ name: str # Strategy identifier for logging
+
+ async def execute(
+ self,
+ agent: BasicAgent,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Execute strategy logic.
+
+ :param agent: The agent instance (access to memory, blackboard, prompter)
+ :param context: Processing context with local/global data
+ :return: ProcessingResult with success status and output data
+ """
+ ...
+```
+
+**Minimal Interface:** The protocol defines only what's essential - a `name` for logging/debugging and an `execute()` method for unified execution.
+
+---
+
+## BaseProcessingStrategy
+
+Most concrete strategies extend `BaseProcessingStrategy`, which provides:
+
+- Dependency declaration and validation
+- Output declaration
+- Error handling infrastructure
+- Logging utilities
+
+```python
+from abc import ABC, abstractmethod
+from typing import List, Optional
+from ufo.agents.processors.strategies.dependency import StrategyDependency
+
+class BaseProcessingStrategy(ABC):
+ """
+ Abstract base class for processing strategies.
+
+ Features:
+ - Dependency declaration via get_dependencies()
+ - Output declaration via get_provides()
+ - Dependency validation
+ - Standardized error handling
+ - Logging integration
+ """
+
+ def __init__(
+ self,
+ name: Optional[str] = None,
+ fail_fast: bool = True
+ ):
+ """
+ Initialize strategy.
+
+ :param name: Strategy name (defaults to class name)
+ :param fail_fast: Raise exception immediately on error vs. return error result
+ """
+ self.name = name or self.__class__.__name__
+ self.fail_fast = fail_fast
+ self.logger = logging.getLogger(f"Strategy.{self.name}")
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ """
+ Declare required dependencies.
+
+ Override to specify what data this strategy needs from context.
+
+ Example:
+ return [
+ StrategyDependency("screenshot", required=True, expected_type=str),
+ StrategyDependency("control_info", required=False, expected_type=str)
+ ]
+
+ :return: List of dependency declarations
+ """
+ return []
+
+ def get_provides(self) -> List[str]:
+ """
+ Declare provided outputs.
+
+ Override to specify what data this strategy writes to context.
+
+ Example:
+ return ["parsed_response", "action", "arguments"]
+
+ :return: List of output field names
+ """
+ return []
+
+ def validate_dependencies(self, context: ProcessingContext) -> List[str]:
+ """
+ Validate that all required dependencies are available in context.
+
+ :param context: Processing context to validate against
+ :return: List of missing required dependency names
+ """
+ missing = []
+ for dependency in self.get_dependencies():
+ value = context.get_local(dependency.field_name)
+ if dependency.required and value is None:
+ missing.append(dependency.field_name)
+ return missing
+
+ def handle_error(
+ self,
+ error: Exception,
+ phase: ProcessingPhase,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Standardized error handling.
+
+ :param error: The exception that occurred
+ :param phase: Processing phase where error occurred
+ :param context: Current processing context
+ :return: ProcessingResult with error information
+ """
+ self.logger.error(f"Strategy {self.name} failed: {error}", exc_info=True)
+
+ if self.fail_fast:
+ raise error
+ else:
+ return ProcessingResult(
+ success=False,
+ data={},
+ error=str(error),
+ phase=phase
+ )
+
+ @abstractmethod
+ async def execute(
+ self,
+ agent: BasicAgent,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Execute strategy logic.
+
+ Subclasses must implement this method.
+
+ :param agent: Agent instance
+ :param context: Processing context
+ :return: ProcessingResult with outputs
+ """
+ pass
+```
+
+**Creating a Concrete Strategy Example:**
+
+```python
+from ufo.agents.processors.strategies.processing_strategy import BaseProcessingStrategy
+from ufo.agents.processors.strategies.strategy_dependency import StrategyDependency
+from ufo.agents.processors.context.processing_context import ProcessingResult, ProcessingPhase
+
+class AppScreenshotCaptureStrategy(BaseProcessingStrategy):
+ """Capture screenshot of Windows application"""
+
+ def __init__(self):
+ super().__init__(name="AppScreenshotCapture")
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ # No dependencies - runs first in DATA_COLLECTION phase
+ return []
+
+ def get_provides(self) -> List[str]:
+ return ["screenshot", "screenshot_path"]
+
+ async def execute(
+ self,
+ agent,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ try:
+ # Capture screenshot
+ screenshot_path = await self._capture_screenshot(agent)
+ screenshot_str = self._encode_image(screenshot_path)
+
+ # Return result with provided data
+ return ProcessingResult(
+ success=True,
+ data={
+ "screenshot": screenshot_str,
+ "screenshot_path": screenshot_path
+ },
+ phase=ProcessingPhase.DATA_COLLECTION
+ )
+ except Exception as e:
+ return self.handle_error(e, ProcessingPhase.DATA_COLLECTION, context)
+
+ async def _capture_screenshot(self, agent):
+ # Platform-specific screenshot logic
+ ...
+
+ def _encode_image(self, path):
+ # Base64 encoding for LLM
+ ...
+```
+
+---
+
+## Strategy Dependency System
+
+The dependency system ensures strategies execute in correct order with required data available.
+
+### StrategyDependency
+
+```python
+from dataclasses import dataclass
+from typing import Optional, Type
+
+@dataclass
+class StrategyDependency:
+ """
+ Represents a data dependency for a strategy.
+
+ :param field_name: Name of required field in ProcessingContext
+ :param required: Whether dependency is mandatory (vs. optional)
+ :param expected_type: Expected Python type (for validation)
+ :param description: Human-readable description
+ """
+ field_name: str
+ required: bool = True
+ expected_type: Optional[Type] = None
+ description: str = ""
+```
+
+### Dependency Declaration
+
+Strategies declare dependencies in two ways:
+
+#### Method 1: Override `get_dependencies()`
+
+```python
+class LLMInteractionStrategy(BaseProcessingStrategy):
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return [
+ StrategyDependency(
+ field_name="screenshot",
+ required=True,
+ expected_type=str,
+ description="Base64-encoded screenshot for LLM visual input"
+ ),
+ StrategyDependency(
+ field_name="control_info",
+ required=True,
+ expected_type=str,
+ description="UI control information from UI Automation"
+ ),
+ StrategyDependency(
+ field_name="request",
+ required=True,
+ expected_type=str,
+ description="User's task request"
+ )
+ ]
+
+ def get_provides(self) -> List[str]:
+ return ["parsed_response", "action", "arguments"]
+```
+
+#### Method 2: Use Decorators
+
+```python
+from ufo.agents.processors.strategies.strategy_dependency import depends_on, provides
+
+@depends_on("screenshot", "control_info", "request")
+@provides("parsed_response", "action", "arguments")
+class LLMInteractionStrategy(BaseProcessingStrategy):
+ async def execute(self, agent, context):
+ # Dependency validation automatic via StrategyDependencyValidator
+ screenshot = context.require_local("screenshot")
+ control_info = context.require_local("control_info")
+ request = context.get_global("REQUEST")
+
+ # ... LLM interaction logic ...
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed,
+ "action": action,
+ "arguments": arguments
+ }
+ )
+```
+
+**Dependency Validation:** The processor validates dependencies before executing each strategy using `StrategyDependencyValidator`:
+
+```python
+# In ProcessorTemplate.process()
+for phase in execution_order:
+ strategy = self.strategies.get(phase)
+ if strategy:
+ # Validate dependencies at runtime
+ self._validate_strategy_dependencies_runtime(strategy, self.processing_context)
+
+ # Execute strategy
+ result = await strategy.execute(agent, self.processing_context)
+```
+
+---
+
+## Four Core Strategy Types
+
+Strategies are organized by **ProcessingPhase**, with four core types:
+
+```mermaid
+graph LR
+ subgraph "Strategy Types by Phase"
+ DC[DATA_COLLECTION Strategies]
+ LLM[LLM_INTERACTION Strategies]
+ AE[ACTION_EXECUTION Strategies]
+ MU[MEMORY_UPDATE Strategies]
+
+ DC -->|provides data| LLM
+ LLM -->|provides decisions| AE
+ AE -->|provides results| MU
+ end
+```
+
+### 1. DATA_COLLECTION Strategies
+
+**Purpose**: Gather contextual information from the device/environment
+
+**Common Implementations**:
+- `AppScreenshotCaptureStrategy`: Capture application screenshot (Windows)
+- `AppControlInfoStrategy`: Extract UI Automation tree (Windows)
+- `LinuxShellOutputStrategy`: Capture shell command output (Linux)
+- `SystemStatusStrategy`: Gather system metrics (CPU, memory, disk)
+
+**Dependencies**: None (typically first in execution chain)
+
+**Provides**: `screenshot`, `control_info`, `observation`, `system_status`
+
+```python
+class AppControlInfoStrategy(BaseProcessingStrategy):
+ """Extract UI Automation tree from Windows application"""
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return [] # No dependencies
+
+ def get_provides(self) -> List[str]:
+ return ["control_info", "control_elements"]
+
+ async def execute(self, agent, context):
+ # Get UI Automation tree via MCP tool
+ command = Command(function="get_ui_tree", arguments={})
+ results = agent.dispatcher.execute_commands([command])
+
+ control_tree = results[0].result
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "control_info": control_tree,
+ "control_elements": self._parse_tree(control_tree)
+ },
+ phase=ProcessingPhase.DATA_COLLECTION
+ )
+```
+
+**Platform Differences:**
+
+- **Windows**: Screenshot + UI Automation tree
+- **Linux**: Screenshot + shell output + accessibility tree (X11/Wayland)
+- **macOS**: Screenshot + Accessibility API tree (future)
+
+### 2. LLM_INTERACTION Strategies
+
+**Purpose**: Construct prompts, call LLM, parse responses
+
+**Common Implementations**:
+- `AppLLMInteractionStrategy`: UI element selection for AppAgent (Windows)
+- `HostLLMInteractionStrategy`: Application selection for HostAgent (Windows)
+- `LinuxLLMInteractionStrategy`: Shell command generation for LinuxAgent
+
+**Dependencies**: `screenshot`, `control_info`, `request`, `memory`
+
+**Provides**: `parsed_response`, `action`, `arguments`, `function_call`
+
+```python
+class AppLLMInteractionStrategy(BaseProcessingStrategy):
+ """LLM reasoning for Windows AppAgent"""
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return [
+ StrategyDependency("screenshot", required=True, expected_type=str),
+ StrategyDependency("control_info", required=True, expected_type=str),
+ StrategyDependency("request", required=True, expected_type=str)
+ ]
+
+ def get_provides(self) -> List[str]:
+ return ["parsed_response", "action", "arguments", "function_call"]
+
+ async def execute(self, agent, context):
+ # 1. Build prompt with screenshot + UI elements
+ prompt = agent.prompter.construct_prompt(
+ screenshot=context.get_local("screenshot"),
+ control_info=context.get_local("control_info"),
+ request=context.get_global("request"),
+ memory=agent.memory.get_latest(5)
+ )
+
+ # 2. Call LLM
+ response = await agent.llm_client.get_response(prompt)
+
+ # 3. Parse JSON response
+ parsed = agent.prompter.parse_response(response)
+
+ # 4. Extract action details
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed,
+ "action": parsed.get("ControlText"),
+ "arguments": parsed.get("Plan"),
+ "function_call": parsed.get("Function")
+ },
+ phase=ProcessingPhase.LLM_INTERACTION
+ )
+```
+
+!!! warning "LLM Response Validation"
+ Always validate and sanitize LLM outputs to prevent errors and security issues:
+
+ ```python
+ # Validate required fields
+ if "Function" not in parsed:
+ raise ProcessingException("LLM response missing 'Function' field")
+
+ # Sanitize dangerous operations
+ if parsed["Function"] == "shell_execute":
+ command = parsed.get("Plan", "")
+ if any(danger in command for danger in ["rm -rf", "del /f /q"]):
+ raise ProcessingException("Dangerous command detected")
+ ```
+
+### 3. ACTION_EXECUTION Strategies
+
+**Purpose**: Execute commands via CommandDispatcher
+
+**Common Implementations**:
+- `AppActionExecutionStrategy`: Execute UI Automation commands (Windows)
+- `HostActionExecutionStrategy`: Launch applications, create AppAgents (Windows)
+- `LinuxActionExecutionStrategy`: Execute shell commands (Linux)
+
+**Dependencies**: `action`, `arguments`, `function_call`, `command_dispatcher`
+
+**Provides**: `results`, `execution_status`, `action_success`
+
+```python
+class AppActionExecutionStrategy(BaseProcessingStrategy):
+ """Execute UI Automation commands for Windows AppAgent"""
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return [
+ StrategyDependency("action", required=True, expected_type=str),
+ StrategyDependency("arguments", required=True, expected_type=dict),
+ StrategyDependency("function_call", required=True, expected_type=str)
+ ]
+
+ def get_provides(self) -> List[str]:
+ return ["results", "execution_status", "action_success"]
+
+ async def execute(self, agent, context):
+ # 1. Build command from LLM output
+ command = Command(
+ function=context.get_local("function_call"),
+ arguments=context.get_local("arguments")
+ )
+
+ # 2. Execute via dispatcher (routes to device client)
+ dispatcher = context.get_global("command_dispatcher")
+ results = await dispatcher.execute_commands([command])
+
+ # 3. Check execution success
+ success = all(r.status == ResultStatus.SUCCESS for r in results)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "results": results,
+ "execution_status": results[0].status,
+ "action_success": success
+ },
+ phase=ProcessingPhase.ACTION_EXECUTION
+ )
+```
+
+See the [Command Layer documentation](command.md) for details on command execution.
+
+### 4. MEMORY_UPDATE Strategies
+
+**Purpose**: Update agent memory and shared blackboard
+
+**Common Implementations**:
+- `AppMemoryUpdateStrategy`: Record UI interactions (Windows AppAgent)
+- `HostMemoryUpdateStrategy`: Record application selections (Windows HostAgent)
+- `LinuxMemoryUpdateStrategy`: Record shell command history (Linux)
+
+**Dependencies**: `action`, `results`, `observation`, `screenshot`
+
+**Provides**: `memory_item`, `updated_blackboard`
+
+```python
+class AppMemoryUpdateStrategy(BaseProcessingStrategy):
+ """Update memory for Windows AppAgent"""
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return [
+ StrategyDependency("action", required=True),
+ StrategyDependency("results", required=True),
+ StrategyDependency("screenshot_path", required=False)
+ ]
+
+ def get_provides(self) -> List[str]:
+ return ["memory_item", "updated_blackboard"]
+
+ async def execute(self, agent, context):
+ # 1. Create memory item for agent's short-term memory
+ memory_item = MemoryItem()
+ memory_item.add_values_from_dict({
+ "step": context.get_global("session_step"),
+ "action": context.get_local("action"),
+ "results": context.get_local("results"),
+ "screenshot": context.get_local("screenshot_path"),
+ "observation": context.get_local("control_info")
+ })
+
+ # 2. Add to agent memory
+ agent.memory.add_memory_item(memory_item)
+
+ # 3. Update blackboard (shared multi-agent memory)
+ if context.get_local("action_success"):
+ agent.blackboard.add_trajectories({
+ "step": context.get_global("session_step"),
+ "action": context.get_local("action"),
+ "status": "success"
+ })
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "memory_item": memory_item,
+ "updated_blackboard": True
+ },
+ phase=ProcessingPhase.MEMORY_UPDATE
+ )
+```
+
+See the [Memory System documentation](memory.md) for details on Memory and Blackboard.
+
+---
+
+## ComposedStrategy
+
+The `ComposedStrategy` class enables **combining multiple strategies** within a single processing phase:
+
+```python
+class ComposedStrategy(BaseProcessingStrategy):
+ """
+ Compose multiple strategies into a single execution flow.
+
+ Features:
+ - Sequential execution of component strategies
+ - Aggregated dependency/provides metadata
+ - Flexible error handling (fail-fast or continue-on-error)
+ - Shared processing context across components
+ """
+
+ def __init__(
+ self,
+ strategies: List[BaseProcessingStrategy],
+ name: str = "",
+ fail_fast: bool = True,
+ phase: ProcessingPhase = ProcessingPhase.DATA_COLLECTION
+ ):
+ """
+ Initialize composed strategy.
+
+ :param strategies: List of strategies to execute sequentially
+ :param name: Composed strategy name
+ :param fail_fast: Stop on first error vs. continue execution
+ :param phase: Processing phase this composition belongs to
+ """
+ super().__init__(name=name or "ComposedStrategy", fail_fast=fail_fast)
+
+ if not strategies:
+ raise ValueError("ComposedStrategy requires at least one strategy")
+
+ self.strategies = strategies
+ self.execution_phase = phase
+
+ # Collect metadata from component strategies
+ self._collect_metadata()
+
+ def _collect_metadata(self):
+ """Aggregate dependencies and provides from component strategies"""
+ all_deps = []
+ all_provides = set()
+
+ for strategy in self.strategies:
+ all_deps.extend(strategy.get_dependencies())
+ all_provides.update(strategy.get_provides())
+
+ # Remove internal dependencies (provided by earlier strategies in composition)
+ external_deps = [
+ dep for dep in all_deps
+ if dep.field_name not in all_provides
+ ]
+
+ self._dependencies = external_deps
+ self._provides = list(all_provides)
+
+ def get_dependencies(self) -> List[StrategyDependency]:
+ return self._dependencies
+
+ def get_provides(self) -> List[str]:
+ return self._provides
+
+ async def execute(
+ self,
+ agent: BasicAgent,
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute all component strategies sequentially"""
+ combined_data = {}
+
+ for i, strategy in enumerate(self.strategies):
+ self.logger.debug(
+ f"Executing component strategy {i+1}/{len(self.strategies)}: {strategy.name}"
+ )
+
+ # Execute component strategy
+ result = await strategy.execute(agent, context)
+
+ if result.success:
+ # Update context for next strategy
+ context.update_local(result.data)
+ combined_data.update(result.data)
+ else:
+ # Handle failure
+ self.logger.error(f"Component strategy {strategy.name} failed: {result.error}")
+
+ if self.fail_fast:
+ return result # Propagate failure immediately
+ else:
+ # Continue with remaining strategies
+ self.logger.warning(f"Continuing despite failure in {strategy.name}")
+
+ return ProcessingResult(
+ success=True,
+ data=combined_data,
+ phase=self.execution_phase
+ )
+```
+
+### Using ComposedStrategy
+
+```mermaid
+graph TB
+ subgraph "DATA_COLLECTION Phase"
+ Composed[ComposedStrategy]
+
+ S1[ScreenshotStrategy]
+ S2[UITreeStrategy]
+ S3[SystemStatusStrategy]
+
+ Composed -->|1. execute| S1
+ Composed -->|2. execute| S2
+ Composed -->|3. execute| S3
+
+ S1 -->|provides: screenshot| Context[ProcessingContext]
+ S2 -->|provides: control_info| Context
+ S3 -->|provides: system_status| Context
+ end
+```
+
+**Composing DATA_COLLECTION Strategies Example:**
+
+```python
+# In AppAgentProcessor._setup_strategies()
+
+# Compose multiple data collection strategies
+data_collection = ComposedStrategy(
+ strategies=[
+ AppScreenshotCaptureStrategy(),
+ AppControlInfoStrategy(),
+ SystemStatusStrategy()
+ ],
+ name="AppDataCollection",
+ fail_fast=False, # Continue even if SystemStatus fails (optional)
+ phase=ProcessingPhase.DATA_COLLECTION
+)
+
+# Register composed strategy
+self.strategies[ProcessingPhase.DATA_COLLECTION] = data_collection
+```
+
+**Composition Benefits:**
+
+- **Modularity**: Build complex workflows from simple, testable components
+- **Reusability**: Mix and match strategies across different processors
+- **Flexibility**: Easily reorder or replace component strategies
+- **Error Handling**: Choose fail-fast or continue-on-error per composition
+- **Metadata Aggregation**: Dependencies and provides automatically computed
+
+---
+
+## Best Practices
+
+### Strategy Design Guidelines
+
+**1. Single Responsibility:** Each strategy should do one thing well.
+
+- ✅ Good: `ScreenshotCaptureStrategy` (captures screenshot)
+- ❌ Bad: `ScreenshotAndLLMStrategy` (mixed concerns)
+
+**2. Explicit Dependencies:** Always declare what you need.
+
+```python
+def get_dependencies(self) -> List[StrategyDependency]:
+ return [
+ StrategyDependency("screenshot", required=True),
+ StrategyDependency("system_status", required=False) # Optional
+ ]
+```
+
+**3. Clear Outputs:** Document what you provide.
+
+```python
+def get_provides(self) -> List[str]:
+ return ["parsed_response", "action", "arguments"]
+```
+
+**4. Appropriate Error Handling:**
+
+- Use `fail_fast=True` for critical strategies (LLM_INTERACTION, ACTION_EXECUTION)
+- Use `fail_fast=False` for optional strategies (system metrics, logging)
+
+**5. Platform Agnostic:** Strategies shouldn't assume specific agent types.
+
+```python
+# ❌ BAD: Type-checking agent
+async def execute(self, agent, context):
+ if isinstance(agent, AppAgent): # Tight coupling
+ ...
+
+# ✅ GOOD: Use context data
+async def execute(self, agent, context):
+ control_info = context.require_local("control_info") # Generic
+ ...
+```
+
+### Common Pitfalls
+
+!!! warning "Pitfall 1: Stateful Strategies"
+ Strategies should be stateless (no instance variables modified during execution):
+
+ ```python
+ # ❌ BAD: Stateful
+ class BadStrategy(BaseProcessingStrategy):
+ def __init__(self):
+ super().__init__()
+ self.counter = 0 # State
+
+ async def execute(self, agent, context):
+ self.counter += 1 # Modifying state
+ ...
+
+ # ✅ GOOD: Stateless
+ class GoodStrategy(BaseProcessingStrategy):
+ async def execute(self, agent, context):
+ counter = context.get_local("counter", 0) # Read from context
+ context.update_local({"counter": counter + 1}) # Write to context
+ ...
+ ```
+
+!!! warning "Pitfall 2: Hidden Dependencies"
+ Don't access context data without declaring dependencies:
+
+ ```python
+ # ❌ BAD: Hidden dependency
+ class BadStrategy(BaseProcessingStrategy):
+ def get_dependencies(self):
+ return [] # Claims no dependencies
+
+ async def execute(self, agent, context):
+ screenshot = context.get_local("screenshot") # But uses screenshot!
+ ...
+
+ # ✅ GOOD: Explicit dependency
+ class GoodStrategy(BaseProcessingStrategy):
+ def get_dependencies(self):
+ return [StrategyDependency("screenshot", required=True)]
+
+ async def execute(self, agent, context):
+ screenshot = context.require_local("screenshot")
+ ...
+ ```
+
+!!! warning "Pitfall 3: Side Effects"
+ Strategies shouldn't modify global state or agent attributes directly:
+
+ ```python
+ # ❌ BAD: Side effects
+ async def execute(self, agent, context):
+ agent.custom_attribute = "value" # Modifying agent
+ global_config["setting"] = "new" # Modifying global
+ ...
+
+ # ✅ GOOD: Update through proper channels
+ async def execute(self, agent, context):
+ context.update_local({"custom_value": "value"}) # Context
+ agent.memory.add_memory_item(...) # Memory system
+ ...
+ ```
+
+---
+
+## Integration with Processor
+
+Strategies are **registered and executed** by ProcessorTemplate:
+
+```mermaid
+sequenceDiagram
+ participant Processor as ProcessorTemplate
+ participant Strategy as ProcessingStrategy
+ participant Context as ProcessingContext
+
+ Note over Processor: Initialization
+ Processor->>Processor: _setup_strategies()
+ Processor->>Processor: Register strategies by phase
+
+ Note over Processor: Execution
+ Processor->>Strategy: validate_dependencies(context)
+ Strategy-->>Processor: [] (no missing deps)
+
+ Processor->>Strategy: execute(agent, context)
+ Strategy->>Context: get_local("screenshot")
+ Context-->>Strategy: screenshot data
+
+ Strategy->>Strategy: Process data
+
+ Strategy->>Context: update_local({"parsed_response": ...})
+ Strategy-->>Processor: ProcessingResult(success=True, data={...})
+
+ Processor->>Context: Update with strategy outputs
+```
+
+**See [Processor Documentation](processor.md) for details on how processors orchestrate strategies.**
+
+---
+
+## Platform-Specific Strategies
+
+Different platforms implement platform-specific strategies while following the same interface:
+
+| Platform | DATA_COLLECTION | LLM_INTERACTION | ACTION_EXECUTION | MEMORY_UPDATE |
+|----------|-----------------|-----------------|------------------|---------------|
+| **Windows AppAgent** | Screenshot + UI tree | UI element selection | UI Automation commands | UI interaction history |
+| **Windows HostAgent** | Desktop screenshot + app list | Application selection | Launch app, create AppAgent | App selection history |
+| **Linux** | Screenshot + shell output | Shell command generation | Shell command execution | Command history |
+| **macOS** (future) | Screenshot + Accessibility tree | Accessibility element selection | Accessibility API commands | Interaction history |
+
+!!! example "Platform-Specific Implementation"
+ ```python
+ # Windows AppAgent
+ class AppAgentProcessor(ProcessorTemplate):
+ def _setup_strategies(self):
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ AppScreenshotCaptureStrategy(), # Windows-specific
+ AppControlInfoStrategy() # UI Automation specific
+ ])
+
+ # Linux Agent
+ class LinuxAgentProcessor(ProcessorTemplate):
+ def _setup_strategies(self):
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ CustomizedScreenshotCaptureStrategy(), # Linux-specific
+ ShellOutputStrategy() # Shell-specific
+ ])
+ ```
+
+---
+
+## Related Documentation
+
+- [Strategy Layer - Processor](processor.md): How ProcessorTemplate orchestrates strategies
+- [Command Layer](command.md): How ACTION_EXECUTION strategies dispatch commands
+- [Memory System](memory.md): How MEMORY_UPDATE strategies use Memory and Blackboard
+- [State Layer](state.md): How AgentState delegates to Processor
+- [Agent Types](../agent_types.md): Platform-specific strategy implementations
+
+---
+
+## API Reference
+
+The following classes are documented via docstrings:
+
+- `ProcessingStrategy`: Protocol defining strategy interface
+- `BaseProcessingStrategy`: Abstract base class for strategies
+- `ComposedStrategy`: Compose multiple strategies within a phase
+- `StrategyDependency`: Dependency declaration dataclass
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **ProcessingStrategy**: Unified interface with `execute()` method
+- **BaseProcessingStrategy**: Abstract base with dependency management and error handling
+- **Four Strategy Types**: DATA_COLLECTION, LLM_INTERACTION, ACTION_EXECUTION, MEMORY_UPDATE
+- **Dependency System**: Explicit declarations ensure correct execution order via `StrategyDependency`
+- **ComposedStrategy**: Combine multiple strategies within a phase
+- **Platform Agnostic**: Same interface, platform-specific implementations
+- **Modular & Reusable**: Strategies can be mixed, matched, and tested independently
+- **Processor Integration**: Strategies are registered and orchestrated by ProcessorTemplate
+
+Processing Strategies are the fundamental building blocks of agent execution logic, providing modularity, reusability, and extensibility across diverse platforms and task requirements.
diff --git a/documents/docs/infrastructure/agents/overview.md b/documents/docs/infrastructure/agents/overview.md
new file mode 100644
index 000000000..7d2f0035f
--- /dev/null
+++ b/documents/docs/infrastructure/agents/overview.md
@@ -0,0 +1,812 @@
+# Device Agent Architecture
+
+Device Agents are the execution engines of UFO3's multi-device orchestration system. Each device agent operates as an autonomous, intelligent controller that translates high-level user intentions into low-level system commands. The architecture is designed for **extensibility**, **safety**, and **scalability** across heterogeneous computing environments.
+
+## Overview
+
+UFO3 orchestrates tasks across multiple devices through a network of **Device Agents**. Originally designed as a Windows automation framework (UFO2), the architecture has evolved to support diverse platforms including Linux, macOS, and embedded systems. This document describes the abstract design principles and interfaces that enable this multi-platform capability.
+
+**Key Capabilities:**
+
+- **Multi-Platform**: Windows agents (HostAgent, AppAgent), Linux agent, extensible to macOS and embedded systems
+- **Safe Execution**: Server-client separation isolates reasoning from system-level operations
+- **Scalable Architecture**: Hierarchical agent coordination supports complex cross-device workflows
+- **LLM-Driven Reasoning**: Dynamic decision-making using large language models
+- **Modular Design**: Three-layer architecture (State, Strategy, Command) enables customization
+
+---
+
+## Three-Layer Architecture
+
+Device agents implement a **three-layer framework** that separates concerns, promotes modularity, and enables extensibility:
+
+```mermaid
+graph TB
+ subgraph "Device Agent Architecture"
+ subgraph "Level-1: State Layer (FSM)"
+ S1[AgentState]
+ S2[State Machine]
+ S3[State Transitions]
+ S1 --> S2 --> S3
+ end
+
+ subgraph "Level-2: Strategy Layer (Execution Logic)"
+ P1[ProcessorTemplate Strategy Orchestrator]
+ P2[DATA_COLLECTION Strategies]
+ P3[LLM_INTERACTION Strategies]
+ P4[ACTION_EXECUTION Strategies]
+ P5[MEMORY_UPDATE Strategies]
+ P1 -->|manages & executes| P2
+ P2 --> P3 --> P4 --> P5
+ end
+
+ subgraph "Level-3: Command Layer (System Interface)"
+ C1[CommandDispatcher]
+ C2[MCP Tools]
+ C3[Atomic Commands]
+ C1 --> C2 --> C3
+ end
+
+ S3 -->|delegates to| P1
+ P5 -->|executes via| C1
+ end
+
+ LLM[Large Language Model]
+ P3 -.->|reasoning| LLM
+ LLM -.->|decisions| P4
+```
+
+### Layer Responsibilities
+
+| Layer | Level | Responsibility | Key Components | Extensibility |
+|-------|-------|----------------|----------------|---------------|
+| **State** | Level-1 | Finite State Machine governing agent lifecycle | `AgentState`, `AgentStateManager`, `AgentStatus` | Register new states via `@AgentStateManager.register` |
+| **Strategy** | Level-2 | Execution logic layer: processor manages sequence of modular strategies | `ProcessorTemplate`, `ProcessingStrategy`, `ProcessingPhase`, `Middleware` | Compose custom strategies via `ComposedStrategy`, add middleware |
+| **Command** | Level-3 | Atomic system operations mapped to MCP tools | `BasicCommandDispatcher`, `Command`, MCP integration | Add new tools via client-side MCP server registration |
+
+**Design Rationale:**
+
+The three-layer separation ensures:
+
+- **State Layer (Level-1)**: Controls *when* and *what* to execute (state transitions, agent handoff)
+- **Strategy Layer (Level-2)**: Defines *how* to execute (processor orchestrates modular strategies)
+- **Command Layer (Level-3)**: Performs *actual* execution (deterministic system operations)
+
+This separation allows replacing individual layers without affecting others.
+
+---
+
+## Level-1: State Layer (FSM)
+
+The **State Layer** implements a Finite State Machine (FSM) that governs the agent's execution lifecycle. Each state encapsulates:
+
+- A **processor** (strategy execution logic)
+- **Transition rules** (to next state)
+- **Agent handoff logic** (for multi-agent workflows)
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE
+ CONTINUE --> CONTINUE: Success
+ CONTINUE --> PENDING: Wait for external event
+ CONTINUE --> CONFIRM: User confirmation needed
+ CONTINUE --> SCREENSHOT: Capture observation
+ CONTINUE --> FINISH: Task complete
+ CONTINUE --> FAIL: Error occurred
+ CONTINUE --> ERROR: Critical failure
+
+ PENDING --> CONTINUE: Event received
+ CONFIRM --> CONTINUE: User confirmed
+ CONFIRM --> FAIL: User rejected
+ SCREENSHOT --> CONTINUE: Screenshot captured
+
+ FINISH --> [*]
+ FAIL --> [*]
+ ERROR --> [*]
+```
+
+### AgentStatus Enum
+
+```python
+class AgentStatus(Enum):
+ """Agent status enumeration"""
+ ERROR = "ERROR" # Critical error occurred
+ FINISH = "FINISH" # Task completed successfully
+ CONTINUE = "CONTINUE" # Normal execution
+ FAIL = "FAIL" # Task failed
+ PENDING = "PENDING" # Waiting for external event
+ CONFIRM = "CONFIRM" # Awaiting user confirmation
+ SCREENSHOT = "SCREENSHOT" # Screenshot capture needed
+```
+
+**State Registration:**
+
+New states can be registered dynamically using the `@AgentStateManager.register` decorator:
+
+```python
+@AgentStateManager.register
+class CustomState(AgentState):
+ async def handle(self, agent, context):
+ # Custom state logic
+ pass
+
+ def next_state(self, agent):
+ return AgentStateManager.get_state("CONTINUE")
+```
+
+**See [State Layer Documentation](design/state.md) for complete details.**
+
+---
+
+## Level-2: Strategy Layer (Execution Logic)
+
+The **Strategy Layer** implements the execution logic within each state. Each state encapsulates a **processor** that manages a sequence of **strategies** to implement step-level workflow. This layer consists of two key components:
+
+### Processor: Strategy Orchestrator
+
+The **ProcessorTemplate** orchestrates the execution of strategies:
+
+- **Registers Strategies**: Configures which strategies execute in each phase
+- **Manages Middleware**: Wraps strategy execution with logging, metrics, error handling
+- **Validates Dependencies**: Ensures strategies have required data before execution
+- **Controls Execution**: Sequences strategies through fixed workflow phases
+
+```mermaid
+graph TB
+ State[AgentState] -->|encapsulates| Processor[ProcessorTemplate Strategy Orchestrator]
+
+ Processor -->|1. Register| Strategies[ProcessingStrategies]
+ Processor -->|2. Wrap| Middleware[Middleware Chain]
+ Processor -->|3. Validate| Dependencies[Strategy Dependencies]
+ Processor -->|4. Execute| Workflow[Workflow Phases]
+
+ Workflow --> DC[DATA_COLLECTION]
+ Workflow --> LLM[LLM_INTERACTION]
+ Workflow --> AE[ACTION_EXECUTION]
+ Workflow --> MU[MEMORY_UPDATE]
+
+ DC & LLM & AE & MU -.->|implements| Strategies
+```
+
+**Processor and Strategy Relationship:**
+
+- **Processor**: Framework that manages the sequence of strategies
+- **Strategy**: Modular, reusable execution units
+
+Together they form **Level-2: Strategy Layer**, which handles:
+- Data collection and environment inspection
+- Prompt construction and LLM reasoning
+- Action planning and tool invocation
+- Memory updates and context synchronization
+
+### Strategy: Modular Execution Units
+
+**ProcessingStrategies** are modular execution units with a unified `execute()` interface:
+
+```mermaid
+graph LR
+ A[DATA_COLLECTION] --> B[LLM_INTERACTION]
+ B --> C[ACTION_EXECUTION]
+ C --> D[MEMORY_UPDATE]
+
+ A1[Screenshots UI Info System Status] --> A
+ B1[Prompt Construction LLM Call Response Parsing] --> B
+ C1[Command Dispatch MCP Execution Result Handling] --> C
+ D1[Memory Items Blackboard Update Context Sync] --> D
+```
+
+### Four Core Strategy Types
+
+| Strategy Type | ProcessingPhase | Purpose | Examples |
+|---------------|-----------------|---------|----------|
+| **DATA_COLLECTION** | `data_collection` | Gather contextual information | Screenshot capture, UI tree extraction, system info |
+| **LLM_INTERACTION** | `llm_interaction` | Construct prompts, interact with LLM, parse responses | Prompt building, LLM reasoning, JSON parsing |
+| **ACTION_EXECUTION** | `action_execution` | Execute commands from LLM/toolkits | Click, type, scroll, API calls |
+| **MEMORY_UPDATE** | `memory_update` | Update short-term/long-term memory | Add memory items, update blackboard, sync context |
+
+**Strategy Layer Configuration Example:**
+
+Each state configures its processor with strategies and middleware:
+
+```python
+class AppAgentProcessor(ProcessorTemplate):
+ def _setup_strategies(self):
+ # Register strategies for each phase
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
+ AppScreenshotCaptureStrategy(),
+ AppControlInfoStrategy()
+ ])
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = AppLLMInteractionStrategy()
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = AppActionExecutionStrategy()
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = AppMemoryUpdateStrategy()
+
+ def _setup_middleware(self):
+ # Add middleware for logging, metrics, error handling
+ self.middleware_chain = [
+ LoggingMiddleware(),
+ PerformanceMetricsMiddleware(),
+ ErrorHandlingMiddleware()
+ ]
+```
+
+**See [Processor Documentation](design/processor.md) and [Strategy Documentation](design/strategy.md) for complete details.**
+
+---
+
+## Level-3: Command Layer (System Interface)
+
+The **Command Layer** provides atomic, deterministic system operations. Each command maps to an **MCP tool** that executes on the device client.
+
+```mermaid
+sequenceDiagram
+ participant Agent as Device Agent (Server)
+ participant Dispatcher as CommandDispatcher
+ participant Protocol as AIP Protocol
+ participant Client as Device Client
+ participant MCP as MCP Tool
+
+ Agent->>Dispatcher: execute_commands([command1, command2])
+ Dispatcher->>Protocol: Send ServerMessage (COMMAND)
+ Protocol->>Client: WebSocket (AIP)
+ Client->>MCP: Route to MCP server
+ MCP->>MCP: Execute tool function
+ MCP->>Client: Return result
+ Client->>Protocol: Send ClientMessage (RESULT)
+ Protocol->>Dispatcher: Receive results
+ Dispatcher->>Agent: Return List[Result]
+```
+
+### Command Structure
+
+```python
+@dataclass
+class Command:
+ """Atomic command to be executed on device client"""
+ tool_name: str # MCP tool name (e.g., "click_element")
+ parameters: Dict[str, Any] # Tool arguments
+ tool_type: str # "data_collection" or "action"
+ call_id: str # Unique identifier
+```
+
+!!! warning "Deterministic Execution"
+ Commands are designed to be:
+
+ - **Atomic**: Single, indivisible operation
+ - **Deterministic**: Same inputs → same outputs
+ - **Auditable**: Full command history logged
+ - **Reversible**: Where possible, support undo operations
+
+**Extensibility:**
+
+New commands can be added by:
+
+1. Registering MCP tool on device client
+2. LLM dynamically selects tool from available MCP registry
+3. No server-side code changes required
+
+**See [Command Layer Documentation](design/command.md) for complete details.**
+
+---
+
+## Server-Client Architecture
+
+Device agents use a **server-client separation** to balance safety, scalability, and functionality:
+
+```mermaid
+graph TB
+ subgraph "Server Side (UFO3 Orchestrator)"
+ Server[Device Agent Server]
+ State[State Machine]
+ Processor[Strategy Processor]
+ LLM[LLM Service]
+ Memory[Memory & Context]
+
+ Server --> State
+ Server --> Processor
+ Server --> Memory
+ Processor -.-> LLM
+ end
+
+ subgraph "Communication Layer"
+ AIP[AIP Protocol WebSocket]
+ end
+
+ subgraph "Client Side (Device)"
+ Client[Device Client]
+ Dispatcher[Command Dispatcher]
+ MCP[MCP Server Manager]
+ Tools[MCP Tools]
+ OS[Operating System]
+
+ Client --> Dispatcher
+ Dispatcher --> MCP
+ MCP --> Tools
+ Tools --> OS
+ end
+
+ Server <-->|Commands/Results| AIP
+ AIP <-->|Commands/Results| Client
+```
+
+### Separation of Concerns
+
+| Component | Location | Responsibilities | Security Boundary |
+|-----------|----------|------------------|-------------------|
+| **Agent Server** | Orchestrator | State management, reasoning, planning, memory | Untrusted (LLM-driven decisions) |
+| **Device Client** | Device | Command execution, MCP tool calls, resource access | Trusted (validated operations) |
+| **AIP Protocol** | Communication | Message serialization, WebSocket transport, error handling | Secure channel (authentication, encryption) |
+
+**Why Server-Client Separation?**
+
+**Safety**: Isolates potentially unsafe LLM-generated decisions from direct system access. Clients validate all commands before execution.
+
+**Scalability**: Single orchestrator server manages multiple device clients. Reduces per-device resource requirements.
+
+**Flexibility**: Device clients can run on resource-constrained devices (embedded systems, mobile) while heavy reasoning occurs on server.
+
+**See [Server-Client Architecture](server_client_architecture.md) for complete details.**
+
+---
+
+## Supported Device Platforms
+
+UFO3 currently supports **Windows** and **Linux** device agents, with architecture designed for extensibility to other platforms.
+
+### Windows Agents
+
+```mermaid
+graph TB
+ subgraph "Windows Device (Two-Tier Hierarchy)"
+ Host[HostAgent Application Selection]
+ App1[AppAgent Word]
+ App2[AppAgent Excel]
+ App3[AppAgent Browser]
+
+ Host -->|delegates| App1
+ Host -->|delegates| App2
+ Host -->|delegates| App3
+ end
+
+ User[User Request] --> Host
+```
+
+**HostAgent** (Application-Level Coordinator):
+- Selects appropriate application(s) for user request
+- Decomposes tasks into application-specific subtasks
+- Coordinates multiple AppAgents
+- Manages application switching and data transfer
+
+**AppAgent** (Application-Level Executor):
+- Controls specific Windows application (Word, Excel, browser, etc.)
+- Uses UI Automation for control element discovery
+- Executes application-specific actions (type, click, scroll)
+- Maintains application context and memory
+
+!!! example "Windows Agent Example"
+ **User Request**: "Create a chart from sales.xlsx and insert into report.docx"
+
+ 1. **HostAgent** decomposes:
+ - Open Excel → Create chart → Copy chart
+ - Open Word → Paste chart
+ 2. **AppAgent (Excel)**: Opens `sales.xlsx`, creates chart, copies to clipboard
+ 3. **AppAgent (Word)**: Opens `report.docx`, pastes chart at cursor
+
+### Linux Agent
+
+```mermaid
+graph TB
+ subgraph "Linux Device (Single-Tier Architecture)"
+ Linux[LinuxAgent Direct System Control]
+ Shell[Shell Commands]
+ Files[File Operations]
+ Apps[Application Launch]
+
+ Linux --> Shell
+ Linux --> Files
+ Linux --> Apps
+ end
+
+ User[User Request] --> Linux
+```
+
+**LinuxAgent** (System-Level Executor):
+- Direct shell command execution
+- File system operations
+- Application launch and management
+- Single-tier architecture (no application-level hierarchy)
+
+!!! info "Architecture Difference"
+ **Windows** uses two-tier hierarchy (HostAgent → AppAgent) due to:
+
+ - UI Automation framework's application-centric model
+ - Distinct application contexts requiring specialized agents
+
+ **Linux** uses single-tier architecture because:
+
+ - Shell provides unified interface to all system operations
+ - Application control occurs through same command-line interface
+
+### Platform Comparison
+
+| Feature | Windows (UFO2) | Linux | macOS (Future) | Embedded (Future) |
+|---------|----------------|-------|----------------|-------------------|
+| **Agent Hierarchy** | Two-tier (Host → App) | Single-tier | TBD | Single-tier |
+| **UI Control** | UI Automation | X11/Wayland | Accessibility API | Platform-specific |
+| **Command Interface** | MCP tools (Win32 API) | MCP tools (Shell) | MCP tools (AppleScript) | MCP tools (Custom) |
+| **Observation** | Screenshot + UI tree | Screenshot + Shell output | Screenshot + UI tree | Sensor data |
+| **State Management** | Shared FSM | Shared FSM | Shared FSM | Shared FSM |
+| **Strategy Layer** | Processor framework | Processor framework | Processor framework | Processor framework |
+| **Current Status** | ✅ Production | ✅ Production | 🔜 Planned | 🔜 Planned |
+
+**Extensibility Path:**
+
+Adding a new platform requires:
+
+1. **Implement Agent Class**: Extend `BasicAgent` (inherit State layer, Processor framework)
+2. **Create Processor**: Subclass `ProcessorTemplate`, implement platform-specific strategies
+3. **Define MCP Tools**: Register platform-specific MCP tools on device client
+4. **Register Agent**: Use `@AgentRegistry.register` decorator
+
+No changes to core State layer, Processor framework, or AIP protocol required.
+
+**See [Agent Types Documentation](agent_types.md) for complete implementation details.**
+
+---
+
+## Agent Lifecycle
+
+A typical device agent execution follows this lifecycle:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Orchestrator
+ participant Agent
+ participant State
+ participant Processor
+ participant LLM
+ participant Dispatcher
+ participant Client
+
+ User->>Orchestrator: Submit task
+ Orchestrator->>Agent: Initialize agent (CONTINUE state)
+
+ loop Until FINISH/FAIL/ERROR
+ Agent->>State: handle(agent, context)
+ State->>Processor: execute strategies
+
+ Processor->>Processor: DATA_COLLECTION
+ Note over Processor: Screenshot, UI info
+
+ Processor->>LLM: LLM_INTERACTION
+ LLM-->>Processor: Action decision
+
+ Processor->>Dispatcher: ACTION_EXECUTION
+ Dispatcher->>Client: Execute commands
+ Client-->>Dispatcher: Results
+ Dispatcher-->>Processor: Results
+
+ Processor->>Processor: MEMORY_UPDATE
+ Note over Processor: Update memory, blackboard
+
+ State->>State: next_state(agent)
+ State->>Agent: Update agent status
+ end
+
+ Agent->>Orchestrator: Task complete/failed
+ Orchestrator->>User: Return result
+```
+
+### Execution Phases
+
+1. **Initialization**: Agent created with default state (`CONTINUE`), processor, memory
+2. **State Handling**: Current state's `handle()` method invoked with agent and context
+3. **Strategy Execution**: Processor runs strategies in sequence (DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE)
+4. **State Transition**: State's `next_state()` determines next FSM state
+5. **Repeat/Terminate**: Loop continues until terminal state (`FINISH`, `FAIL`, `ERROR`)
+
+!!! tip "Multi-Agent Handoff"
+ For multi-agent scenarios (e.g., Windows HostAgent → AppAgent), states implement `next_agent()`:
+
+ ```python
+ def next_agent(self, agent: BasicAgent) -> BasicAgent:
+ # HostAgent delegates to AppAgent
+ if agent.status == "DELEGATE_TO_APP":
+ return agent.create_app_agent(...)
+ return agent
+ ```
+
+---
+
+## Memory and Context Management
+
+Device agents maintain two types of memory:
+
+### Short-Term Memory (Agent Memory)
+
+**Purpose**: Track agent's execution history within a session
+
+**Implementation**: `Memory` class with `MemoryItem` entries
+
+```python
+class Memory:
+ """Agent's short-term memory"""
+ _content: List[MemoryItem]
+
+ def add_memory_item(self, memory_item: MemoryItem):
+ """Add new memory entry"""
+ self._content.append(memory_item)
+```
+
+**Content**: Actions taken, observations made, results received
+
+**Lifetime**: Single session (cleared between tasks)
+
+### Long-Term Memory (Blackboard)
+
+**Purpose**: Share information across agents and sessions
+
+**Implementation**: `Blackboard` class with multiple memory types
+
+```python
+class Blackboard:
+ """Multi-agent shared memory"""
+ _questions: Memory # Q&A history
+ _requests: Memory # Request history
+ _trajectories: Memory # Action trajectories
+ _screenshots: Memory # Visual observations
+```
+
+**Content**: Common knowledge, successful action patterns, user preferences
+
+**Lifetime**: Persistent across sessions (can be saved/loaded)
+
+**Blackboard Usage Example:**
+
+**Scenario**: HostAgent delegates to AppAgent (Excel)
+
+1. HostAgent adds to blackboard:
+ - Request: "Create sales chart"
+ - Context: Previous analysis results
+2. AppAgent reads from blackboard:
+ - Retrieves request and context
+ - Adds action trajectories as executed
+ - Adds screenshot after chart creation
+3. HostAgent reads updated blackboard:
+ - Verifies chart creation
+ - Continues to next step (insert to Word)
+
+**See [Memory System Documentation](design/memory.md) for complete details.**
+
+---
+
+## Integration with UFO3 Components
+
+Device agents integrate with other UFO3 components:
+
+```mermaid
+graph TB
+ subgraph "UFO3 Architecture"
+ Session[Session/Round Manager]
+ Context[Global Context]
+ Agent[Device Agent]
+ Dispatcher[Command Dispatcher]
+ AIP[AIP Protocol]
+ Client[Device Client]
+ MCP[MCP Servers]
+ end
+
+ Session -->|manages lifecycle| Agent
+ Session -->|provides| Context
+ Agent -->|reads/writes| Context
+ Agent -->|sends commands| Dispatcher
+ Dispatcher -->|uses| AIP
+ AIP <-->|WebSocket| Client
+ Client -->|calls| MCP
+```
+
+### Integration Points
+
+| Component | Relationship | Description |
+|-----------|--------------|-------------|
+| **Session Manager** | Parent | Creates agents, manages agent lifecycle, coordinates multi-agent workflows |
+| **Round Manager** | Sibling | Manages round-based execution, tracks round state, synchronizes with agent steps |
+| **Global Context** | Shared State | Agent reads request/config, writes results/status, shares data across components |
+| **Command Dispatcher** | Execution Interface | Agent sends commands, dispatcher routes to client, returns results |
+| **AIP Protocol** | Communication | Serializes commands/results, manages WebSocket, handles errors/timeouts |
+| **Device Client** | Executor | Receives commands, invokes MCP tools, returns results |
+| **MCP Servers** | Tool Registry | Provides available tools, executes tool functions, returns structured results |
+
+**See [Session Documentation](../modules/session.md), [Context Documentation](../modules/context.md), and [AIP Protocol](../../aip/overview.md) for integration details.**
+
+---
+
+## Design Patterns
+
+Device agent architecture leverages several design patterns:
+
+### 1. State Pattern (FSM Layer)
+
+**Purpose**: Encapsulate state-specific behavior, enable dynamic state transitions
+
+**Implementation**: `AgentState` abstract class, concrete state classes
+
+```python
+class AgentState(ABC):
+ @abstractmethod
+ async def handle(self, agent, context):
+ """Execute state-specific logic"""
+ pass
+
+ @abstractmethod
+ def next_state(self, agent):
+ """Determine next state"""
+ pass
+```
+
+### 2. Strategy Pattern (Strategy Layer)
+
+**Purpose**: Define family of algorithms (strategies), make them interchangeable
+
+**Implementation**: `ProcessingStrategy` protocol, concrete strategy classes
+
+```python
+class ProcessingStrategy(Protocol):
+ async def execute(self, agent, context) -> ProcessingResult:
+ """Execute strategy logic"""
+ pass
+```
+
+### 3. Template Method Pattern (Processor Framework)
+
+**Purpose**: Define skeleton of algorithm, let subclasses override specific steps
+
+**Implementation**: `ProcessorTemplate` abstract class
+
+```python
+class ProcessorTemplate(ABC):
+ @abstractmethod
+ def _setup_strategies(self):
+ """Subclass defines which strategies to use"""
+ pass
+
+ async def process(self, agent, context):
+ """Template method - runs strategies in sequence"""
+ for phase, strategy in self.strategies.items():
+ result = await strategy.execute(agent, context)
+ # Handle result, update context
+```
+
+### 4. Singleton Pattern (State Manager)
+
+**Purpose**: Ensure single instance of state registry
+
+**Implementation**: `AgentStateManager` with metaclass
+
+```python
+class AgentStateManager(ABC, metaclass=SingletonABCMeta):
+ _state_mapping: Dict[str, Type[AgentState]] = {}
+
+ def get_state(self, status: str) -> AgentState:
+ """Lazy load and return state instance"""
+ pass
+```
+
+### 5. Registry Pattern (Agent Registration)
+
+**Purpose**: Register agent types, enable dynamic agent creation
+
+**Implementation**: `AgentRegistry` decorator
+
+```python
+@AgentRegistry.register(agent_name="appagent", processor_cls=AppAgentProcessor)
+class AppAgent(BasicAgent):
+ pass
+```
+
+### 6. Blackboard Pattern (Multi-Agent Coordination)
+
+**Purpose**: Share data across multiple agents
+
+**Implementation**: `Blackboard` class
+
+```python
+class Blackboard:
+ _questions: Memory
+ _requests: Memory
+ _trajectories: Memory
+ _screenshots: Memory
+```
+
+---
+
+## Best Practices
+
+### State Design
+
+- Keep states **focused**: Each state should have single, clear responsibility
+- Use **rule-based transitions** for deterministic flows, **LLM-driven transitions** for adaptive behavior
+- Implement **error states** for graceful degradation
+- Document **state invariants** and **transition conditions**
+
+### Strategy Design
+
+- Keep strategies **atomic**: Each strategy should perform one cohesive task
+- Declare **dependencies explicitly** using `get_dependencies()`
+- Use **ComposedStrategy** to combine multiple strategies within a phase
+- Implement **fail-fast** for critical errors, **continue-on-error** for optional operations
+
+### Command Design
+
+- Keep commands **atomic**: Single, indivisible operation
+- Design commands to be **idempotent** where possible
+- Validate **arguments** on client side before execution
+- Return **structured results** with success/failure status
+
+### Memory Management
+
+- Use **short-term memory** for agent-specific execution history
+- Use **blackboard** for multi-agent coordination and persistent knowledge
+- **Clear memory** between sessions to avoid context pollution
+- Implement **memory pruning** for long-running sessions
+
+!!! warning "Security Considerations"
+ - **Validate all commands** on client side before execution
+ - **Sanitize LLM outputs** before converting to commands
+ - **Limit command scope** via MCP tool permissions
+ - **Audit all actions** for compliance and debugging
+ - **Isolate agents** to prevent unauthorized cross-agent access
+
+---
+
+## Related Documentation
+
+**Deep Dive Into Layers:**
+
+- [State Layer Documentation](design/state.md): FSM, AgentState, transitions, state registration
+- [Processor and Strategy Documentation](design/processor.md): ProcessorTemplate, strategies, dependency management
+- [Command Layer Documentation](design/command.md): CommandDispatcher, MCP integration, atomic commands
+
+**Supporting Systems:**
+
+- [Memory System Documentation](design/memory.md): Memory, MemoryItem, Blackboard patterns
+- [Agent Types Documentation](agent_types.md): Windows agents, Linux agent, platform-specific implementations
+
+**Integration Points:**
+
+- [Server-Client Architecture](server_client_architecture.md): Server and client separation, communication patterns
+- [Server Architecture](../../server/overview.md): Agent server, WebSocket manager, orchestration
+- [Client Architecture](../../client/overview.md): Device client, MCP servers, command execution
+- [AIP Protocol](../../aip/overview.md): Agent Interaction Protocol for server-client communication
+- [MCP Integration](../../mcp/overview.md): Model Context Protocol for tool execution
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+✅ **Three-Layer Architecture**: State (FSM) → Strategy (Execution Logic) → Command (System Interface)
+
+✅ **Server-Client Separation**: Safe isolation of reasoning (server) from execution (client)
+
+✅ **Multi-Platform Support**: Windows (two-tier), Linux (single-tier), extensible to macOS and embedded
+
+✅ **LLM-Driven Reasoning**: Dynamic decision-making with structured command output
+
+✅ **Modular & Extensible**: Register new states, compose strategies, add MCP tools without core changes
+
+✅ **Memory Systems**: Short-term (agent memory) and long-term (blackboard) for coordination
+
+✅ **Design Patterns**: State, Strategy, Template Method, Singleton, Registry, Blackboard
+
+The Device Agent architecture provides a **robust, extensible foundation** for multi-device automation. By separating concerns across three layers and isolating reasoning from execution, UFO3 achieves both **safety** and **flexibility** for orchestrating complex cross-device workflows.
+
+---
+
+## Reference
+
+Below is the reference for the `BasicAgent` class. All device agents inherit from `BasicAgent` and implement platform-specific processors and states:
+
+::: agents.agent.basic.BasicAgent
+
diff --git a/documents/docs/infrastructure/agents/server_client_architecture.md b/documents/docs/infrastructure/agents/server_client_architecture.md
new file mode 100644
index 000000000..e405c67ed
--- /dev/null
+++ b/documents/docs/infrastructure/agents/server_client_architecture.md
@@ -0,0 +1,791 @@
+# Server-Client Architecture
+
+Device agents in UFO are partitioned into **server** and **client** components, separating high-level orchestration from low-level execution. This architecture enables safe, scalable, and flexible task execution across heterogeneous devices through the Agent Interaction Protocol (AIP).
+
+---
+
+## Overview
+
+To support safe, scalable, and flexible execution across heterogeneous devices, each **device agent** is partitioned into two distinct components: a **server** and a **client**. This separation of responsibilities aligns naturally with the [layered FSM architecture](./overview.md#three-layer-architecture) and leverages [AIP](../../aip/overview.md) for reliable, low-latency communication.
+
+
+ 
+ The server-client architecture of a device agent. The server handles orchestration, state management, and LLM-driven decision-making, while the client executes commands through MCP tools and reports results back.
+
+
+### Architecture Benefits
+
+| Benefit | Description |
+|---------|-------------|
+| **🔒 Safe Execution** | Separates reasoning (server) from system operations (client), reducing risk |
+| **📈 Scalable Orchestration** | Single server can manage multiple clients concurrently |
+| **🔧 Independent Updates** | Server logic and client tools can be updated independently |
+| **🌐 Multi-Device Support** | Clients can be rapidly deployed on new devices with minimal configuration |
+| **🛡️ Fault Isolation** | Client failures don't crash the server's reasoning logic |
+| **📡 Real-Time Communication** | Persistent WebSocket connections enable low-latency bidirectional messaging |
+
+**Design Philosophy:**
+
+The server-client architecture embodies the **separation of concerns** principle: the server focuses on **what** to do (strategy), while the client focuses on **how** to do it (execution). This clear division enhances maintainability, security, and scalability.
+
+---
+
+## Server: Orchestration and State Management
+
+The **agent server** is responsible for managing the agent's state machine lifecycle, executing high-level strategies, and interacting with the Constellation Agent or orchestrator. It handles task decomposition, prompt construction, decision-making, and command sequencing.
+
+**Server Responsibilities:**
+
+- 🧠 **State Machine Management**: Controls agent lifecycle through the [FSM](./overview.md#level-1-state-layer-fsm)
+- 🎯 **Strategy Execution**: Implements the [Strategy Layer](./overview.md#level-2-strategy-layer-execution-logic)
+- 🤖 **LLM Interaction**: Constructs prompts, parses responses, makes decisions
+- 📋 **Task Decomposition**: Breaks down high-level tasks into executable commands
+- 🔀 **Command Sequencing**: Determines execution order and dependencies
+- 👥 **Multi-Client Coordination**: Manages multiple device clients concurrently
+
+### Server Architecture
+
+```mermaid
+graph TB
+ subgraph "Agent Server"
+ subgraph "State Layer"
+ FSM[Finite State Machine]
+ SM[State Manager]
+ end
+
+ subgraph "Strategy Layer"
+ PROC[ProcessorTemplate]
+ LLM[LLM Interaction]
+ CMD[Command Generation]
+ end
+
+ subgraph "Communication Layer"
+ WS[WebSocket Handler]
+ AIP_S[AIP Protocol]
+ end
+
+ subgraph "Metadata"
+ PROFILE[AgentProfile]
+ CAP[Capabilities]
+ STATUS[Runtime Status]
+ end
+
+ FSM --> SM
+ SM --> PROC
+ PROC --> LLM
+ PROC --> CMD
+ CMD --> WS
+ WS --> AIP_S
+
+ PROFILE --> CAP
+ PROFILE --> STATUS
+ end
+
+ subgraph "External Interfaces"
+ ORCHESTRATOR[Constellation Agent/ Orchestrator]
+ CLIENTS[Multiple Device Clients]
+ end
+
+ ORCHESTRATOR <-->|Task Assignment| FSM
+ ORCHESTRATOR <-->|Profile Query| PROFILE
+ AIP_S <-->|Commands/Results| CLIENTS
+
+ style FSM fill:#e1f5ff
+ style PROC fill:#fff4e1
+ style WS fill:#ffe1f5
+ style PROFILE fill:#f0ffe1
+```
+
+### AgentProfile
+
+Each server instance exposes its capabilities and status through metadata. This information allows the orchestrator to dynamically select suitable agents for specific subtasks, improving task distribution efficiency.
+
+Note: The AgentProfile concept is part of the design for multi-agent coordination in Galaxy (constellation-level orchestration). In UFO3's current implementation, agent metadata is managed through the session context and WebSocket handler registration.
+
+### Multi-Client Management
+
+A **single server instance** can manage **multiple agent clients concurrently**, maintaining isolation across devices while supporting centralized supervision and coordination.
+
+```mermaid
+sequenceDiagram
+ participant O as Orchestrator
+ participant S as Agent Server
+ participant C1 as Client 1 (Desktop)
+ participant C2 as Client 2 (Laptop)
+ participant C3 as Client 3 (Server)
+
+ Note over S: Server manages multiple clients
+
+ C1->>S: Connect & Register (Device Info)
+ C2->>S: Connect & Register (Device Info)
+ C3->>S: Connect & Register (Device Info)
+
+ S->>S: Update AgentProfile (3 clients available)
+
+ O->>S: Query AgentProfile
+ S->>O: Profile (3 devices, capabilities)
+
+ O->>S: Assign Task (requires GPU)
+ S->>S: Select Client 1 (has GPU)
+ S->>C1: Execute Command
+ C1->>S: Result
+ S->>O: Task Complete
+
+ Note over S,C1: Client 2 & 3 remain available
+```
+
+**Benefits of centralized server management:**
+
+- **Session Isolation**: Each client maintains independent state
+- **Load Balancing**: Server distributes tasks across available clients
+- **Fault Tolerance**: Client failures don't affect other clients
+- **Unified Monitoring**: Centralized view of all client statuses
+
+### Server Flexibility
+
+Crucially, the server maintains **full control** over the agent's workflow logic, enabling **updates to decision strategies** without impacting low-level execution on the device.
+
+**Update Scenarios:**
+
+- **Prompt Engineering**: Modify LLM prompts to improve decision quality
+- **Strategy Changes**: Switch between different processing strategies
+- **State Transitions**: Adjust FSM logic for new workflows
+- **API Integration**: Add new orchestrator interfaces
+
+All these updates happen **server-side only**, without redeploying clients.
+
+For detailed server implementation, see the [Server Documentation](../../server/overview.md).
+
+---
+
+## Client: Command Execution and Resource Access
+
+The **agent client** runs on the target device and manages a collection of MCP servers or tool interfaces. These MCP servers can operate locally (via direct invocation) or remotely (through HTTP requests), and each client may register multiple MCP servers to access diverse tool sources.
+
+**Client Responsibilities:**
+
+- ⚙️ **Command Execution**: Translates server commands into MCP tool calls
+- 🛠️ **Tool Management**: Registers and orchestrates local/remote MCP servers
+- 📊 **Device Profiling**: Reports hardware and software configuration
+- 📡 **Result Reporting**: Returns structured execution results via AIP
+- 🔍 **Self-Checks**: Performs diagnostics (disk, CPU, memory, GPU, network)
+- 🚫 **Stateless Operation**: Executes directives without high-level reasoning
+
+### Client Architecture
+
+```mermaid
+graph TB
+ subgraph "Agent Client"
+ subgraph "Communication Layer"
+ WS_C[WebSocket Client]
+ AIP_C[AIP Protocol Handler]
+ end
+
+ subgraph "Orchestration Layer"
+ UFC[UFO Client]
+ CM[Computer Manager]
+ end
+
+ subgraph "Execution Layer"
+ COMP[Computer Instance]
+ DISP[Command Dispatcher]
+ end
+
+ subgraph "Tool Layer"
+ MCP_MGR[MCP Server Manager]
+ LOCAL_MCP[Local MCP Servers]
+ REMOTE_MCP[Remote MCP Servers]
+ end
+
+ subgraph "Device Layer"
+ TOOLS[System Tools]
+ HW[Hardware Access]
+ FS[File System]
+ UI[UI Automation]
+ end
+
+ WS_C --> AIP_C
+ AIP_C --> UFC
+ UFC --> CM
+ CM --> COMP
+ COMP --> DISP
+ DISP --> MCP_MGR
+ MCP_MGR --> LOCAL_MCP
+ MCP_MGR --> REMOTE_MCP
+ LOCAL_MCP --> TOOLS
+ REMOTE_MCP -.->|HTTP| TOOLS
+ TOOLS --> HW
+ TOOLS --> FS
+ TOOLS --> UI
+ end
+
+ subgraph "Agent Server"
+ SERVER[Server Process]
+ end
+
+ SERVER <-->|AIP over WebSocket| WS_C
+
+ style WS_C fill:#e1f5ff
+ style COMP fill:#fff4e1
+ style MCP_MGR fill:#ffe1f5
+ style TOOLS fill:#f0ffe1
+```
+
+### Command Execution Pipeline
+
+Upon receiving commands from the agent server—such as collecting telemetry, invoking system utilities, or interacting with hardware components—the client follows this execution pipeline:
+
+```mermaid
+sequenceDiagram
+ participant S as Agent Server
+ participant C as Agent Client
+ participant D as Dispatcher
+ participant M as MCP Manager
+ participant T as MCP Tool
+
+ S->>C: Command via AIP (function, parameters)
+ C->>D: Parse command
+ D->>M: Resolve MCP tool
+ M->>M: Select server (local/remote)
+ M->>T: Invoke tool
+ T->>T: Execute operation
+ T->>M: Raw result
+ M->>D: Structured output
+ D->>C: Aggregate results
+ C->>S: Result via AIP (status, data)
+```
+
+**Pipeline stages:**
+
+1. **Command Reception**: Client receives AIP message with command metadata
+2. **Parsing**: Extract function name and parameters
+3. **Tool Resolution**: Map command to registered MCP tool
+4. **Server Selection**: Choose local or remote MCP server
+5. **Execution**: Invoke tool deterministically
+6. **Result Aggregation**: Structure output according to schema
+7. **Response Transmission**: Return results via AIP
+
+### MCP Server Management
+
+Each client may **register multiple MCP servers** to access diverse tool sources. MCP servers provide standardized interfaces for:
+
+| Tool Category | Examples | Local/Remote |
+|---------------|----------|--------------|
+| **UI Automation** | Click, type, screenshot, select controls | Local |
+| **File Operations** | Read, write, copy, delete files | Local |
+| **System Utilities** | Process management, network config | Local |
+| **Application APIs** | Excel, Word, Browser automation | Local |
+| **Remote Services** | Cloud APIs, external databases | Remote (HTTP) |
+| **Hardware Control** | Camera, sensors, GPIO | Local |
+
+```python
+# Example: Client registers multiple MCP servers
+client.register_mcp_server(
+ name="ui_automation",
+ type="local",
+ tools=["click", "type", "screenshot"]
+)
+
+client.register_mcp_server(
+ name="file_operations",
+ type="local",
+ tools=["read_file", "write_file", "list_dir"]
+)
+
+client.register_mcp_server(
+ name="cloud_api",
+ type="remote",
+ endpoint="https://api.example.com/mcp",
+ tools=["query_database", "send_notification"]
+)
+```
+
+For detailed MCP integration, see [MCP Integration](../../client/mcp_integration.md).
+
+### Device Initialization and Registration
+
+During initialization, each client connects to the agent server through the AIP endpoint, performs **self-checks**, and **registers its hardware-software profile**.
+
+```mermaid
+sequenceDiagram
+ participant C as Agent Client
+ participant S as Agent Server
+
+ Note over C: Client Startup
+
+ C->>C: Load configuration
+ C->>C: Initialize MCP servers
+
+ C->>C: Self-Check: - Disk space - CPU info - Memory - GPU availability - Network config
+
+ C->>S: Connect (WebSocket)
+ S->>C: Connection Acknowledged
+
+ C->>S: Register Device Info (hardware profile)
+ S->>S: Update AgentProfile
+ S->>C: Registration Confirmed
+
+ Note over C,S: Ready for task execution
+```
+
+**Self-checks performed during initialization:**
+
+```python
+device_info = {
+ # Hardware
+ "cpu": {
+ "model": "Intel Core i7-12700K",
+ "cores": 12,
+ "threads": 20,
+ "frequency_mhz": 3600
+ },
+ "memory": {
+ "total_gb": 32,
+ "available_gb": 24
+ },
+ "disk": {
+ "total_gb": 1024,
+ "free_gb": 512
+ },
+ "gpu": {
+ "available": True,
+ "model": "NVIDIA RTX 4090",
+ "vram_gb": 24
+ },
+
+ # Network
+ "network": {
+ "hostname": "desktop-001",
+ "ip_address": "192.168.1.100",
+ "bandwidth_mbps": 1000
+ },
+
+ # Software
+ "os": {
+ "platform": "windows",
+ "version": "11",
+ "build": "22621"
+ },
+ "installed_apps": [
+ "Microsoft Excel",
+ "Google Chrome",
+ "Visual Studio Code"
+ ],
+ "mcp_servers": [
+ "ui_automation",
+ "file_operations",
+ "system_utilities"
+ ]
+}
+```
+
+This profile is integrated into the server's **AgentProfile**, giving the orchestrator **complete visibility** into system topology and resource availability for informed task assignment and scheduling.
+
+For client implementation details, see the [Client Documentation](../../client/overview.md).
+
+### Stateless Design
+
+The client remains **stateless with respect to reasoning**: it faithfully executes directives without engaging in high-level decision-making.
+
+**Client Does NOT:**
+
+- ❌ Construct prompts for LLMs
+- ❌ Make strategic decisions
+- ❌ Manage state transitions
+- ❌ Decompose tasks into subtasks
+- ❌ Coordinate with other agents
+
+**Client DOES:**
+
+- ✅ Execute commands deterministically
+- ✅ Manage MCP tool lifecycle
+- ✅ Report execution results
+- ✅ Monitor device health
+- ✅ Handle tool failures gracefully
+
+This separation ensures that **updates to one layer do not interfere with the other**, enhancing maintainability and reducing risk of disruption.
+
+---
+
+## Server-Client Communication
+
+All communication between the server and client is routed through the **Agent Interaction Protocol (AIP)**, leveraging **persistent WebSocket connections**. This allows bidirectional, low-latency messaging that supports both synchronous command execution and asynchronous event reporting.
+
+**Why AIP over WebSocket?**
+
+- **Low Latency**: Real-time command dispatch and result streaming
+- **Bidirectional**: Server sends commands, client sends results/events
+- **Persistent**: Maintains connection across multiple commands
+- **Event-Driven**: Supports async notifications (progress updates, errors)
+- **Protocol Abstraction**: Hides network complexity from application logic
+
+### Communication Patterns
+
+#### 1. Synchronous Command Execution
+
+```mermaid
+sequenceDiagram
+ participant S as Server
+ participant C as Client
+
+ S->>C: Command (request_id=123) function: screenshot
+
+ Note over C: Execute tool
+
+ C->>S: Result (request_id=123) status: success data: image_base64
+```
+
+**Flow:**
+1. Server sends command with unique `request_id`
+2. Client executes MCP tool synchronously
+3. Client returns result with matching `request_id`
+4. Server matches result to pending request
+
+#### 2. Asynchronous Event Reporting
+
+```mermaid
+sequenceDiagram
+ participant S as Server
+ participant C as Client
+
+ S->>C: Command: long_running_task
+
+ C->>S: Event: progress (25%)
+ C->>S: Event: progress (50%)
+ C->>S: Event: progress (75%)
+ C->>S: Result: complete (100%)
+```
+
+**Use cases:**
+- Progress updates for long-running operations
+- Error notifications during execution
+- Resource utilization alerts
+- Device state changes
+
+#### 3. Multi-Command Pipeline
+
+```mermaid
+sequenceDiagram
+ participant S as Server
+ participant C as Client
+
+ S->>C: Command 1: screenshot
+ S->>C: Command 2: click(x, y)
+ S->>C: Command 3: screenshot
+
+ Note over C: Execute in order
+
+ C->>S: Result 1: image_before
+ C->>S: Result 2: click_success
+ C->>S: Result 3: image_after
+```
+
+**Benefits:**
+- Reduces round-trip latency
+- Enables atomic operation sequences
+- Supports transaction-like semantics
+
+### AIP Message Format
+
+Commands and results follow the AIP message schema:
+
+```json
+{
+ "type": "command",
+ "request_id": "abc-123",
+ "timestamp": "2025-11-06T10:30:00Z",
+ "payload": {
+ "function": "screenshot",
+ "arguments": {
+ "region": "active_window"
+ }
+ }
+}
+```
+
+```json
+{
+ "type": "result",
+ "request_id": "abc-123",
+ "timestamp": "2025-11-06T10:30:01Z",
+ "payload": {
+ "status": "success",
+ "data": {
+ "image": "base64_encoded_data",
+ "dimensions": {"width": 1920, "height": 1080}
+ }
+ }
+}
+```
+
+For complete AIP specification, see [AIP Documentation](../../aip/overview.md).
+
+### Connection Management
+
+The server and client maintain persistent connections with automatic reconnection logic:
+
+```mermaid
+stateDiagram-v2
+ [*] --> Disconnected
+ Disconnected --> Connecting: Client Start
+ Connecting --> Connected: Handshake Success
+ Connected --> Disconnected: Network Error
+ Connected --> Reconnecting: Connection Lost
+ Reconnecting --> Connected: Reconnect Success
+ Reconnecting --> Disconnected: Max Retries Exceeded
+ Connected --> [*]: Shutdown
+```
+
+**Connection lifecycle:**
+
+1. **Initial Connection**: Client initiates WebSocket connection to server
+2. **Registration**: Client sends device info, receives confirmation
+3. **Active Communication**: Bidirectional message exchange
+4. **Heartbeat**: Periodic pings to detect connection loss
+5. **Reconnection**: Automatic retry with exponential backoff
+6. **Graceful Shutdown**: Clean disconnection on exit
+
+**Resilience features:**
+
+- **Heartbeat Monitoring**: Detects silent connection failures
+- **Automatic Reconnection**: Exponential backoff with jitter
+- **Message Queuing**: Buffers messages during disconnection
+- **Session Recovery**: Restores context after reconnection
+
+---
+
+## Design Considerations
+
+This server-client architecture offers several key advantages:
+
+### 1. Rapid Device Deployment
+
+Device clients can be **rapidly deployed** on new devices with minimal configuration, immediately becoming execution endpoints within UFO.
+
+```bash
+# Deploy client on new device (example)
+# 1. Install client package
+pip install ufo-client
+
+# 2. Configure server endpoint
+cat > client_config.yaml < Dict[str, Any]:
+ """Execute command on remote client."""
+
+ # Create command message
+ message = {
+ "type": "command",
+ "request_id": generate_request_id(),
+ "payload": {
+ "function": command,
+ "arguments": arguments
+ }
+ }
+
+ # Send via AIP
+ result = await server.send_command(client_id, message)
+
+ return result
+```
+
+### Client: Executing Commands
+
+```python
+# Client receives and executes command
+async def handle_command(
+ client: AgentClient,
+ command_message: Dict[str, Any]
+) -> Dict[str, Any]:
+ """Handle incoming command from server."""
+
+ # Extract command details
+ function = command_message["payload"]["function"]
+ arguments = command_message["payload"]["arguments"]
+ request_id = command_message["request_id"]
+
+ try:
+ # Execute via MCP tool
+ result = await client.computer.execute_tool(
+ tool_name=function,
+ parameters=arguments
+ )
+
+ # Return success result
+ return {
+ "type": "result",
+ "request_id": request_id,
+ "payload": {
+ "status": "success",
+ "data": result
+ }
+ }
+
+ except Exception as e:
+ # Return error result
+ return {
+ "type": "result",
+ "request_id": request_id,
+ "payload": {
+ "status": "error",
+ "error": str(e)
+ }
+ }
+```
+
+---
+
+## Summary
+
+The server-client architecture is a foundational design pattern in UFO's distributed agent system:
+
+**Key Takeaways:**
+
+- 🏗️ **Separation of Concerns**: Server handles reasoning, client handles execution
+- 📡 **AIP Communication**: Persistent WebSocket connections enable real-time bidirectional messaging
+- 🔧 **Independent Updates**: Server logic and client tools evolve independently
+- 📈 **Scalable Management**: Single server orchestrates multiple clients
+- 🛡️ **Fault Isolation**: Client failures don't crash server reasoning
+- 🌐 **Multi-Device Ready**: Supports heterogeneous device orchestration
+
+**Related Documentation:**
+
+- [Device Agent Overview](overview.md) - Three-layer FSM framework
+- [Agent Types](agent_types.md) - Platform-specific implementations
+- [Server Overview](../../server/overview.md) - Detailed server architecture and APIs
+- [Client Overview](../../client/overview.md) - Detailed client architecture and tools
+- [AIP Protocol](../../aip/overview.md) - Communication protocol specification
+- [MCP Integration](../../mcp/overview.md) - Tool management and execution
+
+By decoupling high-level reasoning from low-level execution, the server-client architecture enables UFO to safely orchestrate complex workflows across diverse computing environments while maintaining flexibility, reliability, and ease of maintenance.
diff --git a/documents/docs/infrastructure/modules/context.md b/documents/docs/infrastructure/modules/context.md
new file mode 100644
index 000000000..e8bca09ce
--- /dev/null
+++ b/documents/docs/infrastructure/modules/context.md
@@ -0,0 +1,551 @@
+# Context
+
+The **Context** object is a type-safe shared state container that persists conversation state across all Rounds within a Session, providing centralized access to logs, costs, application state, and execution metadata.
+
+**Quick Reference:**
+
+- Get value? `context.get(ContextNames.REQUEST)`
+- Set value? `context.set(ContextNames.REQUEST, "new value")`
+- Auto-sync? See [Auto-Syncing Properties](#auto-syncing-properties)
+- All attributes? See [Complete Attribute Reference](#complete-attribute-reference)
+
+---
+
+## Overview
+
+The `Context` object serves as the central state store for sessions:
+
+1. **Type Safety**: Enum-based attribute names with type definitions
+2. **Default Values**: Automatic initialization with sensible defaults
+3. **Auto-Syncing**: Current round values sync automatically
+4. **Serialization**: Convert to/from dict for persistence
+5. **Dispatcher Attachment**: Command execution integration
+
+### Architecture
+
+```mermaid
+graph TB
+ subgraph "Context Container"
+ CTX[Context Dataclass]
+ VALUES[Attribute Values Dict]
+ NAMES[ContextNames Enum]
+ end
+
+ subgraph "Access Patterns"
+ GET[get method]
+ SET[set method]
+ UPDATE[update_dict method]
+ TO_DICT[to_dict method]
+ FROM_DICT[from_dict method]
+ end
+
+ subgraph "Auto-Sync Properties"
+ PROP_STEP[current_round_step]
+ PROP_COST[current_round_cost]
+ PROP_SUBTASK[current_round_subtask_amount]
+ end
+
+ subgraph "Shared Across"
+ SESS[Session]
+ R1[Round 1]
+ R2[Round 2]
+ R3[Round 3]
+ AGENTHost[HostAgent]
+ AGENTApp[AppAgent]
+ end
+
+ CTX --> VALUES
+ NAMES --> VALUES
+
+ GET --> VALUES
+ SET --> VALUES
+ UPDATE --> VALUES
+
+ PROP_STEP -.auto-updates.-> VALUES
+ PROP_COST -.auto-updates.-> VALUES
+ PROP_SUBTASK -.auto-updates.-> VALUES
+
+ SESS -.shares.-> CTX
+ R1 -.shares.-> CTX
+ R2 -.shares.-> CTX
+ R3 -.shares.-> CTX
+ AGENTHost -.reads/writes.-> CTX
+ AGENTApp -.reads/writes.-> CTX
+
+ style CTX fill:#e1f5ff
+ style VALUES fill:#fff4e1
+ style PROP_STEP fill:#f0ffe1
+ style SESS fill:#ffe1f5
+```
+
+---
+
+## ContextNames Enum
+
+All context attributes are defined in the `ContextNames` enum for type safety:
+
+```python
+from ufo.module.context import ContextNames
+
+# Type-safe attribute names
+request = context.get(ContextNames.REQUEST)
+context.set(ContextNames.SESSION_COST, 0.42)
+```
+
+### Attribute Categories
+
+!!!info "30+ Context Attributes"
+ Context attributes are organized into 7 logical categories.
+
+#### 1. Identifiers & Mode
+
+Context attributes for session and mode identification.
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `ID` | `int` | `0` | Session ID |
+| `MODE` | `str` | `""` | Execution mode (normal, service, etc.) |
+| `CURRENT_ROUND_ID` | `int` | `0` | Current round number |
+
+#### 2. Execution State
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `REQUEST` | `str` | `""` | Current user request |
+| `SUBTASK` | `str` | `""` | Current subtask for AppAgent |
+| `PREVIOUS_SUBTASKS` | `List` | `[]` | Previous subtasks history |
+| `HOST_MESSAGE` | `List` | `[]` | HostAgent → AppAgent messages |
+| `ROUND_RESULT` | `str` | `""` | Current round result |
+
+#### 3. Cost Tracking
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `SESSION_COST` | `float` | `0.0` | Total session cost ($) |
+| `ROUND_COST` | `Dict[int, float]` | `{}` | Cost per round |
+| `CURRENT_ROUND_COST` | `float` | `0.0` | Current round cost (auto-sync) |
+
+#### 4. Step Counting
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `SESSION_STEP` | `int` | `0` | Total steps in session |
+| `ROUND_STEP` | `Dict[int, int]` | `{}` | Steps per round |
+| `CURRENT_ROUND_STEP` | `int` | `0` | Current round steps (auto-sync) |
+| `ROUND_SUBTASK_AMOUNT` | `Dict[int, int]` | `{}` | Subtasks per round |
+| `CURRENT_ROUND_SUBTASK_AMOUNT` | `int` | `0` | Current subtasks (auto-sync) |
+
+#### 5. Application Context
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `APPLICATION_WINDOW` | `UIAWrapper` | `None` | Current application window |
+| `APPLICATION_WINDOW_INFO` | `Any` | - | Window metadata |
+| `APPLICATION_PROCESS_NAME` | `str` | `""` | Process name (e.g., "WINWORD.EXE") |
+| `APPLICATION_ROOT_NAME` | `str` | `""` | Root UI element name |
+| `CONTROL_REANNOTATION` | `List` | `[]` | Control re-annotations |
+
+#### 6. Logging
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `LOG_PATH` | `str` | `""` | Log directory path |
+| `LOGGER` | `Logger` | `None` | Session logger |
+| `REQUEST_LOGGER` | `Logger` | `None` | LLM request logger |
+| `EVALUATION_LOGGER` | `Logger` | `None` | Evaluation logger |
+| `STRUCTURAL_LOGS` | `defaultdict` | `defaultdict(...)` | Structured logs |
+
+#### 7. Tools & Communication
+
+| Attribute | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `TOOL_INFO` | `Dict` | `{}` | Available tools metadata |
+| `DEVICE_INFO` | `List` | `[]` | Connected device information (Galaxy) |
+| `CONSTELLATION` | `TaskConstellation` | `None` | Task constellation (Galaxy) |
+| `WEAVING_MODE` | `WeavingMode` | `CREATION` | Weaving mode (Galaxy) |
+
+---
+
+## Complete Attribute Reference
+
+All 30+ attributes with types and defaults.
+
+```python
+class ContextNames(Enum):
+ # Identifiers
+ ID = "ID" # int, default: 0
+ MODE = "MODE" # str, default: ""
+ CURRENT_ROUND_ID = "CURRENT_ROUND_ID" # int, default: 0
+
+ # Requests & Tasks
+ REQUEST = "REQUEST" # str, default: ""
+ SUBTASK = "SUBTASK" # str, default: ""
+ PREVIOUS_SUBTASKS = "PREVIOUS_SUBTASKS" # List, default: []
+ HOST_MESSAGE = "HOST_MESSAGE" # List, default: []
+ ROUND_RESULT = "ROUND_RESULT" # str, default: ""
+
+ # Costs
+ SESSION_COST = "SESSION_COST" # float, default: 0.0
+ ROUND_COST = "ROUND_COST" # Dict, default: {}
+ CURRENT_ROUND_COST = "CURRENT_ROUND_COST" # float, default: 0.0
+
+ # Steps
+ SESSION_STEP = "SESSION_STEP" # int, default: 0
+ ROUND_STEP = "ROUND_STEP" # Dict, default: {}
+ CURRENT_ROUND_STEP = "CURRENT_ROUND_STEP" # int, default: 0
+ ROUND_SUBTASK_AMOUNT = "ROUND_SUBTASK_AMOUNT" # Dict, default: {}
+ CURRENT_ROUND_SUBTASK_AMOUNT = "CURRENT_ROUND_SUBTASK_AMOUNT" # int, default: 0
+
+ # Application
+ APPLICATION_WINDOW = "APPLICATION_WINDOW" # UIAWrapper, default: None
+ APPLICATION_WINDOW_INFO = "APPLICATION_WINDOW_INFO" # Any
+ APPLICATION_PROCESS_NAME = "APPLICATION_PROCESS_NAME" # str, default: ""
+ APPLICATION_ROOT_NAME = "APPLICATION_ROOT_NAME" # str, default: ""
+ CONTROL_REANNOTATION = "CONTROL_REANNOTATION" # List, default: []
+
+ # Logging
+ LOG_PATH = "LOG_PATH" # str, default: ""
+ LOGGER = "LOGGER" # Logger, default: None
+ REQUEST_LOGGER = "REQUEST_LOGGER" # Logger, default: None
+ EVALUATION_LOGGER = "EVALUATION_LOGGER" # Logger, default: None
+ STRUCTURAL_LOGS = "STRUCTURAL_LOGS" # defaultdict
+
+ # Tools & Devices
+ TOOL_INFO = "TOOL_INFO" # Dict, default: {}
+ DEVICE_INFO = "DEVICE_INFO" # List, default: []
+ CONSTELLATION = "CONSTELLATION" # TaskConstellation, default: None
+ WEAVING_MODE = "WEAVING_MODE" # WeavingMode, default: CREATION
+```
+
+---
+
+## Context Methods
+
+### get()
+
+Retrieve a value from context:
+
+```python
+def get(self, name: ContextNames, default: Any = None) -> Any
+```
+
+**Example:**
+
+```python
+request = context.get(ContextNames.REQUEST)
+# Returns "" if not set
+
+cost = context.get(ContextNames.SESSION_COST, 0.0)
+# Returns 0.0 if not set or uses provided default
+```
+
+### set()
+
+Set a context value:
+
+```python
+def set(self, name: ContextNames, value: Any) -> None
+```
+
+**Example:**
+
+```python
+context.set(ContextNames.REQUEST, "Send an email to John")
+context.set(ContextNames.SESSION_COST, 0.42)
+context.set(ContextNames.APPLICATION_PROCESS_NAME, "WINWORD.EXE")
+```
+
+### update_dict()
+
+Batch update multiple values:
+
+```python
+def update_dict(self, updates: Dict[ContextNames, Any]) -> None
+```
+
+**Example:**
+
+```python
+context.update_dict({
+ ContextNames.REQUEST: "New task",
+ ContextNames.MODE: "normal",
+ ContextNames.SESSION_STEP: 10
+})
+```
+
+### to_dict()
+
+Serialize context to dictionary:
+
+```python
+def to_dict(self) -> Dict[str, Any]
+```
+
+**Returns**: Dictionary with only JSON-serializable values
+
+**Example:**
+
+```python
+context_dict = context.to_dict()
+# Save to file
+json.dump(context_dict, open("context.json", "w"))
+```
+
+**Excluded from serialization:**
+- Loggers (`LOGGER`, `REQUEST_LOGGER`, `EVALUATION_LOGGER`)
+- Window objects (`APPLICATION_WINDOW`)
+- Non-serializable objects
+
+### from_dict()
+
+Restore context from dictionary:
+
+```python
+@staticmethod
+def from_dict(data: Dict[str, Any]) -> "Context"
+```
+
+**Example:**
+
+```python
+# Load from file
+data = json.load(open("context.json"))
+context = Context.from_dict(data)
+```
+
+### attach_command_dispatcher()
+
+Attach dispatcher for command execution:
+
+```python
+def attach_command_dispatcher(self, dispatcher: BasicCommandDispatcher) -> None
+```
+
+**Example:**
+
+```python
+from ufo.module.dispatcher import LocalCommandDispatcher
+
+dispatcher = LocalCommandDispatcher(session, mcp_manager)
+context.attach_command_dispatcher(dispatcher)
+
+# Now rounds can execute commands via context
+```
+
+---
+
+## Auto-Syncing Properties
+
+These properties automatically sync with current round values in dictionaries.
+
+### current_round_step
+
+```python
+@property
+def current_round_step(self) -> int:
+ """Get current round step."""
+ return self.attributes.get(ContextNames.ROUND_STEP, {}).get(
+ self.attributes.get(ContextNames.CURRENT_ROUND_ID, 0), 0
+ )
+
+@current_round_step.setter
+def current_round_step(self, value: int) -> None:
+ """Set current round step and update dict."""
+ round_id = self.attributes.get(ContextNames.CURRENT_ROUND_ID, 0)
+ self.attributes[ContextNames.ROUND_STEP][round_id] = value
+ self.attributes[ContextNames.CURRENT_ROUND_STEP] = value
+```
+
+**Usage:**
+
+```python
+# Reading
+steps = context.current_round_step
+
+# Writing (updates both ROUND_STEP dict and CURRENT_ROUND_STEP)
+context.current_round_step = 5
+```
+
+### current_round_cost
+
+Auto-syncs cost tracking:
+
+```python
+# Reading
+cost = context.current_round_cost
+
+# Writing (updates both ROUND_COST dict and CURRENT_ROUND_COST)
+context.current_round_cost += 0.01
+```
+
+### current_round_subtask_amount
+
+Auto-syncs subtask counting:
+
+```python
+# Reading
+subtasks = context.current_round_subtask_amount
+
+# Writing
+context.current_round_subtask_amount += 1
+```
+
+---
+
+## Usage Patterns
+
+### Pattern 1: Session Initialization
+
+```python
+from ufo.module.context import Context, ContextNames
+
+# Create context
+context = Context()
+
+# Initialize session metadata
+context.set(ContextNames.ID, 0)
+context.set(ContextNames.MODE, "normal")
+context.set(ContextNames.LOG_PATH, "./logs/task_001/")
+context.set(ContextNames.REQUEST, "Send an email")
+```
+
+### Pattern 2: Round Execution
+
+```python
+# At round start
+context.set(ContextNames.CURRENT_ROUND_ID, round_id)
+
+# During round
+context.current_round_step += 1
+context.current_round_cost += agent_cost
+
+# Agent reads state
+request = context.get(ContextNames.REQUEST)
+process_name = context.get(ContextNames.APPLICATION_PROCESS_NAME)
+```
+
+### Pattern 3: Cost Tracking
+
+```python
+# Agent incurs cost
+agent_cost = llm_call_cost()
+context.current_round_cost += agent_cost
+
+# Session total auto-updates
+context.set(
+ ContextNames.SESSION_COST,
+ context.get(ContextNames.SESSION_COST, 0.0) + agent_cost
+)
+
+# Print summary
+print(f"Round cost: ${context.current_round_cost:.4f}")
+print(f"Session total: ${context.get(ContextNames.SESSION_COST):.4f}")
+```
+
+### Pattern 4: Application Tracking
+
+```python
+# Agent selects application
+context.set(ContextNames.APPLICATION_PROCESS_NAME, "WINWORD.EXE")
+context.set(ContextNames.APPLICATION_ROOT_NAME, "Document1 - Word")
+context.set(ContextNames.APPLICATION_WINDOW, word_window)
+
+# Later rounds access same app
+app_window = context.get(ContextNames.APPLICATION_WINDOW)
+if app_window:
+ app_window.set_focus()
+```
+
+### Pattern 5: Logging
+
+```python
+# Setup loggers
+context.set(ContextNames.LOGGER, session_logger)
+context.set(ContextNames.REQUEST_LOGGER, request_logger)
+
+# Use throughout session
+logger = context.get(ContextNames.LOGGER)
+logger.info("Round started")
+
+request_logger = context.get(ContextNames.REQUEST_LOGGER)
+request_logger.log_request(prompt, response)
+```
+
+### Pattern 6: Persistence
+
+```python
+# Save context state
+context_dict = context.to_dict()
+with open("checkpoint.json", "w") as f:
+ json.dump(context_dict, f, indent=2)
+
+# Resume from checkpoint
+with open("checkpoint.json") as f:
+ data = json.load(f)
+restored_context = Context.from_dict(data)
+```
+
+---
+
+## Best Practices
+
+### Type Safety
+
+!!!tip "Use Enum Names"
+ Always use `ContextNames` enum instead of strings:
+
+ ```python
+ # ✅ Good
+ context.get(ContextNames.REQUEST)
+
+ # ❌ Bad
+ context.attributes["REQUEST"]
+ ```
+
+### Default Values
+
+!!!success "Leverage Defaults"
+ ContextNames provides sensible defaults:
+
+ ```python
+ # No need to check for None
+ cost = context.get(ContextNames.SESSION_COST) # Returns 0.0 if unset
+
+ # Explicit default
+ steps = context.get(ContextNames.SESSION_STEP, 0)
+ ```
+
+### Auto-Sync
+
+!!!warning "Use Auto-Sync Properties"
+ For current round values, use auto-sync properties:
+
+ ```python
+ # ✅ Good - auto-syncs both dicts
+ context.current_round_cost += 0.01
+
+ # ❌ Manual - must update both
+ round_id = context.get(ContextNames.CURRENT_ROUND_ID)
+ context.attributes[ContextNames.ROUND_COST][round_id] += 0.01
+ context.attributes[ContextNames.CURRENT_ROUND_COST] += 0.01
+ ```
+
+---
+
+## Reference
+
+### Context Dataclass
+
+::: module.context.Context
+
+### ContextNames Enum
+
+::: module.context.ContextNames
+
+---
+
+## See Also
+
+- [Session](./session.md) - Session lifecycle and context usage
+- [Round](./round.md) - Round execution with context
+- [Overview](./overview.md) - Module system architecture
\ No newline at end of file
diff --git a/documents/docs/infrastructure/modules/dispatcher.md b/documents/docs/infrastructure/modules/dispatcher.md
new file mode 100644
index 000000000..4f4dad7dc
--- /dev/null
+++ b/documents/docs/infrastructure/modules/dispatcher.md
@@ -0,0 +1,1080 @@
+# Command Dispatcher
+
+The **Command Dispatcher** is the bridge between agent decisions and actual execution, routing commands to the appropriate execution environment (local MCP tools or remote WebSocket clients) and managing result delivery with timeout and error handling.
+
+**Quick Reference:**
+
+- Local execution? Use [LocalCommandDispatcher](#localcommanddispatcher)
+- Remote control? Use [WebSocketCommandDispatcher](#websocketcommanddispatcher)
+- Error handling? See [Error Handling](#error-handling)
+- Custom dispatcher? Extend [BasicCommandDispatcher](#basiccommanddispatcher-abstract-base)
+
+---
+
+## Architecture Overview
+
+The dispatcher system implements the **Command Pattern** with async execution and comprehensive error handling:
+
+```mermaid
+graph TB
+ subgraph "Agent Layer"
+ A[Agent Decision Engine]
+ CMD[Generate Command Objects]
+ end
+
+ subgraph "Dispatcher Interface"
+ BD[BasicCommandDispatcher Abstract Base]
+ EXEC[execute_commands async method]
+ ERR[generate_error_results error handler]
+ end
+
+ subgraph "Local Execution Path"
+ LCD[LocalCommandDispatcher]
+ CR[CommandRouter]
+ CM[ComputerManager]
+ MCP[MCP Server Manager]
+ TOOLS[Local Tool Execution]
+ end
+
+ subgraph "Remote Execution Path"
+ WSD[WebSocketCommandDispatcher]
+ AIP[AIP Protocol]
+ WS[WebSocket Transport]
+ CLIENT[Remote Client]
+ end
+
+ subgraph "Result Handling"
+ RES[Result Objects List~Result~]
+ SUCCESS[ResultStatus.SUCCESS]
+ FAILURE[ResultStatus.FAILURE]
+ end
+
+ A --> CMD
+ CMD --> EXEC
+ EXEC -.inherits.-> BD
+
+ BD --> LCD
+ BD --> WSD
+
+ LCD --> CR
+ CR --> CM
+ CM --> MCP
+ MCP --> TOOLS
+ TOOLS --> RES
+
+ WSD --> AIP
+ AIP --> WS
+ WS --> CLIENT
+ CLIENT --> RES
+
+ ERR --> FAILURE
+ RES --> SUCCESS
+ RES --> FAILURE
+
+ style A fill:#e1f5ff
+ style BD fill:#fff4e1
+ style LCD fill:#f0ffe1
+ style WSD fill:#ffe1f5
+ style RES fill:#e1ffe1
+ style ERR fill:#ffe1e1
+```
+
+---
+
+## BasicCommandDispatcher (Abstract Base)
+
+`BasicCommandDispatcher` defines the interface that all concrete dispatchers must implement.
+
+### Core Methods
+
+#### `execute_commands()` (Abstract)
+
+```python
+async def execute_commands(
+ self,
+ commands: List[Command],
+ timeout: float = 6000
+) -> Optional[List[Result]]
+```
+
+**Purpose**: Execute a list of commands and return results.
+
+**Parameters:**
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `commands` | `List[Command]` | Required | Commands to execute |
+| `timeout` | `float` | `6000` | Timeout in seconds |
+
+**Returns:**
+- `List[Result]`: Results from command execution
+- `None`: If execution timed out
+
+!!!warning "Must Override"
+ Concrete dispatchers **must** implement this method with platform-specific logic.
+
+#### `generate_error_results()`
+
+```python
+def generate_error_results(
+ self,
+ commands: List[Command],
+ error: Exception
+) -> Optional[List[Result]]
+```
+
+**Purpose**: Convert exceptions into structured error Results.
+
+**Error Handling Logic:**
+
+```mermaid
+sequenceDiagram
+ participant D as Dispatcher
+ participant E as Exception Handler
+ participant R as Result Factory
+
+ D->>D: execute_commands()
+ D-xD: Exception raised
+ D->>E: generate_error_results(commands, error)
+
+ loop For each command
+ E->>R: Create Result object
+ R->>R: status = FAILURE
+ R->>R: error = error message
+ R->>R: result = error description
+ R->>R: call_id = command.call_id
+ R-->>E: Error Result
+ end
+
+ E-->>D: List[Result] (all failures)
+ D-->>Agent: Return error results
+```
+
+**Generated Error Result:**
+
+```python
+Result(
+ status=ResultStatus.FAILURE,
+ error=f"Error occurred while executing command {command}: {error}",
+ result=f"Error occurred while executing command {command}: {error}, "
+ f"please retry or execute a different command.",
+ call_id=command.call_id
+)
+```
+
+!!!example "Error Result Structure"
+ ```python
+ from aip.messages import Result, ResultStatus
+
+ # Example error result
+ error_result = Result(
+ status=ResultStatus.FAILURE,
+ error="ConnectionRefusedError: [WinError 10061]",
+ result="Error occurred while executing command click_element: "
+ "ConnectionRefusedError, please retry or execute a different command.",
+ call_id="cmd_12345"
+ )
+
+ # Check in agent code
+ if result.status == ResultStatus.FAILURE:
+ print(f"Action failed: {result.error}")
+ # Agent can retry or use alternative approach
+ ```
+
+---
+
+## LocalCommandDispatcher
+
+`LocalCommandDispatcher` routes commands to local MCP tool servers for direct execution on the current machine. Used for interactive and standalone sessions.
+
+### Architecture
+
+```mermaid
+graph TB
+ subgraph "LocalCommandDispatcher"
+ LCD[LocalCommandDispatcher]
+ SESSION[session: BaseSession]
+ PENDING[pending: Dict~str, Future~]
+ MCP_MGR[mcp_server_manager: MCPServerManager]
+ CM[computer_manager: ComputerManager]
+ CR[command_router: CommandRouter]
+ end
+
+ subgraph "Execution Flow"
+ CMD[Receive Commands]
+ ID[Assign call_id to each]
+ ROUTE[CommandRouter.execute]
+ EXEC[ComputerManager → MCP]
+ WAIT[asyncio.wait_for]
+ RES[Return Results]
+ end
+
+ subgraph "Error Paths"
+ TIMEOUT[asyncio.TimeoutError]
+ EXCEPTION[Exception]
+ ERR_RES[generate_error_results]
+ end
+
+ LCD --> SESSION
+ LCD --> MCP_MGR
+ LCD --> CM
+ LCD --> CR
+
+ CMD --> ID
+ ID --> ROUTE
+ ROUTE --> EXEC
+ EXEC --> WAIT
+ WAIT --> RES
+
+ WAIT -.timeout.-> TIMEOUT
+ EXEC -.exception.-> EXCEPTION
+ TIMEOUT --> ERR_RES
+ EXCEPTION --> ERR_RES
+ ERR_RES --> RES
+
+ style LCD fill:#e1f5ff
+ style CMD fill:#fff4e1
+ style RES fill:#e1ffe1
+ style ERR_RES fill:#ffe1e1
+```
+
+### Initialization
+
+```python
+from ufo.module.dispatcher import LocalCommandDispatcher
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+def _init_context(self) -> None:
+ """Initialize context with local dispatcher."""
+ super()._init_context()
+
+ # Create MCP server manager
+ mcp_server_manager = MCPServerManager()
+
+ # Create local dispatcher
+ command_dispatcher = LocalCommandDispatcher(
+ session=self,
+ mcp_server_manager=mcp_server_manager
+ )
+
+ # Attach to context
+ self.context.attach_command_dispatcher(command_dispatcher)
+```
+
+**Initialization Parameters:**
+
+| Parameter | Type | Purpose |
+|-----------|------|---------|
+| `session` | `BaseSession` | Current session instance |
+| `mcp_server_manager` | `MCPServerManager` | MCP server lifecycle manager |
+
+**Internal Components Created:**
+
+- `ComputerManager`: Manages computer-level operations
+- `CommandRouter`: Routes commands to appropriate MCP tools
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Dispatcher as LocalCommandDispatcher
+ participant Router as CommandRouter
+ participant Computer as ComputerManager
+ participant MCP as MCP Servers
+
+ Agent->>Dispatcher: execute_commands([cmd1, cmd2])
+ Dispatcher->>Dispatcher: Assign call_id to each command
+
+ Dispatcher->>Router: execute(agent_name, root_name, process_name, commands)
+
+ Router->>Computer: Route based on tool_type
+
+ par Execute cmd1
+ Computer->>MCP: Tool server 1
+ MCP-->>Computer: Result 1
+ and Execute cmd2
+ Computer->>MCP: Tool server 2
+ MCP-->>Computer: Result 2
+ end
+
+ Computer-->>Router: Results [res1, res2]
+ Router-->>Dispatcher: Results
+ Dispatcher-->>Agent: Results
+
+ alt Timeout
+ Dispatcher-xDispatcher: asyncio.TimeoutError
+ Dispatcher->>Dispatcher: generate_error_results()
+ Dispatcher-->>Agent: Error Results
+ end
+
+ alt Exception
+ Router-xRouter: Exception
+ Dispatcher->>Dispatcher: generate_error_results()
+ Dispatcher-->>Agent: Error Results
+ end
+```
+
+### Command Routing Context
+
+The dispatcher provides execution context to the CommandRouter:
+
+| Context | Source | Purpose |
+|---------|--------|---------|
+| `agent_name` | `session.current_agent_class` | Track which agent issued command |
+| `root_name` | `context.APPLICATION_ROOT_NAME` | Application root for UI operations |
+| `process_name` | `context.APPLICATION_PROCESS_NAME` | Process name for targeting |
+| `commands` | Command list | Actions to execute |
+
+!!!example "Local Execution Example"
+ ```python
+ from aip.messages import Command, ResultStatus
+
+ # Commands for local execution
+ commands = [
+ Command(
+ tool_name="click_element",
+ parameters={"control_label": "1", "button": "left"},
+ tool_type="windows", # Routed to Windows MCP server
+ call_id="" # Will be auto-assigned
+ ),
+ Command(
+ tool_name="type_text",
+ parameters={"text": "Hello World"},
+ tool_type="windows",
+ call_id=""
+ )
+ ]
+
+ # Execute locally
+ results = await context.command_dispatcher.execute_commands(
+ commands=commands,
+ timeout=30.0
+ )
+
+ # Process results
+ for i, result in enumerate(results):
+ if result.status == ResultStatus.SUCCESS:
+ print(f"Command {i+1} succeeded: {result.result}")
+ else:
+ print(f"Command {i+1} failed: {result.error}")
+ ```
+
+### Error Scenarios
+
+| Error Type | Trigger | Handling | Result |
+|------------|---------|----------|--------|
+| **TimeoutError** | Execution exceeds `timeout` | `generate_error_results()` | Error Results with timeout message |
+| **ConnectionError** | MCP server unreachable | `generate_error_results()` | Error Results with connection error |
+| **ValidationError** | Invalid command parameters | `generate_error_results()` | Error Results with validation error |
+| **RuntimeError** | Tool execution failure | `generate_error_results()` | Error Results with execution error |
+
+!!!warning "Timeout Considerations"
+ - Default timeout: **6000 seconds** (100 minutes)
+ - For UI operations: Consider **30-60 seconds**
+ - For network operations: May need longer timeouts
+ - Always handle timeout gracefully in agent code
+
+---
+
+## WebSocketCommandDispatcher
+
+`WebSocketCommandDispatcher` uses the AIP protocol to send commands to remote clients over WebSocket connections. Used for service sessions and remote control.
+
+### Architecture
+
+```mermaid
+graph TB
+ subgraph "WebSocketCommandDispatcher"
+ WSD[WebSocketCommandDispatcher]
+ SESSION[session: BaseSession]
+ PROTOCOL[protocol: TaskExecutionProtocol]
+ PENDING[pending: Dict~str, Future~]
+ QUEUE[send_queue: asyncio.Queue]
+ end
+
+ subgraph "AIP Protocol Layer"
+ MSG[ServerMessage Factory]
+ SEND[protocol.send_command]
+ RECV[protocol.receive_result]
+ end
+
+ subgraph "WebSocket Transport"
+ WS[WebSocket Connection]
+ CLIENT[Remote Client]
+ end
+
+ subgraph "Result Management"
+ FUT[asyncio.Future]
+ WAIT[await with timeout]
+ RES[Results]
+ end
+
+ WSD --> SESSION
+ WSD --> PROTOCOL
+ WSD --> PENDING
+
+ WSD --> MSG
+ MSG --> SEND
+ SEND --> WS
+ WS --> CLIENT
+
+ CLIENT --> RECV
+ RECV --> FUT
+ FUT --> WAIT
+ WAIT --> RES
+
+ style WSD fill:#e1f5ff
+ style PROTOCOL fill:#fff4e1
+ style WS fill:#f0ffe1
+ style RES fill:#e1ffe1
+```
+
+### Initialization
+
+```python
+from ufo.module.dispatcher import WebSocketCommandDispatcher
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+def _init_context(self) -> None:
+ """Initialize context with WebSocket dispatcher."""
+ super()._init_context()
+
+ # Create WebSocket dispatcher with AIP protocol
+ command_dispatcher = WebSocketCommandDispatcher(
+ session=self,
+ protocol=self.task_protocol # TaskExecutionProtocol instance
+ )
+
+ # Attach to context
+ self.context.attach_command_dispatcher(command_dispatcher)
+```
+
+**Initialization Parameters:**
+
+| Parameter | Type | Purpose |
+|-----------|------|---------|
+| `session` | `BaseSession` | Current service session |
+| `protocol` | `TaskExecutionProtocol` | AIP protocol handler |
+
+!!!danger "Protocol Required"
+ WebSocketCommandDispatcher **requires** a `TaskExecutionProtocol` instance. It will raise `ValueError` if protocol is `None`.
+
+### Message Construction
+
+The dispatcher creates structured AIP ServerMessages:
+
+```python
+def make_server_response(self, commands: List[Command]) -> ServerMessage:
+ """
+ Create a server response message for the given commands.
+ """
+ # Assign unique IDs
+ for command in commands:
+ command.call_id = str(uuid.uuid4())
+
+ # Extract context
+ agent_name = self.session.current_agent_class
+ process_name = self.session.context.get(ContextNames.APPLICATION_PROCESS_NAME)
+ root_name = self.session.context.get(ContextNames.APPLICATION_ROOT_NAME)
+ session_id = self.session.id
+ response_id = str(uuid.uuid4())
+
+ # Build AIP message
+ return ServerMessage(
+ type=ServerMessageType.COMMAND,
+ status=TaskStatus.CONTINUE,
+ agent_name=agent_name,
+ process_name=process_name,
+ root_name=root_name,
+ actions=commands,
+ session_id=session_id,
+ task_name=self.session.task,
+ timestamp=datetime.datetime.now(datetime.timezone.utc).isoformat(),
+ response_id=response_id
+ )
+```
+
+**ServerMessage Fields:**
+
+| Field | Source | Purpose |
+|-------|--------|---------|
+| `type` | `ServerMessageType.COMMAND` | Indicates command message |
+| `status` | `TaskStatus.CONTINUE` | Task in progress |
+| `agent_name` | Current agent class | Track agent issuing command |
+| `process_name` | Context | Target process |
+| `root_name` | Context | Application root |
+| `actions` | Command list | Commands to execute |
+| `session_id` | Session ID | Session tracking |
+| `task_name` | Session task | Task identification |
+| `timestamp` | Current UTC time | Message timing |
+| `response_id` | UUID | Correlate request/response |
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Dispatcher as WebSocketCommandDispatcher
+ participant Protocol as TaskExecutionProtocol
+ participant WS as WebSocket
+ participant Client as Remote Client
+
+ Agent->>Dispatcher: execute_commands([cmd1, cmd2])
+ Dispatcher->>Dispatcher: Assign call_id to each
+ Dispatcher->>Dispatcher: make_server_response()
+ Dispatcher->>Dispatcher: Create Future for response_id
+
+ Dispatcher->>Protocol: send_command(ServerMessage)
+ Protocol->>WS: Send via WebSocket
+ WS->>Client: Transmit message
+
+ Note over Dispatcher: await Future with timeout
+
+ Client->>Client: Execute commands locally
+ Client->>WS: Send ClientMessage with results
+ WS->>Protocol: Receive message
+ Protocol->>Dispatcher: set_result(response_id, ClientMessage)
+ Dispatcher->>Dispatcher: Resolve Future
+
+ Dispatcher-->>Agent: Return action_results
+
+ alt Timeout
+ Dispatcher-xDispatcher: asyncio.TimeoutError
+ Dispatcher->>Dispatcher: generate_error_results()
+ Dispatcher-->>Agent: Error Results
+ end
+
+ alt Send Error
+ Protocol-xProtocol: Exception
+ Dispatcher->>Dispatcher: generate_error_results()
+ Dispatcher-->>Agent: Error Results
+ end
+```
+
+### Result Handling
+
+The `set_result()` method is called by the WebSocket handler when a client response arrives:
+
+```python
+async def set_result(self, response_id: str, result: ClientMessage) -> None:
+ """
+ Called by WebSocket handler when client returns a message.
+ :param response_id: The ID of the response.
+ :param result: The result from the client.
+ """
+ fut = self.pending.get(response_id)
+ if fut and not fut.done():
+ fut.set_result(result.action_results)
+```
+
+**Pending Future Management:**
+
+```mermaid
+graph LR
+ subgraph "Request Side"
+ REQ[execute_commands]
+ FUT[Create Future]
+ PEND[Store in pending dict]
+ WAIT[Await Future]
+ end
+
+ subgraph "Response Side"
+ RECV[WebSocket receives result]
+ LOOKUP[Lookup Future by response_id]
+ RESOLVE[set_result on Future]
+ end
+
+ REQ --> FUT
+ FUT --> PEND
+ PEND --> WAIT
+
+ RECV --> LOOKUP
+ LOOKUP --> RESOLVE
+ RESOLVE -.resolves.-> WAIT
+
+ style REQ fill:#e1f5ff
+ style RECV fill:#fff4e1
+ style WAIT fill:#e1ffe1
+```
+
+!!!example "WebSocket Execution Example"
+ ```python
+ from aip.messages import Command
+
+ # Session is ServiceSession with WebSocketCommandDispatcher
+ commands = [
+ Command(
+ tool_name="capture_window_screenshot",
+ parameters={},
+ tool_type="data_collection"
+ )
+ ]
+
+ # Execute remotely via WebSocket
+ results = await context.command_dispatcher.execute_commands(
+ commands=commands,
+ timeout=60.0 # Screenshot may take time
+ )
+
+ # Results came from remote client
+ if results:
+ screenshot_base64 = results[0].result
+ # Process screenshot...
+ ```
+
+### Error Scenarios
+
+| Error Type | Trigger | Handling | Result |
+|------------|---------|----------|--------|
+| **TimeoutError** | Client doesn't respond in time | `generate_error_results()` | Error Results |
+| **ProtocolError** | AIP protocol violation | `generate_error_results()` | Error Results |
+| **ConnectionError** | WebSocket disconnected | `generate_error_results()` | Error Results |
+| **ClientError** | Client reports execution failure | Return client's error Result | Propagate client error |
+
+!!!warning "WebSocket-Specific Considerations"
+ - **Network latency**: Add buffer to timeouts
+ - **Client state**: Client may be busy with other tasks
+ - **Connection loss**: Implement reconnection logic
+ - **Message ordering**: AIP ensures ordered delivery
+
+---
+
+## Error Handling
+
+All dispatchers convert exceptions into structured `Result` objects to maintain consistent error handling.
+
+### Error Flow
+
+```mermaid
+graph TB
+ START[Command Execution Starts]
+
+ TRY{Try Block}
+ SUCCESS[Commands Execute Successfully]
+ RETURN_OK[Return Results]
+
+ TIMEOUT{Timeout?}
+ EXCEPTION{Other Exception?}
+
+ GEN_ERR[generate_error_results]
+ CREATE_RESULTS[Create Result for each command]
+ SET_FAILURE[Set status = FAILURE]
+ ADD_ERROR[Add error message]
+ RETURN_ERR[Return Error Results]
+
+ START --> TRY
+ TRY -->|Success| SUCCESS
+ TRY -->|Failure| TIMEOUT
+ SUCCESS --> RETURN_OK
+
+ TIMEOUT -->|Yes| GEN_ERR
+ TIMEOUT -->|No| EXCEPTION
+
+ EXCEPTION -->|Yes| GEN_ERR
+
+ GEN_ERR --> CREATE_RESULTS
+ CREATE_RESULTS --> SET_FAILURE
+ SET_FAILURE --> ADD_ERROR
+ ADD_ERROR --> RETURN_ERR
+
+ style START fill:#e1f5ff
+ style SUCCESS fill:#e1ffe1
+ style GEN_ERR fill:#ffe1e1
+ style RETURN_OK fill:#f0ffe1
+ style RETURN_ERR fill:#fff4e1
+```
+
+### Error Result Format
+
+```python
+{
+ "status": "failure", # ResultStatus.FAILURE
+ "error": "asyncio.TimeoutError: Command execution timed out",
+ "result": "Error occurred while executing command : TimeoutError, "
+ "please retry or execute a different command.",
+ "call_id": "cmd_abc123"
+}
+```
+
+### Agent Error Handling
+
+Agents should handle error results appropriately:
+
+```python
+async def execute_action(self, context: Context) -> None:
+ """Execute action with error handling."""
+ commands = self.generate_commands()
+
+ results = await context.command_dispatcher.execute_commands(
+ commands=commands,
+ timeout=30.0
+ )
+
+ for command, result in zip(commands, results):
+ if result.status == ResultStatus.FAILURE:
+ # Log error
+ self.logger.error(f"Command {command.tool_name} failed: {result.error}")
+
+ # Decision logic
+ if "timeout" in result.error.lower():
+ # Retry with longer timeout
+ self.retry_count += 1
+ if self.retry_count < 3:
+ return await self.execute_action(context)
+
+ elif "connection" in result.error.lower():
+ # Switch to alternative approach
+ return self.fallback_strategy()
+
+ else:
+ # Escalate to error state
+ self.transition_to_error_state(result.error)
+ else:
+ # Process successful result
+ self.process_result(result.result)
+```
+
+!!!tip "Error Handling Best Practices"
+ - ✅ Always check `result.status` before using `result.result`
+ - ✅ Log errors with context (command, parameters, error message)
+ - ✅ Implement retry logic for transient errors
+ - ✅ Provide fallback strategies for permanent failures
+ - ✅ Include helpful error messages for users
+ - ❌ Don't ignore error results
+ - ❌ Don't assume all commands succeed
+ - ❌ Don't retry indefinitely without backoff
+
+---
+
+## Usage Patterns
+
+### Pattern 1: Sequential Execution
+
+Execute commands one at a time:
+
+```python
+for command in command_list:
+ results = await context.command_dispatcher.execute_commands(
+ commands=[command],
+ timeout=30.0
+ )
+
+ if results[0].status == ResultStatus.SUCCESS:
+ # Process result and decide next command
+ next_command = self.decide_next_action(results[0])
+ else:
+ # Handle error and possibly abort
+ break
+```
+
+### Pattern 2: Batch Execution
+
+Execute multiple related commands together:
+
+```python
+# All commands for a subtask
+commands = [
+ Command(tool_name="click_element", ...),
+ Command(tool_name="type_text", ...),
+ Command(tool_name="press_key", ...)
+]
+
+results = await context.command_dispatcher.execute_commands(
+ commands=commands,
+ timeout=60.0
+)
+
+# Process all results
+for command, result in zip(commands, results):
+ if result.status == ResultStatus.FAILURE:
+ # One failure might invalidate the whole subtask
+ self.handle_subtask_failure(command, result)
+```
+
+### Pattern 3: Conditional Execution
+
+Execute commands based on previous results:
+
+```python
+# Check state first
+check_cmd = Command(tool_name="get_ui_tree", ...)
+check_results = await dispatcher.execute_commands([check_cmd])
+
+if check_results[0].status == ResultStatus.SUCCESS:
+ ui_tree = check_results[0].result
+
+ # Decide action based on UI state
+ if "Login" in ui_tree:
+ action_cmd = Command(tool_name="click_element", parameters={"label": "Login"})
+ else:
+ action_cmd = Command(tool_name="type_text", parameters={"text": "username"})
+
+ # Execute decided action
+ await dispatcher.execute_commands([action_cmd])
+```
+
+### Pattern 4: Retry with Backoff
+
+Retry failed commands with exponential backoff:
+
+```python
+import asyncio
+
+async def execute_with_retry(
+ dispatcher,
+ commands,
+ max_retries=3,
+ base_delay=1.0
+):
+ """Execute commands with exponential backoff retry."""
+
+ for attempt in range(max_retries):
+ results = await dispatcher.execute_commands(commands, timeout=30.0)
+
+ # Check if all succeeded
+ all_success = all(r.status == ResultStatus.SUCCESS for r in results)
+
+ if all_success:
+ return results
+
+ # Not last attempt - retry with backoff
+ if attempt < max_retries - 1:
+ delay = base_delay * (2 ** attempt)
+ logger.warning(f"Retry attempt {attempt + 1} after {delay}s")
+ await asyncio.sleep(delay)
+
+ # All retries exhausted
+ return results # Return last attempt results
+```
+
+---
+
+## Performance Considerations
+
+### Timeout Configuration
+
+Choose timeouts based on operation type:
+
+| Operation Type | Recommended Timeout | Reason |
+|----------------|---------------------|--------|
+| **UI clicks** | 10-30s | Fast but may wait for animations |
+| **Text input** | 5-15s | Usually fast |
+| **Screenshots** | 30-60s | May need rendering time |
+| **File operations** | 60-120s | I/O dependent |
+| **Network calls** | 120-300s | Network latency + processing |
+| **Batch operations** | Sum of individual + 20% | Account for overhead |
+
+### Command Batching
+
+**When to batch:**
+- ✅ Related actions in same context (e.g., fill form fields)
+- ✅ Commands with no dependencies between them
+- ✅ All commands target same application
+
+**When not to batch:**
+- ❌ Commands with dependencies (need sequential execution)
+- ❌ Mix of fast and slow operations (one timeout for all)
+- ❌ Need intermediate results to decide next action
+
+### Resource Management
+
+```python
+# Good: Reuse dispatcher attached to context
+results1 = await context.command_dispatcher.execute_commands(commands1)
+results2 = await context.command_dispatcher.execute_commands(commands2)
+
+# Bad: Creating new dispatchers
+dispatcher1 = LocalCommandDispatcher(session, mcp_manager)
+dispatcher2 = LocalCommandDispatcher(session, mcp_manager)
+```
+
+---
+
+## Advanced Topics
+
+### Custom Dispatcher Implementation
+
+Extend `BasicCommandDispatcher` for custom execution logic:
+
+```python
+from ufo.module.dispatcher import BasicCommandDispatcher
+from aip.messages import Command, Result, ResultStatus
+from typing import List, Optional
+
+class CustomCommandDispatcher(BasicCommandDispatcher):
+ """
+ Custom dispatcher that logs all commands and results.
+ """
+
+ def __init__(self, session, log_file: str):
+ self.session = session
+ self.log_file = log_file
+
+ async def execute_commands(
+ self,
+ commands: List[Command],
+ timeout: float = 6000
+ ) -> Optional[List[Result]]:
+ """Execute with logging."""
+
+ # Log commands
+ with open(self.log_file, 'a') as f:
+ f.write(f"Executing {len(commands)} commands\n")
+ for cmd in commands:
+ f.write(f" {cmd.tool_name}: {cmd.parameters}\n")
+
+ try:
+ # Your custom execution logic here
+ results = await self.custom_execute(commands, timeout)
+
+ # Log results
+ with open(self.log_file, 'a') as f:
+ for result in results:
+ f.write(f" Result: {result.status}\n")
+
+ return results
+
+ except Exception as e:
+ # Log error
+ with open(self.log_file, 'a') as f:
+ f.write(f" ERROR: {e}\n")
+
+ return self.generate_error_results(commands, e)
+
+ async def custom_execute(
+ self,
+ commands: List[Command],
+ timeout: float
+ ) -> List[Result]:
+ """Implement custom execution logic."""
+ # Your implementation here
+ pass
+```
+
+### Dispatcher Selection Logic
+
+Choose dispatcher based on session type:
+
+```python
+from ufo.module.dispatcher import LocalCommandDispatcher, WebSocketCommandDispatcher
+
+def attach_appropriate_dispatcher(session, context):
+ """Attach correct dispatcher based on session type."""
+
+ if isinstance(session, ServiceSession):
+ # Service session uses WebSocket
+ dispatcher = WebSocketCommandDispatcher(
+ session=session,
+ protocol=session.task_protocol
+ )
+ else:
+ # Interactive session uses local execution
+ mcp_manager = MCPServerManager()
+ dispatcher = LocalCommandDispatcher(
+ session=session,
+ mcp_server_manager=mcp_manager
+ )
+
+ context.attach_command_dispatcher(dispatcher)
+```
+
+---
+
+## Troubleshooting
+
+### Issue: Commands Timeout
+
+**Symptoms:**
+- Commands consistently timeout
+- `asyncio.TimeoutError` in logs
+- Error results with timeout messages
+
+**Diagnosis:**
+```python
+# Check timeout value
+results = await dispatcher.execute_commands(commands, timeout=30.0)
+
+# Enable debug logging
+logging.getLogger('ufo.module.dispatcher').setLevel(logging.DEBUG)
+```
+
+**Solutions:**
+1. Increase timeout for slow operations
+2. Check MCP server health (local dispatcher)
+3. Verify WebSocket connection (WebSocket dispatcher)
+4. Split batch into smaller groups
+
+### Issue: Connection Errors
+
+**Symptoms:**
+- Connection refused errors
+- WebSocket disconnection
+- MCP server not responding
+
+**Diagnosis:**
+```python
+# For LocalCommandDispatcher
+# Check MCP server status
+mcp_manager.check_server_health()
+
+# For WebSocketCommandDispatcher
+# Check WebSocket connection
+if protocol.is_connected():
+ print("WebSocket connected")
+else:
+ print("WebSocket disconnected")
+```
+
+**Solutions:**
+1. Restart MCP servers
+2. Reconnect WebSocket
+3. Check firewall/network settings
+4. Verify client is running
+
+### Issue: Wrong Dispatcher Used
+
+**Symptoms:**
+- Commands routed incorrectly
+- MCP tools called in service session
+- WebSocket messages in local session
+
+**Diagnosis:**
+```python
+# Check dispatcher type
+print(type(context.command_dispatcher))
+# Should be LocalCommandDispatcher or WebSocketCommandDispatcher
+
+# Check session type
+print(type(session))
+```
+
+**Solution:**
+Ensure correct dispatcher initialization in session `_init_context()`.
+
+---
+
+## Reference
+
+### BasicCommandDispatcher
+
+::: module.dispatcher.BasicCommandDispatcher
+
+### LocalCommandDispatcher
+
+::: module.dispatcher.LocalCommandDispatcher
+
+### WebSocketCommandDispatcher
+
+::: module.dispatcher.WebSocketCommandDispatcher
+
+---
+
+
+## See Also
+
+- [Context](./context.md) - State management and dispatcher attachment
+- [Session](./session.md) - Session lifecycle and dispatcher initialization
+- [AIP Protocol](../../aip/overview.md) - WebSocket message protocol
+- [MCP Integration](../../mcp/overview.md) - Local tool execution
+
diff --git a/documents/docs/infrastructure/modules/overview.md b/documents/docs/infrastructure/modules/overview.md
new file mode 100644
index 000000000..48c82d590
--- /dev/null
+++ b/documents/docs/infrastructure/modules/overview.md
@@ -0,0 +1,769 @@
+# Module System Overview
+
+The **Module System** is the core execution engine of UFO, orchestrating the complete lifecycle of user interactions from initial request to final completion. It manages sessions, rounds, context state, and command dispatch across both Windows and Linux platforms.
+
+**Quick Navigation:**
+
+- New to modules? Start with [Session](./session.md) and [Round](./round.md) basics
+- Understanding state? See [Context](./context.md) management
+- Command execution? Check [Dispatcher](./dispatcher.md) patterns
+
+---
+
+## Architecture Overview
+
+The module system implements a **hierarchical execution model** with clear separation of concerns:
+
+```mermaid
+graph TB
+ subgraph "User Interaction Layer"
+ UI[Interactor User I/O]
+ end
+
+ subgraph "Session Management Layer"
+ SF[SessionFactory Creates sessions]
+ SP[SessionPool Manages multiple sessions]
+ S[Session Conversation lifecycle]
+ end
+
+ subgraph "Execution Layer"
+ R[Round Single request handler]
+ C[Context Shared state]
+ end
+
+ subgraph "Command Layer"
+ D[Dispatcher Command routing]
+ LCD[LocalCommandDispatcher]
+ WSD[WebSocketCommandDispatcher]
+ end
+
+ subgraph "Platform Layer"
+ WS[WindowsBaseSession]
+ LS[LinuxBaseSession]
+ SS[ServiceSession]
+ end
+
+ UI -.Request.-> SF
+ SF --> SP
+ SP --> S
+ S --> R
+ R --> C
+ R --> D
+ D --> LCD
+ D --> WSD
+
+ S -.inherits.-> WS
+ S -.inherits.-> LS
+ S -.inherits.-> SS
+
+ style UI fill:#e1f5ff
+ style SF fill:#fff4e1
+ style SP fill:#f0ffe1
+ style S fill:#ffe1f5
+ style R fill:#e1ffe1
+ style C fill:#ffe1e1
+ style D fill:#f5e1ff
+```
+
+---
+
+## Core Components
+
+### 1. Session Management
+
+A **Session** represents a complete conversation between the user and UFO, potentially spanning multiple requests and rounds.
+
+**Session Hierarchy:**
+
+```mermaid
+classDiagram
+ class BaseSession {
+ <>
+ +task: str
+ +context: Context
+ +rounds: Dict[int, BaseRound]
+ +run()
+ +create_new_round()
+ +is_finished()
+ }
+
+ class WindowsBaseSession {
+ +host_agent: HostAgent
+ +_init_agents()
+ }
+
+ class LinuxBaseSession {
+ +agent: LinuxAgent
+ +_init_agents()
+ }
+
+ class Session {
+ +mode: str
+ +next_request()
+ }
+
+ class ServiceSession {
+ +task_protocol: TaskExecutionProtocol
+ +_init_context()
+ }
+
+ class LinuxSession {
+ +next_request()
+ }
+
+ class FollowerSession {
+ +plan_reader: PlanReader
+ }
+
+ BaseSession <|-- WindowsBaseSession
+ BaseSession <|-- LinuxBaseSession
+ WindowsBaseSession <|-- Session
+ WindowsBaseSession <|-- ServiceSession
+ LinuxBaseSession <|-- LinuxSession
+ Session <|-- FollowerSession
+```
+
+**Session Types:**
+
+| Session Type | Platform | Use Case | Communication |
+|--------------|----------|----------|---------------|
+| **Session** | Windows | Interactive mode | Local |
+| **ServiceSession** | Windows | Server-controlled | WebSocket (AIP) |
+| **LinuxSession** | Linux | Interactive mode | Local |
+| **LinuxServiceSession** | Linux | Server-controlled | WebSocket (AIP) |
+| **FollowerSession** | Windows | Plan execution | Local |
+| **FromFileSession** | Windows | Batch processing | Local |
+| **OpenAIOperatorSession** | Windows | Operator mode | Local |
+
+!!!example "Session Creation"
+ ```python
+ from ufo.module.session_pool import SessionFactory
+
+ # Create interactive Windows session
+ factory = SessionFactory()
+ sessions = factory.create_session(
+ task="email_task",
+ mode="normal",
+ plan="",
+ request="Open Outlook and send an email"
+ )
+
+ # Create Linux service session
+ linux_session = factory.create_service_session(
+ task="data_task",
+ should_evaluate=True,
+ id="session_001",
+ request="Process CSV files",
+ platform_override="linux"
+ )
+ ```
+
+---
+
+### 2. Round Execution
+
+A **Round** handles a single user request by orchestrating agents through a state machine, executing actions until completion.
+
+**Round Lifecycle:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> Created: Initialize Round
+ Created --> AgentHandle: agent.handle(context)
+ AgentHandle --> StateTransition: Determine next state
+ StateTransition --> AgentSwitch: Switch agent if needed
+ AgentSwitch --> SubtaskCheck: Check if subtask ends
+
+ SubtaskCheck --> CaptureSnapshot: Subtask complete
+ SubtaskCheck --> AgentHandle: Continue
+
+ CaptureSnapshot --> AgentHandle: Next subtask
+
+ AgentHandle --> RoundComplete: is_finished() = True
+ RoundComplete --> Evaluation: should_evaluate = True
+ RoundComplete --> [*]: should_evaluate = False
+ Evaluation --> [*]
+
+ note right of AgentHandle
+ Agent processes current state
+ Updates context
+ Executes actions
+ end note
+
+ note right of StateTransition
+ State pattern determines:
+ - Next state
+ - Next agent
+ - Round completion
+ end note
+```
+
+**Key Round Operations:**
+
+| Operation | Purpose | Trigger |
+|-----------|---------|---------|
+| `agent.handle(context)` | Process current state | Each iteration |
+| `state.next_state(agent)` | Determine next state | After handle |
+| `state.next_agent(agent)` | Switch agent if needed | After state transition |
+| `capture_last_snapshot()` | Save UI state | Subtask/Round end |
+| `evaluation()` | Assess completion | Round end (if enabled) |
+
+!!!warning "Round Termination Conditions"
+ A round finishes when:
+ - `state.is_round_end()` returns `True`
+ - Session step exceeds `ufo_config.system.max_step`
+ - Agent enters ERROR state
+
+---
+
+### 3. Context State Management
+
+**Context** is a type-safe key-value store that maintains state across all rounds in a session.
+
+**Context Architecture:**
+
+```mermaid
+graph LR
+ subgraph "Context Storage"
+ CN[ContextNames Enum]
+ CV[Context Values Dict]
+ end
+
+ subgraph "Tracked Data"
+ ID[Session/Round IDs]
+ ST[Steps & Costs]
+ LOG[Loggers]
+ APP[Application State]
+ CMD[Command Dispatcher]
+ end
+
+ subgraph "Access Patterns"
+ GET[context.get(key)]
+ SET[context.set(key, value)]
+ UPD[context.update_dict(key, dict)]
+ end
+
+ CN -.defines.-> CV
+ CV --> ID
+ CV --> ST
+ CV --> LOG
+ CV --> APP
+ CV --> CMD
+
+ GET -.reads.-> CV
+ SET -.writes.-> CV
+ UPD -.merges.-> CV
+
+ style CN fill:#e1f5ff
+ style CV fill:#fff4e1
+ style GET fill:#f0ffe1
+ style SET fill:#ffe1f5
+ style UPD fill:#f5e1ff
+```
+
+**Context Categories:**
+
+| Category | Context Names | Type | Purpose |
+|----------|---------------|------|---------|
+| **Identifiers** | `ID`, `CURRENT_ROUND_ID` | `int` | Session/round tracking |
+| **Execution State** | `SESSION_STEP`, `ROUND_STEP` | `int/dict` | Progress tracking |
+| **Cost Tracking** | `SESSION_COST`, `ROUND_COST` | `float/dict` | LLM API costs |
+| **Requests** | `REQUEST`, `SUBTASK`, `PREVIOUS_SUBTASKS` | `str/list` | Task information |
+| **Application** | `APPLICATION_WINDOW`, `APPLICATION_PROCESS_NAME` | `UIAWrapper/str` | UI automation |
+| **Logging** | `LOGGER`, `REQUEST_LOGGER`, `EVALUATION_LOGGER` | `FileWriter` | Log outputs |
+| **Communication** | `HOST_MESSAGE`, `CONTROL_REANNOTATION` | `list` | Agent messages |
+| **Infrastructure** | `command_dispatcher` | `BasicCommandDispatcher` | Command execution |
+
+!!!example "Context Usage Patterns"
+ ```python
+ from ufo.module.context import Context, ContextNames
+
+ # Initialize context
+ context = Context()
+
+ # Set values
+ context.set(ContextNames.REQUEST, "Open Notepad")
+ context.set(ContextNames.SESSION_STEP, 0)
+
+ # Get values
+ request = context.get(ContextNames.REQUEST) # "Open Notepad"
+ step = context.get(ContextNames.SESSION_STEP) # 0
+
+ # Update dictionaries (for round-specific tracking)
+ round_costs = {1: 0.05, 2: 0.03}
+ context.update_dict(ContextNames.ROUND_COST, round_costs)
+
+ # Auto-sync current round values
+ current_cost = context.current_round_cost # Auto-synced
+ ```
+
+---
+
+### 4. Command Dispatching
+
+**Dispatchers** route commands to execution environments (local MCP tools or remote WebSocket clients) and handle result delivery.
+
+**Dispatcher Architecture:**
+
+```mermaid
+graph TB
+ subgraph "Agent Layer"
+ AG[Agent generates commands]
+ end
+
+ subgraph "Dispatcher Layer"
+ BD[BasicCommandDispatcher Abstract base]
+ LCD[LocalCommandDispatcher MCP tools]
+ WSD[WebSocketCommandDispatcher AIP protocol]
+ end
+
+ subgraph "Execution Layer"
+ CR[CommandRouter]
+ CM[ComputerManager]
+ MCP[MCP Servers]
+ WS[WebSocket Client]
+ end
+
+ subgraph "Result Handling"
+ RES[Results: List~Result~]
+ ERR[Error Results]
+ end
+
+ AG --> BD
+ BD -.implements.-> LCD
+ BD -.implements.-> WSD
+
+ LCD --> CR
+ CR --> CM
+ CM --> MCP
+
+ WSD --> WS
+
+ MCP --> RES
+ WS --> RES
+ LCD --> ERR
+ WSD --> ERR
+
+ style AG fill:#e1f5ff
+ style BD fill:#fff4e1
+ style LCD fill:#f0ffe1
+ style WSD fill:#ffe1f5
+ style RES fill:#e1ffe1
+```
+
+**Dispatcher Comparison:**
+
+| Dispatcher | Use Case | Communication | Error Handling | Timeout |
+|------------|----------|---------------|----------------|---------|
+| **LocalCommandDispatcher** | Interactive sessions | Direct MCP calls | Generates error Results | 6000s |
+| **WebSocketCommandDispatcher** | Service sessions | AIP protocol messages | Generates error Results | 6000s |
+
+!!!example "Command Dispatch Flow"
+ ```python
+ from aip.messages import Command
+
+ # Create commands
+ commands = [
+ Command(
+ tool_name="click_element",
+ parameters={"control_label": "1", "button": "left"},
+ tool_type="windows"
+ )
+ ]
+
+ # Execute via dispatcher (attached to context)
+ results = await context.command_dispatcher.execute_commands(
+ commands=commands,
+ timeout=30.0
+ )
+
+ # Process results
+ for result in results:
+ if result.status == ResultStatus.SUCCESS:
+ print(f"Action succeeded: {result.result}")
+ else:
+ print(f"Action failed: {result.error}")
+ ```
+
+---
+
+### 5. User Interaction
+
+**Interactor** provides rich CLI experiences for user input with styled prompts, panels, and confirmations.
+
+**Interaction Flows:**
+
+```mermaid
+sequenceDiagram
+ participant U as User
+ participant I as Interactor
+ participant S as Session
+ participant R as Round
+
+ U->>I: Start UFO
+ I->>I: first_request()
+ I-->>U: 🛸 Welcome Panel
+ U->>I: "Open Notepad"
+ I->>S: Initial request
+
+ S->>R: Create Round 1
+ R->>R: Execute...
+ R-->>S: Round complete
+
+ S->>I: new_request()
+ I-->>U: 🛸 Next Request Panel
+ U->>I: "Type hello"
+ I->>S: Next request
+
+ S->>R: Create Round 2
+ R->>R: Execute...
+ R-->>S: Round complete
+
+ S->>I: new_request()
+ I-->>U: 🛸 Next Request Panel
+ U->>I: "N"
+ I-->>U: 👋 Goodbye Panel
+ I->>S: complete=True
+ S->>S: Terminate
+
+ S->>I: experience_asker()
+ I-->>U: 💾 Save Experience Panel
+ U->>I: Yes
+ I->>S: Save experience
+```
+
+**Interactor Functions:**
+
+| Function | Purpose | Returns | Example UI |
+|----------|---------|---------|-----------|
+| `first_request()` | Initial request prompt | `str` | 🛸 Welcome Panel with examples |
+| `new_request()` | Subsequent requests | `Tuple[str, bool]` | 🛸 Next Request Panel |
+| `experience_asker()` | Save experience prompt | `bool` | 💾 Learning & Memory Panel |
+| `question_asker()` | Collect information | `str` | 🤔 Numbered Question Panel |
+| `sensitive_step_asker()` | Security confirmation | `bool` | 🔒 Security Check Panel |
+
+!!!example "Styled User Prompts"
+ ```python
+ from ufo.module import interactor
+
+ # First interaction with rich welcome
+ request = interactor.first_request()
+ # Shows:
+ # ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+ # ┃ 🛸 UFO Assistant ┃
+ # ┃ 🚀 Welcome to UFO - Your AI Assistant ┃
+ # ┃ ...examples... ┃
+ # ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+
+ # Get next request
+ request, complete = interactor.new_request()
+ if complete:
+ print("User exited")
+
+ # Ask for permission on sensitive actions
+ proceed = interactor.sensitive_step_asker(
+ action="Delete file",
+ control_text="important.docx"
+ )
+ if not proceed:
+ print("Action cancelled by user")
+ ```
+
+---
+
+### 6. Session Factory & Pool
+
+**SessionFactory** creates platform-specific sessions based on mode and configuration, while **SessionPool** manages batch execution.
+
+**Factory Creation Logic:**
+
+```mermaid
+graph TB
+ START[SessionFactory.create_session]
+
+ PLATFORM{Platform?}
+ MODE{Mode?}
+
+ WNORMAL[Windows Session]
+ WSERVICE[Windows ServiceSession]
+ WFOLLOWER[Windows FollowerSession]
+ WBATCH[Windows FromFileSession]
+ WOPERATOR[Windows OpenAIOperatorSession]
+
+ LNORMAL[Linux Session]
+ LSERVICE[Linux ServiceSession]
+
+ START --> PLATFORM
+ PLATFORM -->|Windows| MODE
+ PLATFORM -->|Linux| MODE
+
+ MODE -->|normal| WNORMAL
+ MODE -->|service| WSERVICE
+ MODE -->|follower| WFOLLOWER
+ MODE -->|batch_normal| WBATCH
+ MODE -->|operator| WOPERATOR
+
+ MODE -->|normal Linux| LNORMAL
+ MODE -->|service Linux| LSERVICE
+
+ style START fill:#e1f5ff
+ style PLATFORM fill:#fff4e1
+ style MODE fill:#f0ffe1
+ style WNORMAL fill:#ffe1f5
+ style LNORMAL fill:#ffe1f5
+```
+
+**Session Modes:**
+
+| Mode | Platform | Description | Input | Evaluation |
+|------|----------|-------------|-------|------------|
+| **normal** | Both | Interactive single-task | User input | Optional |
+| **service** | Both | WebSocket-controlled | Remote request | Optional |
+| **follower** | Windows | Replay recorded plan | Plan JSON file | Optional |
+| **batch_normal** | Windows | Multiple tasks from files | JSON folder | Per-task |
+| **operator** | Windows | OpenAI Operator API | User input | Optional |
+| **normal_operator** | Both | Interactive with operator | User input | Optional |
+
+!!!example "SessionFactory Usage"
+ ```python
+ from ufo.module.session_pool import SessionFactory, SessionPool
+
+ factory = SessionFactory()
+
+ # Interactive Windows session
+ sessions = factory.create_session(
+ task="task1",
+ mode="normal",
+ plan="",
+ request="Open calculator"
+ )
+
+ # Batch Windows sessions from folder
+ batch_sessions = factory.create_session(
+ task="batch_task",
+ mode="batch_normal",
+ plan="./plans/", # Folder with multiple .json files
+ request=""
+ )
+
+ # Run all sessions
+ pool = SessionPool(batch_sessions)
+ await pool.run_all()
+ ```
+
+---
+
+## Cross-Platform Support
+
+The module system provides a unified API while allowing platform-specific behavior through inheritance.
+
+**Platform Differences:**
+
+| Aspect | Windows | Linux |
+|--------|---------|-------|
+| **Agent Architecture** | HostAgent → AppAgent (two-tier) | LinuxAgent (single-tier) |
+| **HostAgent** | ✅ Used for planning | ❌ Not used |
+| **Session Base** | `WindowsBaseSession` | `LinuxBaseSession` |
+| **UI Automation** | UIA (pywinauto) | Custom automation |
+| **Service Mode** | `ServiceSession` | `LinuxServiceSession` |
+| **Evaluation** | ✅ Full support | ⚠️ Limited |
+| **Markdown Logs** | ✅ Supported | ⚠️ Planned |
+
+!!!example "Platform Detection"
+ ```python
+ import platform
+
+ # Auto-detect platform
+ current_platform = platform.system().lower() # 'windows' or 'linux'
+
+ # Override platform
+ sessions = factory.create_session(
+ task="cross_platform_task",
+ mode="normal",
+ plan="",
+ request="List files",
+ platform_override="linux" # Force Linux session
+ )
+ ```
+
+---
+
+## Execution Flow
+
+Understanding how components interact during a complete user request:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Interactor
+ participant SessionFactory
+ participant Session
+ participant Round
+ participant Context
+ participant Agent
+ participant Dispatcher
+ participant MCP
+
+ User->>Interactor: Start UFO
+ Interactor->>User: Show welcome, ask request
+ User->>Interactor: "Open Notepad and type Hello"
+
+ Interactor->>SessionFactory: create_session(request)
+ SessionFactory->>Session: __init__(task, request)
+ Session->>Context: Initialize context
+ Session->>Agent: Initialize agents
+
+ Session->>Session: run()
+ loop Until is_finished()
+ Session->>Round: create_new_round()
+ Round->>Context: Initialize round context
+
+ loop Until round.is_finished()
+ Round->>Agent: handle(context)
+ Agent->>Agent: Process current state
+ Agent->>Dispatcher: execute_commands([cmd1, cmd2])
+ Dispatcher->>MCP: Route to MCP tools
+ MCP-->>Dispatcher: Results
+ Dispatcher-->>Agent: Results
+ Agent->>Context: Update state
+
+ Round->>Agent: Transition to next state
+ Round->>Agent: Switch agent if needed
+ end
+
+ Round->>Round: capture_last_snapshot()
+ Round-->>Session: Round complete
+
+ Session->>Interactor: new_request()
+ Interactor->>User: Continue or exit?
+ User->>Interactor: "N"
+ end
+
+ Session->>Session: evaluation()
+ Session->>Interactor: experience_asker()
+ Interactor->>User: Save experience?
+ User->>Interactor: Yes
+ Session->>Session: experience_saver()
+ Session-->>User: Session complete
+```
+
+---
+
+## File Structure
+
+```
+ufo/module/
+├── __init__.py
+├── basic.py # BaseSession, BaseRound, FileWriter
+├── context.py # Context, ContextNames
+├── dispatcher.py # Command dispatchers
+├── interactor.py # User interaction functions
+├── session_pool.py # SessionFactory, SessionPool
+└── sessions/
+ ├── __init__.py
+ ├── platform_session.py # WindowsBaseSession, LinuxBaseSession
+ ├── session.py # Session, FollowerSession, FromFileSession
+ ├── service_session.py # ServiceSession
+ ├── linux_session.py # LinuxSession, LinuxServiceSession
+ └── plan_reader.py # PlanReader for follower mode
+```
+
+---
+
+## Key Design Patterns
+
+### 1. State Pattern
+
+Agents use the State pattern to manage transitions and determine control flow.
+
+```python
+# Agent state determines:
+next_state = agent.state.next_state(agent)
+next_agent = agent.state.next_agent(agent)
+is_done = agent.state.is_round_end()
+```
+
+### 2. Factory Pattern
+
+SessionFactory creates appropriate session types based on platform and mode.
+
+### 3. Command Pattern
+
+Commands encapsulate actions with parameters, enabling async execution and result tracking.
+
+### 4. Observer Pattern
+
+Context changes notify dependent components (implicit through shared state).
+
+---
+
+## Best Practices
+
+!!!tip "Session Management"
+ - ✅ Always initialize context before creating rounds
+ - ✅ Use `SessionFactory` for session creation (handles platform differences)
+ - ✅ Attach command dispatcher to context early
+ - ✅ Call `context._sync_round_values()` before accessing round-specific data
+ - ❌ Don't access round context before round initialization
+
+!!!tip "Round Execution"
+ - ✅ Let the state machine control agent transitions
+ - ✅ Capture snapshots at subtask boundaries
+ - ✅ Check `is_finished()` before each iteration
+ - ❌ Don't bypass state transitions
+ - ❌ Don't manually manipulate agent states
+
+!!!tip "Context Usage"
+ - ✅ Use `ContextNames` enum for type-safe access
+ - ✅ Update dictionaries with `update_dict()` for merging
+ - ✅ Use properties (`current_round_cost`) for auto-synced values
+ - ❌ Don't directly access `_context` dictionary
+ - ❌ Don't store non-serializable objects without marking them
+
+!!!tip "Command Dispatch"
+ - ✅ Always await `execute_commands()` (async)
+ - ✅ Handle timeout exceptions gracefully
+ - ✅ Check `ResultStatus` before using results
+ - ❌ Don't ignore error results
+ - ❌ Don't assume commands succeed
+
+---
+
+## Configuration
+
+Key configuration options from `ufo_config`:
+
+| Setting | Location | Default | Purpose |
+|---------|----------|---------|---------|
+| `max_step` | `system.max_step` | 50 | Max steps per session |
+| `max_round` | `system.max_round` | 10 | Max rounds per session |
+| `eva_session` | `system.eva_session` | `True` | Evaluate session |
+| `eva_round` | `system.eva_round` | `False` | Evaluate each round |
+| `save_experience` | `system.save_experience` | `"ask"` | When to save experience |
+| `log_to_markdown` | `system.log_to_markdown` | `True` | Generate markdown logs |
+| `save_ui_tree` | `system.save_ui_tree` | `True` | Save UI tree snapshots |
+
+---
+
+## Documentation Index
+
+| Document | Description |
+|----------|-------------|
+| [Session](./session.md) | Session lifecycle and management |
+| [Round](./round.md) | Round execution and orchestration |
+| [Context](./context.md) | State management and context names |
+| [Dispatcher](./dispatcher.md) | Command routing and execution |
+| [Session Pool](./session_pool.md) | Factory and batch management |
+| [Platform Sessions](./platform_sessions.md) | Windows/Linux implementations |
+
+---
+
+## Next Steps
+
+**Learning Path:**
+
+1. **Understand Sessions**: Read [Session](./session.md) to grasp the conversation model
+2. **Learn Rounds**: Study [Round](./round.md) to understand action execution
+3. **Master Context**: Review [Context](./context.md) for state management
+4. **Explore Dispatch**: Check [Dispatcher](./dispatcher.md) for command execution
+5. **Platform Specifics**: See [Platform Sessions](./platform_sessions.md) for Windows/Linux differences
diff --git a/documents/docs/infrastructure/modules/platform_sessions.md b/documents/docs/infrastructure/modules/platform_sessions.md
new file mode 100644
index 000000000..0d4d12e7b
--- /dev/null
+++ b/documents/docs/infrastructure/modules/platform_sessions.md
@@ -0,0 +1,500 @@
+# Platform-Specific Sessions
+
+**WindowsBaseSession** and **LinuxBaseSession** provide platform-specific base classes with fundamentally different agent architectures: Windows uses two-tier (HostAgent + AppAgent), while Linux uses single-tier (LinuxAgent only).
+
+**Quick Reference:**
+
+- Windows sessions? See [WindowsBaseSession](#windowsbasesession)
+- Linux sessions? See [LinuxBaseSession](#linuxbasesession)
+- Differences? See [Architecture Comparison](#architecture-comparison)
+- Choosing platform? See [Platform Selection](#platform-selection)
+
+---
+
+## Overview
+
+Platform-specific base classes abstract OS-level differences:
+
+- **WindowsBaseSession**: Two-tier agent architecture with HostAgent coordination
+- **LinuxBaseSession**: Single-tier architecture with direct LinuxAgent control
+
+### Inheritance Hierarchy
+
+```mermaid
+graph TB
+ BASE[BaseSession Abstract Base]
+
+ WIN_BASE[WindowsBaseSession Windows Platform]
+ LINUX_BASE[LinuxBaseSession Linux Platform]
+
+ SESSION[Session]
+ SERVICE[ServiceSession]
+ FOLLOWER[FollowerSession]
+ FROMFILE[FromFileSession]
+ OPERATOR[OpenAIOperatorSession]
+
+ LINUX_SESS[LinuxSession]
+ LINUX_SERVICE[LinuxServiceSession]
+
+ BASE --> WIN_BASE
+ BASE --> LINUX_BASE
+
+ WIN_BASE --> SESSION
+ WIN_BASE --> SERVICE
+ WIN_BASE --> FOLLOWER
+ WIN_BASE --> FROMFILE
+ WIN_BASE --> OPERATOR
+
+ LINUX_BASE --> LINUX_SESS
+ LINUX_BASE --> LINUX_SERVICE
+
+ style BASE fill:#e1f5ff
+ style WIN_BASE fill:#fff4e1
+ style LINUX_BASE fill:#f0ffe1
+ style SESSION fill:#e1ffe1
+ style LINUX_SESS fill:#e1ffe1
+```
+
+---
+
+## WindowsBaseSession
+
+Windows sessions use **HostAgent** for application selection and task planning, then **AppAgent** for in-application execution. This provides a two-tier agent architecture.
+
+### Agent Initialization
+
+```python
+def _init_agents(self) -> None:
+ """Initialize Windows-specific agents, including the HostAgent."""
+
+ self._host_agent: HostAgent = AgentFactory.create_agent(
+ "host",
+ "HostAgent",
+ ufo_config.host_agent.visual_mode,
+ ufo_config.system.HOSTAGENT_PROMPT,
+ ufo_config.system.HOSTAGENT_EXAMPLE_PROMPT,
+ ufo_config.system.API_PROMPT,
+ )
+```
+
+**What's Created:**
+
+| Component | Type | Purpose |
+|-----------|------|---------|
+| `_host_agent` | `HostAgent` | Application selection and task coordination |
+| Visual Mode | `bool` | Enable screenshot-based reasoning |
+| Prompts | `str` | HostAgent behavior templates |
+
+### Two-Tier Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant U as User
+ participant S as WindowsBaseSession
+ participant H as HostAgent
+ participant A as AppAgent
+ participant UI as Windows UI
+
+ U->>S: Request: "Send email to John"
+ S->>H: Initialize HostAgent
+ H->>H: Observe desktop
+ H->>UI: Screenshot desktop
+ UI-->>H: Desktop image
+
+ H->>H: LLM Decision
+ Note over H: "Best app: Outlook"
+
+ H->>S: Select application: Outlook
+ S->>A: Create AppAgent for Outlook
+
+ A->>UI: Observe Outlook window
+ UI-->>A: Outlook screenshot + controls
+
+ A->>A: LLM Planning
+ Note over A: Plan: Click "New Email" Type recipient Type subject Click "Send"
+
+ loop Execute plan steps
+ A->>UI: Execute command
+ UI-->>A: Result
+ end
+
+ A->>S: Task complete
+ S->>U: Email sent
+```
+
+### Agent Switching Logic
+
+**HostAgent selects applications:**
+
+```python
+# HostAgent decision
+selected_app = host_agent.handle(context)
+# Result: "Outlook"
+
+# Session switches to AppAgent
+app_agent = create_app_agent("Outlook")
+context.set(ContextNames.APPLICATION_PROCESS_NAME, "OUTLOOK.EXE")
+```
+
+**AppAgent may request HostAgent:**
+
+```python
+# AppAgent realizes need different app
+if need_different_app:
+ # Switch back to HostAgent
+ agent = host_agent
+ # HostAgent selects new app
+```
+
+### Reset Behavior
+
+```python
+def reset(self):
+ """Reset the session state for a new session."""
+ self._host_agent.set_state(self._host_agent.default_state)
+```
+
+**Reset restores:**
+- HostAgent to initial state
+- Clears previous application selections
+- Ready for new task
+
+---
+
+## LinuxBaseSession
+
+Linux sessions use **LinuxAgent** directly without HostAgent intermediary, providing simpler but less flexible architecture. This is a single-tier model.
+
+### Agent Initialization
+
+```python
+def _init_agents(self) -> None:
+ """Initialize Linux-specific agents."""
+
+ # No host agent for Linux
+ self._host_agent = None
+
+ # Create LinuxAgent directly
+ self._agent: LinuxAgent = AgentFactory.create_agent(
+ "LinuxAgent",
+ "LinuxAgent",
+ ufo_config.system.third_party_agent_config["LinuxAgent"]["APPAGENT_PROMPT"],
+ ufo_config.system.third_party_agent_config["LinuxAgent"]["APPAGENT_EXAMPLE_PROMPT"],
+ )
+```
+
+**What's Created:**
+
+| Component | Type | Purpose |
+|-----------|------|---------|
+| `_host_agent` | `None` | **Not used in Linux** |
+| `_agent` | `LinuxAgent` | Direct application control |
+| Prompts | `str` | LinuxAgent behavior templates |
+
+### Single-Tier Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant U as User
+ participant S as LinuxBaseSession
+ participant L as LinuxAgent
+ participant UI as Linux UI
+
+ U->>S: Request: "Open gedit and type Hello"
+ S->>L: Initialize LinuxAgent
+
+ L->>UI: Observe desktop
+ UI-->>L: Desktop state
+
+ L->>L: LLM Decision
+ Note over L: "Launch gedit Type text"
+
+ L->>UI: Execute: launch gedit
+ UI-->>L: gedit opened
+
+ L->>UI: Execute: type "Hello"
+ UI-->>L: Text typed
+
+ L->>S: Task complete
+ S->>U: Done
+```
+
+**No Agent Switching:**
+
+- LinuxAgent handles entire workflow
+- Application specified upfront or agent decides
+- Simpler execution model
+
+### Feature Limitations
+
+Some methods are not yet implemented:
+
+```python
+def evaluation(self) -> None:
+ """Evaluation logic for Linux sessions."""
+ self.logger.warning("Evaluation not yet implemented for Linux sessions.")
+ pass
+
+def save_log_to_markdown(self) -> None:
+ """Save the log of the session to markdown file."""
+ self.logger.warning("Markdown logging not yet implemented for Linux sessions.")
+ pass
+```
+
+!!!warning "Coming Soon"
+ Full evaluation and markdown logging support for Linux sessions is planned for future releases.
+
+### Reset Behavior
+
+```python
+def reset(self) -> None:
+ """Reset the session state for a new session."""
+ self._agent.set_state(self._agent.default_state)
+```
+
+**Reset restores:**
+- LinuxAgent to initial state
+- Ready for new task
+
+---
+
+## Architecture Comparison
+
+### High-Level Differences
+
+```mermaid
+graph TB
+ subgraph "Windows Architecture (Two-Tier)"
+ WIN_USER[User Request]
+ WIN_HOST[HostAgent Application Selector]
+ WIN_APP1[AppAgent Word]
+ WIN_APP2[AppAgent Excel]
+ WIN_APP3[AppAgent Outlook]
+
+ WIN_USER --> WIN_HOST
+ WIN_HOST -->|Select app| WIN_APP1
+ WIN_HOST -->|Switch app| WIN_APP2
+ WIN_HOST -->|Switch app| WIN_APP3
+ end
+
+ subgraph "Linux Architecture (Single-Tier)"
+ LINUX_USER[User Request]
+ LINUX_AGENT[LinuxAgent Direct Control]
+ LINUX_APP[gedit/firefox/etc]
+
+ LINUX_USER --> LINUX_AGENT
+ LINUX_AGENT --> LINUX_APP
+ end
+
+ style WIN_HOST fill:#fff4e1
+ style WIN_APP1 fill:#e1ffe1
+ style LINUX_AGENT fill:#f0ffe1
+```
+
+### Feature Matrix
+
+| Feature | Windows | Linux | Notes |
+|---------|---------|-------|-------|
+| **HostAgent** | ✅ Yes | ❌ No | Windows uses HostAgent for app selection |
+| **AppAgent** | ✅ Yes | ❌ No | Windows creates AppAgent per application |
+| **LinuxAgent** | ❌ No | ✅ Yes | Linux uses LinuxAgent directly |
+| **Agent Switching** | ✅ Yes | ❌ No | Windows can switch between apps mid-task |
+| **Multi-App Tasks** | ✅ Native | ⚠️ Limited | Windows handles multi-app naturally |
+| **Execution Modes** | ✅ All 7 | ⚠️ 3 modes | Windows supports all modes |
+| **Evaluation** | ✅ Yes | 🚧 Planned | Linux evaluation in development |
+| **Markdown Logs** | ✅ Yes | 🚧 Planned | Linux markdown logging in development |
+| **UI Automation** | UIA | Platform tools | Different automation backends |
+
+### Execution Comparison
+
+**Windows Multi-Application Task:**
+
+```python
+# Request: "Copy data from Excel to Word"
+
+# Round 1
+HostAgent: Select Excel → AppAgent(Excel): Copy data
+# Round 2
+HostAgent: Select Word → AppAgent(Word): Paste data
+
+# Agent switching handled automatically
+```
+
+**Linux Single-Application Task:**
+
+```python
+# Request: "Open gedit and type text"
+
+# Single round
+LinuxAgent: Launch gedit → Type text
+
+# No agent switching, direct execution
+```
+
+---
+
+## Platform Selection
+
+### Automatic Detection
+
+SessionFactory automatically detects platform:
+
+```python
+from ufo.module.session_pool import SessionFactory
+import platform
+
+factory = SessionFactory()
+
+# Auto-detects: "windows" or "linux"
+sessions = factory.create_session(
+ task="cross_platform_task",
+ mode="normal",
+ plan="",
+ request="Open text editor"
+)
+
+# Correct base class automatically selected:
+# - Windows: Session extends WindowsBaseSession
+# - Linux: LinuxSession extends LinuxBaseSession
+```
+
+### Manual Override
+
+For testing or special cases:
+
+```python
+# Force Windows session on Linux machine
+sessions = factory.create_session(
+ task="test_task",
+ mode="normal",
+ plan="",
+ request="Test request",
+ platform_override="windows"
+)
+
+# Force Linux session on Windows machine
+sessions = factory.create_session(
+ task="test_task",
+ mode="normal",
+ plan="",
+ request="Test request",
+ platform_override="linux"
+)
+```
+
+!!!warning "Override Use Cases"
+ Only use `platform_override` for:
+ - Testing cross-platform code
+ - Development without target OS
+ - Generating plans for other platforms
+
+ Never use in production!
+
+---
+
+## Migration Guide
+
+### Porting Tasks Windows → Linux
+
+**Considerations:**
+
+1. **No HostAgent**: Specify application upfront or in request
+2. **Single-tier**: Cannot switch applications mid-task
+3. **Limited modes**: Only `normal`, `normal_operator`, `service`
+
+**Example:**
+
+**Windows Request:**
+```python
+"Send an email to John and create a calendar event"
+# HostAgent selects Outlook → AppAgent sends email
+# HostAgent switches to Calendar → AppAgent creates event
+```
+
+**Linux Request (Split):**
+```python
+# Request 1: Email only
+"Send an email to John using Thunderbird"
+# LinuxAgent(Thunderbird): Send email
+
+# Request 2: Calendar separately
+"Create a calendar event in GNOME Calendar"
+# LinuxAgent(Calendar): Create event
+```
+
+### Configuration Differences
+
+**Windows Configuration:**
+
+```yaml
+# config/ufo/config.yaml
+host_agent:
+ visual_mode: true
+system:
+ HOSTAGENT_PROMPT: "prompts/host_agent.yaml"
+ APPAGENT_PROMPT: "prompts/app_agent.yaml"
+```
+
+**Linux Configuration:**
+
+```yaml
+# config/ufo/config.yaml
+system:
+ third_party_agent_config:
+ LinuxAgent:
+ APPAGENT_PROMPT: "prompts/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "prompts/linux_examples.yaml"
+```
+
+---
+
+## Best Practices
+
+### Windows Sessions
+
+!!!tip "Leverage Two-Tier Architecture"
+ - ✅ Use HostAgent for complex multi-app workflows
+ - ✅ Let HostAgent decide application selection
+ - ✅ Design tasks that benefit from app switching
+ - ❌ Don't micromanage app selection
+ - ❌ Don't bypass HostAgent for multi-app tasks
+
+### Linux Sessions
+
+!!!success "Work Within Single-Tier Model"
+ - ✅ Specify application in request if known
+ - ✅ Keep tasks focused on single application
+ - ✅ Split multi-app workflows into multiple sessions
+ - ❌ Don't expect automatic app switching
+ - ❌ Don't assume HostAgent features available
+
+### Cross-Platform Development
+
+!!!warning "Platform Awareness"
+ - ✅ Test on both platforms if deploying cross-platform
+ - ✅ Use platform detection, not hardcoded assumptions
+ - ✅ Handle platform-specific features gracefully
+ - ✅ Document platform limitations
+ - ❌ Don't assume identical behavior
+ - ❌ Don't use platform_override in production
+
+---
+
+## Reference
+
+### WindowsBaseSession
+
+::: module.sessions.platform_session.WindowsBaseSession
+
+### LinuxBaseSession
+
+::: module.sessions.platform_session.LinuxBaseSession
+
+---
+
+## See Also
+
+- [Session](./session.md) - Session lifecycle and types
+- [Session Factory](./session_pool.md) - Platform-aware session creation
+- [Overview](./overview.md) - Module system architecture
+- [Round](./round.md) - Agent orchestration in rounds
diff --git a/documents/docs/infrastructure/modules/round.md b/documents/docs/infrastructure/modules/round.md
new file mode 100644
index 000000000..10c9c46c2
--- /dev/null
+++ b/documents/docs/infrastructure/modules/round.md
@@ -0,0 +1,698 @@
+# Round
+
+A **Round** is a single request-response cycle within a Session, orchestrating agents through a state machine to execute commands until the user's request is fulfilled.
+
+**Quick Reference:**
+
+- Lifecycle? See [Round Lifecycle](#round-lifecycle)
+- State machine? See [State Machine](#state-machine)
+- Agent switching? See [Agent Orchestration](#agent-orchestration)
+- Snapshots? See [Snapshot Capture](#snapshot-capture)
+
+---
+
+## Overview
+
+A `Round` represents one complete request-response interaction:
+
+- **Input**: User request (e.g., "Send an email to John")
+- **Processing**: Agent state machine execution
+- **Output**: Request fulfilled or error state
+
+### Round in Context
+
+```mermaid
+graph TB
+ subgraph "Session Scope"
+ SESS[Session]
+ REQ1[Request 1]
+ REQ2[Request 2]
+ REQ3[Request 3]
+ end
+
+ subgraph "Round Scope (One Request)"
+ ROUND[Round Instance]
+ CTX[Shared Context]
+ INIT[Initialize]
+ LOOP[Execution Loop]
+ FINISH[Finish Condition]
+ end
+
+ subgraph "Execution Loop Detail"
+ HANDLE[agent.handle Generate & Execute]
+ NEXT_STATE[next_state State Transition]
+ NEXT_AGENT[next_agent Agent Switching]
+ SUBTASK{Subtask End?}
+ SNAPSHOT[capture_last_snapshot]
+ end
+
+ SESS --> REQ1
+ SESS --> REQ2
+ SESS --> REQ3
+
+ REQ1 --> ROUND
+ ROUND --> CTX
+ ROUND --> INIT
+ INIT --> LOOP
+
+ LOOP --> HANDLE
+ HANDLE --> NEXT_STATE
+ NEXT_STATE --> NEXT_AGENT
+ NEXT_AGENT --> SUBTASK
+
+ SUBTASK -->|Yes| SNAPSHOT
+ SNAPSHOT --> FINISH
+ SUBTASK -->|No| FINISH
+
+ FINISH -->|Not finished| HANDLE
+ FINISH -->|Finished| REQ2
+
+ style ROUND fill:#e1f5ff
+ style HANDLE fill:#f0ffe1
+ style SNAPSHOT fill:#fff4e1
+ style FINISH fill:#ffe1f5
+```
+
+---
+
+## Round Lifecycle
+
+### State Machine Overview
+
+```mermaid
+stateDiagram-v2
+ [*] --> Initialized: create_new_round()
+ Initialized --> Running: run()
+
+ Running --> AgentHandle: agent.handle(context)
+ AgentHandle --> StateTransition: generate actions
+ StateTransition --> AgentSwitch: determine next
+ AgentSwitch --> SubtaskCheck: update agent
+
+ SubtaskCheck --> CaptureSnapshot: if subtask_end
+ SubtaskCheck --> FinishCheck: if not subtask_end
+ CaptureSnapshot --> FinishCheck: snapshot saved
+
+ FinishCheck --> AgentHandle: not finished
+ FinishCheck --> FinalSnapshot: finished
+
+ FinalSnapshot --> Evaluation: if enabled
+ Evaluation --> [*]: round complete
+ FinalSnapshot --> [*]: skip evaluation
+```
+
+### Core Execution Loop
+
+```python
+async def run(self) -> None:
+ """
+ Run the round asynchronously.
+ """
+
+ while not self.is_finished():
+ # 1. Agent processes current state
+ await self.agent.handle(self.context)
+
+ # 2. State machine transitions
+ self.state = self.agent.state.next_state(self.agent)
+
+ # 3. Agent switching (HostAgent ↔ AppAgent)
+ self.agent = self.agent.state.next_agent(self.agent)
+ self.agent.set_state(self.state)
+
+ # 4. Snapshot capture at subtask boundaries
+ if self.state.is_subtask_end():
+ time.sleep(configs["SLEEP_TIME"])
+ await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
+ self.subtask_amount += 1
+
+ # 5. Add request to blackboard
+ self.agent.blackboard.add_requests(
+ {f"request_{self.id}": self.request}
+ )
+
+ # 6. Final snapshot
+ if self.application_window is not None:
+ await self.capture_last_snapshot()
+
+ # 7. Evaluation (optional)
+ if self._should_evaluate:
+ await self.evaluation()
+```
+
+---
+
+## Lifecycle Stages
+
+### 1. Initialization
+
+Created by session's `create_new_round()`:
+
+```python
+round = Round(
+ task="email_task",
+ context=session.context,
+ request="Send an email to John",
+ id=0 # Round number
+)
+```
+
+**Initialization sets:**
+
+| Property | Source | Description |
+|----------|--------|-------------|
+| `task` | Session | Task name for logging |
+| `context` | Session | Shared context object |
+| `request` | User input | Natural language request |
+| `id` | Round counter | Sequential round number |
+| `agent` | Initial agent | Usually HostAgent (Windows) or LinuxAgent |
+| `state` | Initial state | Usually START state |
+
+### 2. Agent Handle
+
+Each loop iteration calls `agent.handle(context)`:
+
+```python
+await self.agent.handle(self.context)
+```
+
+**What happens:**
+
+1. **Observation**: Agent observes UI state
+2. **Reasoning**: LLM generates plan and actions
+3. **Action**: Commands sent to dispatcher
+4. **Execution**: Commands executed locally or remotely
+5. **Results**: Results stored in context
+
+**Example Flow:**
+
+```mermaid
+sequenceDiagram
+ participant R as Round
+ participant A as Agent (HostAgent)
+ participant LLM as Language Model
+ participant D as Dispatcher
+ participant UI as UI System
+
+ R->>A: handle(context)
+ A->>UI: Observe desktop
+ UI-->>A: Screenshot + control tree
+
+ A->>LLM: Generate plan
+ Note over LLM: Request: "Send email to John" Observation: Desktop with Outlook icon
+ LLM-->>A: Action: open_application("Outlook")
+
+ A->>D: execute_commands([open_app_cmd])
+ D->>UI: Click Outlook icon
+ UI-->>D: Result: Outlook opened
+ D-->>A: ResultStatus.SUCCESS
+
+ A->>R: Update context with results
+```
+
+### 3. State Transition
+
+After agent handling, state machine transitions:
+
+```python
+self.state = self.agent.state.next_state(self.agent)
+```
+
+**State Transitions:**
+
+| Current State | Condition | Next State |
+|---------------|-----------|------------|
+| **START** | Initial | **CONTINUE** |
+| **CONTINUE** | More actions needed | **CONTINUE** |
+| **CONTINUE** | Task complete | **FINISH** |
+| **CONTINUE** | Error occurred | **ERROR** |
+| **FINISH** | Always | Round ends |
+| **ERROR** | Always | Round ends |
+
+**State Diagram:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> START
+ START --> CONTINUE: First action
+ CONTINUE --> CONTINUE: More actions
+ CONTINUE --> FINISH: Task complete
+ CONTINUE --> ERROR: Error occurred
+ FINISH --> [*]
+ ERROR --> [*]
+```
+
+### 4. Agent Switching
+
+Determine which agent handles next step:
+
+```python
+self.agent = self.agent.state.next_agent(self.agent)
+self.agent.set_state(self.state)
+```
+
+**Agent Switching Logic (Windows):**
+
+| Current Agent | Condition | Next Agent |
+|---------------|-----------|------------|
+| **HostAgent** | Application selected | **AppAgent** |
+| **AppAgent** | Need different app | **HostAgent** |
+| **AppAgent** | Same app continues | **AppAgent** |
+| **HostAgent** | Task complete | **HostAgent** (finish) |
+
+**Agent Switching Logic (Linux):**
+
+| Current Agent | Condition | Next Agent |
+|---------------|-----------|------------|
+| **LinuxAgent** | Always | **LinuxAgent** (no switching) |
+
+**Switching Example:**
+
+```mermaid
+sequenceDiagram
+ participant R as Round
+ participant H as HostAgent
+ participant A as AppAgent
+
+ R->>H: handle() - Select app
+ H-->>R: Application: Outlook
+
+ Note over R: Agent switch: HostAgent → AppAgent
+
+ R->>A: handle() - Compose email
+ A-->>R: Commands executed
+
+ R->>A: handle() - Send email
+ A-->>R: Task complete
+
+ Note over R: State: FINISH
+```
+
+### 5. Subtask Boundary Capture
+
+Capture snapshot when subtask ends:
+
+```python
+if self.state.is_subtask_end():
+ time.sleep(configs["SLEEP_TIME"]) # Let UI settle
+ await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
+ self.subtask_amount += 1
+```
+
+**Subtask End Conditions:**
+
+- Agent switched (HostAgent ↔ AppAgent)
+- Major UI change detected
+- Explicit subtask boundary in plan
+
+**Captured Data:**
+
+1. **Window screenshot**: `action_round_{id}_sub_round_{sub_id}_final.png`
+2. **UI tree** (if enabled): `ui_tree_round_{id}_sub_round_{sub_id}_final.json`
+3. **Desktop screenshot** (if enabled): `desktop_round_{id}_sub_round_{sub_id}_final.png`
+
+### 6. Finish Check
+
+```python
+def is_finished(self) -> bool:
+ """Check if round is complete."""
+ return self.state in [AgentState.FINISH, AgentState.ERROR]
+```
+
+Loop continues until state is `FINISH` or `ERROR`.
+
+### 7. Final Snapshot
+
+After loop exits:
+
+```python
+if self.application_window is not None:
+ await self.capture_last_snapshot()
+```
+
+**Final snapshot** captures the end state of the application for logging and evaluation.
+
+### 8. Evaluation
+
+Optional evaluation of round success:
+
+```python
+if self._should_evaluate:
+ await self.evaluation()
+```
+
+**Evaluation checks:**
+- Was the request fulfilled?
+- Quality of actions taken
+- Efficiency metrics
+
+---
+
+## State Machine
+
+### AgentState Enum
+
+```python
+class AgentState(Enum):
+ START = "START"
+ CONTINUE = "CONTINUE"
+ FINISH = "FINISH"
+ ERROR = "ERROR"
+```
+
+### State Behaviors
+
+| State | Meaning | Transitions To |
+|-------|---------|----------------|
+| **START** | Initial state | CONTINUE |
+| **CONTINUE** | Actively processing | CONTINUE, FINISH, ERROR |
+| **FINISH** | Successfully complete | Round ends |
+| **ERROR** | Fatal error occurred | Round ends |
+
+### State Methods
+
+Each state implements:
+
+```python
+class StateInterface:
+ def next_state(self, agent) -> AgentState:
+ """Determine next state based on agent's decision."""
+ pass
+
+ def next_agent(self, agent) -> Agent:
+ """Determine next agent to handle the request."""
+ pass
+
+ def is_subtask_end(self) -> bool:
+ """Check if current state marks subtask boundary."""
+ pass
+```
+
+---
+
+## Agent Orchestration
+
+### Windows Two-Tier Architecture
+
+```mermaid
+sequenceDiagram
+ participant U as User Request
+ participant R as Round
+ participant H as HostAgent
+ participant A as AppAgent
+ participant UI as UI System
+
+ U->>R: "Send email to John"
+ R->>H: handle() - Select application
+ H->>UI: Observe desktop
+ UI-->>H: Screenshot of desktop
+ H->>H: Decide: Outlook
+ H-->>R: Switch to AppAgent for Outlook
+
+ R->>A: handle() - Compose email
+ A->>UI: Observe Outlook window
+ UI-->>A: Screenshot + control tree
+ A->>A: Plan: Click "New Email"
+ A->>UI: Click command
+ UI-->>A: New email window opened
+ A-->>R: Continue
+
+ R->>A: handle() - Fill recipient
+ A->>UI: Type "john@example.com"
+ UI-->>A: Recipient filled
+ A-->>R: Continue
+
+ R->>A: handle() - Click Send
+ A->>UI: Click "Send" button
+ UI-->>A: Email sent
+ A-->>R: Finish
+
+ R-->>U: Request complete
+```
+
+### Linux Single-Tier Architecture
+
+```mermaid
+sequenceDiagram
+ participant U as User Request
+ participant R as Round
+ participant L as LinuxAgent
+ participant UI as UI System
+
+ U->>R: "Open gedit and type Hello"
+ R->>L: handle() - Open application
+ L->>UI: Observe desktop
+ UI-->>L: Desktop state
+ L->>L: Plan: Open gedit
+ L->>UI: Launch gedit command
+ UI-->>L: gedit opened
+ L-->>R: Continue
+
+ R->>L: handle() - Type text
+ L->>UI: Type "Hello"
+ UI-->>L: Text typed
+ L-->>R: Finish
+
+ R-->>U: Request complete
+```
+
+---
+
+## Snapshot Capture
+
+### capture_last_snapshot()
+
+```python
+async def capture_last_snapshot(self, sub_round_id: Optional[int] = None) -> None
+```
+
+**Purpose**: Capture UI state for logging, debugging, and evaluation.
+
+**Captured Artifacts:**
+
+| Artifact | File Pattern | Purpose |
+|----------|--------------|---------|
+| **Window Screenshot** | `action_round_{id}_final.png` | Visual state |
+| **Subtask Screenshot** | `action_round_{id}_sub_round_{sub_id}_final.png` | Subtask boundary |
+| **UI Tree** | `ui_tree_round_{id}_final.json` | Control structure |
+| **Desktop Screenshot** | `desktop_round_{id}_final.png` | Full desktop (if enabled) |
+
+**Example Output:**
+
+```
+logs/task_name/
+├── action_round_0_sub_round_0_final.png ← After HostAgent selects Outlook
+├── action_round_0_sub_round_1_final.png ← After AppAgent composes email
+├── action_round_0_final.png ← Final state after sending
+├── ui_trees/
+│ ├── ui_tree_round_0_sub_round_0_final.json
+│ ├── ui_tree_round_0_sub_round_1_final.json
+│ └── ui_tree_round_0_final.json
+└── desktop_round_0_final.png
+```
+
+### save_ui_tree()
+
+```python
+async def save_ui_tree(self, save_path: str)
+```
+
+Saves the control tree as JSON for analysis:
+
+```json
+{
+ "root": {
+ "control_type": "Window",
+ "name": "Outlook",
+ "children": [
+ {
+ "control_type": "Button",
+ "name": "New Email",
+ "automation_id": "btn_new_email",
+ "bounding_box": [100, 50, 150, 30]
+ }
+ ]
+ }
+}
+```
+
+---
+
+## Properties
+
+### Auto-Syncing Properties
+
+Properties that sync with context automatically:
+
+```python
+@property
+def step(self) -> int:
+ """Current step number in this round."""
+ return self._context.get(ContextNames.ROUND_STEP).get(self.id, 0)
+
+@property
+def cost(self) -> float:
+ """Total cost for this round."""
+ return self._context.get(ContextNames.ROUND_COST).get(self.id, 0)
+
+@property
+def subtask_amount(self) -> int:
+ """Number of subtasks completed."""
+ return self._context.get(ContextNames.ROUND_SUBTASK_AMOUNT).get(self.id, 0)
+
+@subtask_amount.setter
+def subtask_amount(self, value: int) -> None:
+ """Set subtask amount in context."""
+ self._context.current_round_subtask_amount = value
+```
+
+### Static Properties
+
+```python
+@property
+def request(self) -> str:
+ """User request for this round."""
+ return self._request
+
+@property
+def id(self) -> int:
+ """Round number (sequential)."""
+ return self._id
+
+@property
+def context(self) -> Context:
+ """Shared context object."""
+ return self._context
+```
+
+---
+
+## Cost Tracking
+
+### print_cost()
+
+Display round cost after completion:
+
+```python
+def print_cost(self) -> None:
+ """Print the total cost of the round."""
+
+ total_cost = self.cost
+ if isinstance(total_cost, float):
+ formatted_cost = "${:.2f}".format(total_cost)
+ console.print(
+ f"💰 Request total cost for current round is {formatted_cost}",
+ style="yellow",
+ )
+```
+
+**Output Example:**
+
+```
+💰 Request total cost for current round is $0.42
+```
+
+**Cost Components:**
+
+- LLM API calls (HostAgent + AppAgent)
+- Vision model calls (screenshot analysis)
+- Embedding model calls (if used)
+
+---
+
+## Error Handling
+
+### Error States
+
+Rounds can end in error state:
+
+```python
+if agent_fails:
+ self.state = AgentState.ERROR
+ # Round exits loop with ERROR state
+```
+
+### Common Error Scenarios
+
+| Error Type | Trigger | Handling |
+|------------|---------|----------|
+| **Timeout** | Command execution timeout | Set ERROR state |
+| **Agent Failure** | LLM returns invalid plan | Set ERROR state |
+| **UI Not Found** | Element doesn't exist | Retry or ERROR |
+| **Connection Lost** | Dispatcher disconnected | Set ERROR state |
+
+### Error Recovery
+
+```python
+try:
+ await self.agent.handle(self.context)
+except AgentError as e:
+ logger.error(f"Agent handle failed: {e}")
+ self.state = AgentState.ERROR
+ # Loop exits
+```
+
+---
+
+## Configuration
+
+### Round Behavior Settings
+
+| Setting | Type | Purpose |
+|---------|------|---------|
+| `eva_round` | `bool` | Enable round evaluation |
+| `SLEEP_TIME` | `float` | Wait time before snapshot (seconds) |
+| `save_ui_tree` | `bool` | Save UI trees |
+| `save_full_screen` | `bool` | Save desktop screenshots |
+
+**Example Configuration:**
+
+```yaml
+# config/ufo/config.yaml
+system:
+ eva_round: true
+ SLEEP_TIME: 0.5
+ save_ui_tree: true
+ save_full_screen: false
+```
+
+---
+
+## Best Practices
+
+### Efficient Round Execution
+
+!!!tip "Performance Tips"
+ - ✅ Keep agent prompts concise
+ - ✅ Use appropriate timeouts for commands
+ - ✅ Disable full desktop screenshots unless needed
+ - ✅ Capture UI trees only for debugging
+ - ❌ Don't set SLEEP_TIME too high
+ - ❌ Don't enable all logging in production
+
+### State Machine Design
+
+!!!success "Clean State Management"
+ - ✅ Each state should have clear purpose
+ - ✅ Transitions should be deterministic
+ - ✅ Error states should be terminal
+ - ✅ Subtask boundaries should be meaningful
+ - ❌ Don't create circular state loops
+ - ❌ Don't mix state logic with business logic
+
+---
+
+## Reference
+
+### BaseRound
+
+::: module.basic.BaseRound
+
+---
+
+## See Also
+
+- [Session](./session.md) - Multi-round conversation management
+- [Context](./context.md) - Shared state across rounds
+- [Dispatcher](./dispatcher.md) - Command execution
+- [Overview](./overview.md) - Module system architecture
\ No newline at end of file
diff --git a/documents/docs/infrastructure/modules/session.md b/documents/docs/infrastructure/modules/session.md
new file mode 100644
index 000000000..0934d7e07
--- /dev/null
+++ b/documents/docs/infrastructure/modules/session.md
@@ -0,0 +1,885 @@
+# Session
+
+A **Session** is a continuous conversation instance between the user and UFO, managing multiple rounds of interaction from initial request to task completion across different execution modes and platforms.
+
+**Quick Reference:**
+
+- Session types? See [Session Types](#session-types)
+- Lifecycle? See [Session Lifecycle](#session-lifecycle)
+- Mode differences? See [Execution Modes](#execution-modes)
+- Platform differences? See [Platform-Specific Sessions](#platform-specific-sessions)
+
+---
+
+## Overview
+
+A `Session` represents a complete conversation workflow, containing one or more `Rounds` of agent execution. Sessions manage:
+
+1. **Context**: Shared state across all rounds
+2. **Agents**: HostAgent and AppAgent (or LinuxAgent)
+3. **Rounds**: Individual request-response cycles
+4. **Evaluation**: Optional task completion assessment
+5. **Experience**: Learning from successful workflows
+
+### Relationship: Session vs Round
+
+```mermaid
+graph TB
+ subgraph "Session (Conversation)"
+ S[Session Instance]
+ CTX[Context Shared State]
+ R1[Round 1 Request 1]
+ R2[Round 2 Request 2]
+ R3[Round 3 Request 3]
+ EVAL[Evaluation Optional]
+ end
+
+ subgraph "Round 1 Details"
+ HOST1[HostAgent]
+ APP1[AppAgent]
+ CMD1[Commands]
+ end
+
+ subgraph "Round 2 Details"
+ HOST2[HostAgent]
+ APP2[AppAgent]
+ CMD2[Commands]
+ end
+
+ S --> CTX
+ S --> R1
+ S --> R2
+ S --> R3
+ S --> EVAL
+
+ R1 -.shares.-> CTX
+ R2 -.shares.-> CTX
+ R3 -.shares.-> CTX
+
+ R1 --> HOST1
+ HOST1 --> APP1
+ APP1 --> CMD1
+
+ R2 --> HOST2
+ HOST2 --> APP2
+ APP2 --> CMD2
+
+ style S fill:#e1f5ff
+ style CTX fill:#fff4e1
+ style R1 fill:#f0ffe1
+ style R2 fill:#f0ffe1
+ style R3 fill:#f0ffe1
+ style EVAL fill:#ffe1f5
+```
+
+---
+
+## Session Types
+
+UFO supports **7 session types** across Windows and Linux platforms:
+
+| Session Type | Platform | Mode | Description |
+|--------------|----------|------|-------------|
+| **Session** | Windows | `normal`, `normal_operator` | Interactive with HostAgent |
+| **ServiceSession** | Windows | `service` | WebSocket-controlled via AIP |
+| **FollowerSession** | Windows | `follower` | Replays saved plans |
+| **FromFileSession** | Windows | `batch_normal` | Executes from request files |
+| **OpenAIOperatorSession** | Windows | `operator` | Pure operator mode |
+| **LinuxSession** | Linux | `normal`, `normal_operator` | Interactive without HostAgent |
+| **LinuxServiceSession** | Linux | `service` | WebSocket-controlled on Linux |
+
+### Class Hierarchy
+
+```mermaid
+graph TB
+ BASE[BaseSession Abstract]
+
+ WIN_BASE[WindowsBaseSession with HostAgent]
+ LINUX_BASE[LinuxBaseSession without HostAgent]
+
+ SESSION[Session Interactive]
+ SERVICE[ServiceSession WebSocket]
+ FOLLOWER[FollowerSession Plan Replay]
+ FROMFILE[FromFileSession Batch]
+ OPERATOR[OpenAIOperatorSession Operator]
+
+ LINUX_SESS[LinuxSession Interactive]
+ LINUX_SERVICE[LinuxServiceSession WebSocket]
+
+ BASE --> WIN_BASE
+ BASE --> LINUX_BASE
+
+ WIN_BASE --> SESSION
+ WIN_BASE --> SERVICE
+ WIN_BASE --> FOLLOWER
+ WIN_BASE --> FROMFILE
+ WIN_BASE --> OPERATOR
+
+ LINUX_BASE --> LINUX_SESS
+ LINUX_BASE --> LINUX_SERVICE
+
+ style BASE fill:#e1f5ff
+ style WIN_BASE fill:#fff4e1
+ style LINUX_BASE fill:#f0ffe1
+ style SESSION fill:#e1ffe1
+ style LINUX_SESS fill:#e1ffe1
+```
+
+!!!note "Platform Base Classes"
+ - `WindowsBaseSession`: Creates HostAgent, supports two-tier architecture
+ - `LinuxBaseSession`: Single-tier architecture with LinuxAgent only
+
+---
+
+## Session Lifecycle
+
+### Standard Lifecycle
+
+```mermaid
+stateDiagram-v2
+ [*] --> Initialized: __init__
+ Initialized --> ContextReady: _init_context
+ ContextReady --> Running: run()
+
+ Running --> RoundCreate: create_new_round
+ RoundCreate --> RoundExecute: round.run()
+ RoundExecute --> RoundComplete: Round finishes
+
+ RoundComplete --> CheckMore: is_finished?
+ CheckMore --> RoundCreate: More requests
+ CheckMore --> Snapshot: No more requests
+
+ Snapshot --> Evaluation: capture_last_snapshot
+ Evaluation --> CostPrint: evaluation() if enabled
+ CostPrint --> [*]: Session complete
+```
+
+### Core Execution Loop
+
+The main session logic:
+
+```python
+async def run(self) -> None:
+ """
+ Run the session.
+ """
+
+ while not self.is_finished():
+ # Create new round for each request
+ round = self.create_new_round()
+ if round is None:
+ break
+
+ # Execute the round
+ await round.run()
+
+ # Capture final state
+ if self.application_window is not None:
+ await self.capture_last_snapshot()
+
+ # Evaluate if configured
+ if self._should_evaluate and not self.is_error():
+ await self.evaluation()
+
+ # Print cost summary
+ self.print_cost()
+```
+
+### Lifecycle Stages
+
+#### 1. Initialization
+
+```python
+session = Session(
+ task="email_task",
+ should_evaluate=True,
+ id=0,
+ request="Send an email to John",
+ mode="normal"
+)
+```
+
+**What happens:**
+- Task name assigned
+- Session ID set
+- Initial request stored
+- Mode configured
+
+#### 2. Context Initialization
+
+```python
+def _init_context(self) -> None:
+ """Initialize the session context."""
+ super()._init_context()
+
+ # Create MCP server manager
+ mcp_server_manager = MCPServerManager()
+
+ # Create local dispatcher
+ command_dispatcher = LocalCommandDispatcher(
+ session=self,
+ mcp_server_manager=mcp_server_manager
+ )
+
+ # Attach to context
+ self.context.attach_command_dispatcher(command_dispatcher)
+```
+
+**What happens:**
+- Context object created
+- Command dispatcher attached (Local or WebSocket)
+- MCP servers initialized (if applicable)
+- Application window tracked
+
+#### 3. Round Creation
+
+```python
+def create_new_round(self):
+ """Create a new round."""
+
+ # Get request (first or new)
+ if not self.context.get(ContextNames.REQUEST):
+ request = first_request()
+ else:
+ request, complete = new_request()
+ if complete:
+ return None
+
+ # Create round with request
+ round = Round(
+ task=self.task,
+ context=self.context,
+ request=request,
+ id=self._round_num
+ )
+
+ self._round_num += 1
+ return round
+```
+
+**What happens:**
+- User prompted for request (interactive modes)
+- Or request read from file/plan (non-interactive)
+- Round object created with shared context
+- Round counter incremented
+
+#### 4. Round Execution
+
+```python
+await round.run()
+```
+
+**What happens:**
+- HostAgent selects application (Windows)
+- AppAgent executes in application (or LinuxAgent directly)
+- Commands dispatched and executed
+- Results captured in context
+- Experience logged
+
+#### 5. Continuation Check
+
+```python
+def is_finished(self) -> bool:
+ """Check if session is complete."""
+ return self.context.get(ContextNames.SESSION_FINISH, False)
+```
+
+**What happens:**
+- Check if user wants another request
+- Check if error occurred
+- Check if plan is complete (follower/batch modes)
+
+#### 6. Final Snapshot
+
+```python
+async def capture_last_snapshot(self) -> None:
+ """Capture the last snapshot of the application."""
+
+ last_round = self.context.get(ContextNames.ROUND_STEP)
+ subtask_amount = self.context.get(ContextNames.SUBTASK_AMOUNT)
+
+ # Capture screenshot
+ screenshot = self.application_window.capture_screenshot_infor()
+
+ # Save to logs
+ self.file_writer.save_screenshot(
+ screenshot,
+ last_round,
+ subtask_amount,
+ "last"
+ )
+```
+
+**What happens:**
+- Screenshot captured
+- Control tree logged
+- Final state preserved
+
+#### 7. Evaluation
+
+```python
+async def evaluation(self) -> None:
+ """Evaluate the session."""
+
+ evaluator = EvaluationAgent(
+ name="evaluation",
+ process_name=self.context.get(ContextNames.APPLICATION_PROCESS_NAME),
+ app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
+ is_visual=self.configs["EVA_SESSION"]["VIS_EVAL"],
+ main_prompt=self.configs["EVA_SESSION"]["MAIN_PROMPT"],
+ api_prompt=self.configs["EVA_SESSION"]["API_PROMPT"]
+ )
+
+ score = await evaluator.evaluate(
+ request=self.context.get(ContextNames.REQUEST),
+ trajectory=self.context.get(ContextNames.TRAJECTORY)
+ )
+
+ self.file_writer.save_evaluation(score)
+```
+
+**What happens:**
+- EvaluationAgent created
+- Task completion assessed
+- Score logged
+- Feedback saved
+
+#### 8. Cost Summary
+
+```python
+def print_cost(self) -> None:
+ """Print the session cost."""
+
+ total_cost = self.context.get(ContextNames.TOTAL_COST, 0.0)
+ total_tokens = self.context.get(ContextNames.TOTAL_TOKENS, 0)
+
+ console.print(f"[bold green]Session Complete[/bold green]")
+ console.print(f"Total Cost: ${total_cost:.4f}")
+ console.print(f"Total Tokens: {total_tokens}")
+```
+
+---
+
+## Execution Modes
+
+### Normal Mode
+
+**Interactive execution with user in the loop:**
+
+```python
+session = Session(
+ task="document_edit",
+ should_evaluate=True,
+ id=0,
+ request="", # Will prompt user
+ mode="normal"
+)
+
+await session.run()
+```
+
+**Features:**
+- User prompted for initial request via `first_request()`
+- User prompted for each new request via `new_request()`
+- Commands executed locally via `LocalCommandDispatcher`
+- User can exit anytime by typing "N"
+
+**Flow:**
+```
+1. Display welcome panel
+2. User enters: "Open Word"
+3. HostAgent selects Word application
+4. AppAgent types content
+5. User asked: "What next?"
+6. User enters: "Save document"
+7. AppAgent saves file
+8. User asked: "What next?"
+9. User enters: "N" (exit)
+10. Session ends
+```
+
+### Normal_Operator Mode
+
+**Normal mode with operator capabilities:**
+
+```python
+session = Session(
+ task="complex_workflow",
+ should_evaluate=True,
+ id=0,
+ request="Organize my files by date",
+ mode="normal_operator"
+)
+```
+
+**Differences from Normal:**
+- Agent can use operator-level actions
+- More powerful command set
+- Same interactive workflow
+
+### Service Mode
+
+**WebSocket-controlled remote execution:**
+
+```python
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+protocol = TaskExecutionProtocol(websocket_connection)
+
+session = ServiceSession(
+ task="remote_automation",
+ should_evaluate=True,
+ id="session_abc123",
+ request="Click Submit button",
+ task_protocol=protocol
+)
+
+await session.run()
+```
+
+**Features:**
+- No user interaction prompts
+- Single request per session
+- Commands sent via WebSocket
+- Results returned to server
+- Uses `WebSocketCommandDispatcher`
+
+**Flow:**
+```
+1. Server sends request via WebSocket
+2. ServiceSession created
+3. Agent generates commands
+4. Commands sent to client via WebSocket
+5. Client executes locally
+6. Results sent back
+7. Session finishes immediately
+```
+
+**Key Difference:**
+
+```python
+def is_finished(self) -> bool:
+ """Service session finishes after one round."""
+ return self._round_num > 0
+```
+
+### Follower Mode
+
+**Replay saved action plans:**
+
+```python
+session = FollowerSession(
+ task="email_replay",
+ plan_file="/plans/send_email.json",
+ should_evaluate=True,
+ id=0
+)
+
+await session.run()
+```
+
+**Features:**
+- No user prompts
+- Reads actions from plan file
+- Deterministic execution
+- Good for testing/demos
+
+**Plan File Format:**
+
+```json
+{
+ "request": "Send an email to John",
+ "actions": [
+ {
+ "agent": "HostAgent",
+ "action": "select_application",
+ "parameters": {"app_name": "Outlook"}
+ },
+ {
+ "agent": "AppAgent",
+ "action": "click_element",
+ "parameters": {"label": "New Email"}
+ }
+ ]
+}
+```
+
+### Batch_Normal Mode
+
+**Execute multiple requests from files:**
+
+```python
+session = FromFileSession(
+ task="batch_task",
+ plan_file="/requests/task1.json",
+ should_evaluate=True,
+ id=0
+)
+
+await session.run()
+```
+
+**Features:**
+- Request loaded from file
+- No user interaction
+- Can batch multiple files with SessionPool
+- Task status tracking available
+
+**Request File:**
+
+```json
+{
+ "request": "Create a spreadsheet with sales data"
+}
+```
+
+### Operator Mode
+
+**Pure operator-level execution:**
+
+```python
+session = OpenAIOperatorSession(
+ task="system_automation",
+ should_evaluate=True,
+ id=0,
+ request="Install and configure software"
+)
+
+await session.run()
+```
+
+**Features:**
+- Operator-level permissions
+- Can modify system settings
+- More powerful than AppAgent
+- Same interactive prompts as normal mode
+
+---
+
+## Platform-Specific Sessions
+
+### Windows Sessions
+
+**Characteristics:**
+- **Two-tier architecture**: HostAgent → AppAgent
+- **Base class**: `WindowsBaseSession`
+- **Agent flow**: HostAgent selects app, AppAgent controls it
+- **Automation**: Uses UIA (UI Automation)
+
+**Example:**
+
+```python
+class Session(WindowsBaseSession):
+ """Windows interactive session."""
+
+ def _init_context(self):
+ """Initialize with HostAgent."""
+ super()._init_context()
+
+ # HostAgent created by WindowsBaseSession
+ self.host_agent = self.create_host_agent()
+
+ # MCP and LocalCommandDispatcher
+ self.setup_command_dispatcher()
+```
+
+### Linux Sessions
+
+**Characteristics:**
+- **Single-tier architecture**: LinuxAgent only (no HostAgent)
+- **Base class**: `LinuxBaseSession`
+- **Agent flow**: LinuxAgent controls application directly
+- **Automation**: Platform-specific tools
+
+**Example:**
+
+```python
+class LinuxSession(LinuxBaseSession):
+ """Linux interactive session."""
+
+ def _init_context(self):
+ """Initialize without HostAgent."""
+ super()._init_context()
+
+ # No HostAgent - direct LinuxAgent usage
+ self.linux_agent = self.create_linux_agent(
+ application_name=self.application_name
+ )
+```
+
+**Comparison:**
+
+| Aspect | Windows | Linux |
+|--------|---------|-------|
+| **Architecture** | Two-tier (HostAgent + AppAgent) | Single-tier (LinuxAgent) |
+| **Application Selection** | HostAgent decides | Pre-specified or LinuxAgent decides |
+| **Agent Switching** | Yes (HostAgent ↔ AppAgent) | No |
+| **Modes Supported** | All 7 modes | normal, normal_operator, service |
+| **UI Automation** | UIA (UIAutomation) | Platform tools |
+
+See [Platform Sessions](./platform_sessions.md) for detailed comparison.
+
+---
+
+## Experience Saving
+
+Sessions can save successful workflows for future learning:
+
+```python
+# After successful task completion
+if self.configs["SAVE_EXPERIENCE"] == "ask":
+ save = experience_asker()
+
+ if save:
+ self.save_experience()
+```
+
+**Save Modes:**
+
+| Mode | Behavior |
+|------|----------|
+| `always` | Auto-save every successful session |
+| `ask` | Prompt user after each session |
+| `auto` | Save if evaluation score > threshold |
+| `always_not` | Never save |
+
+**Saved Experience Structure:**
+
+```json
+{
+ "task": "Send email",
+ "request": "Send an email to John about the meeting",
+ "trajectory": [
+ {
+ "round": 0,
+ "agent": "HostAgent",
+ "observation": "Desktop with Outlook icon",
+ "action": "select_application",
+ "parameters": {"app_name": "Outlook"}
+ },
+ {
+ "round": 0,
+ "agent": "AppAgent",
+ "observation": "Outlook main window",
+ "action": "click_element",
+ "parameters": {"label": "New Email"}
+ }
+ ],
+ "outcome": "success",
+ "evaluation_score": 0.95,
+ "cost": 0.0234,
+ "tokens": 1542
+}
+```
+
+---
+
+## Error Handling
+
+### Error States
+
+Sessions track errors through context:
+
+```python
+def is_error(self) -> bool:
+ """Check if session encountered error."""
+ return self.context.get(ContextNames.ERROR, False)
+
+def set_error(self, error_message: str):
+ """Set error state."""
+ self.context.set(ContextNames.ERROR, True)
+ self.context.set(ContextNames.ERROR_MESSAGE, error_message)
+```
+
+### Error Recovery
+
+```python
+try:
+ await round.run()
+except AgentError as e:
+ self.set_error(str(e))
+ logger.error(f"Round {self._round_num} failed: {e}")
+
+ # Decide whether to continue or abort
+ if self.can_recover(e):
+ # Try next round
+ continue
+ else:
+ # Abort session
+ break
+```
+
+### Common Errors
+
+| Error Type | Cause | Handling |
+|------------|-------|----------|
+| **TimeoutError** | Command execution timeout | Retry or skip |
+| **ConnectionError** | WebSocket/MCP disconnection | Reconnect or abort |
+| **AgentError** | Agent decision failure | Log and retry |
+| **ValidationError** | Invalid command parameters | Skip command |
+
+---
+
+## Best Practices
+
+### Session Creation
+
+!!!tip "Efficient Sessions"
+ - ✅ Use `SessionFactory.create_session()` for platform-aware creation
+ - ✅ Enable evaluation for quality tracking
+ - ✅ Choose appropriate mode for use case
+ - ✅ Set meaningful task names for logging
+ - ❌ Don't create sessions directly (use factory)
+ - ❌ Don't mix modes (each session has one mode)
+
+### Interactive Sessions
+
+!!!success "User Experience"
+ - ✅ Provide clear initial requests
+ - ✅ Allow users to exit gracefully ("N" option)
+ - ✅ Show progress and confirmations
+ - ✅ Handle sensitive actions with confirmation
+ - ❌ Don't prompt excessively
+ - ❌ Don't hide errors from users
+
+### Service Sessions
+
+!!!warning "WebSocket Considerations"
+ - ✅ Always provide `task_protocol`
+ - ✅ Handle connection loss gracefully
+ - ✅ Set appropriate timeouts
+ - ✅ Validate requests before execution
+ - ❌ Don't assume connection is stable
+ - ❌ Don't block waiting for results indefinitely
+
+### Batch Sessions
+
+!!!tip "Batch Processing"
+ - ✅ Enable task status tracking
+ - ✅ Use descriptive file names
+ - ✅ Group similar tasks
+ - ✅ Log failures for retry
+ - ❌ Don't stop batch on first failure
+ - ❌ Don't run too many sessions in parallel
+
+---
+
+## Examples
+
+### Example 1: Basic Interactive Session
+
+```python
+from ufo.module.sessions.session import Session
+
+# Create session
+session = Session(
+ task="word_editing",
+ should_evaluate=True,
+ id=0,
+ request="", # Will prompt user
+ mode="normal"
+)
+
+# Run session
+await session.run()
+
+# User interaction:
+# 1. Welcome panel shown
+# 2. User enters: "Open Word and type Hello World"
+# 3. HostAgent selects Word
+# 4. AppAgent types text
+# 5. User asked for next request
+# 6. User enters: "N" to exit
+# 7. Session evaluates and ends
+```
+
+### Example 2: Service Session
+
+```python
+from ufo.module.sessions.service_session import ServiceSession
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+# WebSocket established
+protocol = TaskExecutionProtocol(websocket)
+
+# Create service session
+session = ServiceSession(
+ task="remote_click",
+ should_evaluate=False, # Server evaluates
+ id="sess_12345",
+ request="Click the Submit button",
+ task_protocol=protocol
+)
+
+# Run (non-blocking for client)
+await session.run()
+
+# Session finishes after one request
+```
+
+### Example 3: Follower Session
+
+```python
+from ufo.module.sessions.session import FollowerSession
+
+# Replay saved plan
+session = FollowerSession(
+ task="email_demo",
+ plan_file="./plans/send_email.json",
+ should_evaluate=True,
+ id=0
+)
+
+await session.run()
+
+# Executes exactly as recorded in plan file
+# No user prompts
+# Deterministic execution
+```
+
+### Example 4: Linux Session
+
+```python
+from ufo.module.sessions.linux_session import LinuxSession
+
+# Linux interactive session
+session = LinuxSession(
+ task="linux_task",
+ should_evaluate=True,
+ id=0,
+ request="Open gedit and type Hello Linux",
+ mode="normal",
+ application_name="gedit"
+)
+
+await session.run()
+
+# Single-tier architecture
+# No HostAgent
+# LinuxAgent controls gedit directly
+```
+
+---
+
+## Reference
+
+### BaseSession
+
+::: module.basic.BaseSession
+
+### Session (Windows)
+
+::: module.sessions.session.Session
+
+### LinuxSession
+
+::: module.sessions.linux_session.LinuxSession
+
+---
+
+## See Also
+
+- [Round](./round.md) - Individual request-response cycles
+- [Context](./context.md) - Shared state management
+- [Session Factory](./session_pool.md) - Session creation
+- [Platform Sessions](./platform_sessions.md) - Windows vs Linux
\ No newline at end of file
diff --git a/documents/docs/infrastructure/modules/session_pool.md b/documents/docs/infrastructure/modules/session_pool.md
new file mode 100644
index 000000000..1601afb06
--- /dev/null
+++ b/documents/docs/infrastructure/modules/session_pool.md
@@ -0,0 +1,1151 @@
+# Session Factory & Pool
+
+The **SessionFactory** and **SessionPool** classes provide platform-aware session creation and batch execution management, supporting 7 different session modes across Windows and Linux platforms.
+
+**Quick Reference:**
+
+- Create single session? Use [SessionFactory.create_session()](#create_session)
+- Create service session? Use [SessionFactory.create_service_session()](#create_service_session)
+- Batch execution? Use [SessionPool](#sessionpool)
+- Platform detection? Automatic or override with `platform_override`
+
+---
+
+## Overview
+
+The session factory and pool system provides:
+
+1. **Platform Abstraction**: Automatically creates the correct session type for Windows or Linux
+2. **Mode Support**: Handles 7 different execution modes with appropriate session classes
+3. **Batch Management**: Executes multiple sessions sequentially with status tracking
+4. **Service Integration**: Creates WebSocket-controlled sessions with AIP protocol
+
+### Architecture
+
+```mermaid
+graph TB
+ subgraph "Client Code"
+ REQ[User Request]
+ MODE[Execution Mode]
+ PLATFORM[Platform Detection]
+ end
+
+ subgraph "SessionFactory"
+ FACTORY[SessionFactory]
+ DETECT[Platform Detection]
+ WINDOWS[_create_windows_session]
+ LINUX[_create_linux_session]
+ SERVICE[create_service_session]
+ end
+
+ subgraph "Session Types"
+ S1[Session Windows Normal]
+ S2[ServiceSession Windows Service]
+ S3[FollowerSession Windows Follower]
+ S4[FromFileSession Windows Batch]
+ S5[OpenAIOperatorSession Windows Operator]
+ S6[LinuxSession Linux Normal]
+ S7[LinuxServiceSession Linux Service]
+ end
+
+ subgraph "SessionPool"
+ POOL[SessionPool]
+ LIST[session_list]
+ RUN[run_all]
+ NEXT[next_session]
+ end
+
+ REQ --> FACTORY
+ MODE --> FACTORY
+ PLATFORM --> DETECT
+
+ FACTORY --> DETECT
+ DETECT -->|Windows| WINDOWS
+ DETECT -->|Linux| LINUX
+ DETECT -->|Service| SERVICE
+
+ WINDOWS --> S1
+ WINDOWS --> S2
+ WINDOWS --> S3
+ WINDOWS --> S4
+ WINDOWS --> S5
+
+ LINUX --> S6
+ LINUX --> S7
+
+ SERVICE -->|Windows| S2
+ SERVICE -->|Linux| S7
+
+ S1 --> POOL
+ S2 --> POOL
+ S3 --> POOL
+ S4 --> POOL
+ S5 --> POOL
+ S6 --> POOL
+ S7 --> POOL
+
+ POOL --> LIST
+ POOL --> RUN
+ POOL --> NEXT
+
+ style FACTORY fill:#e1f5ff
+ style POOL fill:#f0ffe1
+ style DETECT fill:#fff4e1
+ style SERVICE fill:#ffe1f5
+```
+
+---
+
+## SessionFactory
+
+`SessionFactory` is the central factory for creating all session types with automatic platform detection.
+
+### Class Overview
+
+```python
+from ufo.module.session_pool import SessionFactory
+
+factory = SessionFactory()
+
+# Automatically detects platform and creates appropriate session
+sessions = factory.create_session(
+ task="email_task",
+ mode="normal",
+ plan="",
+ request="Send an email to John"
+)
+```
+
+### Supported Modes
+
+| Mode | Platform | Session Type | Use Case |
+|------|----------|--------------|----------|
+| `normal` | Windows | `Session` | Interactive with HostAgent |
+| `normal` | Linux | `LinuxSession` | Interactive without HostAgent |
+| `normal_operator` | Windows | `Session` | Normal with operator mode |
+| `normal_operator` | Linux | `LinuxSession` | Normal with operator mode |
+| `service` | Windows | `ServiceSession` | WebSocket-controlled |
+| `service` | Linux | `LinuxServiceSession` | WebSocket-controlled |
+| `follower` | Windows | `FollowerSession` | Replay saved plans |
+| `batch_normal` | Windows | `FromFileSession` | Batch execution from files |
+| `operator` | Windows | `OpenAIOperatorSession` | Pure operator mode |
+
+!!!note "Linux Mode Limitations"
+ Currently, Linux only supports `normal`, `normal_operator`, and `service` modes. Follower and batch modes are planned for future releases.
+
+---
+
+### create_session()
+
+Creates one or more sessions based on platform, mode, and plan configuration.
+
+#### Signature
+
+```python
+def create_session(
+ self,
+ task: str,
+ mode: str,
+ plan: str,
+ request: str = "",
+ platform_override: Optional[str] = None,
+ **kwargs,
+) -> List[BaseSession]
+```
+
+#### Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `task` | `str` | Required | Task name for logging/identification |
+| `mode` | `str` | Required | Execution mode (see table above) |
+| `plan` | `str` | Required | Plan file/folder path (for follower/batch modes) |
+| `request` | `str` | `""` | User's natural language request |
+| `platform_override` | `Optional[str]` | `None` | Force platform: `"windows"` or `"linux"` |
+| `**kwargs` | Various | - | Additional parameters (see below) |
+
+**Additional kwargs:**
+
+| Key | Type | Used By | Description |
+|-----|------|---------|-------------|
+| `id` | `int` | All modes | Session ID for tracking |
+| `task_protocol` | `TaskExecutionProtocol` | Service modes | WebSocket protocol instance |
+| `application_name` | `str` | Linux modes | Target application |
+
+#### Return Value
+
+`List[BaseSession]` - List of created sessions
+
+- **Single session modes** (normal, service, operator): Returns 1-element list
+- **Batch modes** (follower, batch_normal with folder): Returns list of sessions for each plan file
+
+#### Platform Detection
+
+```mermaid
+graph TB
+ START[create_session called]
+ CHECK{platform_override?}
+ AUTO[platform.system.lower]
+ OVERRIDE[Use override value]
+
+ WINDOWS{Platform == 'windows'?}
+ LINUX{Platform == 'linux'?}
+ ERROR[NotImplementedError]
+
+ WIN_METHOD[_create_windows_session]
+ LINUX_METHOD[_create_linux_session]
+
+ RETURN[Return session list]
+
+ START --> CHECK
+ CHECK -->|None| AUTO
+ CHECK -->|Set| OVERRIDE
+
+ AUTO --> WINDOWS
+ OVERRIDE --> WINDOWS
+
+ WINDOWS -->|Yes| WIN_METHOD
+ WINDOWS -->|No| LINUX
+
+ LINUX -->|Yes| LINUX_METHOD
+ LINUX -->|No| ERROR
+
+ WIN_METHOD --> RETURN
+ LINUX_METHOD --> RETURN
+
+ style START fill:#e1f5ff
+ style WIN_METHOD fill:#f0ffe1
+ style LINUX_METHOD fill:#fff4e1
+ style ERROR fill:#ffe1e1
+```
+
+#### Examples
+
+**Example 1: Normal Windows Session**
+
+```python
+factory = SessionFactory()
+
+sessions = factory.create_session(
+ task="browse_web",
+ mode="normal",
+ plan="",
+ request="Open Chrome and navigate to google.com"
+)
+
+# Returns: [Session(task="browse_web", ...)]
+session = sessions[0]
+await session.run()
+```
+
+**Example 2: Service Session (Auto-detected Platform)**
+
+```python
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+protocol = TaskExecutionProtocol(websocket_connection)
+
+sessions = factory.create_session(
+ task="remote_control",
+ mode="service",
+ plan="",
+ request="Click the Start button",
+ task_protocol=protocol
+)
+
+# On Windows: Returns [ServiceSession(...)]
+# On Linux: Returns [LinuxServiceSession(...)]
+```
+
+**Example 3: Batch Follower Sessions**
+
+```python
+sessions = factory.create_session(
+ task="batch_email",
+ mode="follower",
+ plan="/path/to/plan_folder", # Folder with multiple .json plan files
+ request=""
+)
+
+# Returns: [
+# FollowerSession(task="batch_email/plan1", ...),
+# FollowerSession(task="batch_email/plan2", ...),
+# FollowerSession(task="batch_email/plan3", ...)
+# ]
+
+# Execute with SessionPool
+pool = SessionPool(sessions)
+await pool.run_all()
+```
+
+**Example 4: Linux Session with Application**
+
+```python
+sessions = factory.create_session(
+ task="edit_document",
+ mode="normal",
+ plan="",
+ request="Type 'Hello World'",
+ platform_override="linux",
+ application_name="gedit"
+)
+
+# Returns: [LinuxSession(task="edit_document", application_name="gedit")]
+```
+
+**Example 5: Operator Mode**
+
+```python
+sessions = factory.create_session(
+ task="complex_workflow",
+ mode="operator",
+ plan="",
+ request="Organize my desktop files by date"
+)
+
+# Returns: [OpenAIOperatorSession(task="complex_workflow", ...)]
+```
+
+---
+
+### create_service_session()
+
+Simplified method specifically for creating service sessions on any platform.
+
+#### Signature
+
+```python
+def create_service_session(
+ self,
+ task: str,
+ should_evaluate: bool,
+ id: str,
+ request: str,
+ task_protocol: Optional["TaskExecutionProtocol"] = None,
+ platform_override: Optional[str] = None,
+) -> BaseSession
+```
+
+#### Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `task` | `str` | Required | Task name |
+| `should_evaluate` | `bool` | Required | Enable evaluation |
+| `id` | `str` | Required | Session ID |
+| `request` | `str` | Required | User request |
+| `task_protocol` | `TaskExecutionProtocol` | `None` | AIP protocol instance |
+| `platform_override` | `Optional[str]` | `None` | Force platform |
+
+#### Return Value
+
+`BaseSession` - Single service session instance
+
+- **Windows**: Returns `ServiceSession`
+- **Linux**: Returns `LinuxServiceSession`
+
+#### Example
+
+```python
+factory = SessionFactory()
+protocol = TaskExecutionProtocol(websocket)
+
+session = factory.create_service_session(
+ task="remote_task",
+ should_evaluate=True,
+ id="session_001",
+ request="Open Notepad",
+ task_protocol=protocol
+)
+
+# Type varies by platform
+if isinstance(session, ServiceSession):
+ print("Windows service session")
+elif isinstance(session, LinuxServiceSession):
+ print("Linux service session")
+
+await session.run()
+```
+
+---
+
+### _create_windows_session() (Internal)
+
+!!!warning "Internal Method"
+ Called by `create_session()` when platform is Windows. Not meant for direct use.
+
+#### Mode Routing
+
+```mermaid
+graph TB
+ START[_create_windows_session]
+ MODE{mode value}
+
+ NORMAL[normal/normal_operator]
+ SERVICE[service]
+ FOLLOWER[follower]
+ BATCH[batch_normal]
+ OPERATOR[operator]
+ ERROR[ValueError]
+
+ S1[Session]
+ S2[ServiceSession]
+ S3_CHECK{plan is folder?}
+ S3_BATCH[create_follower_session_in_batch]
+ S3_SINGLE[FollowerSession single]
+ S4_CHECK{plan is folder?}
+ S4_BATCH[create_sessions_in_batch]
+ S4_SINGLE[FromFileSession single]
+ S5[OpenAIOperatorSession]
+
+ START --> MODE
+
+ MODE -->|normal| NORMAL
+ MODE -->|normal_operator| NORMAL
+ MODE -->|service| SERVICE
+ MODE -->|follower| FOLLOWER
+ MODE -->|batch_normal| BATCH
+ MODE -->|operator| OPERATOR
+ MODE -->|other| ERROR
+
+ NORMAL --> S1
+ SERVICE --> S2
+
+ FOLLOWER --> S3_CHECK
+ S3_CHECK -->|Yes| S3_BATCH
+ S3_CHECK -->|No| S3_SINGLE
+
+ BATCH --> S4_CHECK
+ S4_CHECK -->|Yes| S4_BATCH
+ S4_CHECK -->|No| S4_SINGLE
+
+ OPERATOR --> S5
+
+ style START fill:#e1f5ff
+ style S1 fill:#f0ffe1
+ style S2 fill:#fff4e1
+ style ERROR fill:#ffe1e1
+```
+
+#### Created Session Types
+
+| Mode | Condition | Session Type | Notes |
+|------|-----------|--------------|-------|
+| `normal` | - | `Session` | Standard interactive |
+| `normal_operator` | - | `Session` | With operator mode flag |
+| `service` | - | `ServiceSession` | Requires `task_protocol` |
+| `follower` | Plan is file | `FollowerSession` | Single plan replay |
+| `follower` | Plan is folder | `List[FollowerSession]` | Batch plan replay |
+| `batch_normal` | Plan is file | `FromFileSession` | Single file execution |
+| `batch_normal` | Plan is folder | `List[FromFileSession]` | Batch file execution |
+| `operator` | - | `OpenAIOperatorSession` | Pure operator mode |
+
+---
+
+### _create_linux_session() (Internal)
+
+!!!warning "Internal Method"
+ Called by `create_session()` when platform is Linux. Not meant for direct use.
+
+#### Mode Routing
+
+```mermaid
+graph TB
+ START[_create_linux_session]
+ MODE{mode value}
+
+ NORMAL[normal/normal_operator]
+ SERVICE[service]
+ ERROR[ValueError]
+
+ S1[LinuxSession]
+ S2[LinuxServiceSession]
+
+ START --> MODE
+
+ MODE -->|normal| NORMAL
+ MODE -->|normal_operator| NORMAL
+ MODE -->|service| SERVICE
+ MODE -->|other| ERROR
+
+ NORMAL --> S1
+ SERVICE --> S2
+
+ style START fill:#e1f5ff
+ style S1 fill:#f0ffe1
+ style S2 fill:#fff4e1
+ style ERROR fill:#ffe1e1
+```
+
+#### Supported Modes
+
+| Mode | Session Type | Notes |
+|------|--------------|-------|
+| `normal` | `LinuxSession` | Standard Linux interactive |
+| `normal_operator` | `LinuxSession` | With operator mode flag |
+| `service` | `LinuxServiceSession` | Requires `task_protocol` |
+
+!!!note "Upcoming Features"
+ Follower and batch_normal modes for Linux are planned for future releases.
+
+---
+
+### Batch Session Creation
+
+#### create_follower_session_in_batch()
+
+Creates multiple follower sessions from a folder of plan files:
+
+```python
+def create_follower_session_in_batch(
+ self,
+ task: str,
+ plan: str
+) -> List[BaseSession]
+```
+
+**Process:**
+
+1. Scan folder for `.json` files
+2. Extract file names (without extension)
+3. Create `FollowerSession` for each plan file
+4. Assign sequential IDs
+5. Prefix task name with file name: `{task}/{filename}`
+
+**Example:**
+
+```python
+# Folder structure:
+# /plans/
+# ├── email_john.json
+# ├── email_jane.json
+# └── email_bob.json
+
+sessions = factory.create_follower_session_in_batch(
+ task="send_emails",
+ plan="/plans/"
+)
+
+# Returns:
+# [
+# FollowerSession(task="send_emails/email_john", plan="/plans/email_john.json", id=0),
+# FollowerSession(task="send_emails/email_jane", plan="/plans/email_jane.json", id=1),
+# FollowerSession(task="send_emails/email_bob", plan="/plans/email_bob.json", id=2)
+# ]
+```
+
+#### create_sessions_in_batch()
+
+Creates multiple FromFileSession instances with task status tracking:
+
+```python
+def create_sessions_in_batch(
+ self,
+ task: str,
+ plan: str
+) -> List[BaseSession]
+```
+
+**Features:**
+
+- Tracks completed tasks in `tasks_status.json`
+- Skips already-completed tasks
+- Resumes from last incomplete task
+
+**Task Status File:**
+
+```json
+{
+ "email_john": true,
+ "email_jane": false,
+ "email_bob": false
+}
+```
+
+**Example:**
+
+```python
+# First run
+sessions = factory.create_sessions_in_batch(
+ task="batch_emails",
+ plan="/requests/"
+)
+# Returns 3 sessions: email_john, email_jane, email_bob
+
+# email_john completes successfully
+# tasks_status.json updated: {"email_john": true, "email_jane": false, "email_bob": false}
+
+# Second run (after restart)
+sessions = factory.create_sessions_in_batch(
+ task="batch_emails",
+ plan="/requests/"
+)
+# Returns 2 sessions: email_jane, email_bob (skips completed email_john)
+```
+
+**Configuration:**
+
+```python
+# Enable task status tracking
+ufo_config.system.task_status = True
+
+# Custom status file location
+ufo_config.system.task_status_file = "/path/to/status.json"
+```
+
+---
+
+## SessionPool
+
+`SessionPool` manages multiple sessions and executes them sequentially.
+
+### Class Overview
+
+```python
+from ufo.module.session_pool import SessionPool
+
+# Create sessions
+sessions = factory.create_session(
+ task="batch_task",
+ mode="follower",
+ plan="/plans_folder/"
+)
+
+# Create pool
+pool = SessionPool(session_list=sessions)
+
+# Execute all
+await pool.run_all()
+```
+
+### Constructor
+
+```python
+def __init__(self, session_list: List[BaseSession]) -> None
+```
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `session_list` | `List[BaseSession]` | Initial list of sessions |
+
+### Methods
+
+#### run_all()
+
+Execute all sessions in the pool sequentially:
+
+```python
+async def run_all(self) -> None
+```
+
+**Execution Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Pool as SessionPool
+ participant S1 as Session 1
+ participant S2 as Session 2
+ participant S3 as Session 3
+
+ Pool->>S1: await session.run()
+ S1->>S1: Execute task
+ S1-->>Pool: Complete
+
+ Pool->>S2: await session.run()
+ S2->>S2: Execute task
+ S2-->>Pool: Complete
+
+ Pool->>S3: await session.run()
+ S3->>S3: Execute task
+ S3-->>Pool: Complete
+
+ Pool-->>Pool: All sessions complete
+```
+
+**Example:**
+
+```python
+pool = SessionPool(sessions)
+
+# Execute all sequentially
+await pool.run_all()
+
+# All sessions have completed
+print("Batch execution complete")
+```
+
+#### add_session()
+
+Add a session to the pool:
+
+```python
+def add_session(self, session: BaseSession) -> None
+```
+
+**Example:**
+
+```python
+pool = SessionPool([session1, session2])
+
+# Add another session
+pool.add_session(session3)
+
+# Now pool has 3 sessions
+```
+
+#### next_session()
+
+Get and remove the next session from the pool:
+
+```python
+def next_session(self) -> BaseSession
+```
+
+**Example:**
+
+```python
+pool = SessionPool([session1, session2, session3])
+
+# Get next session (FIFO)
+next_sess = pool.next_session()
+# next_sess == session1
+# Pool now has [session2, session3]
+
+await next_sess.run()
+```
+
+#### session_list (Property)
+
+Get the current session list:
+
+```python
+@property
+def session_list(self) -> List[BaseSession]
+```
+
+**Example:**
+
+```python
+pool = SessionPool(sessions)
+
+print(f"Pool has {len(pool.session_list)} sessions")
+
+for session in pool.session_list:
+ print(f"Task: {session.task}")
+```
+
+---
+
+## Usage Patterns
+
+### Pattern 1: Single Interactive Session
+
+```python
+factory = SessionFactory()
+
+sessions = factory.create_session(
+ task="user_task",
+ mode="normal",
+ plan="",
+ request="Open Word and create a document"
+)
+
+session = sessions[0]
+await session.run()
+```
+
+### Pattern 2: Service Session with WebSocket
+
+```python
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+# WebSocket connection established
+protocol = TaskExecutionProtocol(websocket)
+
+factory = SessionFactory()
+
+session = factory.create_service_session(
+ task="remote_automation",
+ should_evaluate=True,
+ id="session_123",
+ request="Click the Submit button",
+ task_protocol=protocol
+)
+
+await session.run()
+```
+
+### Pattern 3: Batch Execution
+
+```python
+# Create batch sessions
+factory = SessionFactory()
+
+sessions = factory.create_session(
+ task="daily_reports",
+ mode="batch_normal",
+ plan="/request_files/", # Folder with .json request files
+ request=""
+)
+
+# Execute with pool
+pool = SessionPool(sessions)
+await pool.run_all()
+
+print(f"Completed {len(sessions)} tasks")
+```
+
+### Pattern 4: Cross-Platform Application
+
+```python
+import platform
+
+factory = SessionFactory()
+
+# Detect current platform
+current_os = platform.system().lower()
+
+sessions = factory.create_session(
+ task="cross_platform_task",
+ mode="normal",
+ plan="",
+ request="Open text editor",
+ application_name="gedit" if current_os == "linux" else None
+)
+
+# Correct session type automatically created
+await sessions[0].run()
+```
+
+### Pattern 5: Dynamic Session Pool
+
+```python
+pool = SessionPool([])
+
+# Add sessions dynamically
+for user_request in user_requests:
+ sessions = factory.create_session(
+ task=f"request_{len(pool.session_list)}",
+ mode="normal",
+ plan="",
+ request=user_request
+ )
+ pool.add_session(sessions[0])
+
+# Execute all
+await pool.run_all()
+```
+
+### Pattern 6: Resumable Batch Processing
+
+```python
+# Enable task status tracking
+ufo_config.system.task_status = True
+ufo_config.system.task_status_file = "progress.json"
+
+factory = SessionFactory()
+
+# First run
+sessions = factory.create_sessions_in_batch(
+ task="large_batch",
+ plan="/tasks/"
+)
+
+pool = SessionPool(sessions)
+
+try:
+ await pool.run_all()
+except KeyboardInterrupt:
+ print("Interrupted - progress saved")
+
+# Second run (resumes from last incomplete)
+sessions = factory.create_sessions_in_batch(
+ task="large_batch",
+ plan="/tasks/"
+)
+# Only uncompleted tasks loaded
+
+pool = SessionPool(sessions)
+await pool.run_all()
+```
+
+---
+
+## Configuration Integration
+
+### UFO Config Settings
+
+| Setting | Type | Purpose |
+|---------|------|---------|
+| `ufo_config.system.eva_session` | `bool` | Enable session evaluation |
+| `ufo_config.system.task_status` | `bool` | Enable task status tracking |
+| `ufo_config.system.task_status_file` | `str` | Custom status file path |
+
+### Example Configuration
+
+```yaml
+# config/ufo/config.yaml
+system:
+ eva_session: true
+ task_status: true
+ task_status_file: "./logs/task_status.json"
+```
+
+**Usage:**
+
+```python
+from config.config_loader import get_ufo_config
+
+ufo_config = get_ufo_config()
+
+# These settings affect SessionFactory behavior
+factory = SessionFactory()
+
+# Uses ufo_config.system.eva_session for should_evaluate
+sessions = factory.create_session(
+ task="configured_task",
+ mode="normal",
+ plan="",
+ request="Do something"
+)
+```
+
+---
+
+## Platform Detection
+
+### Automatic Detection
+
+```python
+import platform
+
+current_platform = platform.system().lower()
+# Returns: "windows", "linux", "darwin" (macOS)
+```
+
+**Supported Platforms:**
+
+- `"windows"` → Windows-specific sessions
+- `"linux"` → Linux-specific sessions
+- Others → `NotImplementedError`
+
+### Manual Override
+
+Force platform selection:
+
+```python
+# Force Windows session on Linux machine (for testing)
+sessions = factory.create_session(
+ task="test",
+ mode="normal",
+ plan="",
+ request="Test request",
+ platform_override="windows"
+)
+
+# Creates Session instead of LinuxSession
+```
+
+!!!warning "Override Use Cases"
+ - **Testing**: Test Windows sessions on Linux
+ - **Development**: Test platform-specific code
+ - **Cross-compilation**: Generate plans for other platforms
+ - **Not for production**: Always use auto-detection in production
+
+---
+
+## Error Handling
+
+### NotImplementedError
+
+**Trigger:** Unsupported platform or mode
+
+```python
+try:
+ sessions = factory.create_session(
+ task="task",
+ mode="follower",
+ plan="",
+ request="",
+ platform_override="darwin" # macOS not supported
+ )
+except NotImplementedError as e:
+ print(f"Error: {e}")
+ # Error: Platform darwin is not supported yet.
+```
+
+### ValueError
+
+**Trigger:** Invalid mode for platform
+
+```python
+try:
+ sessions = factory.create_session(
+ task="task",
+ mode="follower",
+ plan="",
+ request="",
+ platform_override="linux"
+ )
+except ValueError as e:
+ print(f"Error: {e}")
+ # Error: The follower mode is not supported on Linux yet.
+ # Supported modes: normal, normal_operator, service
+```
+
+### Graceful Handling
+
+```python
+def create_session_safely(task, mode, plan, request):
+ """Create session with error handling."""
+ factory = SessionFactory()
+
+ try:
+ sessions = factory.create_session(
+ task=task,
+ mode=mode,
+ plan=plan,
+ request=request
+ )
+ return sessions
+
+ except NotImplementedError as e:
+ logger.error(f"Platform not supported: {e}")
+ return []
+
+ except ValueError as e:
+ logger.error(f"Invalid mode: {e}")
+ # Fallback to normal mode
+ return factory.create_session(
+ task=task,
+ mode="normal",
+ plan="",
+ request=request
+ )
+```
+
+---
+
+## Best Practices
+
+### Session Creation
+
+!!!tip "Efficient Session Management"
+ - ✅ Use `create_service_session()` for service sessions (cleaner API)
+ - ✅ Let platform auto-detect unless testing
+ - ✅ Use batch modes for multiple similar tasks
+ - ✅ Enable task status tracking for long-running batches
+ - ❌ Don't create sessions in tight loops (use batch modes)
+ - ❌ Don't mix session types in same pool without reason
+
+### Batch Processing
+
+!!!success "Optimal Batch Execution"
+ 1. **Group similar tasks** in same folder
+ 2. **Enable task status** tracking for resumability
+ 3. **Use descriptive filenames** for task identification
+ 4. **Handle failures** gracefully (don't stop entire batch)
+ 5. **Monitor progress** with logging
+
+### Platform Handling
+
+!!!warning "Cross-Platform Considerations"
+ - Always check platform before platform-specific operations
+ - Use `application_name` parameter for Linux sessions
+ - Test on both platforms if deploying cross-platform
+ - Document platform-specific features clearly
+
+---
+
+## Troubleshooting
+
+### Issue: Wrong Session Type Created
+
+**Symptoms:**
+- Expected `LinuxSession` but got `Session`
+- Mode not working as expected
+
+**Diagnosis:**
+```python
+session = sessions[0]
+print(f"Session type: {type(session).__name__}")
+print(f"Platform: {platform.system().lower()}")
+```
+
+**Solutions:**
+1. Check platform detection: `platform.system().lower()`
+2. Verify mode spelling and case
+3. Use `platform_override` if needed for testing
+
+### Issue: Batch Sessions Not Found
+
+**Symptoms:**
+- Empty session list from batch creation
+- `create_sessions_in_batch()` returns `[]`
+
+**Diagnosis:**
+```python
+plan_files = factory.get_plan_files("/path/to/folder")
+print(f"Found {len(plan_files)} plan files")
+print(f"Files: {plan_files}")
+```
+
+**Solutions:**
+1. Ensure folder exists: `os.path.isdir(plan_folder)`
+2. Check files have `.json` extension
+3. Verify file permissions
+4. Check task status file hasn't marked all as done
+
+### Issue: Service Session Missing Protocol
+
+**Symptoms:**
+- `ValueError` about missing protocol
+- Service session fails to initialize
+
+**Diagnosis:**
+```python
+protocol = kwargs.get("task_protocol")
+print(f"Protocol: {protocol}")
+print(f"Type: {type(protocol)}")
+```
+
+**Solution:**
+Always provide `task_protocol` for service sessions:
+
+```python
+from aip.protocol.task_execution import TaskExecutionProtocol
+
+protocol = TaskExecutionProtocol(websocket)
+
+session = factory.create_service_session(
+ task="service_task",
+ should_evaluate=True,
+ id="sess_001",
+ request="Do something",
+ task_protocol=protocol # ← Required!
+)
+```
+
+---
+
+## Reference
+
+### SessionFactory Methods
+
+::: module.session_pool.SessionFactory
+
+### SessionPool Methods
+
+::: module.session_pool.SessionPool
+
+---
+
+## See Also
+
+- [Session](./session.md) - Session lifecycle and execution
+- [Platform Sessions](./platform_sessions.md) - Windows vs Linux differences
+- [Overview](./overview.md) - Module system architecture
+- [AIP Protocol](../../aip/overview.md) - Service session WebSocket protocol
+
diff --git a/documents/docs/javascripts/mermaid-init.js b/documents/docs/javascripts/mermaid-init.js
new file mode 100644
index 000000000..dd00fce19
--- /dev/null
+++ b/documents/docs/javascripts/mermaid-init.js
@@ -0,0 +1,33 @@
+// Initialize Mermaid for ReadTheDocs theme
+(function() {
+ // Wait for DOM to be ready
+ if (document.readyState === 'loading') {
+ document.addEventListener('DOMContentLoaded', initMermaid);
+ } else {
+ initMermaid();
+ }
+
+ function initMermaid() {
+ // Initialize Mermaid
+ if (typeof mermaid !== 'undefined') {
+ mermaid.initialize({
+ startOnLoad: true,
+ theme: 'default',
+ securityLevel: 'loose',
+ flowchart: {
+ useMaxWidth: true,
+ htmlLabels: true,
+ curve: 'basis'
+ },
+ sequence: {
+ useMaxWidth: true,
+ wrap: true
+ },
+ gantt: {
+ useMaxWidth: true
+ }
+ });
+ }
+ }
+})();
+
diff --git a/documents/docs/linux/as_galaxy_device.md b/documents/docs/linux/as_galaxy_device.md
new file mode 100644
index 000000000..1c9beb7f8
--- /dev/null
+++ b/documents/docs/linux/as_galaxy_device.md
@@ -0,0 +1,515 @@
+# Using Linux Agent as Galaxy Device
+
+Configure Linux Agent as a sub-agent in UFO's Galaxy framework to enable cross-platform, multi-device task orchestration. Galaxy can coordinate Linux agents alongside Windows devices to execute complex workflows spanning multiple systems.
+
+## Overview
+
+The **Galaxy framework** provides multi-tier orchestration capabilities, allowing you to manage multiple device agents (Windows, Linux, etc.) from a central ConstellationAgent. When configured as a Galaxy device, LinuxAgent becomes a **sub-agent** that can:
+
+- Execute Linux-specific subtasks assigned by Galaxy
+- Participate in cross-platform workflows (e.g., Windows + Linux collaboration)
+- Report execution status back to the orchestrator
+- Be dynamically selected based on capabilities and metadata
+
+For detailed information about LinuxAgent's design and capabilities, see [Linux Agent Overview](overview.md).
+
+## Galaxy Architecture with Linux Agent
+
+```mermaid
+graph TB
+ User[User Request]
+ Galaxy[Galaxy ConstellationAgent Orchestrator]
+
+ subgraph "Device Pool"
+ Win1[Windows Device 1 HostAgent]
+ Win2[Windows Device 2 HostAgent]
+ Linux1[Linux Agent 1 CLI Executor]
+ Linux2[Linux Agent 2 CLI Executor]
+ Linux3[Linux Agent 3 CLI Executor]
+ end
+
+ User -->|Complex Task| Galaxy
+ Galaxy -->|Windows Subtask| Win1
+ Galaxy -->|Windows Subtask| Win2
+ Galaxy -->|Linux Subtask| Linux1
+ Galaxy -->|Linux Subtask| Linux2
+ Galaxy -->|Linux Subtask| Linux3
+
+ style Galaxy fill:#ffe1e1
+ style Linux1 fill:#e1f5ff
+ style Linux2 fill:#e1f5ff
+ style Linux3 fill:#e1f5ff
+```
+
+Galaxy orchestrates task decomposition, device selection based on capabilities, parallel execution, and result aggregation across all devices.
+
+## Configuration Guide
+
+### Step 1: Configure Device in `devices.yaml`
+
+Add your Linux agent(s) to the device list in `config/galaxy/devices.yaml`:
+
+**Example Configuration:**
+
+```yaml
+devices:
+ - device_id: "linux_agent_1"
+ server_url: "ws://172.23.48.1:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ - "log_analysis"
+ - "file_operations"
+ - "database_management"
+ metadata:
+ os: "linux"
+ performance: "high"
+ logs_file_path: "/var/log/myapp/app.log"
+ dev_path: "/home/user/development/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ description: "Production web server"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Step 2: Understanding Configuration Fields
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `device_id` | ✅ Yes | string | **Unique identifier** - must match client `--client-id` |
+| `server_url` | ✅ Yes | string | WebSocket URL - must match server endpoint |
+| `os` | ✅ Yes | string | Operating system - set to `"linux"` |
+| `capabilities` | ❌ Optional | list | Skills/capabilities for task routing |
+| `metadata` | ❌ Optional | dict | Custom context for LLM-based task execution |
+| `auto_connect` | ❌ Optional | boolean | Auto-connect on Galaxy startup (default: `true`) |
+| `max_retries` | ❌ Optional | integer | Connection retry attempts (default: `5`) |
+
+### Step 3: Capabilities-Based Task Routing
+
+Galaxy uses the `capabilities` field to intelligently route subtasks to appropriate devices. Define capabilities based on server roles, task types, installed software, or data access requirements.
+
+**Example Capability Configurations:**
+
+**Web Server:**
+```yaml
+capabilities:
+ - "web_server"
+ - "nginx"
+ - "ssl_management"
+ - "log_analysis"
+```
+
+**Database Server:**
+```yaml
+capabilities:
+ - "database_server"
+ - "postgresql"
+ - "backup_management"
+ - "query_optimization"
+```
+
+**CI/CD Server:**
+```yaml
+capabilities:
+ - "ci_cd"
+ - "docker"
+ - "kubernetes"
+ - "deployment"
+```
+
+**Monitoring Server:**
+```yaml
+capabilities:
+ - "monitoring"
+ - "prometheus"
+ - "grafana"
+ - "alerting"
+```
+
+### Step 4: Metadata for Contextual Execution
+
+The `metadata` field provides contextual information that the LLM uses when generating commands for the Linux agent.
+
+**Metadata Examples:**
+
+**Web Server Metadata:**
+```yaml
+metadata:
+ os: "linux"
+ logs_file_path: "/var/log/nginx/access.log"
+ error_log_path: "/var/log/nginx/error.log"
+ web_root: "/var/www/html"
+ ssl_cert_path: "/etc/letsencrypt/live/example.com/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ performance: "high"
+ description: "Production nginx web server"
+```
+
+**Database Server Metadata:**
+```yaml
+metadata:
+ os: "linux"
+ logs_file_path: "/var/log/postgresql/postgresql.log"
+ data_path: "/var/lib/postgresql/14/main"
+ backup_path: "/mnt/backups/postgresql"
+ warning_log_pattern: "WARNING"
+ error_log_pattern: "ERROR|FATAL|PANIC"
+ performance: "high"
+ description: "Production PostgreSQL 14 database"
+```
+
+**Development Server Metadata:**
+```yaml
+metadata:
+ os: "linux"
+ dev_path: "/home/developer/projects"
+ logs_file_path: "/var/log/app/dev.log"
+ git_repo_path: "/home/developer/repos"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR"
+ performance: "medium"
+ description: "Development and testing environment"
+```
+
+**How Metadata is Used:**
+
+The LLM receives metadata in the system prompt, enabling context-aware command generation. For example, with the web server metadata above, when the user requests "Find all 500 errors in the last hour", the LLM can generate the appropriate command using the correct log path.
+
+## Multi-Device Configuration Example
+
+**Complete Galaxy Setup:**
+
+```yaml
+devices:
+ # Windows Desktop Agent
+ - device_id: "windows_desktop_1"
+ server_url: "ws://192.168.1.100:5000/ws"
+ os: "windows"
+ capabilities:
+ - "office_applications"
+ - "email"
+ - "web_browsing"
+ metadata:
+ os: "windows"
+ description: "Office productivity workstation"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Web Server
+ - device_id: "linux_web_server"
+ server_url: "ws://192.168.1.101:5001/ws"
+ os: "linux"
+ capabilities:
+ - "web_server"
+ - "nginx"
+ - "log_analysis"
+ metadata:
+ os: "linux"
+ logs_file_path: "/var/log/nginx/access.log"
+ web_root: "/var/www/html"
+ description: "Production web server"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Database Server
+ - device_id: "linux_db_server"
+ server_url: "ws://192.168.1.102:5002/ws"
+ os: "linux"
+ capabilities:
+ - "database_server"
+ - "postgresql"
+ - "backup_management"
+ metadata:
+ os: "linux"
+ logs_file_path: "/var/log/postgresql/postgresql.log"
+ data_path: "/var/lib/postgresql/14/main"
+ description: "Production database server"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Monitoring Server
+ - device_id: "linux_monitoring"
+ server_url: "ws://192.168.1.103:5003/ws"
+ os: "linux"
+ capabilities:
+ - "monitoring"
+ - "prometheus"
+ - "alerting"
+ metadata:
+ os: "linux"
+ logs_file_path: "/var/log/prometheus/prometheus.log"
+ metrics_path: "/var/lib/prometheus"
+ description: "System monitoring server"
+ auto_connect: true
+ max_retries: 5
+```
+
+## Starting Galaxy with Linux Agents
+
+### Prerequisites
+
+Ensure all components are running before starting Galaxy:
+
+1. Device Agent Servers running on all machines
+2. Device Agent Clients connected to their respective servers
+3. MCP Services running on all Linux agents
+4. LLM configured in `config/ufo/agents.yaml` (for UFO) or `config/galaxy/agent.yaml` (for Galaxy)
+
+### Launch Sequence
+
+**Step 1: Start all Device Agent Servers**
+
+```bash
+# On web server machine (192.168.1.101)
+python -m ufo.server.app --port 5001
+
+# On database server machine (192.168.1.102)
+python -m ufo.server.app --port 5002
+
+# On monitoring server machine (192.168.1.103)
+python -m ufo.server.app --port 5003
+```
+
+**Step 2: Start all Linux Clients**
+
+```bash
+# On web server
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.101:5001/ws \
+ --client-id linux_web_server \
+ --platform linux
+
+# On database server
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.102:5002/ws \
+ --client-id linux_db_server \
+ --platform linux
+
+# On monitoring server
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.103:5003/ws \
+ --client-id linux_monitoring \
+ --platform linux
+```
+
+**Step 3: Start all MCP Services**
+
+```bash
+# On each Linux machine
+python -m ufo.client.mcp.http_servers.linux_mcp_server
+```
+
+**Step 4: Launch Galaxy**
+
+```bash
+# On your control machine (interactive mode)
+python -m galaxy --interactive
+```
+
+**Or launch with a specific request:**
+
+```bash
+python -m galaxy "Your task description here"
+```
+
+Galaxy will automatically connect to all configured devices and display the orchestration interface.
+
+## Example Multi-Device Workflows
+
+### Workflow 1: Cross-Platform Data Processing
+
+**User Request:**
+> "Generate a sales report in Excel from the database, then email it to the team"
+
+**Galaxy Orchestration:**
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Galaxy
+ participant LinuxDB as Linux DB Server
+ participant WinDesktop as Windows Desktop
+
+ User->>Galaxy: Request sales report
+ Galaxy->>Galaxy: Decompose task
+
+ Note over Galaxy,LinuxDB: Subtask 1: Extract data
+ Galaxy->>LinuxDB: "Export sales data from PostgreSQL to CSV"
+ LinuxDB->>LinuxDB: Execute SQL query
+ LinuxDB->>LinuxDB: Generate CSV file
+ LinuxDB-->>Galaxy: CSV file location
+
+ Note over Galaxy,WinDesktop: Subtask 2: Create Excel report
+ Galaxy->>WinDesktop: "Create Excel report from CSV"
+ WinDesktop->>WinDesktop: Open Excel
+ WinDesktop->>WinDesktop: Import CSV
+ WinDesktop->>WinDesktop: Format report
+ WinDesktop-->>Galaxy: Excel file created
+
+ Note over Galaxy,WinDesktop: Subtask 3: Send email
+ Galaxy->>WinDesktop: "Email report to team"
+ WinDesktop->>WinDesktop: Open Outlook
+ WinDesktop->>WinDesktop: Attach file
+ WinDesktop->>WinDesktop: Send email
+ WinDesktop-->>Galaxy: Email sent
+
+ Galaxy-->>User: Task completed
+```
+
+### Workflow 2: Multi-Server Log Analysis
+
+**User Request:**
+> "Check all servers for error patterns in the last hour and summarize findings"
+
+**Galaxy Orchestration:**
+
+1. **Linux Web Server**: Analyze nginx logs for HTTP 500 errors
+2. **Linux DB Server**: Check PostgreSQL logs for query failures
+3. **Linux Monitoring**: Review Prometheus alerts
+4. **Galaxy**: Aggregate results and generate summary report
+
+### Workflow 3: Deployment Pipeline
+
+**User Request:**
+> "Deploy the new application version to production"
+
+**Galaxy Orchestration:**
+
+1. **Linux CI/CD Server**: Build Docker image from Git repository
+2. **Linux Web Server**: Stop current service, pull new image, restart
+3. **Linux DB Server**: Run database migrations
+4. **Linux Monitoring**: Verify health checks and metrics
+5. **Windows Desktop**: Send deployment notification email
+
+---
+
+## Task Assignment Behavior
+
+### How Galaxy Routes Tasks to Linux Agents
+
+Galaxy's ConstellationAgent uses several factors to select the appropriate device for each subtask:
+
+| Factor | Description | Example |
+|--------|-------------|---------|
+| **Capabilities** | Match subtask requirements to device capabilities | `"database_server"` → DB server agent |
+| **OS Requirement** | Platform-specific tasks routed to correct OS | Linux commands → Linux agents |
+| **Metadata Context** | Use device-specific paths and configurations | Log analysis → agent with correct log path |
+| **Device Status** | Only assign to online, healthy devices | Skip offline or failing devices |
+| **Load Balancing** | Distribute tasks across similar devices | Round-robin across web servers |
+
+### Example Task Decomposition
+
+**User Request:**
+> "Monitor system health across all servers and alert if any issues found"
+
+**Galaxy Decomposition:**
+
+```yaml
+Task 1:
+ Description: "Check web server health"
+ Target: linux_web_server
+ Reason: Has "web_server" capability
+
+Task 2:
+ Description: "Check database health"
+ Target: linux_db_server
+ Reason: Has "database_server" capability
+
+Task 3:
+ Description: "Review monitoring alerts"
+ Target: linux_monitoring
+ Reason: Has "monitoring" capability
+
+Task 4:
+ Description: "Aggregate results and send alert email"
+ Target: windows_desktop_1
+ Reason: Has "email" capability
+```
+
+## Critical Configuration Requirements
+
+!!!danger "Configuration Validation"
+ Ensure these match exactly or Galaxy cannot control the device:
+
+ - **Device ID**: `device_id` in `devices.yaml` must match `--client-id` in client command
+ - **Server URL**: `server_url` in `devices.yaml` must match `--ws-server` in client command
+ - **Platform**: Must include `--platform linux` in client command
+
+## Monitoring & Debugging
+
+### Verify Device Registration
+
+**Check Galaxy device pool:**
+
+```bash
+# List all connected devices
+curl http://:5000/api/devices
+```
+
+**Expected response:**
+
+```json
+{
+ "devices": [
+ {
+ "device_id": "linux_web_server",
+ "os": "linux",
+ "status": "online",
+ "capabilities": ["web_server", "nginx", "log_analysis"]
+ },
+ {
+ "device_id": "linux_db_server",
+ "os": "linux",
+ "status": "online",
+ "capabilities": ["database_server", "postgresql"]
+ }
+ ]
+}
+```
+
+### View Task Assignments
+
+Galaxy logs show task routing decisions:
+
+```log
+INFO - [Galaxy] Task decomposition: 3 subtasks created
+INFO - [Galaxy] Subtask 1 → linux_web_server (capability match: web_server)
+INFO - [Galaxy] Subtask 2 → linux_db_server (capability match: database_server)
+INFO - [Galaxy] Subtask 3 → windows_desktop_1 (capability match: email)
+```
+
+### Troubleshooting Device Connection
+
+**Issue**: Linux agent not appearing in Galaxy device pool
+
+**Diagnosis:**
+
+1. Check if client is connected to server:
+ ```bash
+ curl http://192.168.1.101:5001/api/clients
+ ```
+
+2. Verify `devices.yaml` configuration matches client parameters
+
+3. Check Galaxy logs for connection errors
+
+4. Ensure `auto_connect: true` in `devices.yaml`
+
+## Related Documentation
+
+- [Linux Agent Overview](overview.md) - Architecture and design principles
+- [Quick Start Guide](../getting_started/quick_start_linux.md) - Step-by-step setup
+- [Galaxy Overview](../galaxy/overview.md) - Multi-device orchestration framework
+- [Galaxy Quick Start](../getting_started/quick_start_galaxy.md) - Galaxy deployment guide
+- [Constellation Orchestrator](../galaxy/constellation_orchestrator/overview.md) - Task orchestration
+- [Galaxy Devices Configuration](../configuration/system/galaxy_devices.md) - Complete device configuration reference
+
+## Summary
+
+Using Linux Agent as a Galaxy device enables multi-device orchestration with capability-based routing, metadata context for LLM-aware command generation, parallel execution, and seamless cross-platform workflows between Linux and Windows agents.
+
diff --git a/documents/docs/linux/commands.md b/documents/docs/linux/commands.md
new file mode 100644
index 000000000..fd856f86f
--- /dev/null
+++ b/documents/docs/linux/commands.md
@@ -0,0 +1,415 @@
+# LinuxAgent MCP Commands
+
+LinuxAgent interacts with Linux systems through MCP (Model Context Protocol) tools provided by the Linux MCP Server. These tools provide atomic building blocks for CLI task execution, isolating system-specific operations within the MCP server layer.
+
+## Command Architecture
+
+### MCP Server Integration
+
+LinuxAgent commands are executed through the MCP server infrastructure:
+
+```mermaid
+graph LR
+ A[LinuxAgent] --> B[Command Dispatcher]
+ B --> C[MCP Server]
+ C --> D[Linux Shell]
+ D --> E[stdout/stderr]
+ E --> C
+ C --> B
+ B --> A
+```
+
+### Command Dispatcher
+
+The command dispatcher routes commands to the appropriate MCP server:
+
+```python
+from aip.messages import Command
+
+# Create command
+command = Command(
+ tool_name="execute_command",
+ parameters={"command": "df -h", "timeout": 30},
+ tool_type="action"
+)
+
+# Execute command via dispatcher
+results = await command_dispatcher.execute_commands([command])
+execution_result = results[0].result
+```
+
+## Primary MCP Tools
+
+### 1. execute_command - Execute Shell Commands
+
+**Purpose**: Execute arbitrary shell commands and capture structured results.
+
+#### Tool Specification
+
+```python
+tool_name = "execute_command"
+parameters = {
+ "command": "df -h", # Shell command to execute
+ "timeout": 30, # Execution timeout (seconds, default: 30)
+ "cwd": "/home/user" # Optional working directory
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Dispatcher
+ participant MCP
+ participant Shell
+
+ Agent->>Dispatcher: execute_command: df -h
+ Dispatcher->>MCP: Forward command
+ MCP->>Shell: Execute: df -h
+ Shell->>Shell: Run command
+ Shell-->>MCP: stdout + stderr + exit_code
+ MCP->>MCP: Structure result
+ MCP-->>Dispatcher: Execution result
+ Dispatcher-->>Agent: Structured result
+```
+
+#### Result Structure
+
+```python
+{
+ "success": True, # Boolean indicating success
+ "exit_code": 0, # Process exit code
+ "stdout": "Filesystem Size Used Avail Use% Mounted on\n/dev/sda1 100G 50G 46G 52% /\n",
+ "stderr": "" # Standard error output
+}
+```
+
+#### Common Use Cases
+
+| Use Case | Command Example | Description |
+|----------|----------------|-------------|
+| **File Operations** | `ls -la /home/user` | List directory contents |
+| **Text Processing** | `grep "error" /var/log/syslog` | Search log files |
+| **System Monitoring** | `top -bn1` | Check system processes |
+| **Disk Management** | `df -h` | Check disk space |
+| **Network Operations** | `ping -c 4 example.com` | Test network connectivity |
+| **Archive Creation** | `tar -czf backup.tar.gz /data` | Create compressed archives |
+| **Package Management** | `apt list --installed` | List installed packages |
+
+#### Error Handling
+
+**Exit Code Interpretation**:
+
+- **0**: Success
+- **1-125**: Command-specific errors
+- **126**: Command not executable
+- **127**: Command not found
+- **128+n**: Terminated by signal n
+
+**Example Error Result**:
+
+```python
+{
+ "success": False,
+ "error": "Command not found: invalid_cmd"
+}
+```
+
+#### Security Considerations
+
+!!!warning "Command Safety"
+ The MCP server blocks dangerous commands including:
+
+ - `rm -rf /` - Recursive root deletion
+ - Fork bombs - `:(){ :|:& };:`
+ - `mkfs` - Filesystem formatting
+ - `dd if=/dev/zero` - Device overwriting
+ - `shutdown`, `reboot` - System shutdown
+
+ Commands execute with user permissions, no automatic privilege escalation. Timeout protection prevents hung processes.
+
+### 2. get_system_info - Collect System Information
+
+**Purpose**: Gather basic Linux system information using standard commands.
+
+#### Tool Specification
+
+```python
+tool_name = "get_system_info"
+parameters = {} # No parameters required
+```
+
+#### Information Collected
+
+The tool executes these commands and returns their output:
+
+| Info Type | Command | Data Returned |
+|-----------|---------|---------------|
+| **uname** | `uname -a` | System and kernel information |
+| **uptime** | `uptime` | System uptime and load averages |
+| **memory** | `free -h` | Memory usage statistics (human-readable) |
+| **disk** | `df -h` | Disk space for all mounted filesystems |
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant Dispatcher
+ participant MCP
+ participant System
+
+ Agent->>Dispatcher: get_system_info
+ Dispatcher->>MCP: Forward request
+ MCP->>System: Execute uname, uptime, free, df
+ System-->>MCP: Command outputs
+ MCP->>MCP: Aggregate results
+ MCP-->>Dispatcher: Structured info
+ Dispatcher-->>Agent: System information
+```
+
+#### Result Example
+
+```python
+{
+ "uname": "Linux hostname 5.15.0-91-generic #101-Ubuntu SMP x86_64 GNU/Linux",
+ "uptime": " 14:23:45 up 5 days, 3:12, 2 users, load average: 0.52, 0.58, 0.59",
+ "memory": " total used free shared buff/cache available\nMem: 15Gi 8.2Gi 1.5Gi 256Mi 5.8Gi 7.0Gi\nSwap: 8.0Gi 512Mi 7.5Gi",
+ "disk": "Filesystem Size Used Avail Use% Mounted on\n/dev/sda1 100G 50G 46G 52% /\n/dev/sdb1 500G 200G 276G 42% /data"
+}
+```
+
+## Command Execution Pipeline
+
+### Atomic Building Blocks
+
+The MCP tools `execute_command` and `get_system_info` serve as atomic operations:
+
+```mermaid
+graph TD
+ A[User Request] --> B[LLM Reasoning]
+ B --> C{Select Tool}
+ C -->|Execute CLI| D[execute_command]
+ C -->|Get System Info| E[get_system_info]
+
+ D --> F[Capture Result]
+ E --> F
+
+ F --> G[Update Memory]
+ G --> H{Task Complete?}
+ H -->|No| B
+ H -->|Yes| I[FINISH]
+```
+
+### Isolation of System Operations
+
+By isolating system operations in the MCP server layer, the architecture achieves clear separation: the Agent layer focuses on LLM reasoning and workflow orchestration, while the MCP layer handles system-specific command execution. This provides testability (commands can be mocked) and portability (MCP servers can be deployed remotely).
+
+## Command Composition
+
+### Sequential Execution
+
+LinuxAgent executes commands sequentially, building on previous results:
+
+```python
+# Round 1: Check disk space
+{
+ "action": {
+ "tool": "execute_command",
+ "arguments": {"command": "df -h /data"}
+ }
+}
+# Result: 276GB available
+
+# Round 2: Create backup (informed by Round 1 result)
+{
+ "action": {
+ "tool": "execute_command",
+ "arguments": {
+ "command": "tar -czf /data/backup.tar.gz /home/user"
+ }
+ }
+}
+```
+
+### Conditional Execution
+
+LLM can make decisions based on command results:
+
+```python
+# Round 1: Check if file exists
+{
+ "action": {
+ "tool": "execute_command",
+ "arguments": {"command": "ls /data/backup.tar.gz"}
+ }
+}
+# Result: exit_code=2 (file not found)
+
+# Round 2: File doesn't exist, create it
+{
+ "thought": "Backup doesn't exist, creating new one",
+ "action": {
+ "tool": "execute_command",
+ "arguments": {"command": "tar -czf /data/backup.tar.gz /home/user"}
+ }
+}
+```
+
+### Error Recovery
+
+Commands can be retried or alternatives attempted:
+
+```python
+# Round 1: Try privileged command
+{
+ "action": {
+ "tool": "execute_command",
+ "arguments": {"command": "systemctl restart nginx"}
+ }
+}
+# Result: success=False, error="Permission denied"
+
+# Round 2: Switch to user-level alternative
+{
+ "thought": "Don't have sudo access, using alternative approach",
+ "action": {
+ "tool": "execute_command",
+ "arguments": {"command": "kill -HUP $(cat /var/run/nginx.pid)"}
+ }
+}
+```
+
+---
+
+## Command Implementation
+
+### MCP Server Location
+
+The MCP server implementation for Linux commands can be found in:
+
+```
+ufo/client/mcp/http_servers/
+└── linux_mcp_server.py
+```
+
+### Example Implementation Skeleton
+
+```python
+class LinuxMCPServer:
+ """MCP server for Linux CLI commands"""
+
+ @mcp.tool()
+ async def execute_command(
+ self,
+ command: str,
+ timeout: int = 30,
+ cwd: Optional[str] = None
+ ) -> Dict:
+ """Execute a shell command"""
+ # Block dangerous commands
+ dangerous = ["rm -rf /", ":(){ :|:& };:", "mkfs", ...]
+ if any(d in command.lower() for d in dangerous):
+ return {"success": False, "error": "Blocked dangerous command."}
+
+ try:
+ proc = await asyncio.create_subprocess_shell(
+ command,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ cwd=cwd
+ )
+ try:
+ stdout, stderr = await asyncio.wait_for(
+ proc.communicate(),
+ timeout=timeout
+ )
+ except asyncio.TimeoutError:
+ proc.kill()
+ await proc.wait()
+ return {"success": False, "error": f"Timeout after {timeout}s."}
+
+ return {
+ "success": proc.returncode == 0,
+ "exit_code": proc.returncode,
+ "stdout": stdout.decode("utf-8", errors="replace"),
+ "stderr": stderr.decode("utf-8", errors="replace")
+ }
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ @mcp.tool()
+ async def get_system_info(self) -> Dict:
+ """Collect system information"""
+ info = {}
+ cmds = {
+ "uname": "uname -a",
+ "uptime": "uptime",
+ "memory": "free -h",
+ "disk": "df -h"
+ }
+
+ for k, cmd in cmds.items():
+ try:
+ proc = await asyncio.create_subprocess_shell(
+ cmd, stdout=asyncio.subprocess.PIPE
+ )
+ out, _ = await proc.communicate()
+ info[k] = out.decode("utf-8", errors="replace").strip()
+ except Exception as e:
+ info[k] = f"Error: {e}"
+
+ return info
+```
+
+---
+
+## Best Practices
+
+### Tool Usage
+
+- Use `get_system_info` for quick system overview
+- Use `execute_command` for custom or complex operations
+- Check `success` field and `exit_code` to detect errors
+- Parse `stdout` for structured data when possible
+- Set timeouts appropriately to prevent hung processes
+
+### Security
+
+!!!warning "Security Best Practices"
+ The MCP server has built-in protections, but be cautious:
+
+ - Dangerous commands are automatically blocked
+ - Commands execute with user permissions only
+ - Avoid sudo when possible (requires user interaction)
+ - Sanitize outputs before logging (may contain sensitive data)
+
+### Error Handling
+
+- Check `success` field before considering command successful
+- Parse `stderr` for error messages
+- Implement retries for transient errors
+- Provide alternatives when primary approach fails
+
+## Comparison with Other Agent Commands
+
+| Agent | Command Types | Execution Layer | Result Format |
+|-------|--------------|-----------------|---------------|
+| **LinuxAgent** | CLI + SysInfo | MCP server | success/exit_code/stdout/stderr |
+| **AppAgent** | UI + API | Automator + MCP | UI state + API responses |
+| **HostAgent** | Desktop + Shell | Automator + MCP | Desktop state + results |
+
+LinuxAgent's command set is intentionally minimal and focused:
+
+- **execute_command**: General-purpose command execution
+- **get_system_info**: Standardized system information
+
+This simplicity reflects the CLI environment's text-based, command-driven nature.
+
+## Next Steps
+
+- [State Machine](state.md) - Understand how command execution fits into the FSM
+- [Processing Strategy](strategy.md) - See how commands are integrated into the 3-phase pipeline
+- [Overview](overview.md) - Return to LinuxAgent architecture overview
+- [MCP Overview](../mcp/overview.md) - MCP server implementation details
diff --git a/documents/docs/linux/overview.md b/documents/docs/linux/overview.md
new file mode 100644
index 000000000..fcb5669e6
--- /dev/null
+++ b/documents/docs/linux/overview.md
@@ -0,0 +1,146 @@
+# LinuxAgent: CLI Task Executor
+
+**LinuxAgent** is a specialized lightweight agent designed for executing command-line instructions on Linux systems. It demonstrates how a standalone device agent can leverage the layered FSM architecture and server-client design to perform intelligent, iterative task execution in a CLI-based environment.
+
+**Quick Links:**
+
+- New to Linux Agent? Start with the [Quick Start Guide](../getting_started/quick_start_linux.md)
+- Using as Sub-Agent in Galaxy? See [Using Linux Agent as Galaxy Device](as_galaxy_device.md)
+
+## Architecture Overview
+
+LinuxAgent operates as a single-agent instance that interacts with Linux systems through command-line interface (CLI) commands. Unlike the two-tier architecture of UFO (HostAgent + AppAgent), LinuxAgent uses a simplified single-agent model optimized for shell-based automation.
+
+## Core Responsibilities
+
+LinuxAgent provides the following capabilities for Linux CLI automation:
+
+### Command-Line Execution
+
+LinuxAgent interprets user requests and translates them into appropriate shell commands for execution on Linux systems.
+
+**Example:** User request "Check disk space and create a backup" becomes:
+
+1. Execute `df -h` to check disk space
+2. Execute `tar -czf backup.tar.gz /data` to create backup
+
+### System Information Collection
+
+The agent can proactively gather system-level information to inform decision-making:
+
+- Memory usage (`free -h`)
+- Disk space (`df -h`)
+- Process status (`ps aux`)
+- Hardware configuration (`lscpu`, `lshw`)
+
+### Iterative Task Execution
+
+LinuxAgent executes tasks iteratively, evaluating execution outcomes at each step and determining the next action based on results and LLM reasoning.
+
+### Error Handling and Recovery
+
+The agent monitors command execution results (`stdout`, `stderr`, exit codes) and can adapt its strategy when errors occur.
+
+## Key Characteristics
+
+- **Scope**: Single Linux system (CLI-based automation)
+- **Lifecycle**: One instance per task session
+- **Hierarchy**: Standalone agent (no child agents)
+- **Communication**: Direct MCP server integration
+- **Control**: 3-state finite state machine with 3-phase processing pipeline
+
+## Execution Workflow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant LinuxAgent
+ participant LLM
+ participant MCPServer
+ participant Linux
+
+ User->>LinuxAgent: "Check disk space and create backup"
+ LinuxAgent->>LinuxAgent: State: CONTINUE
+ LinuxAgent->>LLM: Send prompt with request & context
+ LLM-->>LinuxAgent: Return command: df -h
+ LinuxAgent->>MCPServer: execute_command: df -h
+ MCPServer->>Linux: Execute command
+ Linux-->>MCPServer: stdout + stderr
+ MCPServer-->>LinuxAgent: Execution result
+ LinuxAgent->>LinuxAgent: Update memory
+
+ LinuxAgent->>LLM: Send prompt with previous result
+ LLM-->>LinuxAgent: Return command: tar -czf ...
+ LinuxAgent->>MCPServer: execute_command: tar -czf backup.tar.gz /data
+ MCPServer->>Linux: Execute command
+ Linux-->>MCPServer: stdout + stderr
+ MCPServer-->>LinuxAgent: Execution result
+ LinuxAgent->>LinuxAgent: State: FINISH
+ LinuxAgent-->>User: Task completed
+```
+
+## Comparison with UFO Agents
+
+| Aspect | LinuxAgent | HostAgent | AppAgent |
+|--------|------------|-----------|----------|
+| **Platform** | Linux (CLI) | Windows Desktop | Windows Applications |
+| **States** | 3 (CONTINUE, FINISH, FAIL) | 7 states | 6 states |
+| **Architecture** | Single-agent | Parent orchestrator | Child executor |
+| **Interface** | Command-line | Desktop GUI + Shell | Application GUI + API |
+| **Processing Phases** | 3 phases | 4 phases | 4 phases |
+| **MCP Tools** | execute_command, get_system_info | Desktop commands | UI + API commands |
+
+## Design Principles
+
+LinuxAgent exemplifies a minimal viable design for single-agent systems with a small state set (only 3 states for deterministic control flow), modular strategies (clear separation between LLM interaction, action execution, and memory updates), well-defined commands (atomic CLI operations isolated in MCP server layer), proactive information gathering (on-demand system info collection), and traceable execution (complete logging of commands, results, and state transitions).
+
+## Deep Dive Topics
+
+Explore the detailed architecture and implementation:
+
+- [State Machine](state.md) - 3-state FSM lifecycle and transitions
+- [Processing Strategy](strategy.md) - 3-phase pipeline (LLM, Action, Memory)
+- [MCP Commands](commands.md) - CLI execution and system information commands
+
+## Use Cases
+
+LinuxAgent is ideal for:
+
+- **System Administration**: Automated system maintenance and monitoring
+- **DevOps Tasks**: Deployment scripts, log analysis, configuration management
+- **Data Processing**: File operations, text processing, batch jobs
+- **Monitoring & Alerts**: System health checks and automated responses
+- **Cross-Device Workflows**: As a sub-agent in Galaxy multi-device orchestration
+
+!!!tip "Galaxy Integration"
+ LinuxAgent can serve as a device agent in Galaxy's multi-device orchestration framework, executing Linux-specific tasks as part of cross-platform workflows alongside Windows and other devices.
+
+ See [Using Linux Agent as Galaxy Device](as_galaxy_device.md) for configuration details.
+
+## Implementation Location
+
+The LinuxAgent implementation can be found in:
+
+```
+ufo/
+├── agents/
+│ ├── agent/
+│ │ └── customized_agent.py # LinuxAgent class definition
+│ ├── states/
+│ │ └── linux_agent_state.py # State machine implementation
+│ └── processors/
+│ ├── customized/
+│ │ └── customized_agent_processor.py # LinuxAgentProcessor
+│ └── strategies/
+│ └── linux_agent_strategy.py # Processing strategies
+```
+
+## Next Steps
+
+To understand LinuxAgent's complete architecture:
+
+1. [State Machine](state.md) - Learn about the 3-state FSM
+2. [Processing Strategy](strategy.md) - Understand the 3-phase pipeline
+3. [MCP Commands](commands.md) - Explore CLI command execution
+
+For deployment and configuration, see the [Getting Started Guide](../getting_started/quick_start_linux.md).
diff --git a/documents/docs/linux/state.md b/documents/docs/linux/state.md
new file mode 100644
index 000000000..d1623993b
--- /dev/null
+++ b/documents/docs/linux/state.md
@@ -0,0 +1,279 @@
+# LinuxAgent State Machine
+
+LinuxAgent uses a **3-state finite state machine (FSM)** to manage CLI task execution flow. The minimal state set captures essential execution progression while maintaining simplicity and predictability. States transition based on LLM decisions and command execution results.
+
+## State Machine Architecture
+
+### State Enumeration
+
+```python
+class LinuxAgentStatus(Enum):
+ """Store the status of the linux agent"""
+ CONTINUE = "CONTINUE" # Task is ongoing, requires further commands
+ FINISH = "FINISH" # Task completed successfully
+ FAIL = "FAIL" # Task cannot proceed, unrecoverable error
+```
+
+### State Management
+
+LinuxAgent states are managed by `LinuxAgentStateManager`, which implements the agent state registry pattern:
+
+```python
+class LinuxAgentStateManager(AgentStateManager):
+ """Manages the states of the linux agent"""
+ _state_mapping: Dict[str, Type[LinuxAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ return NoneLinuxAgentState()
+```
+
+All LinuxAgent states are registered using the `@LinuxAgentStateManager.register` decorator, enabling dynamic state lookup by name.
+
+## State Transition Diagram
+
+
+ 
+ Figure: Lifecycle state transitions of the LinuxAgent. The agent starts in CONTINUE state, executes CLI commands iteratively, and transitions to FINISH upon completion or FAIL upon encountering unrecoverable errors.
+
+
+## State Definitions
+
+### 1. CONTINUE State
+
+**Purpose**: Active execution state where LinuxAgent processes the user request and executes CLI commands.
+
+```python
+@LinuxAgentStateManager.register
+class ContinueLinuxAgentState(LinuxAgentState):
+ """The class for the continue linux agent state"""
+
+ async def handle(self, agent: "LinuxAgent", context: Optional["Context"] = None):
+ """Execute the 3-phase processing pipeline"""
+ await agent.process(context)
+
+ def is_round_end(self) -> bool:
+ return False # Round continues
+
+ def is_subtask_end(self) -> bool:
+ return False # Subtask continues
+
+ @classmethod
+ def name(cls) -> str:
+ return LinuxAgentStatus.CONTINUE.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Active |
+| **Processor Executed** | ✓ Yes (3 phases) |
+| **Round Ends** | No |
+| **Subtask Ends** | No |
+| **Duration** | Single round |
+| **Next States** | CONTINUE, FINISH, FAIL |
+
+**Behavior**:
+
+1. Constructs prompts with previous execution results
+2. Gets next CLI command from LLM
+3. Executes command via MCP server
+4. Updates memory with execution results
+5. Determines next state based on LLM response
+
+**State Transition Logic**:
+
+- **CONTINUE → CONTINUE**: Task requires more commands to complete
+- **CONTINUE → FINISH**: LLM determines task is complete
+- **CONTINUE → FAIL**: Unrecoverable error encountered (e.g., permission denied, resource unavailable)
+
+### 2. FINISH State
+
+**Purpose**: Terminal state indicating successful task completion.
+
+```python
+@LinuxAgentStateManager.register
+class FinishLinuxAgentState(LinuxAgentState):
+ """The class for the finish linux agent state"""
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ return FinishLinuxAgentState() # Remains in FINISH
+
+ def is_subtask_end(self) -> bool:
+ return True # Subtask completed
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ @classmethod
+ def name(cls) -> str:
+ return LinuxAgentStatus.FINISH.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Subtask Ends** | Yes |
+| **Duration** | Permanent |
+| **Next States** | FINISH (no transition) |
+
+**Behavior**:
+
+- Signals task completion to session manager
+- No further processing occurs
+- Agent instance can be terminated
+
+FINISH state is reached when all required CLI commands have been executed successfully, the LLM determines the user request has been fulfilled, and no errors or exceptions occurred during execution.
+
+### 3. FAIL State
+
+**Purpose**: Terminal state indicating task failure due to unrecoverable errors.
+
+```python
+@LinuxAgentStateManager.register
+class FailLinuxAgentState(LinuxAgentState):
+ """The class for the fail linux agent state"""
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ return FinishLinuxAgentState() # Transitions to FINISH for cleanup
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ def is_subtask_end(self) -> bool:
+ return True # Subtask failed
+
+ @classmethod
+ def name(cls) -> str:
+ return LinuxAgentStatus.FAIL.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal (Error) |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Subtask Ends** | Yes |
+| **Duration** | Transitions to FINISH |
+| **Next States** | FINISH |
+
+**Behavior**:
+
+- Logs failure reason and context
+- Transitions to FINISH state for cleanup
+- Session manager receives failure status
+
+!!!error "Failure Conditions"
+ FAIL state is reached when insufficient privileges prevent command execution, required system resources are not accessible (disk full, network unreachable), repeated command syntax errors occur, the LLM explicitly indicates task cannot be completed, or task requirements exceed current system capabilities.
+
+**Error Recovery**:
+
+While FAIL is a terminal state, the error information is logged for debugging:
+
+```python
+# Example error logging in FAIL state
+agent.logger.error(f"Task failed: {error_message}")
+agent.logger.debug(f"Last command: {last_command}")
+agent.logger.debug(f"Command output: {stderr}")
+```
+
+## State Transition Rules
+
+### Transition Decision Logic
+
+State transitions are determined by the LLM's response in the **CONTINUE** state:
+
+```python
+# LLM returns status in response
+parsed_response = {
+ "action": {
+ "command": "df -h",
+ "status": "CONTINUE" # or "FINISH" or "FAIL"
+ },
+ "thought": "Need to check disk space first"
+}
+
+# Agent updates its status based on LLM decision
+agent.status = parsed_response["action"]["status"]
+next_state = LinuxAgentStateManager().get_state(agent.status)
+```
+
+### Transition Matrix
+
+| Current State | Condition | Next State | Trigger |
+|---------------|-----------|------------|---------|
+| **CONTINUE** | LLM returns CONTINUE | CONTINUE | More commands needed |
+| **CONTINUE** | LLM returns FINISH | FINISH | Task completed |
+| **CONTINUE** | LLM returns FAIL | FAIL | Unrecoverable error |
+| **CONTINUE** | Exception raised | FAIL | System error |
+| **FINISH** | Any | FINISH | No transition |
+| **FAIL** | Any | FINISH | Cleanup transition |
+
+## State-Specific Processing
+
+### CONTINUE State Processing Pipeline
+
+When in CONTINUE state, LinuxAgent executes the full 3-phase pipeline:
+
+```mermaid
+graph TD
+ A[CONTINUE State] --> B[Phase 1: LLM Interaction]
+ B --> C[Phase 2: Action Execution]
+ C --> D[Phase 3: Memory Update]
+ D --> E{Check Status}
+ E -->|CONTINUE| A
+ E -->|FINISH| F[FINISH State]
+ E -->|FAIL| G[FAIL State]
+```
+
+### Terminal States (FINISH / FAIL)
+
+Terminal states perform no processing:
+
+- **FINISH**: Clean termination, results available in memory
+- **FAIL**: Error termination, error details logged
+
+## Deterministic Control Flow
+
+The 3-state design ensures deterministic, traceable execution with predictable behavior (every execution path is well-defined), debuggability (state transitions are logged and traceable), testability (finite state space simplifies testing), and maintainability (simple state set reduces complexity).
+
+## Comparison with Other Agents
+
+| Agent | States | Complexity | Use Case |
+|-------|--------|------------|----------|
+| **LinuxAgent** | 3 | Minimal | CLI task execution |
+| **AppAgent** | 6 | Moderate | Windows app automation |
+| **HostAgent** | 7 | High | Desktop orchestration |
+
+LinuxAgent's minimal 3-state design reflects its focused scope: execute CLI commands to fulfill user requests. The simplified state machine eliminates unnecessary complexity while maintaining robust error handling and completion detection.
+
+## Implementation Details
+
+The state machine implementation can be found in:
+
+```
+ufo/agents/states/linux_agent_state.py
+```
+
+Key classes:
+
+- `LinuxAgentStatus`: State enumeration
+- `LinuxAgentStateManager`: State registry and lookup
+- `LinuxAgentState`: Abstract base class
+- `ContinueLinuxAgentState`: Active execution state
+- `FinishLinuxAgentState`: Successful completion state
+- `FailLinuxAgentState`: Error termination state
+- `NoneLinuxAgentState`: Initial/undefined state
+
+## Next Steps
+
+- [Processing Strategy](strategy.md) - Understand the 3-phase processing pipeline executed in CONTINUE state
+- [MCP Commands](commands.md) - Explore CLI command execution and system information retrieval
+- [Overview](overview.md) - Return to LinuxAgent architecture overview
diff --git a/documents/docs/linux/strategy.md b/documents/docs/linux/strategy.md
new file mode 100644
index 000000000..4133d904f
--- /dev/null
+++ b/documents/docs/linux/strategy.md
@@ -0,0 +1,520 @@
+# LinuxAgent Processing Strategy
+
+LinuxAgent executes a **3-phase processing pipeline** in the **CONTINUE** state. Each phase handles a specific aspect of CLI task execution: LLM decision making, action execution, and memory recording. This streamlined design separates prompt construction and LLM reasoning from command execution and state updates, enhancing modularity and traceability.
+
+## Strategy Assembly
+
+Processing strategies are assembled and orchestrated by the `LinuxAgentProcessor` class defined in `ufo/agents/processors/customized/customized_agent_processor.py`. The processor coordinates the 3-phase pipeline execution.
+
+### LinuxAgentProcessor Overview
+
+The `LinuxAgentProcessor` extends `CustomizedProcessor` and manages the Linux-specific workflow:
+
+```python
+class LinuxAgentProcessor(CustomizedProcessor):
+ """
+ Processor for Linux MCP Agent.
+ Manages CLI command execution workflow with:
+ - LLM-based command generation
+ - MCP-based command execution
+ - Memory-based result tracking
+ """
+
+ def _setup_strategies(self) -> None:
+ """Setup the 3-phase processing pipeline"""
+
+ # Phase 1: LLM Interaction (critical - fail_fast=True)
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ LinuxLLMInteractionStrategy(fail_fast=True)
+ )
+
+ # Phase 2: Action Execution (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ LinuxActionExecutionStrategy(fail_fast=False)
+ )
+
+ # Phase 3: Memory Update (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(fail_fast=False)
+ )
+```
+
+### Strategy Registration
+
+| Phase | Strategy Class | fail_fast | Rationale |
+|-------|---------------|-----------|-----------|
+| **LLM_INTERACTION** | `LinuxLLMInteractionStrategy` | ✓ True | LLM failure requires immediate recovery |
+| **ACTION_EXECUTION** | `LinuxActionExecutionStrategy` | ✗ False | Command failures can be handled gracefully |
+| **MEMORY_UPDATE** | `AppMemoryUpdateStrategy` | ✗ False | Memory failures shouldn't block execution |
+
+**Fail-Fast vs Graceful:**
+
+- **fail_fast=True**: Critical phases where errors should immediately transition to FAIL state
+- **fail_fast=False**: Non-critical phases where errors can be logged and execution continues
+
+## Three-Phase Pipeline
+
+### Pipeline Execution Flow
+
+```mermaid
+graph LR
+ A[CONTINUE State] --> B[Phase 1: LLM Interaction]
+ B --> C[Phase 2: Action Execution]
+ C --> D[Phase 3: Memory Update]
+ D --> E[Determine Next State]
+ E --> F{Status?}
+ F -->|CONTINUE| A
+ F -->|FINISH| G[FINISH State]
+ F -->|FAIL| H[FAIL State]
+```
+
+## Phase 1: LLM Interaction Strategy
+
+**Purpose**: Construct prompts with execution context and obtain next CLI command from LLM.
+
+### Strategy Implementation
+
+```python
+@depends_on("request")
+@provides("parsed_response", "response_text", "llm_cost",
+ "prompt_message", "action", "thought", "comment")
+class LinuxLLMInteractionStrategy(AppLLMInteractionStrategy):
+ """
+ Strategy for LLM interaction with Linux Agent specific prompting.
+
+ Handles:
+ - Context-aware prompt construction with previous results
+ - LLM interaction with retry logic
+ - Response parsing and validation
+ """
+
+ async def execute(self, agent: "LinuxAgent",
+ context: ProcessingContext) -> ProcessingResult:
+ """Execute LLM interaction for Linux Agent"""
+```
+
+### Phase 1 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Agent
+ participant Prompter
+ participant LLM
+
+ Strategy->>Agent: Get previous plan
+ Strategy->>Agent: Get blackboard context
+ Agent-->>Strategy: Previous execution results
+
+ Strategy->>Prompter: Construct prompt
+ Prompter->>Prompter: Build system message
+ Prompter->>Prompter: Build user message with context
+ Prompter-->>Strategy: Complete prompt
+
+ Strategy->>LLM: Send prompt
+ LLM-->>Strategy: CLI command + status
+
+ Strategy->>Strategy: Parse response
+ Strategy->>Strategy: Validate command
+ Strategy-->>Agent: Parsed response + cost
+```
+
+### Prompt Construction
+
+The strategy constructs comprehensive prompts using:
+
+1. **System Message**: Agent role and capabilities
+2. **User Request**: Original task description
+3. **Previous Results**: Command outputs from prior executions
+4. **Blackboard Context**: Shared state from other agents (if any)
+5. **Last Success Actions**: Previously successful commands
+
+```python
+prompt_message = agent.message_constructor(
+ dynamic_examples=[], # Few-shot examples (optional)
+ dynamic_knowledge="", # Retrieved knowledge (optional)
+ plan=plan, # Previous execution plan
+ request=request, # User request
+ blackboard_prompt=blackboard_prompt, # Shared context
+ last_success_actions=last_success_actions # Successful commands
+)
+```
+
+### LLM Response Format
+
+The LLM returns a structured response:
+
+```json
+{
+ "thought": "Need to check disk space before creating backup",
+ "action": {
+ "tool": "execute_command",
+ "arguments": {
+ "command": "df -h"
+ },
+ "status": "CONTINUE"
+ },
+ "comment": "Checking available disk space"
+}
+```
+
+### Proactive Information Gathering
+
+LinuxAgent proactively requests system information when needed, eliminating unnecessary overhead and increasing responsiveness.
+
+### Error Handling
+
+```python
+try:
+ response_text, llm_cost = await self._get_llm_response(
+ agent, prompt_message
+ )
+ parsed_response = self._parse_app_response(agent, response_text)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed_response,
+ "response_text": response_text,
+ "llm_cost": llm_cost,
+ ...
+ }
+ )
+except Exception as e:
+ self.logger.error(f"LLM interaction failed: {str(e)}")
+ return self.handle_error(e, ProcessingPhase.LLM_INTERACTION, context)
+```
+
+---
+
+## Phase 2: Action Execution Strategy
+
+**Purpose**: Execute CLI commands returned by LLM and capture structured results.
+
+### Strategy Implementation
+
+```python
+class LinuxActionExecutionStrategy(AppActionExecutionStrategy):
+ """
+ Strategy for executing actions in Linux Agent.
+
+ Handles:
+ - CLI command execution via MCP server
+ - Result capturing (stdout, stderr, exit code)
+ - Error handling and retry logic
+ """
+
+ async def execute(self, agent: "LinuxAgent",
+ context: ProcessingContext) -> ProcessingResult:
+ """Execute Linux Agent actions"""
+```
+
+### Phase 2 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant MCP
+ participant Linux
+
+ Strategy->>Strategy: Extract command from LLM response
+ Strategy->>MCP: execute_command: df -h
+
+ MCP->>Linux: Execute shell command
+ Linux-->>MCP: stdout + stderr + exit_code
+
+ MCP-->>Strategy: Execution result
+ Strategy->>Strategy: Create action info
+ Strategy->>Strategy: Format for memory
+ Strategy-->>Agent: Execution results
+```
+
+### Command Execution
+
+The strategy dispatches commands to the MCP server:
+
+```python
+# Extract parsed LLM response
+parsed_response: AppAgentResponse = context.get_local("parsed_response")
+command_dispatcher = context.global_context.command_dispatcher
+
+# Execute the command via MCP
+execution_results = await self._execute_app_action(
+ command_dispatcher,
+ parsed_response.action
+)
+```
+
+### Result Capture
+
+Execution results are structured for downstream processing:
+
+```python
+{
+ "success": True,
+ "exit_code": 0,
+ "stdout": "Filesystem Size Used Avail Use% Mounted on\n/dev/sda1 100G 50G 46G 52% /",
+ "stderr": ""
+}
+```
+
+### Action Info Creation
+
+Results are formatted into `ActionCommandInfo` objects:
+
+```python
+actions = self._create_action_info(
+ parsed_response.action,
+ execution_results,
+)
+
+action_info = ListActionCommandInfo(actions)
+action_info.color_print() # Pretty print to console
+```
+
+### Error Handling
+
+```python
+try:
+ execution_results = await self._execute_app_action(...)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "execution_result": execution_results,
+ "action_info": action_info,
+ "control_log": control_log,
+ "status": status
+ }
+ )
+except Exception as e:
+ self.logger.error(f"Action execution failed: {traceback.format_exc()}")
+ return self.handle_error(e, ProcessingPhase.ACTION_EXECUTION, context)
+```
+
+---
+
+## Phase 3: Memory Update Strategy
+
+**Purpose**: Persist execution results and commands into agent memory for future reference.
+
+### Strategy Implementation
+
+LinuxAgent reuses the `AppMemoryUpdateStrategy` from the app agent framework:
+
+```python
+self.strategies[ProcessingPhase.MEMORY_UPDATE] = AppMemoryUpdateStrategy(
+ fail_fast=False # Memory failures shouldn't stop process
+)
+```
+
+### Phase 3 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Memory
+ participant Context
+
+ Strategy->>Context: Get execution results
+ Strategy->>Context: Get LLM response
+
+ Strategy->>Memory: Create memory item
+ Memory->>Memory: Store command
+ Memory->>Memory: Store stdout/stderr
+ Memory->>Memory: Store timestamp
+
+ Strategy->>Context: Update round result
+ Strategy-->>Agent: Memory updated
+```
+
+### Memory Structure
+
+Each execution round is stored as a memory item:
+
+```python
+{
+ "round": 1,
+ "request": "Check disk space and create backup",
+ "thought": "Need to check disk space first",
+ "action": {
+ "command": "EXEC_CLI",
+ "parameters": {"command": "df -h"}
+ },
+ "result": {
+ "stdout": "Filesystem Size Used...",
+ "stderr": "",
+ "exit_code": 0
+ },
+ "status": "CONTINUE",
+ "timestamp": "2025-11-06T10:30:45"
+}
+```
+
+### Iterative Refinement
+
+Memory enables iterative refinement:
+
+1. **Round 1**: Check disk space → Result: 50G available
+2. **Round 2**: Create backup (knowing 50G is available)
+3. **Round 3**: Verify backup creation
+
+Each round builds on previous results stored in memory.
+
+### Error Recovery
+
+Memory also stores errors for recovery:
+
+```python
+{
+ "round": 2,
+ "action": {"tool": "execute_command", "arguments": {"command": "invalid_cmd"}},
+ "result": {
+ "success": False,
+ "error": "Command not found: invalid_cmd"
+ },
+ "status": "FAIL"
+}
+```
+
+## Middleware Stack
+
+LinuxAgent uses specialized middleware for logging:
+
+```python
+def _setup_middleware(self) -> None:
+ """Setup middleware pipeline for Linux Agent"""
+ self.middleware_chain = [LinuxLoggingMiddleware()]
+```
+
+### LinuxLoggingMiddleware
+
+Provides enhanced logging specific to Linux operations:
+
+```python
+class LinuxLoggingMiddleware(AppAgentLoggingMiddleware):
+ """Specialized logging middleware for Linux Agent"""
+
+ def starting_message(self, context: ProcessingContext) -> str:
+ request = context.get_local("request")
+ return f"Completing the user request [{request}] on Linux."
+```
+
+**Logged Information**:
+
+- User request
+- Each CLI command executed
+- Command outputs (stdout/stderr)
+- Execution timestamps
+- State transitions
+- LLM costs
+
+---
+
+## Context Finalization
+
+After processing, the processor updates global context:
+
+```python
+def _finalize_processing_context(self, processing_context: ProcessingContext):
+ """Finalize processing context by updating ContextNames fields"""
+ super()._finalize_processing_context(processing_context)
+
+ try:
+ result = processing_context.get_local("result")
+ if result:
+ self.global_context.set(ContextNames.ROUND_RESULT, result)
+ except Exception as e:
+ self.logger.warning(f"Failed to update context: {e}")
+```
+
+This makes execution results available to:
+
+- Subsequent rounds (iterative execution)
+- Other agents (if part of multi-agent workflow)
+- Session manager (for monitoring and logging)
+
+---
+
+## Strategy Dependency Graph
+
+The three phases have clear dependencies:
+
+```mermaid
+graph TD
+ A[request] --> B[Phase 1: LLM Interaction]
+ B --> C[parsed_response]
+ B --> D[llm_cost]
+ B --> E[prompt_message]
+
+ C --> F[Phase 2: Action Execution]
+ F --> G[execution_result]
+ F --> H[action_info]
+
+ C --> I[Phase 3: Memory Update]
+ G --> I
+ H --> I
+ I --> J[Memory Updated]
+
+ J --> K[Next Round or Terminal State]
+```
+
+Dependencies are declared using decorators:
+
+```python
+@depends_on("request")
+@provides("parsed_response", "response_text", "llm_cost", ...)
+class LinuxLLMInteractionStrategy(AppLLMInteractionStrategy):
+ ...
+```
+
+---
+
+## Modular Design Benefits
+
+The 3-phase strategy design provides:
+
+!!!success "Modularity Benefits"
+ - **Separation of Concerns**: LLM reasoning, command execution, and memory are isolated
+ - **Testability**: Each phase can be tested independently
+ - **Extensibility**: New strategies can be added without modifying existing code
+ - **Reusability**: Memory strategy is shared with AppAgent
+ - **Maintainability**: Clear boundaries between decision-making and execution
+ - **Traceability**: Each phase logs its operations independently
+
+---
+
+## Comparison with Other Agents
+
+| Agent | Phases | Data Collection | LLM | Action | Memory |
+|-------|--------|----------------|-----|--------|--------|
+| **LinuxAgent** | 3 | ✗ None | ✓ CLI commands | ✓ MCP execute_command | ✓ Results |
+| **AppAgent** | 4 | ✓ Screenshots + UI | ✓ UI actions | ✓ GUI + API | ✓ Results |
+| **HostAgent** | 4 | ✓ Desktop snapshot | ✓ App selection | ✓ Orchestration | ✓ Results |
+
+LinuxAgent omits the **DATA_COLLECTION** phase because there's no GUI to capture (CLI-based), system info is obtained on-demand via MCP tools, and previous execution results provide necessary context. This reflects the proactive information gathering principle.
+
+## Implementation Location
+
+The strategy implementations can be found in:
+
+```
+ufo/agents/processors/
+├── customized/
+│ └── customized_agent_processor.py # LinuxAgentProcessor
+└── strategies/
+ └── linux_agent_strategy.py # Linux-specific strategies
+```
+
+Key classes:
+
+- `LinuxAgentProcessor`: Strategy orchestrator
+- `LinuxLLMInteractionStrategy`: Prompt construction and LLM interaction
+- `LinuxActionExecutionStrategy`: CLI command execution
+- `LinuxLoggingMiddleware`: Enhanced logging
+
+## Next Steps
+
+- [MCP Commands](commands.md) - Explore the CLI execution commands used by LinuxAgent
+- [State Machine](state.md) - Understand the 3-state FSM that controls strategy execution
+- [Overview](overview.md) - Return to LinuxAgent architecture overview
diff --git a/documents/docs/logs/evaluation_logs.md b/documents/docs/logs/evaluation_logs.md
deleted file mode 100644
index 05ee396dd..000000000
--- a/documents/docs/logs/evaluation_logs.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Evaluation Logs
-
-The evaluation logs store the evaluation results from the `EvaluationAgent`. The evaluation log contains the following information:
-
-| Field | Description | Type |
-| --- | --- | --- |
-| Reason | The detailed reason for your judgment, by observing the screenshot differences and the . | String |
-| Sub-score | The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | List of Dictionaries |
-| Complete | The completion status of the evaluation, can be `yes`, `no`, or `unsure`. | String |
-| level | The level of the evaluation. | String |
-| request | The request sent to the `EvaluationAgent`. | Dictionary |
-| id | The ID of the evaluation. | Integer |
-
-
diff --git a/documents/docs/logs/markdown_log_viewer.md b/documents/docs/logs/markdown_log_viewer.md
deleted file mode 100644
index 9d79c15b9..000000000
--- a/documents/docs/logs/markdown_log_viewer.md
+++ /dev/null
@@ -1,15 +0,0 @@
-# Markdown-Formatted Log Viewer
-
-We provide a Markdown-formatted log viewer for better readability and organization of logs for debugging and analysis. The Markdown log viewer is designed to display logs in a structured format, making it easier to identify issues and understand the flow of the application.
-
-## Configuration
-To enable the Markdown log viewer, you need to set the `LOG_TO_MARKDOWN` option in the `config_dev.yaml` file to `True`. Below is the detailed configuration in the `config_dev.yaml` file:
-
-```yaml
-LOG_TO_MARKDOWN: True # Whether to log to markdown format
-```
-
-After setting this option, the logs will be saved in a Markdown format in your `logs/` directory.
-
-!!! tip
- We strongly recommend to turn on this option. The development team uses this option to debug the agent's behavior and improve the performance of the agent.
diff --git a/documents/docs/logs/overview.md b/documents/docs/logs/overview.md
deleted file mode 100644
index b5200ca5f..000000000
--- a/documents/docs/logs/overview.md
+++ /dev/null
@@ -1,12 +0,0 @@
-# UFO Logs
-
-Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO:
-
-| Log Type | Description | Location | Level |
-| --- | --- | --- | --- |
-| [Request Log](./request_logs.md) | Contains the prompt requests to LLMs. | `logs/{task_name}/request.log` | Info |
-| [Step Log](./step_logs.md) | Contains the agent's response to the user's request and additional information at every step. | `logs/{task_name}/response.log` | Info |
-| [Evaluation Log](./evaluation_logs.md) | Contains the evaluation results from the `EvaluationAgent`. | `logs/{task_name}/evaluation.log` | Info |
-| [Screenshots](./screenshots_logs.md) | Contains the screenshots of the application UI. | `logs/{task_name}/` | - |
-
-All logs are stored in the `logs/{task_name}` directory.
\ No newline at end of file
diff --git a/documents/docs/logs/request_logs.md b/documents/docs/logs/request_logs.md
deleted file mode 100644
index 041689422..000000000
--- a/documents/docs/logs/request_logs.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Request Logs
-
-The request is the prompt requests to the LLMs. The request log is stored in the `request.log` file. The request log contains the following information for each step:
-
-| Field | Description |
-| --- | --- |
-| `step` | The step number of the session. |
-| `prompt` | The prompt message sent to the LLMs. |
-
-The request log is stored at the `debug` level. You can configure the logging level in the `LOG_LEVEL` field in the `config_dev.yaml` file.
-
-!!! tip
- You can use the following python code to read the request log:
-
- import json
-
- with open('logs/{task_name}/request.log', 'r') as f:
- for line in f:
- log = json.loads(line)
-
\ No newline at end of file
diff --git a/documents/docs/logs/screenshots_logs.md b/documents/docs/logs/screenshots_logs.md
deleted file mode 100644
index 9990e7a82..000000000
--- a/documents/docs/logs/screenshots_logs.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Screenshot Logs
-
-UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the `logs/{task_name}/`.
-
-There are 4 types of screenshot logs generated by UFO, as detailed below.
-
-
-## Clean Screenshots
-At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the `action_step{step_number}.png` file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the `action_round_{round_id}_sub_round_{sub_task_id}_final.png`, `action_round_{round_id}_final.png` and `action_step_final.png` files, respectively. Below is an example of a clean screenshot.
-
-
-
-
-
-
-## Annotation Screenshots
-UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm. The annotated screenshots are saved in the `action_step{step_number}_annotated.png` file. Below is an example of an annotated screenshot.
-
-
-
-
-
-!!!info
- Only selected types of controls are annotated in the screenshots. They are configured in the `config_dev.yaml` file under the `CONTROL_LIST` field.
-
-!!!tip
- Different types of controls are annotated with different colors. You can configure the colors in the `config_dev.yaml` file under the `ANNOTATION_COLORS` field.
-
-
-## Concatenated Screenshots
-UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the `action_step{step_number}_concat.png` file. Below is an example of a concatenated screenshot.
-
-
-
-
-
-!!!info
- You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the `config_dev.yaml` file under the `CONCAT_SCREENSHOT` field.
-
-## Selected Control Screenshots
-UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the `action_step{step_number}_selected_controls.png` file. Below is an example of a selected control screenshot.
-
-
-
-
-
-!!!info
- You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the `config_dev.yaml` file under the `INCLUDE_LAST_SCREENSHOT` field.
\ No newline at end of file
diff --git a/documents/docs/logs/step_logs.md b/documents/docs/logs/step_logs.md
deleted file mode 100644
index 63bb84326..000000000
--- a/documents/docs/logs/step_logs.md
+++ /dev/null
@@ -1,102 +0,0 @@
-# Step Logs
-
-The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the `response.log` file. The log fields are different for `HostAgent` and `AppAgent`. The step log is at the `info` level.
-## HostAgent Logs
-
-The `HostAgent` logs contain the following fields:
-
-
-### LLM Output
-
-| Field | Description | Type |
-| --- | --- | --- |
-| Observation | The observation of current desktop screenshots. | String |
-| Thought | The logical reasoning process of the `HostAgent`. | String |
-| Current Sub-Task | The current sub-task to be executed by the `AppAgent`. | String |
-| Message | The message to be sent to the `AppAgent` for the completion of the sub-task. | String |
-| ControlLabel | The index of the selected application to execute the sub-task. | String |
-| ControlText | The name of the selected application to execute the sub-task. | String |
-| Plan | The plan for the following sub-tasks after the current sub-task. | List of Strings |
-| Status | The status of the agent, mapped to the `AgentState`. | String |
-| Comment | Additional comments or information provided to the user. | String |
-| Questions | The questions to be asked to the user for additional information. | List of Strings |
-| Bash | The bash command to be executed by the `HostAgent`. It can be used to open applications or execute system commands. | String |
-
-
-### Additional Information
-
-| Field | Description | Type |
-| --- | --- | --- |
-| Step | The step number of the session. | Integer |
-| RoundStep | The step number of the current round. | Integer |
-| AgentStep | The step number of the `HostAgent`. | Integer |
-| Round | The round number of the session. | Integer |
-| ControlLabel | The index of the selected application to execute the sub-task. | Integer |
-| ControlText | The name of the selected application to execute the sub-task. | String |
-| Request | The user request. | String |
-| Agent | The agent that executed the step, set to `HostAgent`. | String |
-| AgentName | The name of the agent. | String |
-| Application | The application process name. | String |
-| Cost | The cost of the step. | Float |
-| Results | The results of the step, set to an empty string. | String |
-| CleanScreenshot | The image path of the desktop screenshot. | String |
-| AnnotatedScreenshot | The image path of the annotated application screenshot. | String |
-| ConcatScreenshot | The image path of the concatenated application screenshot. | String |
-| SelectedControlScreenshot | The image path of the selected control screenshot. | String |
-| time_cost | The time cost of each step in the process. | Dictionary |
-
-
-
-## AppAgent Logs
-
-The `AppAgent` logs contain the following fields:
-
-### LLM Output
-
-| Field | Description | Type |
-| --- | --- | --- |
-| Observation | The observation of the current application screenshots. | String |
-| Thought | The logical reasoning process of the `AppAgent`. | String |
-| ControlLabel | The index of the selected control to interact with. | String |
-| ControlText | The name of the selected control to interact with. | String |
-| Function | The function to be executed on the selected control. | String |
-| Args | The arguments required for the function execution. | List of Strings |
-| Status | The status of the agent, mapped to the `AgentState`. | String |
-| Plan | The plan for the following steps after the current action. | List of Strings |
-| Comment | Additional comments or information provided to the user. | String |
-| SaveScreenshot | The flag to save the screenshot of the application to the `blackboard` for future reference. | Boolean |
-
-### Additional Information
-
-| Field | Description | Type |
-| --- | --- | --- |
-| Step | The step number of the session. | Integer |
-| RoundStep | The step number of the current round. | Integer |
-| AgentStep | The step number of the `AppAgent`. | Integer |
-| Round | The round number of the session. | Integer |
-| Subtask | The sub-task to be executed by the `AppAgent`. | String |
-| SubtaskIndex | The index of the sub-task in the current round. | Integer |
-| Action | The action to be executed by the `AppAgent`. | String |
-| ActionType | The type of the action to be executed. | String |
-| Request | The user request. | String |
-| Agent | The agent that executed the step, set to `AppAgent`. | String |
-| AgentName | The name of the agent. | String |
-| Application | The application process name. | String |
-| Cost | The cost of the step. | Float |
-| Results | The results of the step. | String |
-| CleanScreenshot | The image path of the desktop screenshot. | String |
-| AnnotatedScreenshot | The image path of the annotated application screenshot. | String |
-| ConcatScreenshot | The image path of the concatenated application screenshot. | String |
-| time_cost | The time cost of each step in the process. | Dictionary |
-
-!!! tip
- You can use the following python code to read the request log:
-
- import json
-
- with open('logs/{task_name}/request.log', 'r') as f:
- for line in f:
- log = json.loads(line)
-
-!!! info
- The `FollowerAgent` logs share the same fields as the `AppAgent` logs.
\ No newline at end of file
diff --git a/documents/docs/logs/ui_tree_logs.md b/documents/docs/logs/ui_tree_logs.md
deleted file mode 100644
index c83231ea8..000000000
--- a/documents/docs/logs/ui_tree_logs.md
+++ /dev/null
@@ -1,81 +0,0 @@
-# UI Tree Logs
-
-UFO can save the entire UI tree of the application window at every step for data collection purposes. The UI tree can represent the application's UI structure, including the window, controls, and their properties. The UI tree logs are saved in the `logs/{task_name}/ui_tree` folder. You have to set the `SAVE_UI_TREE` flag to `True` in the `config_dev.yaml` file to enable the UI tree logs. Below is an example of the UI tree logs for application:
-
-```json
-{
- "id": "node_0",
- "name": "Mail - Chaoyun Zhang - Outlook",
- "control_type": "Window",
- "rectangle": {
- "left": 628,
- "top": 258,
- "right": 3508,
- "bottom": 1795
- },
- "adjusted_rectangle": {
- "left": 0,
- "top": 0,
- "right": 2880,
- "bottom": 1537
- },
- "relative_rectangle": {
- "left": 0.0,
- "top": 0.0,
- "right": 1.0,
- "bottom": 1.0
- },
- "level": 0,
- "children": [
- {
- "id": "node_1",
- "name": "",
- "control_type": "Pane",
- "rectangle": {
- "left": 3282,
- "top": 258,
- "right": 3498,
- "bottom": 330
- },
- "adjusted_rectangle": {
- "left": 2654,
- "top": 0,
- "right": 2870,
- "bottom": 72
- },
- "relative_rectangle": {
- "left": 0.9215277777777777,
- "top": 0.0,
- "right": 0.9965277777777778,
- "bottom": 0.0468445022771633
- },
- "level": 1,
- "children": []
- }
- ]
-}
-```
-
-
-## Fields in the UI tree logs
-Below is a table of the fields in the UI tree logs:
-
-| Field | Description | Type |
-| --- | --- | --- |
-| id | The unique identifier of the UI tree node. | String |
-| name | The name of the UI tree node. | String |
-| control_type | The type of the UI tree node. | String |
-| rectangle | The absolute position of the UI tree node. | Dictionary |
-| adjusted_rectangle | The adjusted position of the UI tree node. | Dictionary |
-| relative_rectangle | The relative position of the UI tree node. | Dictionary |
-| level | The level of the UI tree node. | Integer |
-| children | The children of the UI tree node. | List of UI tree nodes |
-
-# Reference
-
-:::automator.ui_control.ui_tree.UITree
-
-
-
-!!!note
- Save the UI tree logs may increase the latency of the system. It is recommended to set the `SAVE_UI_TREE` flag to `False` when you do not need the UI tree logs.
\ No newline at end of file
diff --git a/documents/docs/mcp/action.md b/documents/docs/mcp/action.md
new file mode 100644
index 000000000..6d0b1adc1
--- /dev/null
+++ b/documents/docs/mcp/action.md
@@ -0,0 +1,372 @@
+# Action Servers
+
+## Overview
+
+**Action Servers** provide tools that modify system state by executing actions. These servers enable agents to interact with the environment, automate tasks, and implement decisions.
+
+**Action servers are the only servers whose tools can be selected by the LLM agent.** At each step, the agent chooses which action tool to execute based on the task and current context.
+
+- **LLM Decision**: Agent actively selects from available action tools
+- **Dynamic Selection**: Different action chosen at each step based on needs
+- **Tool Visibility**: All action tools are presented to the LLM in the prompt
+
+**[Data Collection Servers](./data_collection.md) are NOT LLM-selectable** - they are automatically invoked by the framework.
+
+### How Tool Metadata Becomes LLM Instructions
+
+**Every action tool's implementation directly affects what the LLM sees and understands.** The UFO² framework automatically extracts:
+
+- **`Annotated` type hints**: Parameter types, constraints, and descriptions
+- **Docstrings**: Tool purpose, parameter explanations, return value descriptions
+- **Function signatures**: Parameter names, defaults, required vs. optional
+
+These are automatically assembled into structured tool instructions that appear in the LLM's prompt. The LLM uses these instructions to understand what each tool does, select the appropriate tool for each step, and call the tool with correct parameters.
+
+**Therefore, developers MUST write clear, comprehensive metadata.** For examples:
+
+- See [AppUIExecutor documentation](servers/app_ui_executor.md) for well-documented UI automation tools
+- See [WordCOMExecutor documentation](servers/word_com_executor.md) for COM API tool examples
+- See [Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md) for step-by-step guide on writing tool metadata
+
+```mermaid
+graph TB
+ LLM["LLM Agent Decision (Selects Action Tool)"]
+
+ Agent["Agent Decision 'Click OK Button'"]
+
+ MCP["MCP Server Action Server"]
+
+ subgraph Tools["Available Action Tools"]
+ Click["click()"]
+ Type["type_text()"]
+ Insert["insert_table()"]
+ Shell["run_shell()"]
+ end
+
+ System["System Modified ✅ Side Effects"]
+
+ LLM --> Agent
+ Agent --> MCP
+ MCP --> Tools
+ Tools --> System
+
+ style LLM fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
+ style Agent fill:#fff3e0,stroke:#f57c00,stroke-width:2px
+ style MCP fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
+ style Tools fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
+ style System fill:#ffebee,stroke:#c62828,stroke-width:2px
+```
+
+**Side Effects:**
+
+- **✅ Modifies State**: Can change system, files, UI
+- **⚠️ Not Idempotent**: Same action may have different results
+- **🔒 Use with Caution**: Always verify before executing
+- **📝 Audit Trail**: Log all actions for debugging
+- **🤖 LLM-Controlled**: Agent decides when and which action to execute
+
+## Tool Type Identifier
+
+All action tools use the tool type:
+
+```python
+tool_type = "action"
+```
+
+Tool keys follow the format:
+
+```python
+tool_key = "action::{tool_name}"
+
+# Examples:
+"action::click"
+"action::type_text"
+"action::run_shell"
+```
+
+## Built-in Action Servers
+
+UFO² provides several built-in action servers for different automation scenarios. Below is a summary - click each server name for detailed documentation including all tools, parameters, and usage examples.
+
+### UI Automation Servers
+
+| Server | Agent | Description | Documentation |
+|--------|-------|-------------|---------------|
+| **[HostUIExecutor](servers/host_ui_executor.md)** | HostAgent | Window selection and desktop-level UI automation | [Full Details →](servers/host_ui_executor.md) |
+| **[AppUIExecutor](servers/app_ui_executor.md)** | AppAgent | Application-level UI automation (clicks, typing, scrolling) | [Full Details →](servers/app_ui_executor.md) |
+
+### Command Execution Servers
+
+| Server | Platform | Description | Documentation |
+|--------|----------|-------------|---------------|
+| **[CommandLineExecutor](servers/command_line_executor.md)** | Windows | Execute shell commands and launch applications | [Full Details →](servers/command_line_executor.md) |
+| **[BashExecutor](servers/bash_executor.md)** | Linux | Execute Linux commands via HTTP server | [Full Details →](servers/bash_executor.md) |
+
+### Office Automation Servers (COM API)
+
+| Server | Application | Description | Documentation |
+|--------|-------------|-------------|---------------|
+| **[WordCOMExecutor](servers/word_com_executor.md)** | Microsoft Word | Word document automation (insert table, format text, etc.) | [Full Details →](servers/word_com_executor.md) |
+| **[ExcelCOMExecutor](servers/excel_com_executor.md)** | Microsoft Excel | Excel automation (insert data, create charts, etc.) | [Full Details →](servers/excel_com_executor.md) |
+| **[PowerPointCOMExecutor](servers/ppt_com_executor.md)** | Microsoft PowerPoint | PowerPoint automation (slides, formatting, etc.) | [Full Details →](servers/ppt_com_executor.md) |
+
+### Specialized Servers
+
+| Server | Purpose | Description | Documentation |
+|--------|---------|-------------|---------------|
+| **[PDFReaderExecutor](servers/pdf_reader_executor.md)** | PDF Processing | Extract text from PDFs with human simulation | [Full Details →](servers/pdf_reader_executor.md) |
+| **[ConstellationEditor](servers/constellation_editor.md)** | Multi-Device | Create and manage multi-device task workflows | [Full Details →](servers/constellation_editor.md) |
+| **[HardwareExecutor](servers/hardware_executor.md)** | Hardware Control | Control Arduino, robot arms, test fixtures, mobile devices | [Full Details →](servers/hardware_executor.md) |
+
+**Quick Reference:** Each server documentation page includes:
+
+- 📋 **Complete tool reference** with all parameters and return values
+- 💡 **Code examples** showing actual usage patterns
+- ⚙️ **Configuration examples** for different scenarios
+- ✅ **Best practices** with do's and don'ts
+- 🎯 **Use cases** with complete workflows
+
+## Configuration Examples
+
+Action servers are configured in `config/ufo/mcp.yaml`. Each server's documentation provides detailed configuration examples.
+
+### Basic Configuration
+
+```yaml
+HostAgent:
+ default:
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ reset: false
+```
+
+### App-Specific Configuration
+
+```yaml
+AppAgent:
+ # Default configuration for all apps
+ default:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+
+ # Word-specific configuration
+ WINWORD.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+ - namespace: WordCOMExecutor
+ type: local
+ reset: true # Reset when switching documents
+
+ # Excel-specific configuration
+ EXCEL.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+ - namespace: ExcelCOMExecutor
+ type: local
+ reset: true # Reset when switching workbooks
+```
+
+### Multi-Platform Configuration
+
+```yaml
+# Windows agent
+HostAgent:
+ default:
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local
+
+# Linux agent
+LinuxAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "192.168.1.100"
+ port: 8010
+ path: "/mcp"
+```
+
+For complete configuration details, see:
+
+- [MCP Configuration Guide](configuration.md) - Complete configuration reference
+- Individual server documentation for server-specific configuration options
+
+## Best Practices
+
+### General Principles
+
+#### 1. Verify Before Acting
+
+Always observe before executing actions:
+
+```python
+# ✅ Good: Verify target exists
+control_info = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_control_info", ...)
+])
+
+if control_info[0].data and control_info[0].data["is_enabled"]:
+ await computer.run_actions([
+ MCPToolCall(tool_key="action::click", ...)
+ ])
+```
+
+#### 2. Handle Action Failures
+
+Actions can fail for many reasons - always implement error handling and retries.
+
+#### 3. Validate Inputs
+
+Never execute unsanitized commands, especially with `run_shell` and similar tools.
+
+#### 4. Wait for Action Completion
+
+Some actions need time to complete - add appropriate delays after launching applications or triggering UI changes.
+
+For detailed best practices including code examples, error handling patterns, and common pitfalls, see the individual server documentation:
+
+- [HostUIExecutor Best Practices](servers/host_ui_executor.md)
+- [AppUIExecutor Best Practices](servers/app_ui_executor.md)
+- [CommandLineExecutor Best Practices](servers/command_line_executor.md)
+- [WordCOMExecutor Best Practices](servers/word_com_executor.md)
+- [ExcelCOMExecutor Best Practices](servers/excel_com_executor.md)
+- [PowerPointCOMExecutor Best Practices](servers/ppt_com_executor.md)
+- [PDFReaderExecutor Best Practices](servers/pdf_reader_executor.md)
+- [ConstellationEditor Best Practices](servers/constellation_editor.md)
+- [HardwareExecutor Best Practices](servers/hardware_executor.md)
+- [BashExecutor Best Practices](servers/bash_executor.md)
+
+## Common Use Cases
+
+For complete use case examples with detailed workflows, see the individual server documentation:
+
+### UI Automation
+
+- **Form Filling**: [AppUIExecutor](servers/app_ui_executor.md)
+- **Window Management**: [HostUIExecutor](servers/host_ui_executor.md)
+
+### Document Automation
+
+- **Word Processing**: [WordCOMExecutor](servers/word_com_executor.md)
+- **Excel Data Processing**: [ExcelCOMExecutor](servers/excel_com_executor.md)
+- **PowerPoint Generation**: [PowerPointCOMExecutor](servers/ppt_com_executor.md)
+- **PDF Extraction**: [PDFReaderExecutor](servers/pdf_reader_executor.md)
+
+### System Automation
+
+- **Application Launching**: [CommandLineExecutor](servers/command_line_executor.md)
+- **Linux Command Execution**: [BashExecutor](servers/bash_executor.md)
+
+### Multi-Device Workflows
+
+- **Task Distribution**: [ConstellationEditor](servers/constellation_editor.md)
+- **Hardware Control**: [HardwareExecutor](servers/hardware_executor.md)
+
+## Error Handling
+
+Action servers implement robust error handling with timeouts and retries. For detailed error handling patterns specific to each server, see:
+
+- [HostUIExecutor](servers/host_ui_executor.md)
+- [AppUIExecutor](servers/app_ui_executor.md)
+- [CommandLineExecutor](servers/command_line_executor.md)
+- [BashExecutor](servers/bash_executor.md)
+- And other server-specific documentation
+
+### General Timeout Handling
+
+Actions are executed with a timeout (default: 6000 seconds):
+
+```python
+try:
+ result = await computer.run_actions([
+ MCPToolCall(tool_key="action::run_shell", ...)
+ ])
+except asyncio.TimeoutError:
+ logger.error("Action timed out after 6000 seconds")
+ # Cleanup or retry logic...
+```
+
+### General Retry Pattern
+
+```python
+async def retry_action(action: MCPToolCall, max_retries: int = 3):
+ """Retry an action with exponential backoff."""
+ for attempt in range(max_retries):
+ try:
+ result = await computer.run_actions([action])
+ if not result[0].is_error:
+ return result[0]
+ logger.warning(f"Attempt {attempt + 1} failed: {result[0].content}")
+ if attempt < max_retries - 1:
+ await asyncio.sleep(2 ** attempt) # Exponential backoff
+ except Exception as e:
+ logger.error(f"Exception on attempt {attempt + 1}: {e}")
+ if attempt < max_retries - 1:
+ await asyncio.sleep(2 ** attempt)
+ raise ValueError(f"Action failed after {max_retries} attempts")
+```
+
+## Integration with Data Collection
+
+Actions should be paired with data collection for verification:
+
+```python
+# Pattern: Observe → Act → Verify
+
+# 1. Observe: Capture initial state
+before_screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::take_screenshot", ...)
+])
+
+# 2. Act: Execute action
+action_result = await computer.run_actions([
+ MCPToolCall(tool_key="action::click", ...)
+])
+
+# 3. Verify: Check result
+await asyncio.sleep(1) # Wait for UI update
+after_screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::take_screenshot", ...)
+])
+```
+
+For more details on agent execution patterns:
+
+- [HostAgent Commands](../ufo2/host_agent/commands.md) - HostAgent command patterns
+- [AppAgent Commands](../ufo2/app_agent/commands.md) - AppAgent action patterns
+- [Agent Overview](../ufo2/overview.md) - UFO² agent architecture
+
+For more details on data collection:
+
+- [Data Collection Servers](data_collection.md) - Observation tools
+- [UICollector Documentation](servers/ui_collector.md) - Complete data collection reference
+
+## Related Documentation
+
+- [Data Collection Servers](data_collection.md) - Observation tools
+- [Configuration Guide](configuration.md) - Configure action servers
+- [Local Servers](local_servers.md) - Built-in action servers overview
+- [Remote Servers](remote_servers.md) - HTTP deployment for actions
+- [Computer](../client/computer.md) - Action execution layer
+- [MCP Overview](overview.md) - High-level MCP architecture
+
+**Safety Reminder:** Action servers can **modify system state**. Always:
+
+1. ✅ **Validate inputs** before execution
+2. ✅ **Verify targets** exist and are accessible
+3. ✅ **Log all actions** for audit trail
+4. ✅ **Handle failures** gracefully with retries
+5. ✅ **Test in safe environment** before production use
diff --git a/documents/docs/mcp/configuration.md b/documents/docs/mcp/configuration.md
new file mode 100644
index 000000000..5ed6000d7
--- /dev/null
+++ b/documents/docs/mcp/configuration.md
@@ -0,0 +1,645 @@
+# MCP Configuration Guide
+
+## Overview
+
+MCP configuration in UFO² uses a **hierarchical YAML structure** that maps agents to their MCP servers. The configuration file is located at:
+
+```
+config/ufo/mcp.yaml
+```
+
+For complete field documentation, see [MCP Reference](../configuration/system/mcp_reference.md).
+
+## Configuration Structure
+
+```yaml
+AgentName: # Name of the agent
+ SubType: # Sub-type (e.g., "default", "WINWORD.EXE")
+ data_collection: # Data collection servers
+ - namespace: ... # Server namespace
+ type: ... # Server type (local/http/stdio)
+ ... # Additional server config
+ action: # Action servers
+ - namespace: ...
+ type: ...
+ ...
+```
+
+### Hierarchy Levels
+
+1. **Agent Name** - Top-level agent identifier (e.g., `HostAgent`, `AppAgent`)
+2. **Sub-Type** - Context-specific configuration (e.g., `default`, `WINWORD.EXE`)
+3. **Tool Type** - `data_collection` or `action`
+4. **Server List** - Array of MCP server configurations
+
+```
+AgentName
+ └─ SubType
+ ├─ data_collection
+ │ ├─ Server 1
+ │ ├─ Server 2
+ │ └─ ...
+ └─ action
+ ├─ Server 1
+ ├─ Server 2
+ └─ ...
+```
+
+**Default Sub-Type:** Always define a `default` sub-type as a fallback configuration. If a specific sub-type is not found, the agent will use `default`.
+
+## Server Configuration Fields
+
+### Common Fields
+
+All MCP servers share these fields:
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `namespace` | `string` | ✅ Yes | Unique identifier for the server |
+| `type` | `string` | ✅ Yes | Server type: `local`, `http`, or `stdio` |
+| `reset` | `boolean` | ❌ No | Whether to reset server state (default: `false`) |
+| `start_args` | `array` | ❌ No | Arguments passed to server initialization |
+
+### Local Server Fields
+
+For `type: local`:
+
+```yaml
+- namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+```
+
+| Field | Description |
+|-------|-------------|
+| `start_args` | Arguments passed to server factory function |
+
+Local servers are retrieved from the `MCPRegistry` and run in-process.
+
+### HTTP Server Fields
+
+For `type: http`:
+
+```yaml
+- namespace: HardwareCollector
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+```
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `host` | `string` | ✅ Yes | Server hostname or IP |
+| `port` | `integer` | ✅ Yes | Server port number |
+| `path` | `string` | ✅ Yes | URL path to MCP endpoint |
+
+HTTP servers run on remote machines and are accessed via REST API.
+
+### Stdio Server Fields
+
+For `type: stdio`:
+
+```yaml
+- namespace: CustomProcessor
+ type: stdio
+ command: "python"
+ start_args: ["-m", "custom_mcp_server"]
+ env: {"API_KEY": "secret"}
+ cwd: "/path/to/server"
+ reset: false
+```
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `command` | `string` | ✅ Yes | Executable command |
+| `start_args` | `array` | ❌ No | Command-line arguments |
+| `env` | `object` | ❌ No | Environment variables |
+| `cwd` | `string` | ❌ No | Working directory |
+
+Stdio servers run as child processes and communicate via stdin/stdout.
+
+## Agent Configurations
+
+### HostAgent
+
+System-level agent for OS-wide automation:
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+```
+
+**Tools Available**:
+- **Data Collection**: UI detection, screenshots
+- **Actions**: System-wide clicks, window management, CLI execution
+
+### AppAgent
+
+Application-specific agent:
+
+#### Default Configuration
+
+```yaml
+AppAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+```
+
+#### Word-Specific Configuration
+
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: WordCOMExecutor
+ type: local
+ start_args: []
+ reset: true # Reset COM state when switching documents
+```
+
+**Tools Available**:
+- **Data Collection**: Same as default
+- **Actions**: App UI automation + Word COM API (insert_table, select_text, etc.)
+
+**Reset Flag:** Set `reset: true` for stateful tools (like COM executors) to prevent state leakage between contexts (e.g., different documents).
+
+#### Excel-Specific Configuration
+
+```yaml
+AppAgent:
+ EXCEL.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+ - namespace: ExcelCOMExecutor
+ type: local
+ reset: true
+```
+
+#### PowerPoint-Specific Configuration
+
+```yaml
+AppAgent:
+ POWERPNT.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+ - namespace: PowerPointCOMExecutor
+ type: local
+ reset: true
+```
+
+#### File Explorer Configuration
+
+```yaml
+AppAgent:
+ explorer.exe:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+ - namespace: PDFReaderExecutor
+ type: local
+ reset: true
+```
+
+### ConstellationAgent
+
+Multi-device coordination agent:
+
+```yaml
+ConstellationAgent:
+ default:
+ action:
+ - namespace: ConstellationEditor
+ type: local
+ start_args: []
+ reset: false
+```
+
+**Tools Available**:
+- **Actions**: Create tasks, assign devices, check task status
+
+### HardwareAgent
+
+Remote hardware monitoring agent:
+
+```yaml
+HardwareAgent:
+ default:
+ data_collection:
+ - namespace: HardwareCollector
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+ action:
+ - namespace: HardwareExecutor
+ type: http
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+ reset: false
+```
+
+**Tools Available**:
+- **Data Collection**: CPU info, memory info, disk info
+- **Actions**: Hardware control commands
+
+**Remote Deployment:** For remote servers, ensure the HTTP MCP server is running on the target machine. See [Remote Servers](remote_servers.md) for deployment guide.
+
+### LinuxAgent
+
+Linux system agent:
+
+```yaml
+LinuxAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "localhost"
+ port: 8010
+ path: "/mcp"
+ reset: false
+```
+
+**Tools Available**:
+- **Actions**: Bash command execution
+
+## Configuration Examples
+
+### Example 1: Local-Only Agent
+
+```yaml
+SimpleAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: SimpleExecutor
+ type: local
+ reset: false
+```
+
+### Example 2: Hybrid Agent (Local + Remote)
+
+```yaml
+HybridAgent:
+ default:
+ data_collection:
+ # Local UI detection
+ - namespace: UICollector
+ type: local
+ reset: false
+
+ # Remote hardware monitoring
+ - namespace: HardwareCollector
+ type: http
+ host: "192.168.1.100"
+ port: 8006
+ path: "/mcp"
+ reset: false
+
+ action:
+ # Local UI automation
+ - namespace: UIExecutor
+ type: local
+ reset: false
+
+ # Remote command execution
+ - namespace: RemoteExecutor
+ type: http
+ host: "192.168.1.100"
+ port: 8007
+ path: "/mcp"
+ reset: false
+```
+
+### Example 3: Multi-Context Agent
+
+```yaml
+MultiContextAgent:
+ # Default configuration
+ default:
+ data_collection:
+ - namespace: BasicCollector
+ type: local
+ action:
+ - namespace: BasicExecutor
+ type: local
+
+ # Specialized for Chrome
+ chrome.exe:
+ data_collection:
+ - namespace: BasicCollector
+ type: local
+ - namespace: WebCollector
+ type: local
+ action:
+ - namespace: BasicExecutor
+ type: local
+ - namespace: BrowserExecutor
+ type: local
+ reset: true
+
+ # Specialized for VS Code
+ Code.exe:
+ data_collection:
+ - namespace: BasicCollector
+ type: local
+ - namespace: IDECollector
+ type: local
+ action:
+ - namespace: BasicExecutor
+ type: local
+ - namespace: CodeExecutor
+ type: local
+ reset: true
+```
+
+## Best Practices
+
+### 1. Use Descriptive Namespaces
+
+```yaml
+# ✅ Good: Clear and descriptive
+namespace: WindowsUICollector
+namespace: ExcelCOMExecutor
+namespace: LinuxBashExecutor
+
+# ❌ Bad: Generic and unclear
+namespace: Collector1
+namespace: Server
+namespace: Tools
+```
+
+### 2. Group Related Servers
+
+```yaml
+# ✅ Good: Logical grouping
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector # All UI-related
+ - namespace: ScreenshotTaker
+ action:
+ - namespace: UIExecutor # All UI actions
+ - namespace: WindowManager
+
+# ❌ Bad: Mixed purposes
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ - namespace: HardwareMonitor # Different purpose
+```
+
+### 3. Reset Stateful Servers
+
+```yaml
+# ✅ Good: Reset COM servers
+WordCOMExecutor:
+ type: local
+ reset: true # Prevents state leakage
+
+# ❌ Bad: Not resetting can cause issues
+WordCOMExecutor:
+ type: local
+ reset: false # May retain state from previous document
+```
+
+### 4. Validate Remote Server Availability
+
+```yaml
+# When using remote servers, ensure they're accessible
+HardwareCollector:
+ type: http
+ host: "192.168.1.100" # ✅ Verify this host is reachable
+ port: 8006 # ✅ Verify port is open
+ path: "/mcp" # ✅ Verify endpoint exists
+```
+
+### 5. Use Environment Variables for Secrets
+
+```yaml
+# ✅ Good: Use environment variables
+- namespace: SecureAPI
+ type: http
+ host: "${API_HOST}"
+ port: "${API_PORT}"
+ auth:
+ token: "${API_TOKEN}"
+
+# ❌ Bad: Hardcoded secrets
+- namespace: SecureAPI
+ type: http
+ host: "api.example.com"
+ auth:
+ token: "secret123" # Don't commit this!
+```
+
+## Loading Configuration
+
+### From File
+
+```python
+import yaml
+from pathlib import Path
+
+# Load MCP configuration
+config_path = Path("config/ufo/mcp.yaml")
+with open(config_path) as f:
+ mcp_config = yaml.safe_load(f)
+
+# Access agent configuration
+host_agent_config = mcp_config["HostAgent"]["default"]
+```
+
+### Programmatically
+
+```python
+from ufo.config import get_config
+
+# Get full configuration
+configs = get_config()
+
+# Access MCP section
+mcp_config = configs.get("mcp", {})
+
+# Get specific agent
+host_agent = mcp_config.get("HostAgent", {}).get("default", {})
+```
+
+## Validation
+
+### Schema Validation
+
+UFO² validates MCP configuration on load:
+
+```python
+from ufo.config.config_schemas import MCPConfigSchema
+
+# Validate configuration
+try:
+ MCPConfigSchema.validate(mcp_config)
+ print("✅ Configuration is valid")
+except ValidationError as e:
+ print(f"❌ Configuration error: {e}")
+```
+
+### Common Validation Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `Missing required field: namespace` | Server missing namespace | Add `namespace` field |
+| `Invalid server type: unknown` | Unsupported type | Use `local`, `http`, or `stdio` |
+| `Missing host for http server` | HTTP server without host | Add `host` and `port` |
+| `Duplicate namespace` | Same namespace used twice | Use unique namespaces |
+
+## Debugging Configuration
+
+### Enable Debug Logging
+
+```python
+import logging
+
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger("ufo.client.mcp")
+
+# Will show server creation and registration
+# DEBUG: Creating MCP server 'UICollector' of type local
+# DEBUG: Registered MCP server 'UICollector' with 15 tools
+```
+
+### Check Loaded Servers
+
+```python
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+# List all registered servers
+servers = MCPServerManager._servers_mapping
+for namespace, server in servers.items():
+ print(f"Server: {namespace}, Type: {type(server).__name__}")
+```
+
+### Test Server Connectivity
+
+```python
+async def test_server(config):
+ """Test if MCP server is accessible."""
+ try:
+ server = MCPServerManager.create_mcp_server(config)
+ print(f"✅ Server '{config['namespace']}' is accessible")
+
+ # List tools
+ if hasattr(server, 'server'):
+ from fastmcp.client import Client
+ async with Client(server.server) as client:
+ tools = await client.list_tools()
+ print(f" Tools: {[tool.name for tool in tools]}")
+ except Exception as e:
+ print(f"❌ Server '{config['namespace']}' failed: {e}")
+```
+
+## Migration Guide
+
+### From Old Configuration Format
+
+If you're migrating from an older UFO configuration:
+
+**Old Format** (config.yaml):
+```yaml
+MCP_SERVERS:
+ - name: ui_collector
+ module: ufo.mcp.ui_server
+```
+
+**New Format** (mcp.yaml):
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+```
+
+For detailed migration instructions, see [Configuration Migration Guide](../configuration/system/migration.md).
+
+## Related Documentation
+
+- [MCP Overview](overview.md) - High-level MCP architecture
+- [Data Collection Servers](data_collection.md) - Data collection configuration
+- [Action Servers](action.md) - Action server configuration
+- [Local Servers](local_servers.md) - Built-in local MCP servers
+- [Remote Servers](remote_servers.md) - HTTP and Stdio deployment
+- [Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md) - Build your own servers
+- [MCP Reference](../configuration/system/mcp_reference.md) - Complete field reference
+- [Configuration Guide](../configuration/system/overview.md) - General configuration guide
+- [HostAgent Overview](../ufo2/host_agent/overview.md) - HostAgent configuration examples
+- [AppAgent Overview](../ufo2/app_agent/overview.md) - AppAgent configuration examples
+
+**Configuration Philosophy:**
+
+MCP configuration follows the **convention over configuration** principle:
+
+- **Sensible defaults** - Minimal configuration required
+- **Explicit when needed** - Full control when customization is necessary
+- **Type-safe** - Validated on load to catch errors early
+- **Hierarchical** - Inherit from defaults, override as needed
diff --git a/documents/docs/mcp/data_collection.md b/documents/docs/mcp/data_collection.md
new file mode 100644
index 000000000..ed5e4512a
--- /dev/null
+++ b/documents/docs/mcp/data_collection.md
@@ -0,0 +1,272 @@
+# Data Collection Servers
+
+## Overview
+
+**Data Collection Servers** provide read-only tools that observe and retrieve system state without modifying it. These servers are essential for agents to understand the current environment before taking actions.
+
+**Data Collection servers are automatically invoked by the UFO² framework** to gather context and build observation prompts for the LLM. The LLM agent **does not select these tools** - they run in the background to provide system state information.
+
+- **Framework-Driven**: Automatically called to collect screenshots, UI controls, system info
+- **Observation Purpose**: Build the prompt that the LLM uses for decision-making
+- **Not in Tool List**: These tools are NOT presented to the LLM as selectable actions
+
+**Only [Action Servers](./action.md) are LLM-selectable.**
+
+```mermaid
+graph TB
+ Framework["UFO² Framework (Automatic Invocation)"]
+
+ AgentStep["Agent Step Observation & Prompt Build"]
+
+ MCP["MCP Server UICollector"]
+
+ subgraph Tools["Data Collection Tools"]
+ Screenshot["take_screenshot()"]
+ WindowList["get_window_list()"]
+ ControlInfo["get_control_info()"]
+ end
+
+ SystemState["System State → LLM Context"]
+
+ Framework --> AgentStep
+ Framework --> MCP
+ MCP --> Tools
+ Tools --> SystemState
+ SystemState --> AgentStep
+
+ style Framework fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
+ style AgentStep fill:#fff3e0,stroke:#f57c00,stroke-width:2px
+ style MCP fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
+ style Tools fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
+ style SystemState fill:#fce4ec,stroke:#c2185b,stroke-width:2px
+```
+
+**Characteristics:**
+
+- **❌ No Side Effects**: Cannot modify system state
+- **✅ Safe to Retry**: Can be called multiple times without risk
+- **✅ Idempotent**: Same input always produces same output
+- **📊 Observation Only**: Provides information for decision-making
+- **🤖 Framework-Invoked**: Not selectable by LLM agent
+
+## Tool Type Identifier
+
+All data collection tools use the tool type:
+
+```python
+tool_type = "data_collection"
+```
+
+Tool keys follow the format:
+
+```python
+tool_key = "data_collection::{tool_name}"
+
+# Examples:
+"data_collection::take_screenshot"
+"data_collection::get_window_list"
+"data_collection::get_control_info"
+```
+
+## Built-in Data Collection Servers
+
+### UICollector
+
+**Purpose**: Collect UI element information and screenshots
+
+**Namespace**: `UICollector`
+
+**Platform**: Windows (using pywinauto)
+
+**Tools**: 8 tools for UI observation including screenshots, window lists, control info, and annotations
+
+For complete documentation including all tool details, parameters, return types, and usage examples, see:
+
+**[→ UICollector Full Documentation](servers/ui_collector.md)**
+
+#### Quick Example
+
+```python
+from aip.messages import Command
+
+# Take a screenshot of the active window
+screenshot_cmd = Command(
+ tool_name="take_screenshot",
+ tool_type="data_collection",
+ parameters={
+ "region": "active_window",
+ "save_path": "screenshots/current.png"
+ }
+)
+
+# Get list of all windows
+windows_cmd = Command(
+ tool_name="get_window_list",
+ tool_type="data_collection",
+ parameters={}
+)
+```
+
+For detailed tool specifications, advanced usage patterns, and best practices, see the [UICollector documentation](servers/ui_collector.md).
+
+## Configuration Examples
+
+Data collection servers are configured in `config/ufo/mcp.yaml`. For detailed configuration options, see the [UICollector documentation](servers/ui_collector.md#configuration).
+
+### Basic Configuration
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+```
+
+### Multi-Server Configuration
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+```
+
+### App-Specific Configuration
+
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false # Don't reset when switching between documents
+
+ EXCEL.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: true # Reset when switching between spreadsheets
+```
+
+## Best Practices
+
+For detailed best practices with complete code examples, see the [UICollector documentation](servers/ui_collector.md).
+
+### General Guidelines
+
+#### 1. Call Before Action
+
+Always collect data before executing actions to make informed decisions.
+
+#### 2. Cache Results
+
+Data collection results can be cached when state hasn't changed to improve performance.
+
+#### 3. Handle Failures Gracefully
+
+Data collection can fail if windows close or controls disappear - implement proper error handling.
+
+#### 4. Minimize Screenshot Calls
+
+Screenshots are expensive operations - take one screenshot and analyze it multiple times rather than taking multiple screenshots.
+
+5. **Use Appropriate Regions**
+
+Choose the smallest region that contains needed information (e.g., active window vs. full screen).
+
+See the [UICollector documentation](servers/ui_collector.md) for detailed examples and anti-patterns.
+
+## Common Use Cases
+
+For complete use case examples with detailed code, see the [UICollector documentation](servers/ui_collector.md).
+
+### UI Element Detection
+
+Discover windows and controls for automation targeting.
+
+### Screen Monitoring
+
+Monitor screen changes for event-driven automation.
+
+### System Health Check
+
+Check system resources before executing heavy tasks.
+
+See the [UICollector documentation](servers/ui_collector.md) for complete workflow examples.
+
+## Error Handling
+
+For detailed error handling patterns, see the [UICollector documentation](servers/ui_collector.md).
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `WindowNotFoundError` | Target window closed | Check window existence first |
+| `ControlNotFoundError` | Control not accessible | Use alternative identification method |
+| `ScreenshotFailedError` | Graphics driver issue | Retry with different region |
+| `TimeoutError` | Operation took too long | Increase timeout or simplify query |
+
+See the [UICollector documentation](servers/ui_collector.md) for complete error recovery examples.
+
+## Performance Considerations
+
+For detailed performance optimization techniques, see the [UICollector documentation](servers/ui_collector.md).
+
+### Key Optimizations
+
+- **Screenshot Optimization**: Use region parameters to capture only needed areas
+- **Parallel Data Collection**: Collect independent data in parallel when possible
+- **Caching**: Cache results when state hasn't changed
+
+See the [UICollector documentation](servers/ui_collector.md) for complete examples.
+
+## Integration with Agents
+
+Data collection servers are typically used in the **observation phase** of agent execution. See the [UICollector documentation](servers/ui_collector.md) for complete integration patterns.
+
+For more details on agent architecture and execution flow:
+
+- [HostAgent Overview](../ufo2/host_agent/overview.md) - HostAgent architecture and workflow
+- [AppAgent Overview](../ufo2/app_agent/overview.md) - AppAgent architecture and workflow
+- [Agent Overview](../ufo2/overview.md) - UFO² agent system architecture
+
+```python
+# Agent execution loop
+while not task_complete:
+ # 1. Observe: Collect current state
+ screenshot = await data_collection_server.take_screenshot()
+
+ # 2. Reason: Agent decides next action
+ next_action = agent.plan(screenshot)
+
+ # 3. Act: Execute action
+ result = await action_server.execute(next_action)
+
+ # 4. Verify: Check action result
+ new_screenshot = await data_collection_server.take_screenshot()
+```
+
+## Related Documentation
+
+- **[UICollector Full Documentation](servers/ui_collector.md)** - Complete tool reference with all parameters and examples
+- [Action Servers](action.md) - State-changing execution tools
+- [Configuration Guide](configuration.md) - How to configure data collection servers
+- [Local Servers](local_servers.md) - Built-in local MCP servers
+- [Remote Servers](remote_servers.md) - HTTP deployment for data collection
+- [Computer](../client/computer.md) - Tool execution layer
+- [MCP Overview](overview.md) - High-level MCP architecture
+
+**Key Takeaways:**
+
+- Data collection servers are **read-only** and **safe to retry**
+- Always **observe before acting** to make informed decisions
+- **Cache results** when state hasn't changed to improve performance
+- Handle **errors gracefully** with retries and fallback logic
+- Use **appropriate regions** and **parallel collection** for performance
+- See the **[UICollector documentation](servers/ui_collector.md)** for complete details
diff --git a/documents/docs/mcp/local_servers.md b/documents/docs/mcp/local_servers.md
new file mode 100644
index 000000000..5ef27020f
--- /dev/null
+++ b/documents/docs/mcp/local_servers.md
@@ -0,0 +1,163 @@
+# Local MCP Servers
+
+Local MCP servers run in-process with the UFO² agent, providing fast and efficient access to tools without network overhead. They are the most common server type for built-in functionality.
+
+**For remote HTTP servers** (BashExecutor, HardwareExecutor, MobileExecutor), see [Remote Servers](./remote_servers.md).
+
+## Overview
+
+UFO² includes several built-in local MCP servers organized by functionality. This page provides a quick reference - click each server name for complete documentation.
+
+| Server | Type | Description | Full Documentation |
+|--------|------|-------------|-------------------|
+| **UICollector** | Data Collection | Windows UI observation | **[→ Full Docs](servers/ui_collector.md)** |
+| **HostUIExecutor** | Action | Desktop-level UI automation | **[→ Full Docs](servers/host_ui_executor.md)** |
+| **AppUIExecutor** | Action | Application-level UI automation | **[→ Full Docs](servers/app_ui_executor.md)** |
+| **CommandLineExecutor** | Action | Shell command execution | **[→ Full Docs](servers/command_line_executor.md)** |
+| **WordCOMExecutor** | Action | Microsoft Word COM API | **[→ Full Docs](servers/word_com_executor.md)** |
+| **ExcelCOMExecutor** | Action | Microsoft Excel COM API | **[→ Full Docs](servers/excel_com_executor.md)** |
+| **PowerPointCOMExecutor** | Action | Microsoft PowerPoint COM API | **[→ Full Docs](servers/ppt_com_executor.md)** |
+| **PDFReaderExecutor** | Action | PDF text extraction | **[→ Full Docs](servers/pdf_reader_executor.md)** |
+| **ConstellationEditor** | Action | Multi-device task orchestration | **[→ Full Docs](servers/constellation_editor.md)** |
+
+---
+
+## Server Summaries
+
+### UICollector
+
+**Type**: Data Collection (read-only, automatically invoked)
+**Platform**: Windows
+**Tools**: 8 tools for screenshots, window lists, control info, and annotations
+
+**[→ See complete UICollector documentation](servers/ui_collector.md)** for all tool details, parameters, return values, and examples.
+
+---
+
+### HostUIExecutor
+
+**Type**: Action (LLM-selectable, state-modifying)
+**Platform**: Windows
+**Agent**: HostAgent
+**Tool**: `select_application_window` - Window selection and focus management
+
+**[→ See complete HostUIExecutor documentation](servers/host_ui_executor.md)** for tool specifications and workflow examples.
+
+---
+
+### AppUIExecutor
+
+**Type**: Action (LLM-selectable, GUI automation)
+**Platform**: Windows
+**Agent**: AppAgent
+**Tools**: 9 tools for clicks, typing, scrolling, and UI interaction
+
+**[→ See complete AppUIExecutor documentation](servers/app_ui_executor.md)** for all automation tools and usage patterns.
+
+---
+
+### CommandLineExecutor
+
+**Type**: Action (LLM-selectable, shell execution)
+**Platform**: Cross-platform
+**Agent**: HostAgent, AppAgent
+**Tool**: `run_shell` - Execute shell commands
+
+**[→ See complete CommandLineExecutor documentation](servers/command_line_executor.md)** for security guidelines and examples.
+
+---
+
+### WordCOMExecutor
+
+**Type**: Action (LLM-selectable, Word COM API)
+**Platform**: Windows
+**Agent**: AppAgent (WINWORD.EXE only)
+**Tools**: 6 tools for Word document automation
+
+**[→ See complete WordCOMExecutor documentation](servers/word_com_executor.md)** for all Word automation tools.
+
+---
+
+### ExcelCOMExecutor
+
+**Type**: Action (LLM-selectable, Excel COM API)
+**Platform**: Windows
+**Agent**: AppAgent (EXCEL.EXE only)
+**Tools**: 6 tools for Excel automation
+
+**[→ See complete ExcelCOMExecutor documentation](servers/excel_com_executor.md)** for all Excel manipulation tools.
+
+---
+
+### PowerPointCOMExecutor
+
+**Type**: Action (LLM-selectable, PowerPoint COM API)
+**Platform**: Windows
+**Agent**: AppAgent (POWERPNT.EXE only)
+**Tools**: 2 tools for PowerPoint automation
+
+**[→ See complete PowerPointCOMExecutor documentation](servers/ppt_com_executor.md)** for PowerPoint tools and examples.
+
+---
+
+### PDFReaderExecutor
+
+**Type**: Action (LLM-selectable, PDF text extraction)
+**Platform**: Windows
+**Agent**: AppAgent (explorer.exe)
+**Tools**: 3 tools for PDF text extraction with human simulation
+
+**[→ See complete PDFReaderExecutor documentation](servers/pdf_reader_executor.md)** for PDF extraction tools and workflows.
+
+---
+
+### ConstellationEditor
+
+**Type**: Action (LLM-selectable, multi-device orchestration)
+**Platform**: Cross-platform
+**Agent**: ConstellationAgent
+**Tools**: 7 tools for task and dependency management
+
+**[→ See complete ConstellationEditor documentation](servers/constellation_editor.md)** for multi-device workflow tools.
+
+---
+
+## Configuration
+
+All local servers are configured in `config/ufo/mcp.yaml`. For detailed configuration options, see:
+
+- [MCP Configuration Guide](./configuration.md) - Complete configuration reference
+- Individual server documentation for server-specific configuration
+
+**Example configuration:**
+
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: AppUIExecutor # GUI automation
+ type: local
+ reset: false
+ - namespace: WordCOMExecutor # API automation
+ type: local
+ reset: true # Reset when switching documents
+ - namespace: CommandLineExecutor
+ type: local
+ reset: false
+```
+
+## See Also
+
+- [MCP Overview](./overview.md) - MCP architecture and concepts
+- [Data Collection Servers](./data_collection.md) - Data collection overview
+- [Action Servers](./action.md) - Action server overview
+- [MCP Configuration](./configuration.md) - Configuration guide
+- [Remote Servers](./remote_servers.md) - HTTP/Stdio deployment
+- [Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md) - Learn to build your own servers
+- [HostAgent Overview](../ufo2/host_agent/overview.md) - HostAgent architecture
+- [AppAgent Overview](../ufo2/app_agent/overview.md) - AppAgent architecture
+- [Hybrid Actions](../ufo2/core_features/hybrid_actions.md) - GUI + API dual-mode automation
diff --git a/documents/docs/mcp/overview.md b/documents/docs/mcp/overview.md
new file mode 100644
index 000000000..a84d5a471
--- /dev/null
+++ b/documents/docs/mcp/overview.md
@@ -0,0 +1,568 @@
+# MCP (Model Context Protocol) - Overview
+
+## What is MCP?
+
+**MCP (Model Context Protocol)** is a standardized protocol that enables UFO² agents to interact with external tools and services in a unified way. It provides a **tool execution framework** where agents can:
+
+- **Collect system state** through data collection servers
+- **Execute actions** through action servers
+- **Extend capabilities** through custom MCP servers
+
+```mermaid
+graph TB
+ subgraph Agent["UFO² Agent"]
+ HostAgent[HostAgent]
+ AppAgent[AppAgent]
+ end
+
+ Computer["Computer (MCP Tool Manager)"]
+
+ subgraph DataServers["Data Collection Servers"]
+ UICollector["UICollector • Screenshots • Window Info"]
+ HWInfo["Hardware Info • CPU/Memory • System State"]
+ end
+
+ subgraph ActionServers["Action Servers"]
+ UIExecutor["UIExecutor • Click/Type • UI Automation"]
+ CLIExecutor["CLI Executor • Shell Commands"]
+ COMExecutor["COM Executor • API Calls"]
+ end
+
+ HostAgent --> Computer
+ AppAgent --> Computer
+ Computer --> DataServers
+ Computer --> ActionServers
+
+ style Agent fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
+ style Computer fill:#fff3e0,stroke:#f57c00,stroke-width:2px
+ style DataServers fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
+ style ActionServers fill:#fce4ec,stroke:#c2185b,stroke-width:2px
+```
+
+MCP serves as the **execution layer** in UFO²'s architecture. While agents make decisions about *what* to do, MCP servers handle *how* to do it by providing concrete tool implementations.
+
+## Key Concepts
+
+### 1. Two Server Types
+
+MCP servers in UFO² are categorized into two types based on their purpose:
+
+| Type | Purpose | Examples | Side Effects | LLM Selectable? |
+|------|---------|----------|--------------|-----------------|
+| **Data Collection** | Retrieve system state Read-only operations | UI detection, Screenshot, System info | ❌ None | ❌ **No** - Auto-invoked |
+| **Action** | Modify system state State-changing operations | Click, Type text, Run command | ✅ Yes | ✅ **Yes** - LLM chooses |
+
+**Server Selection Model:**
+
+- **[Data Collection Servers](./data_collection.md)**: Automatically invoked by the framework to gather context and build observation prompts. Not selectable by LLM.
+- **[Action Servers](./action.md)**: LLM agent actively selects which action tool to execute at each step based on the task. Only action tools are LLM-selectable.
+
+**How Action Tools Reach the LLM**: Each action tool's `Annotated` type hints and docstring are automatically extracted and converted into structured instructions that appear in the LLM's prompt. The LLM uses these instructions to understand what each tool does, what parameters it requires, and when to use it. Therefore, developers should write clear, comprehensive docstrings and type annotations - they directly impact the LLM's ability to use the tool correctly.
+
+### 2. Server Deployment Models
+
+UFO² supports three deployment models for MCP servers:
+
+| Model | Description | Benefits | Trade-offs |
+|-------|-------------|----------|------------|
+| **Local (In-Process)** | Server runs in same process as agent | Fast (no IPC overhead), Simple setup | Shares process resources |
+| **HTTP (Remote)** | Server runs as HTTP service (e.g., Port 8006) | Process isolation, Language-agnostic | Network overhead |
+| **Stdio (Process)** | Server runs as child process using stdin/stdout | Process isolation, Bidirectional streaming | Platform-specific |
+
+### 3. Namespace Isolation
+
+Each MCP server has a **namespace** that groups related tools together:
+
+```yaml
+# Example: HostAgent configuration
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector # Namespace for UI detection tools
+ type: local
+ action:
+ - namespace: HostUIExecutor # Namespace for UI automation tools
+ type: local
+ - namespace: CommandLineExecutor # Namespace for CLI tools
+ type: local
+```
+
+**Tool Key Format**: `{tool_type}::{tool_name}`
+
+- Example: `data_collection::screenshot` - Screenshot tool in data_collection
+- Example: `action::click` - Click tool in action
+- Example: `action::run_shell` - Shell command in action
+
+## Key Features
+
+### 1. GUI + API Dual-Mode Automation
+
+**UFO² supports both GUI automation and API-based automation simultaneously.** Each agent can register multiple action servers, combining:
+
+- **GUI Automation**: Windows UI Automation (UIA) - clicking, typing, scrolling when visual interaction is needed
+- **API Automation**: Direct COM API calls, shell commands, REST APIs for efficient, reliable operations
+
+**The LLM agent dynamically chooses the best action at each step** based on task requirements, reliability, speed, and availability.
+
+**Example: Word Document Automation**
+
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor # API: Fast, reliable
+ - namespace: AppUIExecutor # GUI: Visual navigation fallback
+ - namespace: CommandLineExecutor # Shell: File operations
+```
+
+**LLM's Dynamic Selection:**
+
+```
+Task: "Create a report with a 3x2 table and bold the title"
+
+Step 1: Insert table
+ → LLM selects: WordCOMExecutor::insert_table(rows=3, cols=2)
+ → Reason: API is fast, reliable, no GUI navigation needed
+
+Step 2: Navigate to Design tab
+ → LLM selects: AppUIExecutor::click_input(id="5", name="Design")
+ → Reason: Visual navigation, COM API doesn't expose tab selection
+
+Step 3: Type table header
+ → LLM selects: AppUIExecutor::set_edit_text(id="cell_1_1", text="Product")
+ → Reason: GUI interaction needed for cell input
+
+Step 4: Bold title text
+ → LLM selects: WordCOMExecutor::select_text(text="Report Title")
+ → WordCOMExecutor::set_font(font_size=16)
+ → Reason: API is more reliable than GUI button clicking
+
+Step 5: Save as PDF
+ → LLM selects: WordCOMExecutor::save_as(file_ext=".pdf")
+ → Reason: One API call vs. multiple GUI clicks (File → Save As → Format → PDF)
+```
+
+**Why Hybrid Automation Matters:**
+
+- **APIs**: ~10x faster, deterministic, no visual dependency
+- **GUI**: Handles visual elements, fallback when API unavailable
+- **LLM Decision**: Chooses optimal approach per step, not locked into one mode
+
+### 2. Multi-Server Per Agent
+
+Each agent can register **multiple action servers**, each providing a different set of tools:
+
+**HostAgent Example:**
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - UICollector # Automatically invoked
+ action:
+ - HostUIExecutor # LLM selects: Window selection
+ - CommandLineExecutor # LLM selects: Launch apps, shell commands
+```
+
+**AppAgent Example (Word-specific):**
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ data_collection:
+ - UICollector # Automatically invoked
+ action:
+ - WordCOMExecutor # LLM selects: insert_table, select_text, save_as
+ - AppUIExecutor # LLM selects: click_input, set_edit_text
+ - CommandLineExecutor # LLM selects: run_shell
+```
+
+**HardwareAgent Example (Cross-platform):**
+```yaml
+HardwareAgent:
+ default:
+ data_collection:
+ - HardwareCollector # Auto-invoked (HTTP remote)
+ action:
+ - HardwareExecutor # LLM selects: touch_screen, swipe, press_key (HTTP remote)
+```
+
+**At each step, the LLM sees all available action tools and selects the most appropriate one.**
+
+### 3. Process Isolation
+
+MCP servers can run:
+
+- **In-process** (local): Fast, low overhead
+- **HTTP** (remote): Process isolation, cross-platform, distributed
+- **Stdio** (child process): Sandboxed execution, clean resource management
+
+### 4. Namespace Isolation
+
+Each MCP server has a unique namespace that groups related tools together, preventing naming conflicts and enabling modular organization. See [Namespace Isolation](#3-namespace-isolation) section above for details.
+
+## Architecture
+
+### MCP Server Lifecycle
+
+```mermaid
+graph TB
+ Start([MCP Server Lifecycle])
+
+ Config["1. Configuration Loading (mcp.yaml)"]
+ Manager["2. MCPServerManager Creates BaseMCPServer"]
+ ServerStart["3. Server.start() • Local: Get from registry • HTTP: Build URL • Stdio: Spawn process"]
+ Register["4. Computer Registration • List tools from server • Register in tool registry"]
+ Execute["5. Tool Execution • Agent sends Command • Computer routes to tool • MCP server executes"]
+ Reset["6. Server.reset() (optional) Reset server state"]
+
+ Start --> Config
+ Config --> Manager
+ Manager --> ServerStart
+ ServerStart --> Register
+ Register --> Execute
+ Execute --> Reset
+
+ style Start fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
+ style Config fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
+ style Manager fill:#fff9c4,stroke:#f57f17,stroke-width:2px
+ style ServerStart fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
+ style Register fill:#fce4ec,stroke:#c2185b,stroke-width:2px
+ style Execute fill:#e0f2f1,stroke:#00695c,stroke-width:2px
+ style Reset fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
+```
+
+### Component Relationships
+
+```mermaid
+graph TB
+ subgraph Architecture["MCP Architecture"]
+ Registry["MCPRegistry (Singleton) • Stores server factories • Lazy initialization"]
+
+ Manager["MCPServerManager (Singleton) • Creates server instances • Maps server types to classes • Manages server lifecycle"]
+
+ subgraph ServerTypes["MCP Server Types"]
+ Local["Local MCP Server (In-Process)"]
+ HTTP["HTTP MCP Server (Remote)"]
+ Stdio["Stdio MCP Server (Child Process)"]
+ end
+
+ Computer["Computer (Per Agent) • Manages multiple MCP servers • Routes commands to tools • Maintains tool registry"]
+
+ Registry --> Manager
+ Manager --> ServerTypes
+ Manager --> Computer
+ end
+
+ style Architecture fill:#fafafa,stroke:#424242,stroke-width:2px
+ style Registry fill:#e1f5fe,stroke:#01579b,stroke-width:2px
+ style Manager fill:#fff9c4,stroke:#f57f17,stroke-width:2px
+ style ServerTypes fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
+ style Local fill:#c8e6c9,stroke:#2e7d32,stroke-width:1px
+ style HTTP fill:#c8e6c9,stroke:#2e7d32,stroke-width:1px
+ style Stdio fill:#c8e6c9,stroke:#2e7d32,stroke-width:1px
+ style Computer fill:#fce4ec,stroke:#880e4f,stroke-width:2px
+```
+
+## Built-in MCP Servers
+
+UFO² comes with several **built-in MCP servers** that cover common automation scenarios:
+
+### Data Collection Servers
+
+| Namespace | Purpose | Key Tools | Platform |
+|-----------|---------|-----------|----------|
+| **UICollector** | UI element detection | `get_control_info`, `take_screenshot`, `get_window_list` | Windows |
+| **HardwareCollector** | Hardware information | `get_cpu_info`, `get_memory_info` | Cross-platform |
+| **MobileDataCollector** | Android device observation | `capture_screenshot`, `get_ui_tree`, `get_device_info`, `get_mobile_app_target_info` | Android (ADB) |
+
+### Action Servers
+
+| Namespace | Purpose | Key Tools | Platform |
+|-----------|---------|-----------|----------|
+| **HostUIExecutor** | UI automation (host-level) | `click`, `type_text`, `scroll` | Windows |
+| **AppUIExecutor** | UI automation (app-level) | `click`, `type_text`, `set_edit_text` | Windows |
+| **CommandLineExecutor** | CLI execution | `run_shell` | Cross-platform |
+| **WordCOMExecutor** | Word automation | `insert_table`, `select_text`, `format_text` | Windows |
+| **ExcelCOMExecutor** | Excel automation | `insert_cell`, `select_range`, `format_cell` | Windows |
+| **PowerPointCOMExecutor** | PowerPoint automation | `insert_slide`, `add_text`, `format_shape` | Windows |
+| **ConstellationEditor** | Multi-device coordination | `create_task`, `assign_device` | Cross-platform |
+| **BashExecutor** | Linux commands | `execute_bash` | Linux |
+| **MobileExecutor** | Android device control | `tap`, `swipe`, `type_text`, `launch_app`, `click_control` | Android (ADB) |
+
+!!!example "Tool Examples"
+ ```python
+ # Data Collection: Take a screenshot
+ {
+ "tool_type": "data_collection",
+ "tool_name": "take_screenshot",
+ "parameters": {"region": "active_window"}
+ }
+
+ # Action: Click a button
+ {
+ "tool_type": "action",
+ "tool_name": "click",
+ "parameters": {"control_label": "Submit"}
+ }
+
+ # Action: Run a shell command
+ {
+ "tool_type": "action",
+ "tool_name": "run_shell",
+ "parameters": {"bash_command": "notepad.exe"}
+ }
+ ```
+
+## Agent-Specific Configurations
+
+Different agents can have **different MCP configurations** based on their roles:
+
+```yaml
+# HostAgent: System-level operations
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local
+
+# AppAgent: Application-specific operations
+AppAgent:
+ WINWORD.EXE: # Word-specific configuration
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor # Word COM API
+ type: local
+ reset: true # Reset when switching documents
+
+# HardwareAgent: Remote hardware monitoring
+HardwareAgent:
+ default:
+ data_collection:
+ - namespace: HardwareCollector
+ type: http # Remote server
+ host: "localhost"
+ port: 8006
+ path: "/mcp"
+
+# MobileAgent: Android device automation
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http # Remote server
+ host: "localhost"
+ port: 8020
+ path: "/mcp"
+ action:
+ - namespace: MobileExecutor
+ type: http
+ host: "localhost"
+ port: 8021
+ path: "/mcp"
+```
+
+**Configuration Hierarchy:**
+
+Agent configurations follow this hierarchy:
+
+1. **Agent Name** (e.g., `HostAgent`, `AppAgent`)
+2. **Sub-type** (e.g., `default`, `WINWORD.EXE`)
+3. **Tool Type** (e.g., `data_collection`, `action`)
+4. **Server List** (array of server configurations)
+
+## Key Features
+
+### 1. Process Isolation with Reset
+
+Some MCP servers support **state reset** to ensure clean execution:
+
+```yaml
+AppAgent:
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor
+ type: local
+ reset: true
+
+**When to use reset:**
+
+- Server state is cleared when switching contexts
+- Prevents state leakage between tasks
+- Useful for stateful tools (e.g., COM APIs)
+
+### 2. Thread Isolation
+
+MCP tools execute in **isolated thread pools** to prevent blocking:
+
+```python
+# From Computer class
+self._executor = ThreadPoolExecutor(max_workers=10)
+self._tool_timeout = 6000 # 100 minutes
+```
+
+**Benefits**:
+- Prevents blocking the main event loop
+- Protects WebSocket connections from timeouts
+- Enables concurrent tool execution
+
+**Timeout Protection:** If a tool takes longer than 6000 seconds, it will be cancelled and return a timeout error. Adjust `_tool_timeout` for long-running operations.
+
+### 3. Dynamic Server Management
+
+Add or remove MCP servers at runtime:
+
+```python
+# Add a custom server
+await computer.add_server(
+ namespace="CustomTools",
+ mcp_server=custom_server,
+ tool_type="action"
+)
+
+# Remove a server
+await computer.delete_server(
+ namespace="CustomTools",
+ tool_type="action"
+)
+```
+
+### 4. Tool Introspection
+
+Use meta tools to discover available tools:
+
+```python
+# List all action tools
+tool_call = MCPToolCall(
+ tool_key="action::list_tools",
+ tool_name="list_tools",
+ parameters={"tool_type": "action"}
+)
+
+result = await computer.run_actions([tool_call])
+# Returns: List of all available action tools
+```
+
+For more details on introspection capabilities, see [Computer - Meta Tools](../client/computer.md#meta-tools).
+
+## Configuration Files
+
+MCP configuration is located at:
+
+```
+config/ufo/mcp.yaml
+```
+
+For detailed configuration options, see:
+
+- [MCP Configuration Guide](configuration.md) - Complete configuration reference
+- [System Configuration](../configuration/system/system_config.md) - MCP-related system settings
+- [MCP Reference](../configuration/system/mcp_reference.md) - MCP-specific settings
+
+## Use Cases
+
+### 1. UI Automation
+
+```yaml
+# Agent that automates UI interactions
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector # Detect UI elements
+ action:
+ - namespace: HostUIExecutor # Click, type, scroll
+```
+
+### 2. Document Processing
+
+```yaml
+# Agent specialized for Word documents
+AppAgent:
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector # Read document structure
+ action:
+ - namespace: WordCOMExecutor # Insert tables, format text
+```
+
+### 3. Multi-Device Coordination
+
+```yaml
+# Agent that coordinates tasks across devices
+ConstellationAgent:
+ default:
+ action:
+ - namespace: ConstellationEditor # Create and assign tasks
+```
+
+### 4. Remote Hardware Monitoring
+
+```yaml
+# Agent that monitors remote hardware
+HardwareAgent:
+ default:
+ data_collection:
+ - namespace: HardwareCollector
+ type: http
+ host: "192.168.1.100"
+ port: 8006
+```
+
+### 5. Android Device Automation
+
+```yaml
+# Agent that automates Android devices via ADB
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http
+ host: "localhost" # Or remote Android automation server
+ port: 8020
+ path: "/mcp"
+ action:
+ - namespace: MobileExecutor
+ type: http
+ host: "localhost"
+ port: 8021
+ path: "/mcp"
+```
+
+## Getting Started
+
+To start using MCP in UFO²:
+
+1. **Understand the two server types** - Read about [Data Collection](data_collection.md) and [Action](action.md) servers
+2. **Configure your agents** - See [Configuration Guide](configuration.md) for setup details
+3. **Use built-in servers** - Explore available [Local Servers](local_servers.md)
+4. **Create custom servers** - Follow the [Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md)
+5. **Deploy remotely** - Learn about [Remote Servers](remote_servers.md) deployment
+
+## Related Documentation
+
+- [Data Collection Servers](data_collection.md) - Read-only observation tools
+- [Action Servers](action.md) - State-changing execution tools
+- [Configuration Guide](configuration.md) - How to configure MCP for agents
+- [Local Servers](local_servers.md) - Built-in MCP servers
+- [Remote Servers](remote_servers.md) - HTTP and Stdio deployment
+- [Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md) - Step-by-step guide to building custom servers
+- [Computer](../client/computer.md) - MCP tool execution layer
+- [Agent Client](../client/overview.md) - Client architecture overview
+- [Agent Overview](../ufo2/overview.md) - UFO² agent system architecture
+
+**Design Philosophy:**
+
+MCP in UFO² follows the **separation of concerns** principle:
+
+- **Agents** decide *what* to do (high-level planning)
+- **MCP servers** implement *how* to do it (low-level execution)
+- **Computer** manages the routing between them (middleware)
+
+This architecture enables flexibility, extensibility, and maintainability.
diff --git a/documents/docs/mcp/remote_servers.md b/documents/docs/mcp/remote_servers.md
new file mode 100644
index 000000000..f58b288a7
--- /dev/null
+++ b/documents/docs/mcp/remote_servers.md
@@ -0,0 +1,224 @@
+# Remote MCP Servers
+
+Remote MCP servers run as separate processes or on different machines, communicating with UFO² over HTTP or stdio. This enables **cross-platform automation**, process isolation, and distributed workflows.
+
+**Cross-Platform Automation:** Remote servers enable **Windows UFO² agents to control Linux systems, mobile devices, and hardware** through HTTP MCP servers running on those platforms.
+
+## Deployment Models
+
+### HTTP Servers
+
+HTTP MCP servers run as standalone HTTP services, accessible via REST-like endpoints.
+
+**Advantages:**
+- Cross-platform communication (Windows ↔ Linux, Windows ↔ Hardware)
+- Language-agnostic (server can be in Python, Go, Rust, etc.)
+- Network-accessible (local or remote deployment)
+- Stateless design (each request is independent)
+
+**Use Cases:**
+- Linux command execution from Windows
+- Hardware device control (Arduino, robot arms, test fixtures)
+- Mobile device automation (Android, iOS via robot arm)
+- Distributed multi-machine workflows
+
+### Stdio Servers
+
+Stdio MCP servers run as child processes, communicating via stdin/stdout.
+
+**Advantages:**
+- Process isolation (sandboxed execution)
+- Clean resource management (process lifetime)
+- Standard protocol (works with any language)
+
+**Use Cases:**
+- Custom Python/Node.js tools running in separate environments
+- Third-party MCP servers
+- Sandboxed execution for security
+
+
+---
+
+## Built-in Remote Servers
+
+### HardwareExecutor
+
+**Type**: Action (HTTP deployment)
+**Purpose**: Control hardware devices (Arduino HID, BB-8 test fixture, robot arm, mobile devices)
+**Deployment**: HTTP server on hardware controller machine
+**Agent**: HardwareAgent
+**Tools**: 30+ hardware control tools
+
+**[→ See complete HardwareExecutor documentation](servers/hardware_executor.md)** for all hardware control tools, deployment instructions, and usage examples.
+
+---
+
+### BashExecutor
+
+**Type**: Action (HTTP deployment)
+**Purpose**: Execute shell commands on Linux systems
+**Deployment**: HTTP server on Linux machine
+**Agent**: LinuxAgent
+**Tools**: 2 tools for command execution and system info
+
+**[→ See complete BashExecutor documentation](servers/bash_executor.md)** for Linux command execution, security guidelines, and systemd setup.
+
+---
+
+### MobileExecutor
+
+**Type**: Action + Data Collection (HTTP deployment, dual-server)
+**Purpose**: Android device automation via ADB
+**Deployment**: HTTP servers on machine with ADB access
+**Agent**: MobileAgent
+**Ports**: 8020 (data collection), 8021 (action)
+**Tools**: 13+ tools for Android automation
+
+**Architecture**: Runs as **two HTTP servers** that share a singleton state manager for coordinated operations:
+- **Mobile Data Collection Server** (port 8020): Screenshots, UI tree, device info, app list, controls
+- **Mobile Action Server** (port 8021): Tap, swipe, type, launch apps, press keys, control clicks
+
+**[→ See complete MobileExecutor documentation](servers/mobile_executor.md)** for all Android automation tools, dual-server architecture, deployment instructions, and usage examples.
+
+---
+
+## Configuration Reference
+
+### HTTP Server Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `namespace` | String | ✅ Yes | Unique server identifier |
+| `type` | String | ✅ Yes | Must be `"http"` |
+| `host` | String | ✅ Yes | Server hostname or IP |
+| `port` | Integer | ✅ Yes | Server port number |
+| `path` | String | ✅ Yes | HTTP endpoint path |
+| `reset` | Boolean | ❌ No | Reset on context switch (default: `false`) |
+
+### Stdio Server Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `namespace` | String | ✅ Yes | Unique server identifier |
+| `type` | String | ✅ Yes | Must be `"stdio"` |
+| `command` | String | ✅ Yes | Executable command |
+| `start_args` | List[String] | ❌ No | Command-line arguments |
+| `env` | Dict | ❌ No | Environment variables |
+| `cwd` | String | ❌ No | Working directory |
+| `reset` | Boolean | ❌ No | Reset on context switch (default: `false`) |
+
+---
+
+## Example Configurations
+
+### HTTP: Hardware Control
+
+```yaml
+HardwareAgent:
+ default:
+ action:
+ - namespace: HardwareExecutor
+ type: http
+ host: "192.168.1.100"
+ port: 8006
+ path: "/mcp"
+```
+
+**Server Start:**
+```bash
+python -m ufo.client.mcp.http_servers.hardware_mcp_server --host 0.0.0.0 --port 8006
+```
+
+See the [HardwareExecutor documentation](servers/hardware_executor.md) for complete deployment instructions.
+
+### HTTP: Linux Command Execution
+
+```yaml
+LinuxAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "192.168.1.50"
+ port: 8010
+ path: "/mcp"
+```
+
+**Server Start:**
+```bash
+python -m ufo.client.mcp.http_servers.linux_mcp_server --host 0.0.0.0 --port 8010
+```
+
+See the [BashExecutor documentation](servers/bash_executor.md) for systemd service setup.
+
+### HTTP: Android Device Automation
+
+```yaml
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http
+ host: "192.168.1.60" # Android automation server
+ port: 8020
+ path: "/mcp"
+ action:
+ - namespace: MobileExecutor
+ type: http
+ host: "192.168.1.60"
+ port: 8021
+ path: "/mcp"
+```
+
+**Server Start:**
+```bash
+# Start both servers (recommended - they share state)
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both --host 0.0.0.0 --data-port 8020 --action-port 8021
+```
+
+See the [MobileExecutor documentation](servers/mobile_executor.md) for complete deployment instructions and ADB setup.
+
+### Stdio: Custom Python Server
+
+```yaml
+CustomAgent:
+ default:
+ action:
+ - namespace: CustomProcessor
+ type: stdio
+ command: "python"
+ start_args: ["-m", "custom_mcp_server"]
+ env:
+ API_KEY: "secret_key"
+ cwd: "/path/to/server"
+```
+
+---
+
+## Best Practices
+
+**Recommended Practices:**
+
+- ✅ **Use HTTP for cross-platform automation**
+- ✅ **Use stdio for process isolation**
+- ✅ **Validate remote server connectivity** before deployment
+- ✅ **Set appropriate timeouts** for long-running commands
+- ✅ **Use environment variables** for sensitive credentials
+
+**Anti-Patterns to Avoid:**
+
+- ❌ **Don't expose HTTP servers to public internet** without authentication
+- ❌ **Don't hardcode credentials** in configuration files
+- ❌ **Don't forget to start remote servers** before client connection
+
+---
+
+## See Also
+
+- [MCP Overview](./overview.md) - MCP architecture and deployment models
+- [Local Servers](./local_servers.md) - In-process servers
+- [MCP Configuration](./configuration.md) - Complete configuration reference
+- [Action Servers](./action.md) - Action execution overview
+- **[Creating Custom MCP Servers Tutorial](../tutorials/creating_mcp_servers.md)** - Step-by-step guide for HTTP/Stdio servers
+- [HardwareExecutor](servers/hardware_executor.md) - Complete hardware control reference
+- [BashExecutor](servers/bash_executor.md) - Complete Linux command reference
diff --git a/documents/docs/mcp/servers/app_ui_executor.md b/documents/docs/mcp/servers/app_ui_executor.md
new file mode 100644
index 000000000..af4f69bc3
--- /dev/null
+++ b/documents/docs/mcp/servers/app_ui_executor.md
@@ -0,0 +1,414 @@
+# AppUIExecutor Server
+
+## Overview
+
+**AppUIExecutor** is an action server that provides application-level UI automation for the AppAgent. It enables precise interaction with UI controls within the currently selected application window.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** AppAgent
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `AppUIExecutor` |
+| **Server Name** | `UFO UI AppAgent Action MCP Server` |
+| **Platform** | Windows |
+| **Tool Type** | `action` |
+| **Tool Key Format** | `action::{tool_name}` |
+
+## Tools Summary
+
+| Tool Name | Description | Parameters |
+|-----------|-------------|------------|
+| `click_input` | Click on a UI control | `id`, `name`, `button`, `double` |
+| `click_on_coordinates` | Click at fractional coordinates | `x`, `y`, `button`, `double` |
+| `drag_on_coordinates` | Drag between two points | `start_x`, `start_y`, `end_x`, `end_y`, `button`, `duration`, `key_hold` |
+| `set_edit_text` | Set text in edit control | `id`, `name`, `text`, `clear_current_text` |
+| `keyboard_input` | Send keyboard keys | `id`, `name`, `keys`, `control_focus` |
+| `wheel_mouse_input` | Scroll with mouse wheel | `id`, `name`, `wheel_dist` |
+| `texts` | Get text from control | `id`, `name` |
+| `wait` | Wait for specified time | `seconds` |
+| `summary` | Provide observation summary | `text` |
+
+## Tool Details
+
+### click_input
+
+Click on a UI control element using the mouse.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `id` | `str` | ✅ Yes | - | Control ID from `get_app_window_controls_info` |
+| `name` | `str` | ✅ Yes | - | Control name matching the ID |
+| `button` | `str` | No | `"left"` | Mouse button: `"left"`, `"right"`, `"middle"`, `"x"` |
+| `double` | `bool` | No | `False` | Perform double-click |
+
+#### Returns
+
+`str` - Result message or warning if name doesn't match ID
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_input",
+ tool_name="click_input",
+ parameters={
+ "id": "5",
+ "name": "Submit Button",
+ "button": "left",
+ "double": False
+ }
+ )
+])
+```
+
+---
+
+### click_on_coordinates
+
+Click at specific fractional coordinates within the window (0.0-1.0).
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `x` | `float` | ✅ Yes | - | Relative x-coordinate (0.0-1.0) |
+| `y` | `float` | ✅ Yes | - | Relative y-coordinate (0.0-1.0) |
+| `button` | `str` | No | `"left"` | Mouse button |
+| `double` | `bool` | No | `False` | Double-click |
+
+#### Example
+
+```python
+# Click at center of window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_on_coordinates",
+ tool_name="click_on_coordinates",
+ parameters={"x": 0.5, "y": 0.5, "button": "left"}
+ )
+])
+```
+
+---
+
+### drag_on_coordinates
+
+Drag from one fractional coordinate to another.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `start_x` | `float` | ✅ Yes | - | Start x-coordinate (0.0-1.0) |
+| `start_y` | `float` | ✅ Yes | - | Start y-coordinate (0.0-1.0) |
+| `end_x` | `float` | ✅ Yes | - | End x-coordinate (0.0-1.0) |
+| `end_y` | `float` | ✅ Yes | - | End y-coordinate (0.0-1.0) |
+| `button` | `str` | No | `"left"` | Mouse button |
+| `duration` | `float` | No | `1.0` | Drag duration in seconds |
+| `key_hold` | `str` | No | `None` | Key to hold (`"ctrl"`, `"shift"`) |
+
+#### Example
+
+```python
+# Drag from top-left to bottom-right
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::drag_on_coordinates",
+ tool_name="drag_on_coordinates",
+ parameters={
+ "start_x": 0.2, "start_y": 0.2,
+ "end_x": 0.8, "end_y": 0.8,
+ "duration": 1.5
+ }
+ )
+])
+```
+
+---
+
+### set_edit_text
+
+Set text in an edit control.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `id` | `str` | ✅ Yes | - | Control ID |
+| `name` | `str` | ✅ Yes | - | Control name |
+| `text` | `str` | ✅ Yes | - | Text to set |
+| `clear_current_text` | `bool` | No | `False` | Clear existing text first |
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_edit_text",
+ tool_name="set_edit_text",
+ parameters={
+ "id": "3",
+ "name": "Search Box",
+ "text": "Hello World",
+ "clear_current_text": True
+ }
+ )
+])
+```
+
+---
+
+### keyboard_input
+
+Send keyboard input to a control or application.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `id` | `str` | ✅ Yes | - | Control ID |
+| `name` | `str` | ✅ Yes | - | Control name |
+| `keys` | `str` | ✅ Yes | - | Key sequence (e.g., `"{VK_CONTROL}c"`, `"{TAB 2}"`) |
+| `control_focus` | `bool` | No | `True` | Focus control before sending keys |
+
+#### Example
+
+```python
+# Copy selected text (Ctrl+C)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::keyboard_input",
+ tool_name="keyboard_input",
+ parameters={
+ "id": "1",
+ "name": "Editor",
+ "keys": "{VK_CONTROL}c",
+ "control_focus": True
+ }
+ )
+])
+
+# Press Tab twice
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::keyboard_input",
+ tool_name="keyboard_input",
+ parameters={
+ "id": "1",
+ "name": "Form",
+ "keys": "{TAB 2}"
+ }
+ )
+])
+```
+
+---
+
+### wheel_mouse_input
+
+Scroll using mouse wheel on a control.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `id` | `str` | ✅ Yes | - | Control ID |
+| `name` | `str` | ✅ Yes | - | Control name |
+| `wheel_dist` | `int` | No | `0` | Wheel notches (positive=up, negative=down) |
+
+#### Example
+
+```python
+# Scroll down 5 notches
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::wheel_mouse_input",
+ tool_name="wheel_mouse_input",
+ parameters={
+ "id": "10",
+ "name": "Content Panel",
+ "wheel_dist": -5
+ }
+ )
+])
+```
+
+---
+
+### texts
+
+Retrieve all text content from a control.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `id` | `str` | ✅ Yes | - | Control ID |
+| `name` | `str` | ✅ Yes | - | Control name |
+
+#### Returns
+
+`str` - Text content of the control
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::texts",
+ tool_name="texts",
+ parameters={"id": "7", "name": "Status Label"}
+ )
+])
+# result[0].data = "Operation completed successfully"
+```
+
+---
+
+### wait
+
+Wait for a specified duration (non-blocking).
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `seconds` | `float` | ✅ Yes | - | Wait duration (max 300s) |
+
+#### Example
+
+```python
+# Wait for 2 seconds
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::wait",
+ tool_name="wait",
+ parameters={"seconds": 2.0}
+ )
+])
+```
+
+---
+
+### summary
+
+Provide a visual summary of observations.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `text` | `str` | ✅ Yes | - | Summary text based on visual observation |
+
+#### Returns
+
+`str` - The summary text (passed through)
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::summary",
+ tool_name="summary",
+ parameters={
+ "text": "Window shows login form with username and password fields. Submit button is enabled."
+ }
+ )
+])
+```
+
+## Configuration
+
+```yaml
+AppAgent:
+ default:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ reset: false
+
+ # App-specific configuration
+ WINWORD.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor # Additional server for Word
+ type: local
+```
+
+## Best Practices
+
+### 1. Always Verify Control ID and Name
+
+```python
+# ✅ Good
+controls = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_app_window_controls_info", ...)
+])
+
+control = controls[0].data[0] # Get first control
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_input",
+ parameters={
+ "id": control["label"],
+ "name": control["control_text"]
+ }
+ )
+])
+
+# ❌ Bad: Hardcode IDs
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_input",
+ parameters={"id": "1", "name": "Button"} # May not exist
+ )
+])
+```
+
+### 2. Use Coordinates for Unlabeled Elements
+
+```python
+# When control not in control list
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_on_coordinates",
+ parameters={"x": 0.75, "y": 0.25} # Top-right area
+ )
+])
+```
+
+### 3. Wait After Actions
+
+```python
+# Click button
+await computer.run_actions([
+ MCPToolCall(tool_key="action::click_input", ...)
+])
+
+# Wait for UI update
+await computer.run_actions([
+ MCPToolCall(tool_key="action::wait", parameters={"seconds": 1.0})
+])
+
+# Verify result
+screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::capture_window_screenshot", ...)
+])
+```
+
+## Related Documentation
+
+- [HostUIExecutor](./host_ui_executor.md) - Window selection
+- [UICollector](./ui_collector.md) - Control discovery
+- [Action Servers](../action.md) - Action concepts
+- [AppAgent Overview](../../ufo2/app_agent/overview.md) - AppAgent architecture
diff --git a/documents/docs/mcp/servers/bash_executor.md b/documents/docs/mcp/servers/bash_executor.md
new file mode 100644
index 000000000..aec03574a
--- /dev/null
+++ b/documents/docs/mcp/servers/bash_executor.md
@@ -0,0 +1,466 @@
+# BashExecutor Server
+
+## Overview
+
+**BashExecutor** provides Linux shell command execution with output capture and system information retrieval via HTTP MCP server.
+
+**Server Type:** Action
+**Deployment:** HTTP (remote Linux server)
+**Default Port:** 8010
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `BashExecutor` |
+| **Server Name** | `Linux Bash MCP Server` |
+| **Platform** | Linux |
+| **Tool Type** | `action` |
+| **Deployment** | HTTP server (stateless) |
+
+## Tools
+
+### execute_command
+
+Execute a shell command on Linux and return stdout/stderr with exit code.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `command` | `str` | ✅ Yes | - | Shell command to execute (valid bash/sh command) |
+| `timeout` | `int` | No | `30` | Maximum execution time in seconds (default: 30, max: any) |
+| `cwd` | `str` | No | `None` | Working directory path (absolute path recommended) |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool, # True if exit code == 0
+ "exit_code": int, # Process exit code
+ "stdout": str, # Standard output
+ "stderr": str, # Standard error output
+ # OR
+ "error": str # Error message if execution failed
+}
+```
+
+#### Example
+
+```python
+# Simple command
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ tool_name="execute_command",
+ parameters={
+ "command": "ls -la /home",
+ "timeout": 30
+ }
+ )
+])
+
+# Output:
+# {
+# "success": True,
+# "exit_code": 0,
+# "stdout": "total 12\ndrwxr-xr-x 3 root root 4096 ...",
+# "stderr": ""
+# }
+
+# Command with specific working directory
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ tool_name="execute_command",
+ parameters={
+ "command": "python script.py --arg value",
+ "timeout": 60,
+ "cwd": "/home/user/project"
+ }
+ )
+])
+
+# Check system info
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ tool_name="execute_command",
+ parameters={"command": "cat /etc/os-release"}
+ )
+])
+```
+
+#### Security Blocklist
+
+Dangerous commands are automatically blocked:
+
+| Blocked Command | Reason |
+|-----------------|--------|
+| `rm -rf /` | System destruction |
+| `:(){ :\|:& };:` | Fork bomb |
+| `mkfs` | Filesystem formatting |
+| `dd if=/dev/zero` | Disk overwrite |
+| `shutdown` | System shutdown |
+| `reboot` | System reboot |
+
+**Returns**: `{"success": False, "error": "Blocked dangerous command."}`
+
+#### Timeout Handling
+
+If command exceeds timeout:
+
+```python
+{
+ "success": False,
+ "error": "Timeout after {timeout}s."
+}
+```
+
+#### Error Handling
+
+If execution fails:
+
+```python
+{
+ "success": False,
+ "error": "{exception_details}"
+}
+```
+
+---
+
+### get_system_info
+
+Get basic Linux system information (uname, uptime, memory, disk).
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "uname": str, # System and kernel info (uname -a)
+ "uptime": str, # System uptime and load averages
+ "memory": str, # Memory usage statistics (free -h)
+ "disk": str # Disk space usage (df -h)
+}
+```
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::get_system_info",
+ tool_name="get_system_info",
+ parameters={}
+ )
+])
+
+# Output:
+# {
+# "uname": "Linux server 5.15.0-91-generic #101-Ubuntu SMP x86_64 GNU/Linux",
+# "uptime": " 10:30:45 up 5 days, 2:15, 3 users, load average: 0.52, 0.58, 0.59",
+# "memory": " total used free shared buff/cache available\nMem: 15Gi 4.2Gi 7.8Gi 123Mi 3.0Gi 10Gi\nSwap: 2.0Gi 0B 2.0Gi",
+# "disk": "Filesystem Size Used Avail Use% Mounted on\n/dev/sda1 100G 45G 50G 48% /"
+# }
+```
+
+#### Error Handling
+
+If command fails, value is error message:
+
+```python
+{
+ "uname": "Linux ubuntu ...",
+ "uptime": "Error: No such file or directory",
+ "memory": "...",
+ "disk": "..."
+}
+```
+
+## Configuration
+
+### Client Configuration
+
+```yaml
+# Windows client connecting to Linux server
+HostAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "192.168.1.100" # Linux server IP
+ port: 8010
+ path: "/mcp"
+
+# Linux client (local)
+HostAgent:
+ default:
+ action:
+ - namespace: BashExecutor
+ type: http
+ host: "localhost"
+ port: 8010
+ path: "/mcp"
+```
+
+## Deployment
+
+### Starting the Server
+
+```bash
+# Start Bash MCP server on Linux
+python -m ufo.client.mcp.http_servers.linux_mcp_server --host 0.0.0.0 --port 8010
+
+# Output:
+# ==================================================
+# UFO Linux Bash MCP Server
+# Linux command execution via Model Context Protocol
+# Running on 0.0.0.0:8010
+# ==================================================
+```
+
+### Command-Line Arguments
+
+| Argument | Default | Description |
+|----------|---------|-------------|
+| `--host` | `localhost` | Host to bind server to |
+| `--port` | `8010` | Port to run server on |
+
+### Systemd Service (Optional)
+
+```ini
+# /etc/systemd/system/ufo-bash-mcp.service
+[Unit]
+Description=UFO Bash MCP Server
+After=network.target
+
+[Service]
+Type=simple
+User=ufo
+WorkingDirectory=/home/ufo/UFO2
+ExecStart=/usr/bin/python3 -m ufo.client.mcp.http_servers.linux_mcp_server --host 0.0.0.0 --port 8010
+Restart=on-failure
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+```bash
+sudo systemctl enable ufo-bash-mcp
+sudo systemctl start ufo-bash-mcp
+sudo systemctl status ufo-bash-mcp
+```
+
+## Best Practices
+
+### 1. Use Absolute Paths
+
+```python
+# ✅ Good: Absolute paths
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={
+ "command": "ls /home/user/project",
+ "cwd": "/home/user"
+ }
+ )
+])
+
+# ❌ Bad: Relative paths may fail
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={
+ "command": "ls project", # May fail if cwd unclear
+ "cwd": None
+ }
+ )
+])
+```
+
+### 2. Set Appropriate Timeouts
+
+```python
+# Quick commands: short timeout
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={"command": "ls -la", "timeout": 5}
+ )
+])
+
+# Long-running: increase timeout
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={"command": "python train_model.py", "timeout": 3600} # 1 hour
+ )
+])
+```
+
+### 3. Check Exit Codes
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={"command": "grep 'pattern' file.txt"}
+ )
+])
+
+if result[0].data["success"]:
+ logger.info(f"Found: {result[0].data['stdout']}")
+else:
+ logger.warning(f"Not found (exit code {result[0].data['exit_code']})")
+```
+
+### 4. Validate Commands
+
+```python
+def safe_execute(command: str, allowed_commands: List[str]):
+ """Whitelist-based command validation"""
+ cmd_base = command.split()[0]
+
+ if cmd_base not in allowed_commands:
+ raise ValueError(f"Command not allowed: {cmd_base}")
+
+ return MCPToolCall(
+ tool_key="action::execute_command",
+ tool_name="execute_command",
+ parameters={"command": command}
+ )
+
+# Usage
+allowed = ["ls", "cat", "grep", "find", "python3"]
+await computer.run_actions([safe_execute("ls -la /home", allowed)])
+```
+
+## Use Cases
+
+### 1. System Monitoring
+
+```python
+# Get system info
+info = await computer.run_actions([
+ MCPToolCall(tool_key="action::get_system_info", parameters={})
+])
+
+# Parse disk usage
+disk_info = info[0].data["disk"]
+if "98%" in disk_info:
+ logger.warning("Disk almost full!")
+```
+
+### 2. Log Analysis
+
+```python
+# Search logs
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={
+ "command": "grep ERROR /var/log/application.log | tail -20",
+ "timeout": 10
+ }
+ )
+])
+
+errors = result[0].data["stdout"]
+```
+
+### 3. File Operations
+
+```python
+# Create directory
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={"command": "mkdir -p /tmp/workspace/data"}
+ )
+])
+
+# Copy files
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={"command": "cp source.txt /tmp/workspace/"}
+ )
+])
+```
+
+### 4. Script Execution
+
+```python
+# Run Python script
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::execute_command",
+ parameters={
+ "command": "python3 process_data.py --input data.csv --output results.json",
+ "timeout": 300,
+ "cwd": "/home/user/scripts"
+ }
+ )
+])
+
+if result[0].data["success"]:
+ logger.info("Script completed successfully")
+else:
+ logger.error(f"Script failed: {result[0].data['stderr']}")
+```
+
+## Comparison with CommandLineExecutor
+
+| Feature | CommandLineExecutor | BashExecutor |
+|---------|---------------------|--------------|
+| **Platform** | Windows/Cross-platform | Linux only |
+| **Output Capture** | ❌ No | ✅ Yes (stdout/stderr) |
+| **Exit Code** | ❌ No | ✅ Yes |
+| **Timeout** | Fixed 5s | ✅ Configurable |
+| **Working Directory** | ❌ No | ✅ Yes |
+| **Deployment** | Local | HTTP (remote) |
+| **Security** | ⚠️ No blocklist | ✅ Dangerous commands blocked |
+
+## Security Considerations
+
+!!!danger "Security Warning"
+ - **Command injection risk**: Always validate/sanitize commands
+ - **Privilege escalation**: Server runs with user permissions
+ - **Network exposure**: Use firewall rules to limit access
+ - **Sensitive data**: Stdout/stderr may contain secrets
+
+### Recommendations
+
+1. **Use firewall**: Restrict access to trusted IPs
+ ```bash
+ sudo ufw allow from 192.168.1.0/24 to any port 8010
+ ```
+
+2. **Run as limited user**: Don't run server as root
+ ```bash
+ useradd -m -s /bin/bash ufo
+ sudo -u ufo python3 -m ufo.client.mcp.http_servers.linux_mcp_server
+ ```
+
+3. **Implement command whitelist**: Don't execute arbitrary commands
+
+4. **Use HTTPS**: For production, add TLS encryption
+
+## Related Documentation
+
+- [CommandLineExecutor](./command_line_executor.md) - Windows command execution
+- [HardwareExecutor](./hardware_executor.md) - Hardware control via HTTP
+- [Remote Servers](../remote_servers.md) - HTTP deployment guide
+- [Action Servers](../action.md) - Action server concepts
diff --git a/documents/docs/mcp/servers/command_line_executor.md b/documents/docs/mcp/servers/command_line_executor.md
new file mode 100644
index 000000000..f31e2ff04
--- /dev/null
+++ b/documents/docs/mcp/servers/command_line_executor.md
@@ -0,0 +1,301 @@
+# CommandLineExecutor Server
+
+## Overview
+
+**CommandLineExecutor** provides shell command execution capabilities for launching applications and running system commands.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** HostAgent, AppAgent
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `CommandLineExecutor` |
+| **Server Name** | `UFO CLI MCP Server` |
+| **Platform** | Cross-platform (Windows, Linux, macOS) |
+| **Tool Type** | `action` |
+
+## Tools
+
+### run_shell
+
+Execute a shell command to launch applications or perform system operations.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `bash_command` | `str` | ✅ Yes | Command to execute in shell |
+
+#### Returns
+
+`None` - Command is launched asynchronously (5-second wait after execution)
+
+#### Example
+
+```python
+# Launch Notepad
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ tool_name="run_shell",
+ parameters={"bash_command": "notepad.exe"}
+ )
+])
+
+# Launch application with arguments
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ tool_name="run_shell",
+ parameters={"bash_command": "python script.py --arg value"}
+ )
+])
+
+# Create directory (Windows)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ tool_name="run_shell",
+ parameters={"bash_command": "mkdir C:\\temp\\newfolder"}
+ )
+])
+```
+
+#### Error Handling
+
+Raises `ToolError` if:
+- Command is empty
+- Execution fails
+
+```python
+# Error: Empty command
+ToolError("Bash command cannot be empty.")
+
+# Error: Execution failed
+ToolError("Failed to launch application: {error_details}")
+```
+
+#### Implementation Details
+
+- Uses `subprocess.Popen` with `shell=True`
+- Waits 5 seconds after launch for application to start
+- Non-blocking: Returns immediately after launch
+
+!!!danger "Security Warning"
+ **Arbitrary command execution risk!** Always validate commands before execution.
+
+ Dangerous examples:
+ - `rm -rf /` (Linux)
+ - `del /F /S /Q C:\*` (Windows)
+ - `shutdown /s /t 0`
+
+ **Best Practice**: Implement command whitelist or validation.
+
+## Configuration
+
+```yaml
+HostAgent:
+ default:
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local # Enable shell execution
+
+AppAgent:
+ default:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local # Enable if app needs to launch child processes
+```
+
+## Best Practices
+
+### 1. Validate Commands
+
+```python
+def safe_run_shell(command: str):
+ """Whitelist-based command validation"""
+ allowed_commands = [
+ "notepad.exe",
+ "calc.exe",
+ "mspaint.exe",
+ "code", # VS Code
+ ]
+
+ cmd_base = command.split()[0]
+ if cmd_base not in allowed_commands:
+ raise ValueError(f"Command not allowed: {cmd_base}")
+
+ return MCPToolCall(
+ tool_key="action::run_shell",
+ tool_name="run_shell",
+ parameters={"bash_command": command}
+ )
+
+# Usage
+await computer.run_actions([safe_run_shell("notepad.exe test.txt")])
+```
+
+### 2. Wait for Application Launch
+
+```python
+# Launch application
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "notepad.exe"}
+ )
+])
+
+# Wait for launch (5 seconds built-in + extra)
+await asyncio.sleep(2)
+
+# Get window list
+windows = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
+])
+
+# Find Notepad window
+notepad_windows = [w for w in windows[0].data if "Notepad" in w["name"]]
+```
+
+### 3. Platform-Specific Commands
+
+```python
+import platform
+
+def get_platform_command(app_name: str) -> str:
+ """Get platform-specific command"""
+ if platform.system() == "Windows":
+ commands = {
+ "notepad": "notepad.exe",
+ "terminal": "cmd.exe",
+ "browser": "start msedge"
+ }
+ elif platform.system() == "Darwin": # macOS
+ commands = {
+ "notepad": "open -a TextEdit",
+ "terminal": "open -a Terminal",
+ "browser": "open -a Safari"
+ }
+ else: # Linux
+ commands = {
+ "notepad": "gedit",
+ "terminal": "gnome-terminal",
+ "browser": "firefox"
+ }
+
+ return commands.get(app_name, app_name)
+
+# Usage
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": get_platform_command("notepad")}
+ )
+])
+```
+
+### 4. Handle Launch Failures
+
+```python
+try:
+ result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "nonexistent.exe"}
+ )
+ ])
+
+ if result[0].is_error:
+ logger.error(f"Failed to launch: {result[0].content}")
+ # Retry with alternative command
+
+except Exception as e:
+ logger.error(f"Command execution exception: {e}")
+```
+
+## Use Cases
+
+### 1. Application Launching
+
+```python
+# Launch text editor
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "notepad.exe"}
+ )
+])
+
+# Launch browser with URL
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "start https://www.example.com"}
+ )
+])
+```
+
+### 2. File Operations
+
+```python
+# Create directory
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "mkdir C:\\temp\\workspace"}
+ )
+])
+
+# Copy file
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "copy source.txt dest.txt"}
+ )
+])
+```
+
+### 3. Script Execution
+
+```python
+# Run Python script
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "python automation_script.py --mode batch"}
+ )
+])
+
+# Run PowerShell script
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::run_shell",
+ parameters={"bash_command": "powershell -File script.ps1"}
+ )
+])
+```
+
+## Limitations
+
+- **No output capture**: Command output (stdout/stderr) is not returned
+- **No exit code**: Cannot determine if command succeeded
+- **Async execution**: No way to know when command completes
+- **Security risk**: Arbitrary command execution
+
+**Tip:** For Linux systems with output capture and better control, use **BashExecutor** server instead.
+
+## Related Documentation
+
+- [BashExecutor](./bash_executor.md) - Linux command execution with output
+- [Action Servers](../action.md) - Action server concepts
+- [HostAgent](../../ufo2/host_agent/overview.md) - HostAgent architecture
+
diff --git a/documents/docs/mcp/servers/constellation_editor.md b/documents/docs/mcp/servers/constellation_editor.md
new file mode 100644
index 000000000..183ffcda2
--- /dev/null
+++ b/documents/docs/mcp/servers/constellation_editor.md
@@ -0,0 +1,447 @@
+# ConstellationEditor Server
+
+## Overview
+
+**ConstellationEditor** provides multi-device task coordination and dependency management for distributed workflows in UFO².
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** GalaxyAgent
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `ConstellationEditor` |
+| **Server Name** | `UFO Constellation Editor MCP Server` |
+| **Platform** | Cross-platform |
+| **Tool Type** | `action` |
+
+## Tools Summary
+
+| Category | Tool Name | Description |
+|----------|-----------|-------------|
+| **Task Management** | `add_task` | Create new task |
+| | `remove_task` | Delete task |
+| | `update_task` | Modify task properties |
+| **Dependency Management** | `add_dependency` | Create task dependency |
+| | `remove_dependency` | Delete dependency |
+| | `update_dependency` | Modify dependency description |
+| **Bulk Operations** | `build_constellation` | Build complete constellation from config |
+
+## Task Management Tools
+
+### add_task
+
+Add a new task to the constellation.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `task_id` | `str` | ✅ Yes | - | Unique task identifier (e.g., `"open_browser"`, `"login_system"`) |
+| `name` | `str` | ✅ Yes | - | Human-readable task name (e.g., `"Open Browser"`) |
+| `description` | `str` | ✅ Yes | - | Detailed task description with steps and expected outcomes |
+| `target_device_id` | `str` | No | `None` | Device ID where task should execute (from Device Info List) |
+| `tips` | `List[str]` | No | `None` | List of tips and best practices for task execution |
+
+#### Returns
+
+`str` - JSON representation of complete TaskConstellation after adding task
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::add_task",
+ tool_name="add_task",
+ parameters={
+ "task_id": "extract_data",
+ "name": "Extract Data from Excel",
+ "description": "Open Excel file, extract data from Sheet1, save to CSV format",
+ "target_device_id": "device_windows_001",
+ "tips": [
+ "Ensure Excel is installed",
+ "Close Excel before running task",
+ "Verify file path exists"
+ ]
+ }
+ )
+])
+```
+
+---
+
+### remove_task
+
+Remove a task from the constellation (also removes all dependencies involving this task).
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `task_id` | `str` | ✅ Yes | Unique task identifier to remove |
+
+#### Returns
+
+`str` - JSON representation of constellation after removal
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::remove_task",
+ tool_name="remove_task",
+ parameters={"task_id": "extract_data"}
+ )
+])
+```
+
+---
+
+### update_task
+
+Update specific fields of an existing task.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `task_id` | `str` | ✅ Yes | - | Task to update |
+| `name` | `str` | No | `None` | New task name (leave empty to keep current) |
+| `description` | `str` | No | `None` | New description |
+| `target_device_id` | `str` | No | `None` | New target device |
+| `tips` | `List[str]` | No | `None` | New tips list |
+
+**Note:** Only provided fields are updated; others remain unchanged.
+
+#### Returns
+
+`str` - JSON representation of constellation after update
+
+#### Example
+
+```python
+# Update only description and tips
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::update_task",
+ tool_name="update_task",
+ parameters={
+ "task_id": "extract_data",
+ "description": "Extract data from Excel Sheet1 and Sheet2, merge into single CSV",
+ "tips": [
+ "Ensure Excel is installed",
+ "Handle merged cells properly",
+ "Verify output CSV encoding"
+ ]
+ }
+ )
+])
+```
+
+## Dependency Management Tools
+
+### add_dependency
+
+Create a dependency relationship between two tasks (source task must complete before target task can start).
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | **MUST generate unique ID** (e.g., `"login->extract_data"`) |
+| `from_task_id` | `str` | ✅ Yes | Source/prerequisite task ID |
+| `to_task_id` | `str` | ✅ Yes | Target/dependent task ID |
+| `condition_description` | `str` | No | `None` | Human-readable description of dependency condition |
+
+!!!warning "dependency_id Required"
+ You **MUST** generate and provide a unique `dependency_id`. Do not omit this parameter!
+
+#### Returns
+
+`str` - JSON representation of constellation after adding dependency
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::add_dependency",
+ tool_name="add_dependency",
+ parameters={
+ "dependency_id": "login_system->extract_data", # MUST provide
+ "from_task_id": "login_system",
+ "to_task_id": "extract_data",
+ "condition_description": "Wait for successful user authentication before accessing user data"
+ }
+ )
+])
+```
+
+---
+
+### remove_dependency
+
+Remove a dependency relationship without affecting the tasks themselves.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | Dependency ID (line_id) to remove |
+
+#### Returns
+
+`str` - JSON representation of constellation after removal
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::remove_dependency",
+ tool_name="remove_dependency",
+ parameters={"dependency_id": "login_system->extract_data"}
+ )
+])
+```
+
+---
+
+### update_dependency
+
+Update the condition description of an existing dependency.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `dependency_id` | `str` | ✅ Yes | Dependency to update |
+| `condition_description` | `str` | ✅ Yes | New condition description |
+
+#### Returns
+
+`str` - JSON representation of constellation after update
+
+#### Example
+
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::update_dependency",
+ tool_name="update_dependency",
+ parameters={
+ "dependency_id": "login_system->extract_data",
+ "condition_description": "Wait for successful authentication and database connection before data extraction"
+ }
+ )
+])
+```
+
+## Bulk Operations
+
+### build_constellation
+
+Build a complete constellation from configuration data (batch creation).
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `config` | `TaskConstellationSchema` | ✅ Yes | - | Complete constellation configuration |
+| `clear_existing` | `bool` | No | `True` | Clear existing tasks/dependencies before building |
+
+#### Configuration Schema
+
+```python
+{
+ "tasks": [
+ {
+ "task_id": "string (required)",
+ "name": "string (optional)",
+ "description": "string (required)",
+ "target_device_id": "string (optional)",
+ "priority": int (1-4, optional),
+ "status": "string (optional)",
+ "tips": ["string"] (optional)
+ }
+ ],
+ "dependencies": [
+ {
+ "from_task_id": "string (required)",
+ "to_task_id": "string (required)",
+ "dependency_type": "string (optional)",
+ "condition_description": "string (optional)"
+ }
+ ],
+ "metadata": dict (optional)
+}
+```
+
+#### Returns
+
+`str` - JSON representation of built constellation
+
+#### Example
+
+```python
+config = {
+ "tasks": [
+ {
+ "task_id": "open_browser",
+ "name": "Open Browser",
+ "description": "Launch Chrome and navigate to login page",
+ "target_device_id": "device_001"
+ },
+ {
+ "task_id": "login",
+ "name": "User Login",
+ "description": "Enter credentials and submit login form",
+ "target_device_id": "device_001"
+ },
+ {
+ "task_id": "extract_data",
+ "name": "Extract Data",
+ "description": "Navigate to data page and extract table",
+ "target_device_id": "device_002"
+ }
+ ],
+ "dependencies": [
+ {
+ "from_task_id": "open_browser",
+ "to_task_id": "login",
+ "condition_description": "Browser must be open before login"
+ },
+ {
+ "from_task_id": "login",
+ "to_task_id": "extract_data",
+ "condition_description": "User must be authenticated before data access"
+ }
+ ]
+}
+
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::build_constellation",
+ tool_name="build_constellation",
+ parameters={
+ "config": config,
+ "clear_existing": True
+ }
+ )
+])
+```
+
+## Configuration
+
+```yaml
+GalaxyAgent:
+ default:
+ action:
+ - namespace: ConstellationEditor
+ type: local
+```
+
+## Best Practices
+
+### 1. Use Descriptive Task IDs
+
+```python
+# ✅ Good: Clear task IDs
+"task_id": "extract_sales_data_from_excel"
+"task_id": "send_email_notification"
+"task_id": "process_user_input"
+
+# ❌ Bad: Unclear IDs
+"task_id": "task1"
+"task_id": "do_stuff"
+"task_id": "process"
+```
+
+### 2. Always Provide dependency_id
+
+```python
+# ✅ Good: Generate unique dependency_id
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::add_dependency",
+ parameters={
+ "dependency_id": f"{from_task}->{ to_task}", # Generate ID
+ "from_task_id": from_task,
+ "to_task_id": to_task
+ }
+ )
+])
+
+# ❌ Bad: Omit dependency_id
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::add_dependency",
+ parameters={
+ # Missing dependency_id - will fail!
+ "from_task_id": from_task,
+ "to_task_id": to_task
+ }
+ )
+])
+```
+
+### 3. Provide Detailed Descriptions
+
+```python
+# ✅ Good: Detailed description
+{
+ "description": "Open Chrome browser, navigate to https://example.com/login, wait for page to fully load, then take a screenshot and save it to C:\\screenshots\\login_page.png"
+}
+
+# ❌ Bad: Vague description
+{
+ "description": "Open browser"
+}
+```
+
+## Use Cases
+
+### Multi-Device Workflow
+
+```python
+# 1. Create tasks on different devices
+await computer.run_actions([
+ MCPToolCall(tool_key="action::add_task", parameters={
+ "task_id": "windows_extract",
+ "name": "Extract Data on Windows",
+ "description": "Extract Excel data",
+ "target_device_id": "device_windows_001"
+ })
+])
+
+await computer.run_actions([
+ MCPToolCall(tool_key="action::add_task", parameters={
+ "task_id": "linux_process",
+ "name": "Process Data on Linux",
+ "description": "Run Python analysis script",
+ "target_device_id": "device_linux_001"
+ })
+])
+
+# 2. Create dependency
+await computer.run_actions([
+ MCPToolCall(tool_key="action::add_dependency", parameters={
+ "dependency_id": "windows_extract->linux_process",
+ "from_task_id": "windows_extract",
+ "to_task_id": "linux_process",
+ "condition_description": "Data must be extracted before processing"
+ })
+])
+```
+
+## Related Documentation
+
+- [Action Servers](../action.md) - Action server concepts
+- [MCP Overview](../overview.md) - MCP architecture
+- [Configuration Guide](../configuration.md) - Constellation setup
+- [Local Servers](../local_servers.md) - Local server deployment
diff --git a/documents/docs/mcp/servers/excel_com_executor.md b/documents/docs/mcp/servers/excel_com_executor.md
new file mode 100644
index 000000000..214a8dcde
--- /dev/null
+++ b/documents/docs/mcp/servers/excel_com_executor.md
@@ -0,0 +1,373 @@
+# ExcelCOMExecutor Server
+
+## Overview
+
+**ExcelCOMExecutor** provides Microsoft Excel automation via COM API for efficient spreadsheet manipulation.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** AppAgent
+**Target Application:** Microsoft Excel (`EXCEL.EXE`)
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `ExcelCOMExecutor` |
+| **Platform** | Windows |
+| **Requires** | Microsoft Excel (COM interface) |
+| **Tool Type** | `action` |
+
+## Tools Summary
+
+| Tool Name | Description |
+|-----------|-------------|
+| `table2markdown` | Convert Excel sheet to Markdown table |
+| `insert_excel_table` | Insert table data into sheet |
+| `select_table_range` | Select cell range |
+| `save_as` | Save/export workbook |
+| `reorder_columns` | Reorder columns in sheet |
+| `get_range_values` | Get values from cell range |
+
+## Tool Details
+
+### table2markdown
+
+Convert an Excel sheet to Markdown format table.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `sheet_name` | `str` or `int` | ✅ Yes | Sheet name or index (1-based) |
+
+#### Returns
+
+`str` - Markdown-formatted table
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::table2markdown",
+ tool_name="table2markdown",
+ parameters={"sheet_name": "Sales Data"}
+ )
+])
+
+# Output:
+# | Product | Q1 | Q2 | Q3 | Q4 |
+# |---------|----|----|----|----|
+# | A | 100| 150| 120| 180|
+# | B | 200| 180| 210| 190|
+```
+
+---
+
+### insert_excel_table
+
+Insert a table (2D list) into an Excel sheet at a specified position.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `table` | `List[List[Any]]` | ✅ Yes | 2D list of values (strings/numbers) |
+| `sheet_name` | `str` | ✅ Yes | Target sheet name |
+| `start_row` | `int` | ✅ Yes | Start row (1-based) |
+| `start_col` | `int` | ✅ Yes | Start column (1-based) |
+
+#### Returns
+
+`str` - Success message
+
+#### Example
+
+```python
+# Define table data
+data = [
+ ["Name", "Age", "Gender"],
+ ["Alice", 30, "Female"],
+ ["Bob", 25, "Male"],
+ ["Charlie", 35, "Male"]
+]
+
+# Insert at A1
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::insert_excel_table",
+ tool_name="insert_excel_table",
+ parameters={
+ "table": data,
+ "sheet_name": "Sheet1",
+ "start_row": 1,
+ "start_col": 1
+ }
+ )
+])
+```
+
+---
+
+### select_table_range
+
+Select a range of cells in a sheet (faster than dragging).
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `sheet_name` | `str` | ✅ Yes | Sheet name |
+| `start_row` | `int` | ✅ Yes | Start row (1-based) |
+| `start_col` | `int` | ✅ Yes | Start column (1=A, 2=B, etc.) |
+| `end_row` | `int` | ✅ Yes | End row (`-1` = last row with content) |
+| `end_col` | `int` | ✅ Yes | End column (`-1` = last column with content) |
+
+#### Returns
+
+`str` - Selection confirmation message
+
+#### Example
+
+```python
+# Select A1:D10
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_table_range",
+ tool_name="select_table_range",
+ parameters={
+ "sheet_name": "Sheet1",
+ "start_row": 1,
+ "start_col": 1, # Column A
+ "end_row": 10,
+ "end_col": 4 # Column D
+ }
+ )
+])
+
+# Select all data (A1 to last used cell)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_table_range",
+ tool_name="select_table_range",
+ parameters={
+ "sheet_name": "Sheet1",
+ "start_row": 1,
+ "start_col": 1,
+ "end_row": -1, # Last row with data
+ "end_col": -1 # Last column with data
+ }
+ )
+])
+```
+
+---
+
+### save_as
+
+Save or export Excel workbook to specified format.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_dir` | `str` | No | `""` | Directory path |
+| `file_name` | `str` | No | `""` | Filename without extension |
+| `file_ext` | `str` | No | `""` | Extension (default: `.csv`) |
+
+#### Supported Extensions
+
+- `.csv` - CSV format (default)
+- `.xlsx` - Excel workbook
+- `.xls` - Excel 97-2003 format
+- `.txt` - Tab-delimited text
+- `.pdf` - PDF format
+
+#### Example
+
+```python
+# Save as CSV
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Data\\Exports",
+ "file_name": "sales_report",
+ "file_ext": ".csv"
+ }
+ )
+])
+```
+
+---
+
+### reorder_columns
+
+Reorder columns in a sheet based on desired column name order.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `sheet_name` | `str` | ✅ Yes | Sheet name |
+| `desired_order` | `List[str]` | ✅ Yes | List of column names in new order |
+
+#### Returns
+
+`str` - Success/failure message
+
+#### Example
+
+```python
+# Original columns: ["Name", "Age", "Email", "Phone"]
+# Reorder to: ["Name", "Phone", "Email", "Age"]
+
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::reorder_columns",
+ tool_name="reorder_columns",
+ parameters={
+ "sheet_name": "Contacts",
+ "desired_order": ["Name", "Phone", "Email", "Age"]
+ }
+ )
+])
+```
+
+---
+
+### get_range_values
+
+Get values from a specified cell range.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `sheet_name` | `str` | ✅ Yes | Sheet name |
+| `start_row` | `int` | ✅ Yes | Start row |
+| `start_col` | `int` | ✅ Yes | Start column |
+| `end_row` | `int` | ✅ Yes | End row |
+| `end_col` | `int` | ✅ Yes | End column |
+
+#### Returns
+
+`List[List[Any]]` - 2D list of cell values
+
+#### Example
+
+```python
+# Get A1:C3
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::get_range_values",
+ tool_name="get_range_values",
+ parameters={
+ "sheet_name": "Sheet1",
+ "start_row": 1,
+ "start_col": 1,
+ "end_row": 3,
+ "end_col": 3
+ }
+ )
+])
+
+# Output: [["A1", "B1", "C1"], ["A2", "B2", "C2"], ["A3", "B3", "C3"]]
+```
+
+## Configuration
+
+```yaml
+AppAgent:
+ EXCEL.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: ExcelCOMExecutor
+ type: local
+ reset: true # Recommended: prevent data leakage between workbooks
+```
+
+## Best Practices
+
+### 1. Use Column Numbers for select_table_range
+
+```python
+# Column mapping: A=1, B=2, C=3, D=4, ...
+# Select A1:D10
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_table_range",
+ parameters={
+ "sheet_name": "Sheet1",
+ "start_row": 1,
+ "start_col": 1, # A
+ "end_row": 10,
+ "end_col": 4 # D
+ }
+ )
+])
+```
+
+### 2. Insert Data Efficiently
+
+```python
+# ✅ Good: Insert entire table at once
+data = [["Header1", "Header2"], ["Val1", "Val2"]]
+await computer.run_actions([
+ MCPToolCall(tool_key="action::insert_excel_table", parameters={
+ "table": data, "sheet_name": "Sheet1", "start_row": 1, "start_col": 1
+ })
+])
+
+# ❌ Bad: Insert cell by cell
+for row in data:
+ for col in row:
+ # Multiple calls...
+```
+
+### 3. Save Frequently
+
+```python
+# After data insertion/manipulation
+await computer.run_actions([
+ MCPToolCall(tool_key="action::save_as", parameters={"file_ext": ".xlsx"})
+])
+```
+
+## Use Cases
+
+### Data Processing Workflow
+
+```python
+# 1. Get data
+data = await computer.run_actions([
+ MCPToolCall(tool_key="action::get_range_values", parameters={
+ "sheet_name": "Raw Data", "start_row": 1, "start_col": 1,
+ "end_row": -1, "end_col": -1
+ })
+])
+
+# 2. Process data (Python)
+processed = process_data(data[0].data)
+
+# 3. Insert into new sheet
+await computer.run_actions([
+ MCPToolCall(tool_key="action::insert_excel_table", parameters={
+ "table": processed, "sheet_name": "Processed", "start_row": 1, "start_col": 1
+ })
+])
+
+# 4. Export as CSV
+await computer.run_actions([
+ MCPToolCall(tool_key="action::save_as", parameters={"file_ext": ".csv"})
+])
+```
+
+## Related Documentation
+
+- [WordCOMExecutor](./word_com_executor.md) - Word COM automation
+- [PowerPointCOMExecutor](./ppt_com_executor.md) - PowerPoint COM automation
diff --git a/documents/docs/mcp/servers/hardware_executor.md b/documents/docs/mcp/servers/hardware_executor.md
new file mode 100644
index 000000000..39a2acf53
--- /dev/null
+++ b/documents/docs/mcp/servers/hardware_executor.md
@@ -0,0 +1,506 @@
+# HardwareExecutor Server
+
+## Overview
+
+**HardwareExecutor** provides hardware control capabilities including Arduino HID, BB-8 test fixture, robot arm, mouse control, and screenshot capture.
+
+**Server Type:** Action
+**Deployment:** HTTP (remote server)
+**Default Port:** 8006
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `HardwareExecutor` |
+| **Server Name** | `Echo Base MCP Server` |
+| **Platform** | Cross-platform (requires hardware) |
+| **Tool Type** | `action` |
+| **Deployment** | HTTP server (stateless) |
+
+## Tool Categories
+
+### 1. Arduino HID Tools (Keyboard/Mouse Emulation)
+### 2. Mouse Control Tools
+### 3. BB-8 Test Fixture Tools
+### 4. Robot Arm Tools
+### 5. Screenshot Tool
+
+## Arduino HID Tools
+
+### arduino_hid_status
+
+Get Arduino HID device status.
+
+**Returns**: `Dict[str, Any]` with `connected`, `status`, `device`
+
+---
+
+### arduino_hid_connect
+
+Connect to Arduino HID device.
+
+**Returns**: `Dict[str, Any]` with success message
+
+---
+
+### arduino_hid_disconnect
+
+Disconnect from Arduino HID device.
+
+**Returns**: `Dict[str, Any]` with success message
+
+---
+
+### type_text
+
+Type a string of text via Arduino HID.
+
+**Parameters**:
+- `text` (`str`): Text to type
+
+**Returns**: Success message
+
+**Example**:
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::type_text",
+ tool_name="type_text",
+ parameters={"text": "Hello, World!"}
+ )
+])
+```
+
+---
+
+### press_key_sequence
+
+Press a sequence of keys.
+
+**Parameters**:
+- `keys` (`List[str]`): List of key names
+- `interval` (`float`): Interval between key presses (default: 0.1)
+
+**Example**:
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::press_key_sequence",
+ tool_name="press_key_sequence",
+ parameters={
+ "keys": ["a", "b", "c"],
+ "interval": 0.2
+ }
+ )
+])
+```
+
+---
+
+### press_hotkey
+
+Press multiple keys simultaneously (hotkey combination).
+
+**Parameters**:
+- `keys` (`List[str]`): List of keys to press together
+
+**Example**:
+```python
+# Ctrl+C
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::press_hotkey",
+ tool_name="press_hotkey",
+ parameters={"keys": ["ctrl", "c"]}
+ )
+])
+```
+
+## Mouse Control Tools
+
+### move_mouse
+
+Move the mouse pointer.
+
+**Parameters**:
+- `x` (`int`): X coordinate
+- `y` (`int`): Y coordinate
+- `absolute` (`bool`): Absolute (True) or relative (False) positioning (default: False)
+
+---
+
+### click_mouse
+
+Click mouse button.
+
+**Parameters**:
+- `button` (`str`): `"left"`, `"right"`, or `"middle"` (default: `"left"`)
+- `count` (`int`): Number of clicks (default: 1)
+- `interval` (`float`): Interval between clicks (default: 0.1)
+
+---
+
+### press_mouse_button
+
+Press and hold mouse button.
+
+**Parameters**:
+- `button` (`str`): Mouse button (default: `"left"`)
+
+---
+
+### release_mouse_button
+
+Release mouse button.
+
+**Parameters**:
+- `button` (`str`): Mouse button (default: `"left"`)
+
+---
+
+### scroll_mouse
+
+Scroll mouse wheel.
+
+**Parameters**:
+- `vertical` (`int`): Vertical scroll amount (default: 0)
+- `horizontal` (`int`): Horizontal scroll amount (default: 0)
+
+**Example**:
+```python
+# Scroll down
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::scroll_mouse",
+ tool_name="scroll_mouse",
+ parameters={"vertical": -5, "horizontal": 0}
+ )
+])
+```
+
+---
+
+### drag_mouse
+
+Drag mouse from start to end position.
+
+**Parameters**:
+- `start` (`Tuple[int, int]`): Start (x, y) coordinates
+- `end` (`Tuple[int, int]`): End (x, y) coordinates
+- `button` (`str`): Mouse button (default: `"left"`)
+- `duration` (`float`): Drag duration in seconds (default: 0.5)
+
+**Example**:
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::drag_mouse",
+ tool_name="drag_mouse",
+ parameters={
+ "start": [100, 100],
+ "end": [300, 300],
+ "duration": 1.0
+ }
+ )
+])
+```
+
+---
+
+### double_click_mouse
+
+Perform double-click.
+
+**Parameters**:
+- `button` (`str`): Mouse button (default: `"left"`)
+
+---
+
+### right_click_mouse
+
+Shortcut for right-click.
+
+---
+
+### middle_click_mouse
+
+Shortcut for middle-click.
+
+## BB-8 Test Fixture Tools
+
+Test fixture for Surface device testing.
+
+### bb8_status
+
+Get BB-8 test fixture status.
+
+---
+
+### bb8_connect / bb8_disconnect
+
+Connect/disconnect to BB-8.
+
+---
+
+### bb8_usb_port_plug / bb8_usb_port_unplug
+
+Plug/unplug USB device.
+
+**Parameters**:
+- `port_name` (`str`): USB port name
+
+---
+
+### bb8_psu_charger_plug / bb8_psu_charger_unplug
+
+Plug/unplug PSU charger.
+
+---
+
+### bb8_blade_attach / bb8_blade_detach
+
+Attach/detach blade.
+
+---
+
+### bb8_lid_open / bb8_lid_close
+
+Open/close lid.
+
+---
+
+### bb8_button_press
+
+Press a physical button.
+
+**Parameters**:
+- `button_name` (`str`): Button name
+
+---
+
+### bb8_button_long_press
+
+Long press a button.
+
+**Parameters**:
+- `button_name` (`str`): Button name
+
+## Robot Arm Tools
+
+Physical robot arm for touchscreen interaction.
+
+### robot_arm_status
+
+Get robot arm status (position, connection).
+
+---
+
+### robot_arm_connect / robot_arm_disconnect
+
+Connect/disconnect robot arm.
+
+---
+
+### touch_screen
+
+Simulate touch at specific screen location.
+
+**Parameters**:
+- `location` (`Tuple[int, int]`): (x, y) coordinates
+
+**Example**:
+```python
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::touch_screen",
+ tool_name="touch_screen",
+ parameters={"location": [500, 300]}
+ )
+])
+```
+
+---
+
+### draw_on_screen
+
+Draw on screen by following coordinate path.
+
+**Parameters**:
+- `path` (`List[Tuple[int, int]]`): List of (x, y) coordinates
+
+---
+
+### tap_screen
+
+Simulate tap(s) on screen.
+
+**Parameters**:
+- `location` (`Tuple[int, int]`): Tap location
+- `count` (`int`): Number of taps (default: 1)
+- `interval` (`float`): Interval between taps (default: 0.1)
+
+---
+
+### swipe_screen
+
+Simulate swipe gesture.
+
+**Parameters**:
+- `start_location` (`Tuple[int, int]`): Start position
+- `end_location` (`Tuple[int, int]`): End position
+- `duration` (`float`): Swipe duration (default: 0.5)
+
+---
+
+### long_press_screen
+
+Simulate long press.
+
+**Parameters**:
+- `location` (`Tuple[int, int]`): Press location
+- `duration` (`float`): Press duration (default: 1.0)
+
+---
+
+### double_tap_screen
+
+Simulate double tap.
+
+**Parameters**:
+- `location` (`Tuple[int, int]`): Tap location
+
+---
+
+### press_key
+
+Simulate keyboard key press via robot arm.
+
+**Parameters**:
+- `key` (`str`): Key to press
+- `modifiers` (`List[str]`): Modifier keys (e.g., `["ctrl", "shift"]`)
+- `duration` (`float`): Press duration (default: 0.1)
+
+---
+
+### tap_trackpad / swipe_trackpad
+
+Simulate trackpad interactions.
+
+## Screenshot Tool
+
+### take_screenshot
+
+Capture a screenshot.
+
+**Returns**: `str` - Base64-encoded image data
+
+**Example**:
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::take_screenshot",
+ tool_name="take_screenshot",
+ parameters={}
+ )
+])
+# result[0].data = "iVBORw0KGgoAAAANSUhEUgAA..."
+```
+
+## Configuration
+
+```yaml
+# Client configuration (Windows agent)
+HostAgent:
+ default:
+ action:
+ - namespace: HardwareExecutor
+ type: http
+ host: "192.168.1.100" # Hardware server IP
+ port: 8006
+ path: "/mcp"
+```
+
+## Deployment
+
+### Starting the Server
+
+```bash
+# Start hardware MCP server
+python -m ufo.client.mcp.http_servers.hardware_mcp_server --host 0.0.0.0 --port 8006
+
+# Output:
+# ==================================================
+# UFO Hardware MCP Server
+# Hardware automation via Model Context Protocol
+# Running on 0.0.0.0:8006
+# ==================================================
+```
+
+### Configuration
+
+**Default Values**:
+- Host: `localhost`
+- Port: `8006`
+- Path: `/mcp`
+
+## Best Practices
+
+### 1. Network Configuration
+
+```yaml
+# Use IP address for remote hardware
+action:
+ - namespace: HardwareExecutor
+ type: http
+ host: "192.168.1.100" # Hardware server
+ port: 8006
+```
+
+### 2. Error Handling
+
+All tools return dict with `success` key:
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(tool_key="action::touch_screen", parameters={"location": [100, 100]})
+])
+
+if not result[0].data.get("success"):
+ logger.error(f"Touch failed: {result[0].data.get('error')}")
+```
+
+### 3. Physical Hardware Requirements
+
+- Arduino HID: Requires Arduino board with HID firmware
+- BB-8: Microsoft Surface test fixture
+- Robot Arm: Physical robot arm setup
+- Network: Stable network connection for HTTP communication
+
+## Use Cases
+
+### Automated Testing
+
+```python
+# 1. Connect to hardware
+await computer.run_actions([
+ MCPToolCall(tool_key="action::robot_arm_connect", parameters={})
+])
+
+# 2. Touch screen at login button
+await computer.run_actions([
+ MCPToolCall(tool_key="action::touch_screen", parameters={"location": [500, 700]})
+])
+
+# 3. Take screenshot to verify
+screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="action::take_screenshot", parameters={})
+])
+```
+
+## Related Documentation
+
+- [BashExecutor](./bash_executor.md) - Linux command execution
+- [Remote Servers](../remote_servers.md) - HTTP deployment guide
+- [Action Servers](../action.md) - Action server concepts
diff --git a/documents/docs/mcp/servers/host_ui_executor.md b/documents/docs/mcp/servers/host_ui_executor.md
new file mode 100644
index 000000000..9db8c1b74
--- /dev/null
+++ b/documents/docs/mcp/servers/host_ui_executor.md
@@ -0,0 +1,417 @@
+# HostUIExecutor Server
+
+## Overview
+
+**HostUIExecutor** is an action server that provides system-level UI automation capabilities for the HostAgent. It enables window management, window switching, and cross-application interactions at the desktop level.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** HostAgent
+**LLM-Selectable:** ✅ Yes (LLM chooses when to execute)
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `HostUIExecutor` |
+| **Server Name** | `UFO UI HostAgent Action MCP Server` |
+| **Platform** | Windows |
+| **Backend** | UIAutomation (UIA) or Win32 |
+| **Tool Type** | `action` |
+| **Tool Key Format** | `action::{tool_name}` |
+
+## Tools
+
+### select_application_window
+
+Select an application window for UI automation and set it as the active window.
+
+#### Description
+
+This is the primary tool for window selection in HostAgent workflows. It:
+1. Finds the specified window by ID and name
+2. Sets focus on the window
+3. Optionally maximizes the window
+4. Optionally draws a visual outline (for debugging)
+5. Initializes UI state for subsequent AppAgent operations
+
+!!!warning "Prerequisites"
+ You must call `get_desktop_app_info` (UICollector) first to obtain valid window IDs and names.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `id` | `str` | ✅ Yes | The precise annotated ID of the application window to select. Must match an ID from `get_desktop_app_info` |
+| `name` | `str` | ✅ Yes | The precise name of the application window. Must match the name of the selected ID |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "root_name": str, # Application root name (e.g., "WINWORD.EXE")
+ "window_info": dict # WindowInfo object with window details
+}
+```
+
+#### WindowInfo Structure
+
+```python
+{
+ "annotation_id": str, # Window identifier
+ "name": str, # Window element name
+ "title": str, # Window title text
+ "handle": int, # Window handle (HWND)
+ "class_name": str, # Window class name
+ "process_id": int, # Process ID
+ "is_visible": bool, # Visibility status
+ "is_minimized": bool, # Minimized state
+ "is_maximized": bool, # Maximized state
+ "is_active": bool, # Active window status
+ "rectangle": { # Window bounding rectangle
+ "x": int,
+ "y": int,
+ "width": int,
+ "height": int
+ },
+ "text_content": str, # Window text
+ "control_type": str # Control type (usually "Window")
+}
+```
+
+#### Example
+
+```python
+# Step 1: Get available windows
+windows = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_desktop_app_info",
+ tool_name="get_desktop_app_info",
+ parameters={"remove_empty": True}
+ )
+])
+
+# windows[0].data = [
+# {"id": "1", "name": "Calculator", "type": "Window", "kind": "window"},
+# {"id": "2", "name": "Notepad", "type": "Window", "kind": "window"}
+# ]
+
+# Step 2: Select Calculator window
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ tool_name="select_application_window",
+ parameters={
+ "id": "1",
+ "name": "Calculator"
+ }
+ )
+])
+
+# Result:
+{
+ "root_name": "ApplicationFrameHost.exe",
+ "window_info": {
+ "annotation_id": "1",
+ "title": "Calculator",
+ "handle": 12345678,
+ "class_name": "ApplicationFrameWindow",
+ "process_id": 9876,
+ "is_visible": True,
+ "is_minimized": False,
+ "is_maximized": False,
+ "is_active": True,
+ "rectangle": {"x": 100, "y": 100, "width": 400, "height": 600}
+ }
+}
+```
+
+#### Error Handling
+
+The tool raises `ToolError` in the following cases:
+
+```python
+# Error 1: Missing ID
+ToolError("Window id is required for select_application_window")
+
+# Error 2: No windows available
+ToolError("No application windows available. Please call get_desktop_app_info first.")
+
+# Error 3: Invalid ID
+ToolError("Control with id '99' not found. Available control ids: ['1', '2', '3']")
+
+# Error 4: Failed to set focus
+ToolError("Failed to set focus on window: {error_details}")
+```
+
+#### Configuration Behavior
+
+The tool respects these configuration settings:
+
+**MAXIMIZE_WINDOW** (default: `False`)
+```yaml
+# config.yaml
+MAXIMIZE_WINDOW: true # Window is maximized after selection
+```
+
+**SHOW_VISUAL_OUTLINE_ON_SCREEN** (default: `True`)
+```yaml
+# config.yaml
+SHOW_VISUAL_OUTLINE_ON_SCREEN: true # Red outline drawn around window
+```
+
+#### Side Effects
+
+!!!warning "Side Effects"
+ - ✅ **Changes focus**: Brings target window to foreground
+ - ✅ **May maximize**: If `MAXIMIZE_WINDOW` is enabled
+ - ✅ **Visual feedback**: Red outline if `SHOW_VISUAL_OUTLINE_ON_SCREEN` is enabled
+ - ✅ **State initialization**: Sets up AppPuppeteer for the window
+
+#### Internal State Changes
+
+After `select_application_window` executes:
+1. `ui_state.selected_app_window` is set to the window object
+2. `ui_state.puppeteer` is initialized with `AppPuppeteer`
+3. Available commands are logged for debugging
+4. Subsequent UICollector and AppUIExecutor tools can operate on this window
+
+## Configuration
+
+### Basic Configuration
+
+```yaml
+HostAgent:
+ default:
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ reset: false
+```
+
+### Configuration Options
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `namespace` | `str` | Must be `"HostUIExecutor"` |
+| `type` | `str` | Deployment type: `"local"` |
+| `reset` | `bool` | Whether to reset server state between tasks (usually `false` for HostUIExecutor) |
+
+## Usage Patterns
+
+### Pattern 1: Basic Window Selection
+
+```python
+# 1. Discover windows
+windows = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
+])
+
+# 2. Select target window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Calculator"}
+ )
+])
+
+# 3. Now AppAgent can interact with the window
+controls = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_app_window_controls_info", ...)
+])
+```
+
+### Pattern 2: Multi-Window Workflow
+
+```python
+# Work with first window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Word"}
+ )
+])
+# ... perform actions on Word ...
+
+# Switch to second window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "2", "name": "Excel"}
+ )
+])
+# ... perform actions on Excel ...
+```
+
+### Pattern 3: Verify Before Selection
+
+```python
+# Get windows
+windows = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
+])
+
+# Verify target window exists
+target_windows = [w for w in windows[0].data if "Calculator" in w["name"]]
+
+if not target_windows:
+ logger.error("Calculator not found")
+else:
+ # Select window
+ await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={
+ "id": target_windows[0]["id"],
+ "name": target_windows[0]["name"]
+ }
+ )
+ ])
+```
+
+## Best Practices
+
+### 1. Always Validate ID and Name
+
+```python
+# ✅ Good: Use exact ID and name from get_desktop_app_info
+windows = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
+])
+
+window = windows[0].data[0] # First window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={
+ "id": window["id"], # Exact ID from response
+ "name": window["name"] # Exact name from response
+ }
+ )
+])
+
+# ❌ Bad: Hardcode or guess IDs
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Some Window"} # May not exist
+ )
+])
+```
+
+### 2. Handle Selection Failures
+
+```python
+try:
+ result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": window_id, "name": window_name}
+ )
+ ])
+
+ if result[0].is_error:
+ logger.error(f"Failed to select window: {result[0].content}")
+ # Retry or select alternative window
+ else:
+ logger.info(f"Selected window: {result[0].data['root_name']}")
+
+except Exception as e:
+ logger.error(f"Window selection exception: {e}")
+```
+
+### 3. Wait After Selection
+
+```python
+# Select window
+await computer.run_actions([
+ MCPToolCall(tool_key="action::select_application_window", ...)
+])
+
+# Wait for window to become active
+await asyncio.sleep(0.5)
+
+# Now interact with window
+await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::capture_window_screenshot", ...)
+])
+```
+
+### 4. Use Visual Outline for Debugging
+
+```yaml
+# config.yaml - Enable during development
+SHOW_VISUAL_OUTLINE_ON_SCREEN: true # See red outline on selected window
+
+# config.yaml - Disable in production
+SHOW_VISUAL_OUTLINE_ON_SCREEN: false
+```
+
+## Integration with AppAgent
+
+After `select_application_window` succeeds, the window becomes the target for **AppAgent** operations:
+
+```python
+# HostAgent: Select window
+host_result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Calculator"}
+ )
+])
+
+# AppAgent: Get controls in selected window
+app_controls = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_app_window_controls_info", ...)
+])
+
+# AppAgent: Click a button in selected window
+app_click = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_input",
+ tool_name="click_input",
+ parameters={"id": "5", "name": "Seven", "button": "left"}
+ )
+])
+```
+
+## Troubleshooting
+
+### Window Not Found
+
+**Problem**: `ToolError("Control with id 'X' not found")`
+
+**Solutions**:
+1. Call `get_desktop_app_info` with `refresh_app_windows=True`
+2. Verify window is not minimized or hidden
+3. Check window still exists (hasn't been closed)
+
+### Focus Failed
+
+**Problem**: `ToolError("Failed to set focus on window")`
+
+**Solutions**:
+1. Check window is not disabled or unresponsive
+2. Verify window process is running
+3. Ensure no modal dialogs are blocking focus
+4. Try again after a short delay
+
+### Wrong Window Selected
+
+**Problem**: Selected wrong window with similar name
+
+**Solutions**:
+1. Use more specific name matching
+2. Check `process_id` or `class_name` in window info
+3. Filter windows by additional criteria before selection
+
+## Related Documentation
+
+
+- [UICollector](./ui_collector.md) - Window discovery server
+- [AppUIExecutor](./app_ui_executor.md) - Window interaction server
+- [Action Servers](../action.md) - Action server concepts
+- [HostAgent](../../ufo2/host_agent/overview.md) - HostAgent architecture
+
diff --git a/documents/docs/mcp/servers/mobile_executor.md b/documents/docs/mcp/servers/mobile_executor.md
new file mode 100644
index 000000000..d02d7f6fc
--- /dev/null
+++ b/documents/docs/mcp/servers/mobile_executor.md
@@ -0,0 +1,1418 @@
+# MobileExecutor Server
+
+## Overview
+
+**MobileExecutor** provides Android mobile device automation via ADB (Android Debug Bridge). It runs as **two separate HTTP servers** that share state for coordinated operations:
+
+- **Mobile Data Collection Server** (port 8020): Screenshots, UI tree, device info, app list, controls
+- **Mobile Action Server** (port 8021): Tap, swipe, type, launch apps, press keys
+
+**Server Type:** Action + Data Collection
+**Deployment:** HTTP (remote server, runs on machine with ADB)
+**Default Ports:** 8020 (data), 8021 (action)
+**LLM-Selectable:** ✅ Yes (action tools only)
+**Platform:** Android devices via ADB
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `MobileDataCollector` (data), `MobileExecutor` (action) |
+| **Server Names** | `Mobile Data Collection MCP Server`, `Mobile Action MCP Server` |
+| **Platform** | Android (via ADB) |
+| **Tool Types** | `data_collection`, `action` |
+| **Deployment** | HTTP server (stateless with shared cache) |
+| **Architecture** | Dual-server with singleton state manager |
+
+## Architecture
+
+### Dual-Server Design
+
+The mobile MCP server uses a **dual-server architecture** similar to `linux_mcp_server.py`:
+
+```mermaid
+graph TB
+ Agent["Windows UFO² Agent"]
+
+ subgraph Process["Mobile MCP Servers (Same Process)"]
+ State["MobileServerState (Singleton Cache) • Apps cache • Controls cache • UI tree cache • Device info cache"]
+
+ DataServer["Data Collection Server Port 8020 • Screenshots • UI tree • Device info • App list • Controls"]
+
+ ActionServer["Action Server Port 8021 • Tap/Swipe • Type text • Launch app • Click control"]
+
+ State -.->|Shared Cache| DataServer
+ State -.->|Shared Cache| ActionServer
+ end
+
+ Device["Android Device (via ADB)"]
+
+ Agent -->|HTTP| DataServer
+ Agent -->|HTTP| ActionServer
+ DataServer -->|ADB Commands| Device
+ ActionServer -->|ADB Commands| Device
+
+ style Agent fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
+ style Process fill:#fafafa,stroke:#424242,stroke-width:2px
+ style State fill:#fff3e0,stroke:#f57c00,stroke-width:2px
+ style DataServer fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
+ style ActionServer fill:#fce4ec,stroke:#c2185b,stroke-width:2px
+ style Device fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
+```
+
+**Shared State Benefits:**
+
+- **Cache Coordination**: Action server can access controls cached by data server
+- **Performance**: Avoid duplicate ADB queries (UI tree, app list, etc.)
+- **State Consistency**: Both servers see same device state
+- **Resource Efficiency**: Single process, shared memory
+
+### State Management
+
+**MobileServerState** is a singleton that caches:
+
+| Cache | Duration | Purpose |
+|-------|----------|---------|
+| **Installed Apps** | 5 minutes | Package list for `get_mobile_app_target_info` |
+| **UI Controls** | 5 seconds | Control list for `get_app_window_controls_target_info` |
+| **UI Tree XML** | 5 seconds | Raw XML for `get_ui_tree` |
+| **Device Info** | 1 minute | Hardware specs for `get_device_info` |
+
+**Cache Invalidation:**
+
+- Automatically invalidated after interactions (tap, swipe, type)
+- Manually invalidated via `invalidate_cache` tool
+- Expired caches refreshed on next query
+
+## Data Collection Tools
+
+Data collection tools are automatically invoked by the framework, not selectable by LLM.
+
+### capture_screenshot
+
+Capture screenshot from Android device.
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `str`
+
+Base64-encoded image data URI directly (format: `data:image/png;base64,...`)
+
+#### Example
+
+```python
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::capture_screenshot",
+ tool_name="capture_screenshot",
+ parameters={}
+ )
+])
+
+# result[0].data = "data:image/png;base64,iVBORw0KGgo..."
+```
+
+#### Implementation Details
+
+1. Captures screenshot on device (`screencap -p /sdcard/screen_temp.png`)
+2. Pulls image from device via ADB (`adb pull`)
+3. Encodes as base64
+4. Cleans up temporary files
+5. Returns data URI directly (matches `ui_mcp_server` format)
+
+---
+
+### get_ui_tree
+
+Get the UI hierarchy tree in XML format.
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "ui_tree": str, # XML content
+ "format": "xml",
+ # OR
+ "error": str # Error message if failed
+}
+```
+
+#### Example
+
+```python
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_ui_tree",
+ tool_name="get_ui_tree",
+ parameters={}
+ )
+])
+
+# Parse XML to find elements
+import xml.etree.ElementTree as ET
+tree = ET.fromstring(result[0].data["ui_tree"])
+```
+
+#### Cache Behavior
+
+- Cached for 5 seconds
+- Automatically invalidated after interactions
+- Shared with `get_app_window_controls_target_info`
+
+---
+
+### get_device_info
+
+Get comprehensive Android device information.
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "device_info": {
+ "model": str, # Device model
+ "android_version": str, # Android version (e.g., "13")
+ "sdk_version": str, # SDK version (e.g., "33")
+ "screen_size": str, # Screen resolution (e.g., "Physical size: 1080x2400")
+ "screen_density": str, # Screen density (e.g., "Physical density: 440")
+ "battery_level": str, # Battery percentage
+ "battery_status": str # Charging status
+ },
+ "from_cache": bool, # True if returned from cache
+ # OR
+ "error": str # Error message if failed
+}
+```
+
+#### Example
+
+```python
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_device_info",
+ tool_name="get_device_info",
+ parameters={}
+ )
+])
+
+device = result[0].data["device_info"]
+print(f"Device: {device['model']}")
+print(f"Android: {device['android_version']}")
+print(f"Battery: {device['battery_level']}%")
+```
+
+#### Cache Behavior
+
+- Cached for 1 minute
+- Returns `from_cache: true` when using cached data
+
+---
+
+### get_mobile_app_target_info
+
+Get information about installed application packages as `TargetInfo` list.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `filter` | `str` | No | `""` | Filter pattern for package names (e.g., `"com.android"`) |
+| `include_system_apps` | `bool` | No | `False` | Whether to include system apps (default: only user apps) |
+| `force_refresh` | `bool` | No | `False` | Force refresh from device, ignoring cache |
+
+#### Returns
+
+**Type**: `List[TargetInfo]`
+
+```python
+[
+ TargetInfo(
+ kind=TargetKind.THIRD_PARTY_AGENT,
+ id="1", # Sequential ID
+ name="com.example.app", # Package name (displayed)
+ type="com.example.app" # Package name (stored)
+ ),
+ ...
+]
+```
+
+#### Example
+
+```python
+# Get all user-installed apps
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_mobile_app_target_info",
+ tool_name="get_mobile_app_target_info",
+ parameters={"include_system_apps": False}
+ )
+])
+
+apps = result[0].data
+for app in apps:
+ print(f"ID: {app.id}, Package: {app.name}")
+
+# Filter by package name
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_mobile_app_target_info",
+ tool_name="get_mobile_app_target_info",
+ parameters={"filter": "com.android", "include_system_apps": True}
+ )
+])
+```
+
+#### Cache Behavior
+
+- Cached for 5 minutes (only when no filter and `include_system_apps=False`)
+- Use `force_refresh=True` to bypass cache
+
+---
+
+### get_app_window_controls_target_info
+
+Get UI controls information as `TargetInfo` list.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `force_refresh` | `bool` | No | `False` | Force refresh from device, ignoring cache |
+
+#### Returns
+
+**Type**: `List[TargetInfo]`
+
+```python
+[
+ TargetInfo(
+ kind=TargetKind.CONTROL,
+ id="1", # Sequential ID
+ name="Button Name", # Control text or content-desc
+ type="Button", # Control class (short name)
+ rect=[x1, y1, x2, y2] # Bounding box [left, top, right, bottom]
+ ),
+ ...
+]
+```
+
+#### Example
+
+```python
+result = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ tool_name="get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+controls = result[0].data
+for ctrl in controls:
+ print(f"ID: {ctrl.id}, Name: {ctrl.name}, Type: {ctrl.type}")
+ print(f" Rect: {ctrl.rect}")
+```
+
+#### Control Selection Criteria
+
+Only **meaningful controls** are included:
+
+- Clickable controls (`clickable="true"`)
+- Long-clickable controls (`long-clickable="true"`)
+- Checkable controls (`checkable="true"`)
+- Scrollable controls (`scrollable="true"`)
+- Controls with text or content-desc
+- EditText and Button controls
+
+**Rect format**: `[left, top, right, bottom]` in pixels (matches `ui_mcp_server.py` bbox format)
+
+#### Cache Behavior
+
+- Cached for 5 seconds
+- Automatically invalidated after interactions (tap, swipe, type)
+- Shared with action server for `click_control` and `type_text`
+
+---
+
+## Action Tools
+
+Action tools are LLM-selectable, state-modifying operations.
+
+### tap
+
+Tap/click at specified coordinates on the screen.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `x` | `int` | ✅ Yes | X coordinate in pixels (from left) |
+| `y` | `int` | ✅ Yes | Y coordinate in pixels (from top) |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # "tap(x, y)"
+ "output": str, # Command output
+ "error": str # Error message if failed
+}
+```
+
+#### Example
+
+```python
+# Tap at specific coordinates
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::tap",
+ tool_name="tap",
+ parameters={"x": 500, "y": 1200}
+ )
+])
+```
+
+#### Side Effects
+
+- Invalidates controls cache (UI likely changed)
+
+---
+
+### swipe
+
+Perform swipe gesture from start to end coordinates.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `start_x` | `int` | ✅ Yes | - | Starting X coordinate |
+| `start_y` | `int` | ✅ Yes | - | Starting Y coordinate |
+| `end_x` | `int` | ✅ Yes | - | Ending X coordinate |
+| `end_y` | `int` | ✅ Yes | - | Ending Y coordinate |
+| `duration` | `int` | No | `300` | Duration in milliseconds |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # "swipe(x1,y1)->(x2,y2) in Nms"
+ "output": str,
+ "error": str
+}
+```
+
+#### Example
+
+```python
+# Swipe up (scroll down content)
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::swipe",
+ tool_name="swipe",
+ parameters={
+ "start_x": 500,
+ "start_y": 1500,
+ "end_x": 500,
+ "end_y": 500,
+ "duration": 300
+ }
+ )
+])
+
+# Swipe left (next page)
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::swipe",
+ tool_name="swipe",
+ parameters={
+ "start_x": 800,
+ "start_y": 1000,
+ "end_x": 200,
+ "end_y": 1000,
+ "duration": 200
+ }
+ )
+])
+```
+
+#### Side Effects
+
+- Invalidates controls cache (UI changed)
+
+---
+
+### type_text
+
+Type text into a specific input field control.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `text` | `str` | ✅ Yes | - | Text to input (spaces/special chars auto-escaped) |
+| `control_id` | `str` | ✅ Yes | - | Precise annotated ID from `get_app_window_controls_target_info` |
+| `control_name` | `str` | ✅ Yes | - | Precise name of control (must match `control_id`) |
+| `clear_current_text` | `bool` | No | `False` | Clear existing text before typing |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # Full action description
+ "message": str, # Step-by-step messages
+ "control_info": {
+ "id": str,
+ "name": str,
+ "type": str
+ },
+ # OR
+ "error": str # Error message
+}
+```
+
+#### Example
+
+```python
+# 1. Get controls first
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ tool_name="get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+# 2. Find search input field
+search_field = next(c for c in controls[0].data if "Search" in c.name)
+
+# 3. Type text
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::type_text",
+ tool_name="type_text",
+ parameters={
+ "text": "hello world",
+ "control_id": search_field.id,
+ "control_name": search_field.name,
+ "clear_current_text": True
+ }
+ )
+])
+```
+
+#### Workflow
+
+1. Verifies control exists in cache (requires prior `get_app_window_controls_target_info` call)
+2. Clicks control to focus it
+3. Optionally clears existing text (deletes up to 50 characters)
+4. Types text (spaces replaced with `%s`, `&` escaped)
+5. Invalidates controls cache
+
+#### Side Effects
+
+- Clicks the control (may trigger navigation)
+- Modifies input field content
+- Invalidates controls cache
+
+---
+
+### launch_app
+
+Launch an application by package name or app ID.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `package_name` | `str` | ✅ Yes | - | Package name (e.g., `"com.android.settings"`) or app name |
+| `id` | `str` | No | `None` | Optional: Precise annotated ID from `get_mobile_app_target_info` |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "message": str,
+ "package_name": str, # Actual package launched
+ "output": str, # ADB monkey output
+ "error": str,
+ "warning": str, # Optional: name resolution warning
+ "app_info": { # Optional: if id provided
+ "id": str,
+ "name": str,
+ "package": str
+ }
+}
+```
+
+#### Example
+
+```python
+# Launch by package name
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::launch_app",
+ tool_name="launch_app",
+ parameters={"package_name": "com.android.settings"}
+ )
+])
+
+# Launch by app ID (from cache)
+apps = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_mobile_app_target_info",
+ tool_name="get_mobile_app_target_info",
+ parameters={}
+ )
+])
+
+settings_app = next(a for a in apps[0].data if "settings" in a.name.lower())
+
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::launch_app",
+ tool_name="launch_app",
+ parameters={
+ "package_name": settings_app.type, # Package from cache
+ "id": settings_app.id
+ }
+ )
+])
+
+# Launch by app name (auto-resolves package)
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::launch_app",
+ tool_name="launch_app",
+ parameters={"package_name": "Settings"} # Resolves to com.android.settings
+ )
+])
+```
+
+#### Name Resolution
+
+If `package_name` doesn't contain `.` (not a package format):
+
+1. Searches installed packages for matching display name
+2. Returns resolved package with warning
+3. Fails if no match found
+
+#### Implementation
+
+Uses `adb shell monkey -p -c android.intent.category.LAUNCHER 1`
+
+---
+
+### press_key
+
+Press a hardware or software key.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `key_code` | `str` | ✅ Yes | Key code (e.g., `"KEYCODE_HOME"`, `"KEYCODE_BACK"`) |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # "press_key(KEYCODE_X)"
+ "output": str,
+ "error": str
+}
+```
+
+#### Example
+
+```python
+# Press back button
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::press_key",
+ tool_name="press_key",
+ parameters={"key_code": "KEYCODE_BACK"}
+ )
+])
+
+# Press home button
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::press_key",
+ tool_name="press_key",
+ parameters={"key_code": "KEYCODE_HOME"}
+ )
+])
+
+# Press enter
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::press_key",
+ tool_name="press_key",
+ parameters={"key_code": "KEYCODE_ENTER"}
+ )
+])
+```
+
+#### Common Key Codes
+
+| Key Code | Description |
+|----------|-------------|
+| `KEYCODE_HOME` | Home button |
+| `KEYCODE_BACK` | Back button |
+| `KEYCODE_ENTER` | Enter/Return |
+| `KEYCODE_MENU` | Menu button |
+| `KEYCODE_POWER` | Power button |
+| `KEYCODE_VOLUME_UP` | Volume up |
+| `KEYCODE_VOLUME_DOWN` | Volume down |
+
+Full list: [Android KeyEvent](https://developer.android.com/reference/android/view/KeyEvent)
+
+---
+
+### click_control
+
+Click a UI control by its ID and name.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `control_id` | `str` | ✅ Yes | Precise annotated ID from `get_app_window_controls_target_info` |
+| `control_name` | `str` | ✅ Yes | Precise name of control (must match `control_id`) |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # Full action description
+ "message": str, # Success message with coordinates
+ "control_info": {
+ "id": str,
+ "name": str,
+ "type": str,
+ "rect": [int, int, int, int]
+ },
+ "warning": str, # Optional: name mismatch warning
+ # OR
+ "error": str # Error message
+}
+```
+
+#### Example
+
+```python
+# 1. Get controls
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ tool_name="get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+# 2. Find OK button
+ok_button = next(c for c in controls[0].data if c.name == "OK")
+
+# 3. Click it
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_control",
+ tool_name="click_control",
+ parameters={
+ "control_id": ok_button.id,
+ "control_name": ok_button.name
+ }
+ )
+])
+```
+
+#### Workflow
+
+1. Retrieves control from cache by `control_id`
+2. Verifies name matches (warns if different)
+3. Calculates center position from bounding box
+4. Taps at center coordinates
+5. Invalidates controls cache
+
+#### Side Effects
+
+- Taps the control (may trigger navigation)
+- Invalidates controls cache
+
+---
+
+### wait
+
+Wait for a specified number of seconds.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `seconds` | `float` | No | `1.0` | Number of seconds to wait (0-60 range) |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "action": str, # "wait(Ns)"
+ "message": str, # "Waited for N seconds"
+ # OR
+ "error": str # Error if invalid seconds
+}
+```
+
+#### Example
+
+```python
+# Wait 1 second
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::wait",
+ tool_name="wait",
+ parameters={"seconds": 1.0}
+ )
+])
+
+# Wait 500ms
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::wait",
+ tool_name="wait",
+ parameters={"seconds": 0.5}
+ )
+])
+
+# Wait 2.5 seconds
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::wait",
+ tool_name="wait",
+ parameters={"seconds": 2.5}
+ )
+])
+```
+
+#### Constraints
+
+- Minimum: 0 seconds
+- Maximum: 60 seconds
+- Use for UI transitions, animations, app loading
+
+---
+
+### invalidate_cache
+
+Manually invalidate cached data to force refresh on next query.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `cache_type` | `str` | No | `"all"` | Type of cache: `"controls"`, `"apps"`, `"ui_tree"`, `"device_info"`, `"all"` |
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+```python
+{
+ "success": bool,
+ "message": str, # Confirmation message
+ # OR
+ "error": str # Invalid cache_type
+}
+```
+
+#### Example
+
+```python
+# Invalidate all caches
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::invalidate_cache",
+ tool_name="invalidate_cache",
+ parameters={"cache_type": "all"}
+ )
+])
+
+# Invalidate only controls cache
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::invalidate_cache",
+ tool_name="invalidate_cache",
+ parameters={"cache_type": "controls"}
+ )
+])
+```
+
+#### Cache Types
+
+| Type | Description |
+|------|-------------|
+| `"controls"` | UI controls list |
+| `"apps"` | Installed apps list |
+| `"ui_tree"` | UI hierarchy XML |
+| `"device_info"` | Device information |
+| `"all"` | All caches |
+
+#### Use Cases
+
+- After manual device interaction (outside automation)
+- After app installation/uninstallation
+- When device state significantly changed
+- Before critical operations requiring fresh data
+
+---
+
+## Configuration
+
+### Client Configuration (UFO² Agent)
+
+```yaml
+# Windows agent controlling Android device
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http
+ host: "localhost" # Or remote machine IP
+ port: 8020
+ path: "/mcp"
+ action:
+ - namespace: MobileExecutor
+ type: http
+ host: "localhost"
+ port: 8021
+ path: "/mcp"
+
+# Remote Android device
+MobileAgent:
+ default:
+ data_collection:
+ - namespace: MobileDataCollector
+ type: http
+ host: "192.168.1.150" # Android automation server
+ port: 8020
+ path: "/mcp"
+ action:
+ - namespace: MobileExecutor
+ type: http
+ host: "192.168.1.150"
+ port: 8021
+ path: "/mcp"
+```
+
+## Deployment
+
+### Prerequisites
+
+1. **ADB Installation**
+
+```bash
+# Windows (via Android SDK or standalone)
+# Download from: https://developer.android.com/studio/releases/platform-tools
+
+# Linux
+sudo apt-get install android-tools-adb
+
+# macOS
+brew install android-platform-tools
+```
+
+2. **Android Device Setup**
+
+- Enable USB debugging in Developer Options
+- Connect device via USB or Wi-Fi
+- Verify connection: `adb devices`
+
+```bash
+# Check connected devices
+adb devices
+
+# Output:
+# List of devices attached
+# R5CR20XXXXX device
+```
+
+### Starting the Servers
+
+```bash
+# Start both servers (recommended)
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both --host 0.0.0.0 --data-port 8020 --action-port 8021
+
+# Output:
+# ==================================================
+# UFO Mobile MCP Servers (Android)
+# Android device control via ADB and Model Context Protocol
+# ==================================================
+# Using ADB: C:\...\adb.exe
+# Found 1 connected device(s)
+# ✅ Starting both servers in same process (shared MobileServerState)
+# - Data Collection Server: 0.0.0.0:8020
+# - Action Server: 0.0.0.0:8021
+# Both servers share MobileServerState cache. Press Ctrl+C to stop.
+
+# Start only data collection server
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server data --host 0.0.0.0 --data-port 8020
+
+# Start only action server
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server action --host 0.0.0.0 --action-port 8021
+```
+
+### Command-Line Arguments
+
+| Argument | Default | Description |
+|----------|---------|-------------|
+| `--server` | `both` | Which server(s): `data`, `action`, or `both` |
+| `--host` | `localhost` | Host to bind servers to |
+| `--data-port` | `8020` | Port for Data Collection Server |
+| `--action-port` | `8021` | Port for Action Server |
+| `--adb-path` | Auto-detect | Path to ADB executable |
+
+### ADB Path Detection
+
+The server auto-detects ADB from:
+
+1. Common installation paths:
+ - Windows: `C:\Users\{USER}\AppData\Local\Android\Sdk\platform-tools\adb.exe`
+ - Linux: `/usr/bin/adb`, `/usr/local/bin/adb`
+2. System PATH environment variable
+3. Fallback to `adb` command
+
+Override with `--adb-path`:
+
+```bash
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --adb-path "C:\custom\path\adb.exe"
+```
+
+### Network Configuration
+
+**Local Development:**
+```bash
+# Servers on same machine as client
+--host localhost
+```
+
+**Remote Access:**
+```bash
+# Servers accessible from network
+--host 0.0.0.0
+```
+
+**Security:** Use firewall rules to restrict access to trusted IPs.
+
+---
+
+## Best Practices
+
+### 1. Always Run Both Servers Together
+
+```bash
+# ✅ Good: Both servers in same process (shared state)
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server both
+
+# ❌ Bad: Separate processes (no shared state)
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server data &
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --server action &
+```
+
+**Why:** Shared `MobileServerState` enables action server to access controls cached by data server.
+
+### 2. Get Controls Before Interaction
+
+```python
+# ✅ Good: Get controls first
+controls = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::get_app_window_controls_target_info", ...)
+])
+
+# Then click/type
+await computer.run_actions([
+ MCPToolCall(tool_key="action::click_control", parameters={"control_id": "5", ...})
+])
+
+# ❌ Bad: Click without getting controls
+await computer.run_actions([
+ MCPToolCall(tool_key="action::click_control", parameters={"control_id": "5", ...})
+])
+# Error: Control not found in cache
+```
+
+### 3. Use Control IDs, Not Coordinates
+
+```python
+# ✅ Good: Use click_control (reliable)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_control",
+ parameters={"control_id": "3", "control_name": "Submit"}
+ )
+])
+
+# ⚠️ OK: Use tap only when control not available
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::tap",
+ parameters={"x": 500, "y": 1200}
+ )
+])
+```
+
+### 4. Handle Cache Expiration
+
+```python
+# Check if controls are stale
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={"force_refresh": False} # Use cache if available
+ )
+])
+
+# For critical operations, force refresh
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={"force_refresh": True} # Always query device
+ )
+])
+```
+
+### 5. Wait After Actions
+
+```python
+# ✅ Good: Wait for UI to settle
+await computer.run_actions([
+ MCPToolCall(tool_key="action::tap", parameters={"x": 500, "y": 1200})
+])
+await computer.run_actions([
+ MCPToolCall(tool_key="action::wait", parameters={"seconds": 1.0})
+])
+
+# Get updated controls
+controls = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::get_app_window_controls_target_info", ...)
+])
+```
+
+### 6. Validate ADB Connection
+
+```python
+# Check device info before operations
+device_info = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::get_device_info", parameters={})
+])
+
+if device_info[0].is_error:
+ raise RuntimeError("No Android device connected")
+```
+
+---
+
+## Use Cases
+
+### 1. App Automation
+
+```python
+# Launch app
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::launch_app",
+ tool_name="launch_app",
+ parameters={"package_name": "com.example.app"}
+ )
+])
+
+# Wait for app to load
+await computer.run_actions([
+ MCPToolCall(tool_key="action::wait", parameters={"seconds": 2.0})
+])
+
+# Get controls
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+# Find and click button
+login_btn = next(c for c in controls[0].data if "Login" in c.name)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_control",
+ parameters={
+ "control_id": login_btn.id,
+ "control_name": login_btn.name
+ }
+ )
+])
+```
+
+### 2. Form Filling
+
+```python
+# Get controls
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+# Type username
+username_field = next(c for c in controls[0].data if "username" in c.name.lower())
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::type_text",
+ tool_name="type_text",
+ parameters={
+ "text": "john.doe@example.com",
+ "control_id": username_field.id,
+ "control_name": username_field.name,
+ "clear_current_text": True
+ }
+ )
+])
+
+# Get updated controls (after typing)
+await computer.run_actions([
+ MCPToolCall(tool_key="action::wait", parameters={"seconds": 0.5})
+])
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={"force_refresh": True}
+ )
+])
+
+# Type password
+password_field = next(c for c in controls[0].data if "password" in c.name.lower())
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::type_text",
+ parameters={
+ "text": "SecureP@ssw0rd",
+ "control_id": password_field.id,
+ "control_name": password_field.name
+ }
+ )
+])
+
+# Submit
+submit_btn = next(c for c in controls[0].data if "Submit" in c.name)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::click_control",
+ parameters={
+ "control_id": submit_btn.id,
+ "control_name": submit_btn.name
+ }
+ )
+])
+```
+
+### 3. Scrolling and Navigation
+
+```python
+# Swipe up to scroll down content
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::swipe",
+ tool_name="swipe",
+ parameters={
+ "start_x": 500,
+ "start_y": 1500,
+ "end_x": 500,
+ "end_y": 500,
+ "duration": 300
+ }
+ )
+])
+
+# Wait for scrolling to complete
+await computer.run_actions([
+ MCPToolCall(tool_key="action::wait", parameters={"seconds": 0.5})
+])
+
+# Get updated controls
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={"force_refresh": True}
+ )
+])
+```
+
+### 4. Device Testing
+
+```python
+# Get device info
+device_info = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::get_device_info", parameters={})
+])
+
+print(f"Testing on: {device_info[0].data['device_info']['model']}")
+print(f"Android: {device_info[0].data['device_info']['android_version']}")
+
+# Take screenshot before test
+screenshot_before = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::capture_screenshot", parameters={})
+])
+
+# Perform test actions
+# ...
+
+# Take screenshot after test
+screenshot_after = await computer.run_data_collection([
+ MCPToolCall(tool_key="data_collection::capture_screenshot", parameters={})
+])
+
+# Compare screenshots (external comparison logic)
+```
+
+---
+
+## Comparison with Other Servers
+
+| Feature | MobileExecutor | HardwareExecutor (Robot Arm) | AppUIExecutor (Windows) |
+|---------|----------------|------------------------------|-------------------------|
+| **Platform** | Android (ADB) | Cross-platform (Hardware) | Windows (UIA) |
+| **Controls** | ✅ XML-based | ❌ Coordinate-based | ✅ UIA-based |
+| **Screenshots** | ✅ ADB screencap | ✅ Hardware camera | ✅ Windows API |
+| **Deployment** | HTTP (dual-server) | HTTP (single-server) | Local (in-process) |
+| **State Management** | ✅ Shared singleton | ❌ Stateless | ❌ No caching |
+| **App Launch** | ✅ Package manager | ❌ Manual | ✅ Process spawn |
+| **Text Input** | ✅ ADB input | ✅ HID keyboard | ✅ UIA SetValue |
+| **Cache** | ✅ 5s-5min TTL | ❌ No cache | ❌ No cache |
+
+---
+
+## Troubleshooting
+
+### ADB Connection Issues
+
+```bash
+# Restart ADB server
+adb kill-server
+adb start-server
+
+# Check device connection
+adb devices
+
+# If no devices shown:
+# 1. Check USB cable
+# 2. Verify USB debugging enabled on device
+# 3. Accept "Allow USB debugging" prompt on device
+```
+
+### Server Not Starting
+
+```bash
+# Check if ports are in use
+netstat -an | findstr "8020"
+netstat -an | findstr "8021"
+
+# Change ports if needed
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --data-port 8030 --action-port 8031
+```
+
+### Controls Not Found
+
+```python
+# Force refresh cache
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={"force_refresh": True}
+ )
+])
+
+# Or invalidate cache manually
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::invalidate_cache",
+ parameters={"cache_type": "controls"}
+ )
+])
+```
+
+### Text Input Fails
+
+```python
+# Ensure control is in cache
+controls = await computer.run_data_collection([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_target_info",
+ parameters={}
+ )
+])
+
+# Verify control ID and name match
+field = next(c for c in controls[0].data if c.id == "5")
+print(f"Control name: {field.name}")
+
+# Use exact ID and name
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::type_text",
+ parameters={
+ "text": "test",
+ "control_id": field.id,
+ "control_name": field.name
+ }
+ )
+])
+```
+
+---
+
+## Related Documentation
+
+- [HardwareExecutor](./hardware_executor.md) - Hardware control (robot arm, mobile devices)
+- [BashExecutor](./bash_executor.md) - Linux command execution
+- [AppUIExecutor](./app_ui_executor.md) - Windows UI automation
+- [Remote Servers](../remote_servers.md) - HTTP deployment guide
+- [Action Servers](../action.md) - Action server concepts
+- [Data Collection Servers](../data_collection.md) - Data collection overview
diff --git a/documents/docs/mcp/servers/pdf_reader_executor.md b/documents/docs/mcp/servers/pdf_reader_executor.md
new file mode 100644
index 000000000..40b7da94e
--- /dev/null
+++ b/documents/docs/mcp/servers/pdf_reader_executor.md
@@ -0,0 +1,350 @@
+# PDFReaderExecutor Server
+
+## Overview
+
+**PDFReaderExecutor** provides PDF text extraction with optional human simulation capabilities.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** AppAgent, HostAgent
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `PDFReaderExecutor` |
+| **Server Name** | `UFO PDF Reader MCP Server` |
+| **Platform** | Cross-platform (Windows, Linux, macOS) |
+| **Dependencies** | PyPDF2 |
+| **Tool Type** | `action` |
+
+## Tools
+
+### extract_pdf_text
+
+Extract text content from a single PDF file with optional human simulation.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `pdf_path` | `str` | ✅ Yes | - | Full path to PDF file |
+| `simulate_human` | `bool` | No | `True` | Simulate human-like document review |
+
+#### Returns
+
+`str` - Extracted text content with page markers
+
+#### Human Simulation Behavior
+
+When `simulate_human=True`:
+1. Opens PDF with default application
+2. Waits 2-5 seconds (random) to simulate reading
+3. Extracts text with page-by-page delays (0.5-1.5 seconds)
+4. Closes PDF file
+
+When `simulate_human=False`:
+- Direct text extraction (no delays)
+- No application launching
+
+#### Example
+
+```python
+# With human simulation (default)
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_pdf_text",
+ tool_name="extract_pdf_text",
+ parameters={
+ "pdf_path": "C:\\Documents\\report.pdf",
+ "simulate_human": True
+ }
+ )
+])
+
+# Fast extraction (no simulation)
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_pdf_text",
+ tool_name="extract_pdf_text",
+ parameters={
+ "pdf_path": "C:\\Documents\\report.pdf",
+ "simulate_human": False
+ }
+ )
+])
+```
+
+#### Output Format
+
+```
+--- Page 1 ---
+This is the content of page 1.
+
+--- Page 2 ---
+This is the content of page 2.
+
+--- Page 3 ---
+This is the content of page 3.
+```
+
+#### Error Handling
+
+Returns error message string if:
+- File not found: `"Error: PDF file not found at {path}"`
+- Not a PDF: `"Error: File {path} is not a PDF file"`
+- Read error: `"Error reading PDF {path}: {details}"`
+
+---
+
+### list_pdfs_in_directory
+
+List all PDF files in a specified directory.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `directory_path` | `str` | ✅ Yes | Directory path to scan |
+
+#### Returns
+
+`List[str]` - List of PDF file paths (sorted)
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::list_pdfs_in_directory",
+ tool_name="list_pdfs_in_directory",
+ parameters={"directory_path": "C:\\Documents\\Reports"}
+ )
+])
+
+# Output: [
+# "C:\\Documents\\Reports\\Q1_Report.pdf",
+# "C:\\Documents\\Reports\\Q2_Report.pdf",
+# "C:\\Documents\\Reports\\Q3_Report.pdf"
+# ]
+```
+
+#### Error Handling
+
+Returns empty list `[]` if:
+- Directory doesn't exist
+- Path is not a directory
+- No PDF files found
+
+---
+
+### extract_all_pdfs_text
+
+Extract text from all PDF files in a directory with human simulation.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `directory_path` | `str` | ✅ Yes | - | Directory containing PDFs |
+| `simulate_human` | `bool` | No | `True` | Simulate human review for each PDF |
+
+#### Returns
+
+`Dict[str, str]` - Dictionary mapping filenames to extracted text
+
+#### Human Simulation Behavior
+
+When `simulate_human=True`:
+- Brief pause between files (1-3 seconds random)
+- Each PDF processed with human simulation
+- Progress messages logged
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_all_pdfs_text",
+ tool_name="extract_all_pdfs_text",
+ parameters={
+ "directory_path": "C:\\Documents\\Reports",
+ "simulate_human": True
+ }
+ )
+])
+
+# Output: {
+# "Q1_Report.pdf": "--- Page 1 ---\nQ1 Sales Report\n...",
+# "Q2_Report.pdf": "--- Page 1 ---\nQ2 Sales Report\n...",
+# "Q3_Report.pdf": "--- Page 1 ---\nQ3 Sales Report\n..."
+# }
+```
+
+#### Error Handling
+
+Returns dictionary with error key if:
+- Directory not found: `{"error": "Directory not found: {path}"}`
+- Not a directory: `{"error": "Path is not a directory: {path}"}`
+- No PDFs found: `{"message": "No PDF files found in directory: {path}"}`
+
+## Configuration
+
+```yaml
+AppAgent:
+ default:
+ action:
+ - namespace: PDFReaderExecutor
+ type: local
+
+HostAgent:
+ default:
+ action:
+ - namespace: PDFReaderExecutor
+ type: local
+```
+
+## Best Practices
+
+### 1. Disable Simulation for Batch Processing
+
+```python
+# ✅ Good: Fast batch processing
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_all_pdfs_text",
+ tool_name="extract_all_pdfs_text",
+ parameters={
+ "directory_path": "C:\\Documents",
+ "simulate_human": False # Faster
+ }
+ )
+])
+
+# ❌ Bad: Slow with simulation
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_all_pdfs_text",
+ tool_name="extract_all_pdfs_text",
+ parameters={
+ "directory_path": "C:\\Documents",
+ "simulate_human": True # 2-5 seconds per file
+ }
+ )
+])
+```
+
+### 2. Verify Files Exist
+
+```python
+# List PDFs first
+pdf_list = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::list_pdfs_in_directory",
+ parameters={"directory_path": "C:\\Documents"}
+ )
+])
+
+if pdf_list[0].data:
+ # Extract from first PDF
+ text = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_pdf_text",
+ parameters={
+ "pdf_path": pdf_list[0].data[0],
+ "simulate_human": False
+ }
+ )
+ ])
+else:
+ logger.warning("No PDF files found")
+```
+
+### 3. Handle Large Documents
+
+```python
+# Extract text
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_pdf_text",
+ parameters={"pdf_path": "large_document.pdf", "simulate_human": False}
+ )
+])
+
+text = result[0].data
+
+# Process in chunks if needed
+if len(text) > 100000: # Large document
+ chunks = [text[i:i+50000] for i in range(0, len(text), 50000)]
+ for chunk in chunks:
+ process_chunk(chunk)
+```
+
+## Use Cases
+
+### Document Analysis Pipeline
+
+```python
+# 1. List all PDFs
+pdfs = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::list_pdfs_in_directory",
+ parameters={"directory_path": "C:\\Contracts"}
+ )
+])
+
+# 2. Extract text from each
+for pdf_path in pdfs[0].data:
+ text = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_pdf_text",
+ parameters={"pdf_path": pdf_path, "simulate_human": False}
+ )
+ ])
+
+ # 3. Analyze text
+ analyze_contract(text[0].data)
+```
+
+### Batch Report Processing
+
+```python
+# Extract all reports at once
+reports = await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::extract_all_pdfs_text",
+ tool_name="extract_all_pdfs_text",
+ parameters={
+ "directory_path": "C:\\Reports\\2024",
+ "simulate_human": False
+ }
+ )
+])
+
+# Process all reports
+for filename, content in reports[0].data.items():
+ logger.info(f"Processing {filename}")
+ # Extract data from content
+ data = extract_report_data(content)
+```
+
+## Limitations
+
+- **Text-only**: Cannot extract images or formatting
+- **OCR not supported**: Scanned PDFs with no text layer will return empty
+- **Table parsing**: Complex tables may not preserve structure
+- **No modification**: Read-only operations (cannot edit PDFs)
+
+## Performance
+
+| Operation | simulate_human=True | simulate_human=False |
+|-----------|---------------------|----------------------|
+| Single PDF (10 pages) | ~10-20 seconds | ~1 second |
+| Batch 10 PDFs | ~2-3 minutes | ~10 seconds |
+| Large PDF (100 pages) | ~2-5 minutes | ~5-10 seconds |
+
+## Related Documentation
+
+- [Action Servers](../action.md) - Action server concepts
+- [Local Servers](../local_servers.md) - Local deployment
diff --git a/documents/docs/mcp/servers/ppt_com_executor.md b/documents/docs/mcp/servers/ppt_com_executor.md
new file mode 100644
index 000000000..910428a6c
--- /dev/null
+++ b/documents/docs/mcp/servers/ppt_com_executor.md
@@ -0,0 +1,324 @@
+# PowerPointCOMExecutor Server
+
+## Overview
+
+**PowerPointCOMExecutor** provides Microsoft PowerPoint automation via COM API for efficient presentation manipulation.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** AppAgent
+**Target Application:** Microsoft PowerPoint (`POWERPNT.EXE`)
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `PowerPointCOMExecutor` |
+| **Platform** | Windows |
+| **Requires** | Microsoft PowerPoint (COM interface) |
+| **Tool Type** | `action` |
+
+## Tools
+
+### set_background_color
+
+Set the background color for one or more slides in the presentation.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `color` | `str` | ✅ Yes | - | Hex color code (RGB format, e.g., `"FFFFFF"`) |
+| `slide_index` | `List[int]` | No | `None` | List of slide indices (1-based). `None` = all slides |
+
+#### Returns
+
+`str` - Success/failure message
+
+#### Example
+
+```python
+# Set white background for slide 1
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ tool_name="set_background_color",
+ parameters={
+ "color": "FFFFFF",
+ "slide_index": [1]
+ }
+ )
+])
+
+# Set blue background for slides 1, 3, 5
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ tool_name="set_background_color",
+ parameters={
+ "color": "0000FF",
+ "slide_index": [1, 3, 5]
+ }
+ )
+])
+
+# Set red background for ALL slides
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ tool_name="set_background_color",
+ parameters={
+ "color": "FF0000",
+ "slide_index": None # All slides
+ }
+ )
+])
+```
+
+#### Color Format
+
+Use 6-character hex RGB codes (without `#`):
+
+| Color | Hex Code |
+|-------|----------|
+| White | `FFFFFF` |
+| Black | `000000` |
+| Red | `FF0000` |
+| Green | `00FF00` |
+| Blue | `0000FF` |
+| Yellow | `FFFF00` |
+| Gray | `808080` |
+
+---
+
+### save_as
+
+Save or export PowerPoint presentation to specified format.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_dir` | `str` | No | `""` | Directory path |
+| `file_name` | `str` | No | `""` | Filename without extension |
+| `file_ext` | `str` | No | `""` | Extension (default: `.pptx`) |
+| `current_slide_only` | `bool` | No | `False` | For image formats: save only current slide or all slides |
+
+#### Supported Extensions
+
+**Presentation Formats**:
+- `.pptx` - PowerPoint presentation (default)
+- `.ppt` - PowerPoint 97-2003
+- `.pdf` - PDF format
+
+**Image Formats** (controlled by `current_slide_only`):
+- `.jpg`, `.jpeg` - JPEG image
+- `.png` - PNG image
+- `.gif` - GIF image
+- `.bmp` - Bitmap image
+- `.tiff` - TIFF image
+
+#### Returns
+
+`str` - Success/failure message
+
+#### Example
+
+```python
+# Save as PPTX
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Presentations",
+ "file_name": "Q4_Report",
+ "file_ext": ".pptx"
+ }
+ )
+])
+
+# Export as PDF
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_ext": ".pdf"
+ }
+ )
+])
+
+# Save current slide as PNG
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_name": "slide_1",
+ "file_ext": ".png",
+ "current_slide_only": True
+ }
+ )
+])
+
+# Export all slides as PNG images (creates directory)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Exports\\Slides",
+ "file_ext": ".png",
+ "current_slide_only": False # Saves all slides
+ }
+ )
+])
+```
+
+#### Image Export Behavior
+
+| `current_slide_only` | Behavior |
+|----------------------|----------|
+| `True` | Single image file of current slide |
+| `False` | Directory containing multiple image files (one per slide) |
+
+## Configuration
+
+```yaml
+AppAgent:
+ POWERPNT.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: PowerPointCOMExecutor
+ type: local
+ reset: true # Recommended
+```
+
+## Best Practices
+
+### 1. Bulk Background Setting
+
+```python
+# ✅ Good: Set multiple slides at once
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ parameters={"color": "FFFFFF", "slide_index": [1, 2, 3, 4, 5]}
+ )
+])
+
+# ❌ Bad: One call per slide
+for i in range(1, 6):
+ await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ parameters={"color": "FFFFFF", "slide_index": [i]}
+ )
+ ])
+```
+
+### 2. Use save_as for Exports
+
+```python
+# ✅ Good: Fast one-command export
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ parameters={"file_ext": ".pdf"}
+ )
+])
+
+# ❌ Bad: Manual UI navigation
+await computer.run_actions([
+ MCPToolCall(tool_key="action::keyboard_input", parameters={"keys": "{VK_MENU}f"}) # Alt+F
+])
+# ... navigate File menu ...
+```
+
+### 3. Verify Hex Colors
+
+```python
+def validate_hex_color(color: str) -> bool:
+ """Validate hex color format"""
+ return bool(re.match(r'^[0-9A-Fa-f]{6}$', color))
+
+color = "FFFFFF"
+if validate_hex_color(color):
+ await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ parameters={"color": color, "slide_index": [1]}
+ )
+ ])
+```
+
+## Use Cases
+
+### Presentation Branding
+
+```python
+# Apply company color scheme
+brand_color = "003366" # Company blue
+
+# Set all slides to brand background
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_background_color",
+ tool_name="set_background_color",
+ parameters={
+ "color": brand_color,
+ "slide_index": None # All slides
+ }
+ )
+])
+
+# Save as PDF for distribution
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Distribution",
+ "file_name": "Company_Presentation",
+ "file_ext": ".pdf"
+ }
+ )
+])
+```
+
+### Slide Export for Documentation
+
+```python
+# Export each slide as PNG for documentation
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Docs\\Images",
+ "file_name": "presentation_slides",
+ "file_ext": ".png",
+ "current_slide_only": False # Export all
+ }
+ )
+])
+```
+
+## Limitations
+
+- **Limited tool set**: Only 2 tools (background color and save)
+- **No content creation**: Cannot add text, shapes, or images via COM (use UI automation)
+- **No slide management**: Cannot add/delete/reorder slides (use UI automation)
+
+**Tip:** Combine with **AppUIExecutor** for full PowerPoint automation:
+- **PowerPointCOMExecutor**: Background colors, export
+- **AppUIExecutor**: Add slides, insert text, shapes, animations
+
+## Related Documentation
+
+- [WordCOMExecutor](./word_com_executor.md) - Word COM automation
+- [ExcelCOMExecutor](./excel_com_executor.md) - Excel COM automation
+- [AppUIExecutor](./app_ui_executor.md) - UI-based PowerPoint automation
diff --git a/documents/docs/mcp/servers/ui_collector.md b/documents/docs/mcp/servers/ui_collector.md
new file mode 100644
index 000000000..7f2ba8bd4
--- /dev/null
+++ b/documents/docs/mcp/servers/ui_collector.md
@@ -0,0 +1,566 @@
+# UICollector Server
+
+## Overview
+
+**UICollector** is a data collection MCP server that provides comprehensive UI observation and information retrieval capabilities for the UFO² framework. It automatically gathers screenshots, window lists, control information, and UI trees to build the observation context for LLM decision-making.
+
+**Server Type:** Data Collection
+**Deployment:** Local (in-process)
+**Agent:** HostAgent, AppAgent
+**LLM-Selectable:** ❌ No (automatically invoked by framework)
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `UICollector` |
+| **Server Name** | `UFO UI Data MCP Server` |
+| **Platform** | Windows |
+| **Backend** | UIAutomation (UIA) or Win32 |
+| **Tool Type** | `data_collection` |
+| **Tool Key Format** | `data_collection::{tool_name}` |
+
+## Tools
+
+### 1. get_desktop_app_info
+
+Get information about all application windows currently open on the desktop.
+
+#### Description
+
+Retrieves a list of all visible application windows on the Windows desktop, including window names, types, and identifiers. This is typically the first step in UI automation workflows to discover available applications.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `remove_empty` | `bool` | No | `True` | Whether to remove windows with no visible content |
+| `refresh_app_windows` | `bool` | No | `True` | Whether to refresh the list of application windows |
+
+#### Returns
+
+**Type**: `List[Dict[str, Any]]`
+
+List of window information dictionaries, each containing:
+
+```python
+{
+ "id": str, # Unique window identifier (e.g., "1", "2", "3")
+ "name": str, # Window title/text
+ "type": str, # Control type (e.g., "Window", "Pane")
+ "kind": str # Target kind: "window"
+}
+```
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_desktop_app_info",
+ tool_name="get_desktop_app_info",
+ parameters={
+ "remove_empty": True,
+ "refresh_app_windows": True
+ }
+ )
+])
+
+# Example output:
+[
+ {
+ "id": "1",
+ "name": "Visual Studio Code",
+ "type": "Window",
+ "kind": "window"
+ },
+ {
+ "id": "2",
+ "name": "Microsoft Edge",
+ "type": "Window",
+ "kind": "window"
+ }
+]
+```
+
+---
+
+### 2. get_desktop_app_target_info
+
+Get comprehensive target information for all desktop application windows.
+
+#### Description
+
+Similar to `get_desktop_app_info`, but returns `TargetInfo` objects instead of plain dictionaries. This provides a more structured representation of window information for internal framework use.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `remove_empty` | `bool` | No | `True` | Whether to remove windows with no visible content |
+| `refresh_app_windows` | `bool` | No | `True` | Whether to refresh the list of application windows |
+
+#### Returns
+
+**Type**: `List[TargetInfo]`
+
+List of `TargetInfo` objects with properties:
+- `id`: Unique identifier
+- `name`: Window title
+- `type`: Control type
+- `kind`: TargetKind.WINDOW
+
+---
+
+### 3. get_app_window_info
+
+Get detailed information about the currently selected application window.
+
+#### Description
+
+Retrieves specific fields of information for the active/selected window. You must select a window using `select_application_window` (HostUIExecutor) before calling this tool.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `field_list` | `List[str]` | Yes | - | List of field names to retrieve |
+
+#### Supported Fields
+
+Common fields include:
+- `"control_text"`: Window title/text
+- `"control_type"`: Control type (e.g., "Window")
+- `"control_rect"`: Bounding rectangle coordinates
+- `"process_id"`: Process ID
+- `"class_name"`: Window class name
+- `"is_visible"`: Visibility status
+- `"is_enabled"`: Enabled status
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+Dictionary mapping field names to their values.
+
+#### Example
+
+```python
+# First select a window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Calculator"}
+ )
+])
+
+# Then get window info
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_info",
+ tool_name="get_app_window_info",
+ parameters={
+ "field_list": ["control_text", "control_type", "control_rect"]
+ }
+ )
+])
+
+# Example output:
+{
+ "control_text": "Calculator",
+ "control_type": "Window",
+ "control_rect": {"x": 100, "y": 100, "width": 400, "height": 600}
+}
+```
+
+---
+
+### 4. get_app_window_controls_info
+
+Get information about all UI controls in the selected application window.
+
+#### Description
+
+Scans the currently selected window and retrieves information about all interactive controls (buttons, text boxes, etc.). This is essential for understanding what actions can be performed on the window.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `field_list` | `List[str]` | Yes | - | List of field names to retrieve for each control |
+
+#### Supported Fields
+
+- `"label"`: Control identifier/label
+- `"control_text"`: Text content of the control
+- `"control_type"`: Type of control (Button, Edit, etc.)
+- `"control_rect"`: Bounding rectangle
+- `"is_enabled"`: Whether control is enabled
+- `"is_visible"`: Whether control is visible
+
+#### Returns
+
+**Type**: `List[Dict[str, Any]]`
+
+List of dictionaries, each representing one UI control.
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_info",
+ tool_name="get_app_window_controls_info",
+ parameters={
+ "field_list": ["label", "control_text", "control_type"]
+ }
+ )
+])
+
+# Example output:
+[
+ {
+ "label": "1",
+ "control_text": "Submit",
+ "control_type": "Button"
+ },
+ {
+ "label": "2",
+ "control_text": "",
+ "control_type": "Edit"
+ }
+]
+```
+
+---
+
+### 5. get_app_window_controls_target_info
+
+Get `TargetInfo` objects for all controls in the selected window.
+
+#### Description
+
+Similar to `get_app_window_controls_info`, but returns structured `TargetInfo` objects for internal framework use.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `field_list` | `List[str]` | Yes | - | List of field names to retrieve |
+
+#### Returns
+
+**Type**: `List[TargetInfo]`
+
+List of `TargetInfo` objects, each with:
+- `kind`: TargetKind.CONTROL
+- `id`: Control identifier
+- `name`: Control text
+- `type`: Control type
+- `rect`: Bounding rectangle
+- `source`: "uia"
+
+---
+
+### 6. capture_window_screenshot
+
+Capture a screenshot of the currently selected application window.
+
+#### Description
+
+Takes a screenshot of the active window and returns it as base64-encoded image data. This is crucial for visual observation and LLM vision capabilities.
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `str`
+
+Base64-encoded PNG image data.
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::capture_window_screenshot",
+ tool_name="capture_window_screenshot",
+ parameters={}
+ )
+])
+
+# Result is base64 string: "iVBORw0KGgoAAAANSUhEUgAA..."
+```
+
+#### Error Handling
+
+Returns error message string if screenshot capture fails:
+```
+"Error: No window selected"
+"Error capturing screenshot: {error_details}"
+```
+
+---
+
+### 7. capture_desktop_screenshot
+
+Capture a screenshot of the entire desktop or primary screen.
+
+#### Description
+
+Takes a screenshot of the desktop environment, either all monitors or just the primary screen.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `all_screens` | `bool` | No | `True` | Capture all screens (True) or primary screen only (False) |
+
+#### Returns
+
+**Type**: `str`
+
+Base64-encoded PNG image data of the desktop screenshot.
+
+#### Example
+
+```python
+# Capture all screens
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::capture_desktop_screenshot",
+ tool_name="capture_desktop_screenshot",
+ parameters={"all_screens": True}
+ )
+])
+
+# Capture primary screen only
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::capture_desktop_screenshot",
+ tool_name="capture_desktop_screenshot",
+ parameters={"all_screens": False}
+ )
+])
+```
+
+---
+
+### 8. get_ui_tree
+
+Get the complete UI tree structure for the selected window.
+
+#### Description
+
+Retrieves the hierarchical structure of all UI elements in the window as a tree. This provides deep insight into the window's layout and control relationships.
+
+#### Parameters
+
+None
+
+#### Returns
+
+**Type**: `Dict[str, Any]`
+
+UI tree structure as a nested dictionary representing the control hierarchy.
+
+#### Example
+
+```python
+result = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_ui_tree",
+ tool_name="get_ui_tree",
+ parameters={}
+ )
+])
+
+# Example output (simplified):
+{
+ "control_type": "Window",
+ "name": "Calculator",
+ "children": [
+ {
+ "control_type": "Pane",
+ "name": "Display",
+ "children": [...]
+ },
+ {
+ "control_type": "Button",
+ "name": "1"
+ }
+ ]
+}
+```
+
+#### Error Handling
+
+Returns error dictionary if UI tree extraction fails:
+```python
+{"error": "No window selected"}
+{"error": "Error getting UI tree: {details}"}
+```
+
+## Configuration
+
+### Basic Configuration
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+
+AppAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+```
+
+### Configuration Options
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `namespace` | `str` | Must be `"UICollector"` |
+| `type` | `str` | Deployment type: `"local"` |
+| `reset` | `bool` | Whether to reset server state between tasks |
+
+## Internal State
+
+The UICollector maintains shared state across operations:
+
+- **photographer**: Screenshot capture facade
+- **control_inspector**: UI control inspection facade
+- **selected_app_window**: Currently selected window (set by HostUIExecutor)
+- **last_app_windows**: Cached list of desktop windows
+- **control_dict**: Dictionary mapping control IDs to control objects
+
+## Usage Patterns
+
+### Pattern 1: Complete Desktop Observation
+
+```python
+# 1. Get all windows
+windows = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
+])
+
+# 2. Capture desktop screenshot
+screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::capture_desktop_screenshot", ...)
+])
+
+# 3. Select target window
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_application_window",
+ parameters={"id": "1", "name": "Calculator"}
+ )
+])
+
+# 4. Get window controls
+controls = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_info",
+ parameters={"field_list": ["label", "control_text", "control_type"]}
+ )
+])
+```
+
+### Pattern 2: Window-Specific Observation
+
+```python
+# After window is selected by HostUIExecutor...
+
+# Get window info
+window_info = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_info",
+ parameters={"field_list": ["control_text", "control_rect"]}
+ )
+])
+
+# Get window screenshot
+screenshot = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::capture_window_screenshot", ...)
+])
+
+# Get UI controls
+controls = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_info",
+ parameters={"field_list": ["label", "control_text"]}
+ )
+])
+```
+
+## Best Practices
+
+### 1. Caching Window Lists
+
+```python
+# First call: refresh windows
+windows = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_desktop_app_info",
+ parameters={"refresh_app_windows": True}
+ )
+])
+
+# Subsequent calls: use cached data
+windows = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_desktop_app_info",
+ parameters={"refresh_app_windows": False} # Faster
+ )
+])
+```
+
+### 2. Selective Field Retrieval
+
+```python
+# ✅ Good: Only request needed fields
+controls = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_info",
+ parameters={"field_list": ["label", "control_text"]}
+ )
+])
+
+# ❌ Bad: Don't request unnecessary fields
+controls = await computer.run_actions([
+ MCPToolCall(
+ tool_key="data_collection::get_app_window_controls_info",
+ parameters={"field_list": [
+ "label", "control_text", "control_type", "control_rect",
+ "is_visible", "is_enabled", "automation_id", "class_name"
+ ]} # Too many fields slow down processing
+ )
+])
+```
+
+### 3. Error Handling
+
+```python
+# Always check for window selection
+window_info = await computer.run_actions([
+ MCPToolCall(tool_key="data_collection::get_app_window_info", ...)
+])
+
+if "error" in window_info[0].content[0].text:
+ # No window selected
+ # Select window first...
+```
+
+## Related Documentation
+
+- [Data Collection Overview](../data_collection.md) - Data collection concepts
+- [HostUIExecutor](./host_ui_executor.md) - Window selection server
+- [AppUIExecutor](./app_ui_executor.md) - UI action execution
+- [Local Servers](../local_servers.md) - Local server deployment
diff --git a/documents/docs/mcp/servers/word_com_executor.md b/documents/docs/mcp/servers/word_com_executor.md
new file mode 100644
index 000000000..1743d04be
--- /dev/null
+++ b/documents/docs/mcp/servers/word_com_executor.md
@@ -0,0 +1,385 @@
+# WordCOMExecutor Server
+
+## Overview
+
+**WordCOMExecutor** provides Microsoft Word automation via COM API for efficient document manipulation beyond UI automation.
+
+**Server Type:** Action
+**Deployment:** Local (in-process)
+**Agent:** AppAgent
+**Target Application:** Microsoft Word (`WINWORD.EXE`)
+**LLM-Selectable:** ✅ Yes
+
+## Server Information
+
+| Property | Value |
+|----------|-------|
+| **Namespace** | `WordCOMExecutor` |
+| **Server Name** | `UFO UI AppAgent Action MCP Server` |
+| **Platform** | Windows |
+| **Requires** | Microsoft Word (COM interface) |
+| **Tool Type** | `action` |
+
+## Tools Summary
+
+| Tool Name | Description |
+|-----------|-------------|
+| `insert_table` | Insert table into document |
+| `select_text` | Select specific text |
+| `select_table` | Select table by index |
+| `select_paragraph` | Select paragraph range |
+| `save_as` | Save/export document |
+| `set_font` | Set font properties for selected text |
+
+## Tool Details
+
+### insert_table
+
+Insert a table into the Word document at the current cursor position.
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `rows` | `int` | ✅ Yes | Number of rows in the table |
+| `columns` | `int` | ✅ Yes | Number of columns in the table |
+
+#### Returns
+
+`str` - Result message
+
+#### Example
+
+```python
+# Insert 3x4 table
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::insert_table",
+ tool_name="insert_table",
+ parameters={"rows": 3, "columns": 4}
+ )
+])
+```
+
+---
+
+### select_text
+
+Select exact text in the document for further operations (formatting, deletion, etc.).
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `text` | `str` | ✅ Yes | Exact text to select |
+
+#### Returns
+
+`str` - Selected text if successful, or "text not found" message
+
+#### Example
+
+```python
+# Select specific text
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_text",
+ tool_name="select_text",
+ parameters={"text": "Annual Report 2024"}
+ )
+])
+
+# Then format it
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_font",
+ tool_name="set_font",
+ parameters={"font_name": "Arial", "font_size": 18}
+ )
+])
+```
+
+---
+
+### select_table
+
+Select a table in the document by its index (1-based).
+
+#### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `number` | `int` | ✅ Yes | Table index (1-based) |
+
+#### Returns
+
+`str` - Success message or "out of range" message
+
+#### Example
+
+```python
+# Select first table
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_table",
+ tool_name="select_table",
+ parameters={"number": 1}
+ )
+])
+```
+
+---
+
+### select_paragraph
+
+Select a range of paragraphs in the document.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `start_index` | `int` | ✅ Yes | - | Start paragraph index |
+| `end_index` | `int` | ✅ Yes | - | End paragraph index (`-1` = end of document) |
+| `non_empty` | `bool` | No | `True` | Select only non-empty paragraphs |
+
+#### Returns
+
+`str` - Result message
+
+#### Example
+
+```python
+# Select paragraphs 1-5 (non-empty only)
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_paragraph",
+ tool_name="select_paragraph",
+ parameters={
+ "start_index": 1,
+ "end_index": 5,
+ "non_empty": True
+ }
+ )
+])
+
+# Select from paragraph 10 to end
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_paragraph",
+ tool_name="select_paragraph",
+ parameters={"start_index": 10, "end_index": -1}
+ )
+])
+```
+
+---
+
+### save_as
+
+Save or export Word document to specified format. **Fastest way to save documents.**
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `file_dir` | `str` | No | `""` | Directory path (empty = current directory) |
+| `file_name` | `str` | No | `""` | Filename without extension (empty = current name) |
+| `file_ext` | `str` | No | `""` | File extension (empty = `.pdf`) |
+
+#### Supported Extensions
+
+- `.pdf` - PDF format (default)
+- `.docx` - Word document
+- `.txt` - Plain text
+- `.html` - HTML format
+- `.rtf` - Rich Text Format
+
+#### Returns
+
+`str` - Success/failure message
+
+#### Example
+
+```python
+# Save as PDF in current directory
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "",
+ "file_name": "",
+ "file_ext": "" # Defaults to .pdf
+ }
+ )
+])
+
+# Save as DOCX with specific name and path
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ tool_name="save_as",
+ parameters={
+ "file_dir": "C:\\Documents\\Reports",
+ "file_name": "Q4_Report_2024",
+ "file_ext": ".docx"
+ }
+ )
+])
+```
+
+---
+
+### set_font
+
+Set font properties for currently selected text.
+
+!!!warning "Selection Required"
+ Text must be selected first using `select_text`, `select_paragraph`, or manual selection.
+
+#### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `font_name` | `str` | No | `None` | Font name (e.g., "Arial", "Times New Roman", "宋体") |
+| `font_size` | `int` | No | `None` | Font size in points |
+
+#### Returns
+
+`str` - Font change confirmation or "no text selected" message
+
+#### Example
+
+```python
+# Select text first
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::select_text",
+ parameters={"text": "Important Notice"}
+ )
+])
+
+# Set font to Arial 16pt
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_font",
+ tool_name="set_font",
+ parameters={"font_name": "Arial", "font_size": 16}
+ )
+])
+
+# Change only size
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::set_font",
+ tool_name="set_font",
+ parameters={"font_size": 20} # Keep current font name
+ )
+])
+```
+
+## Configuration
+
+```yaml
+AppAgent:
+ # Word-specific configuration
+ WINWORD.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor # Add COM automation
+ type: local
+ reset: true # Reset COM state when switching documents
+```
+
+### Configuration Options
+
+| Option | Value | Description |
+|--------|-------|-------------|
+| `reset` | `true` | **Recommended**: Reset COM state between documents to prevent data leakage |
+| `reset` | `false` | Keep COM state across documents (faster but risky) |
+
+## Best Practices
+
+### 1. Use COM for Bulk Operations
+
+```python
+# ✅ Good: Fast COM API
+await computer.run_actions([
+ MCPToolCall(tool_key="action::insert_table", parameters={"rows": 10, "columns": 5})
+])
+
+# ❌ Bad: Slow UI automation
+for i in range(10):
+ await computer.run_actions([
+ MCPToolCall(tool_key="action::click_input", ...) # Click Insert Table
+ ])
+```
+
+### 2. Prefer save_as Over Manual Saving
+
+```python
+# ✅ Good: One command
+await computer.run_actions([
+ MCPToolCall(
+ tool_key="action::save_as",
+ parameters={"file_dir": "C:\\Reports", "file_name": "report", "file_ext": ".pdf"}
+ )
+])
+
+# ❌ Bad: Multiple UI steps
+await computer.run_actions([
+ MCPToolCall(tool_key="action::keyboard_input", parameters={"keys": "{VK_CONTROL}s"})
+])
+# ... then navigate save dialog ...
+```
+
+### 3. Select Before Formatting
+
+```python
+# ✅ Good: Select then format
+await computer.run_actions([
+ MCPToolCall(tool_key="action::select_text", parameters={"text": "Title"})
+])
+await computer.run_actions([
+ MCPToolCall(tool_key="action::set_font", parameters={"font_size": 24})
+])
+
+# ❌ Bad: Format without selection
+await computer.run_actions([
+ MCPToolCall(tool_key="action::set_font", parameters={"font_size": 24})
+]) # Fails: "no text selected"
+```
+
+## Use Cases
+
+### Document Report Generation
+
+```python
+# 1. Select title
+await computer.run_actions([
+ MCPToolCall(tool_key="action::select_paragraph", parameters={"start_index": 1, "end_index": 1})
+])
+
+# 2. Format title
+await computer.run_actions([
+ MCPToolCall(tool_key="action::set_font", parameters={"font_name": "Arial", "font_size": 20})
+])
+
+# 3. Insert data table
+await computer.run_actions([
+ MCPToolCall(tool_key="action::insert_table", parameters={"rows": 5, "columns": 3})
+])
+
+# 4. Save as PDF
+await computer.run_actions([
+ MCPToolCall(tool_key="action::save_as", parameters={"file_ext": ".pdf"})
+])
+```
+
+## Related Documentation
+
+- [ExcelCOMExecutor](./excel_com_executor.md) - Excel COM automation
+- [PowerPointCOMExecutor](./ppt_com_executor.md) - PowerPoint COM automation
+- [AppUIExecutor](./app_ui_executor.md) - UI-based Word automation
+- [Action Servers](../action.md) - Action server concepts
diff --git a/documents/docs/mobile/as_galaxy_device.md b/documents/docs/mobile/as_galaxy_device.md
new file mode 100644
index 000000000..05d0f561f
--- /dev/null
+++ b/documents/docs/mobile/as_galaxy_device.md
@@ -0,0 +1,698 @@
+# Using Mobile Agent as Galaxy Device
+
+Configure Mobile Agent as a sub-agent in UFO's Galaxy framework to enable cross-platform, multi-device task orchestration. Galaxy can coordinate Mobile agents alongside Windows and Linux devices to execute complex workflows spanning multiple systems and platforms.
+
+> **📖 Prerequisites:**
+>
+> Before configuring Mobile Agent in Galaxy, ensure you have:
+>
+> - Completed the [Mobile Agent Quick Start Guide](../getting_started/quick_start_mobile.md) - Learn how to set up server, MCP services, and client
+> - Read the [Mobile Agent Overview](overview.md) - Understand Mobile Agent's design and capabilities
+> - Reviewed the [Galaxy Overview](../galaxy/overview.md) - Understand multi-device orchestration
+
+## Overview
+
+The **Galaxy framework** provides multi-tier orchestration capabilities, allowing you to manage multiple device agents (Windows, Linux, Android, etc.) from a central ConstellationAgent. When configured as a Galaxy device, MobileAgent becomes a **sub-agent** that can:
+
+- Execute Android-specific subtasks assigned by Galaxy
+- Participate in cross-platform workflows (e.g., Windows + Android + Linux collaboration)
+- Report execution status back to the orchestrator
+- Be dynamically selected based on capabilities and metadata
+
+For detailed information about MobileAgent's design and capabilities, see [Mobile Agent Overview](overview.md).
+
+## Galaxy Architecture with Mobile Agent
+
+```mermaid
+graph TB
+ User[User Request]
+ Galaxy[Galaxy ConstellationAgent Orchestrator]
+
+ subgraph "Device Pool"
+ Win1[Windows Device 1 HostAgent]
+ Linux1[Linux Agent 1 CLI Executor]
+ Mobile1[Mobile Agent 1 Android Phone]
+ Mobile2[Mobile Agent 2 Android Tablet]
+ Mobile3[Mobile Agent 3 Android Emulator]
+ end
+
+ User -->|Complex Task| Galaxy
+ Galaxy -->|Windows Subtask| Win1
+ Galaxy -->|Linux Subtask| Linux1
+ Galaxy -->|Mobile Subtask| Mobile1
+ Galaxy -->|Mobile Subtask| Mobile2
+ Galaxy -->|Mobile Subtask| Mobile3
+
+ style Galaxy fill:#ffe1e1
+ style Mobile1 fill:#c8e6c9
+ style Mobile2 fill:#c8e6c9
+ style Mobile3 fill:#c8e6c9
+```
+
+Galaxy orchestrates:
+
+- **Task decomposition** - Break complex requests into platform-specific subtasks
+- **Device selection** - Choose appropriate devices based on capabilities
+- **Parallel execution** - Execute subtasks concurrently across devices
+- **Result aggregation** - Combine results from all devices
+
+---
+
+## Configuration Guide
+
+### Step 1: Configure Device in `devices.yaml`
+
+Add your Mobile agent(s) to the device list in `config/galaxy/devices.yaml`:
+
+#### Example Configuration
+
+```yaml
+devices:
+ - device_id: "mobile_phone_1"
+ server_url: "ws://192.168.1.100:5001/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "messaging"
+ - "camera"
+ - "location"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ screen_size: "1080x2400"
+ installed_apps:
+ - "com.google.android.apps.maps"
+ - "com.whatsapp"
+ - "com.android.chrome"
+ description: "Personal Android phone for mobile tasks"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Step 2: Understanding Configuration Fields
+
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `device_id` | ✅ Yes | string | **Unique identifier** - must match client `--client-id` |
+| `server_url` | ✅ Yes | string | WebSocket URL - must match server endpoint |
+| `os` | ✅ Yes | string | Operating system - set to `"mobile"` |
+| `capabilities` | ❌ Optional | list | Skills/capabilities for task routing |
+| `metadata` | ❌ Optional | dict | Custom context for LLM-based task execution |
+| `auto_connect` | ❌ Optional | boolean | Auto-connect on Galaxy startup (default: `true`) |
+| `max_retries` | ❌ Optional | integer | Connection retry attempts (default: `5`) |
+
+### Step 3: Capabilities-Based Task Routing
+
+Galaxy uses the `capabilities` field to intelligently route subtasks to appropriate devices. Define capabilities based on device features, installed apps, or task types.
+
+#### Example Capability Configurations
+
+**Personal Phone:**
+```yaml
+capabilities:
+ - "mobile"
+ - "android"
+ - "messaging"
+ - "whatsapp"
+ - "maps"
+ - "camera"
+ - "location"
+```
+
+**Work Phone:**
+```yaml
+capabilities:
+ - "mobile"
+ - "android"
+ - "email"
+ - "calendar"
+ - "office_apps"
+ - "vpn"
+```
+
+**Testing Emulator:**
+```yaml
+capabilities:
+ - "mobile"
+ - "android"
+ - "testing"
+ - "automation"
+ - "screenshots"
+```
+
+**Tablet:**
+```yaml
+capabilities:
+ - "mobile"
+ - "android"
+ - "tablet"
+ - "large_screen"
+ - "media"
+ - "reading"
+```
+
+### Step 4: Metadata for Contextual Execution
+
+The `metadata` field provides contextual information that the LLM uses when generating actions for the Mobile agent.
+
+#### Metadata Examples
+
+**Personal Phone Metadata:**
+```yaml
+metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ sdk_version: "33"
+ screen_size: "1080x2400"
+ screen_density: "420"
+ installed_apps:
+ - "com.google.android.apps.maps"
+ - "com.whatsapp"
+ - "com.android.chrome"
+ - "com.spotify.music"
+ contacts:
+ - "John Doe"
+ - "Jane Smith"
+ description: "Personal Android phone with social and navigation apps"
+```
+
+**Work Device Metadata:**
+```yaml
+metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "12"
+ screen_size: "1080x2340"
+ installed_apps:
+ - "com.microsoft.office.outlook"
+ - "com.microsoft.teams"
+ - "com.slack"
+ vpn_configured: true
+ email_accounts:
+ - "work@company.com"
+ description: "Work phone with corporate apps and VPN"
+```
+
+**Testing Emulator Metadata:**
+```yaml
+metadata:
+ os: "mobile"
+ device_type: "emulator"
+ android_version: "14"
+ sdk_version: "34"
+ screen_size: "1080x1920"
+ installed_apps:
+ - "com.example.testapp"
+ adb_over_network: true
+ description: "Android emulator for app testing"
+```
+
+#### How Metadata is Used
+
+The LLM receives metadata in the system prompt, enabling context-aware action generation:
+
+- **App Availability**: LLM knows which apps can be launched
+- **Screen Size**: Informs swipe distances and touch coordinates
+- **Android Version**: Affects available features and UI patterns
+- **Device Type**: Phone vs tablet affects UI layout
+- **Custom Fields**: Any additional context you provide
+
+**Example**: With the personal phone metadata above, when the user requests "Navigate to restaurant", the LLM knows Maps is installed and can generate `launch_app(package_name="com.google.android.apps.maps")`.
+
+---
+
+## Multi-Device Configuration Example
+
+### Complete Galaxy Setup
+
+```yaml
+devices:
+ # Windows Desktop Agent
+ - device_id: "windows_desktop_1"
+ server_url: "ws://192.168.1.101:5000/ws"
+ os: "windows"
+ capabilities:
+ - "office_applications"
+ - "email"
+ - "web_browsing"
+ metadata:
+ os: "windows"
+ description: "Office productivity workstation"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Server Agent
+ - device_id: "linux_server_1"
+ server_url: "ws://192.168.1.102:5001/ws"
+ os: "linux"
+ capabilities:
+ - "server"
+ - "database"
+ - "api"
+ metadata:
+ os: "linux"
+ description: "Backend server"
+ auto_connect: true
+ max_retries: 5
+
+ # Personal Android Phone
+ - device_id: "mobile_phone_personal"
+ server_url: "ws://192.168.1.103:5002/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "messaging"
+ - "whatsapp"
+ - "maps"
+ - "camera"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "13"
+ screen_size: "1080x2400"
+ installed_apps:
+ - "com.google.android.apps.maps"
+ - "com.whatsapp"
+ - "com.android.chrome"
+ description: "Personal phone with social apps"
+ auto_connect: true
+ max_retries: 5
+
+ # Work Android Phone
+ - device_id: "mobile_phone_work"
+ server_url: "ws://192.168.1.104:5003/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "email"
+ - "calendar"
+ - "teams"
+ metadata:
+ os: "mobile"
+ device_type: "phone"
+ android_version: "12"
+ screen_size: "1080x2340"
+ installed_apps:
+ - "com.microsoft.office.outlook"
+ - "com.microsoft.teams"
+ description: "Work phone with corporate apps"
+ auto_connect: true
+ max_retries: 5
+
+ # Android Tablet
+ - device_id: "mobile_tablet_home"
+ server_url: "ws://192.168.1.105:5004/ws"
+ os: "mobile"
+ capabilities:
+ - "mobile"
+ - "android"
+ - "tablet"
+ - "media"
+ - "reading"
+ metadata:
+ os: "mobile"
+ device_type: "tablet"
+ android_version: "13"
+ screen_size: "2560x1600"
+ installed_apps:
+ - "com.netflix.mediaclient"
+ - "com.google.android.youtube"
+ description: "Tablet for media and entertainment"
+ auto_connect: true
+ max_retries: 5
+```
+
+---
+
+## Starting Galaxy with Mobile Agents
+
+### Prerequisites
+
+Ensure all components are running before starting Galaxy:
+
+1. ✅ Device Agent Servers running on all machines
+2. ✅ Device Agent Clients connected to their respective servers
+3. ✅ MCP Services running (both data collection and action servers)
+4. ✅ ADB accessible and Android devices connected
+5. ✅ USB debugging enabled on all Android devices
+6. ✅ LLM configured in `config/ufo/agents.yaml` or `config/galaxy/agent.yaml`
+
+### Launch Sequence
+
+#### Step 1: Prepare Android Devices
+
+```bash
+# Check ADB connection to all devices
+adb devices
+
+# Expected output:
+# List of devices attached
+# 192.168.1.103:5555 device
+# 192.168.1.104:5555 device
+# emulator-5554 device
+```
+
+**For Physical Devices:**
+1. Enable USB debugging in Developer Options
+2. Connect via USB or wireless ADB
+3. Accept ADB debugging prompt on device
+
+**For Emulators:**
+1. Start Android emulator
+2. ADB connects automatically
+
+#### Step 2: Start Device Agent Servers
+
+```bash
+# On machine hosting personal phone agent (192.168.1.103)
+python -m ufo.server.app --port 5002 --platform mobile
+
+# On machine hosting work phone agent (192.168.1.104)
+python -m ufo.server.app --port 5003 --platform mobile
+
+# On machine hosting tablet agent (192.168.1.105)
+python -m ufo.server.app --port 5004 --platform mobile
+```
+
+#### Step 3: Start MCP Servers for Each Device
+
+```bash
+# On machine hosting personal phone
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --data-port 8020 \
+ --action-port 8021 \
+ --server both
+
+# On machine hosting work phone
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --data-port 8022 \
+ --action-port 8023 \
+ --server both
+
+# On machine hosting tablet
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --data-port 8024 \
+ --action-port 8025 \
+ --server both
+```
+
+#### Step 4: Start Mobile Clients
+
+```bash
+# Personal phone client
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.103:5002/ws \
+ --client-id mobile_phone_personal \
+ --platform mobile
+
+# Work phone client
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.104:5003/ws \
+ --client-id mobile_phone_work \
+ --platform mobile
+
+# Tablet client
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.105:5004/ws \
+ --client-id mobile_tablet_home \
+ --platform mobile
+```
+
+#### Step 5: Launch Galaxy
+
+```bash
+# On your control machine (interactive mode)
+python -m galaxy --interactive
+```
+
+**Or launch with a specific request:**
+
+```bash
+python -m galaxy "Your cross-device task description here"
+```
+
+Galaxy will automatically connect to all configured devices and display the orchestration interface.
+
+---
+
+## Example Multi-Device Workflows
+
+### Workflow 1: Cross-Platform Productivity
+
+**User Request:**
+> "Get my meeting notes from email on work phone, summarize them on desktop, and send summary to team via WhatsApp on personal phone"
+
+**Galaxy Orchestration:**
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Galaxy
+ participant WorkPhone as Work Phone (Android)
+ participant Desktop as Windows Desktop
+ participant PersonalPhone as Personal Phone (Android)
+
+ User->>Galaxy: Request meeting workflow
+ Galaxy->>Galaxy: Decompose task
+
+ Note over Galaxy,WorkPhone: Subtask 1: Get notes from email
+ Galaxy->>WorkPhone: "Open Outlook and find meeting notes"
+ WorkPhone->>WorkPhone: Launch Outlook app
+ WorkPhone->>WorkPhone: Navigate to inbox
+ WorkPhone->>WorkPhone: Find meeting email
+ WorkPhone->>WorkPhone: Extract notes text
+ WorkPhone-->>Galaxy: Notes content
+
+ Note over Galaxy,Desktop: Subtask 2: Summarize on desktop
+ Galaxy->>Desktop: "Summarize meeting notes"
+ Desktop->>Desktop: Open Word
+ Desktop->>Desktop: Paste notes
+ Desktop->>Desktop: Generate summary
+ Desktop-->>Galaxy: Summary document
+
+ Note over Galaxy,PersonalPhone: Subtask 3: Send via WhatsApp
+ Galaxy->>PersonalPhone: "Send summary to team on WhatsApp"
+ PersonalPhone->>PersonalPhone: Launch WhatsApp
+ PersonalPhone->>PersonalPhone: Select team group
+ PersonalPhone->>PersonalPhone: Type summary message
+ PersonalPhone->>PersonalPhone: Send message
+ PersonalPhone-->>Galaxy: Message sent
+
+ Galaxy-->>User: Workflow completed
+```
+
+### Workflow 2: Mobile Testing Across Devices
+
+**User Request:**
+> "Test the new app on phone, tablet, and emulator, capture screenshots of each screen"
+
+**Galaxy Orchestration:**
+
+1. **Mobile Phone**: Install app, navigate through screens, capture screenshots
+2. **Mobile Tablet**: Install app (tablet layout), navigate screens, capture screenshots
+3. **Mobile Emulator**: Install app, run automated test suite, capture screenshots
+4. **Windows Desktop**: Aggregate screenshots, generate test report
+
+### Workflow 3: Location-Based Multi-Device Task
+
+**User Request:**
+> "Find nearest coffee shops on phone, book table using tablet, add calendar event on work phone"
+
+**Galaxy Orchestration:**
+
+1. **Personal Phone**: Launch Maps, search "coffee shops near me", get results
+2. **Tablet**: Open booking app, select coffee shop, book table
+3. **Work Phone**: Open Calendar, create event with location and time
+4. **Galaxy**: Aggregate confirmations and notify user
+
+---
+
+## Task Assignment Behavior
+
+### How Galaxy Routes Tasks to Mobile Agents
+
+Galaxy's ConstellationAgent uses several factors to select the appropriate mobile device for each subtask:
+
+| Factor | Description | Example |
+|--------|-------------|---------|
+| **Capabilities** | Match subtask requirements to device capabilities | `"messaging"` → Personal phone |
+| **OS Requirement** | Platform-specific tasks routed to correct OS | Mobile tasks → Mobile agents |
+| **Metadata Context** | Use device-specific apps and configurations | WhatsApp task → Device with WhatsApp |
+| **Device Type** | Phone vs tablet for different UI requirements | Media viewing → Tablet |
+| **Device Status** | Only assign to online, healthy devices | Skip offline or failing devices |
+| **Load Balancing** | Distribute tasks across similar devices | Round-robin across phones |
+
+### Example Task Decomposition
+
+**User Request:**
+> "Check messages on personal phone, review calendar on work phone, and play video on tablet"
+
+**Galaxy Decomposition:**
+
+```yaml
+Task 1:
+ Description: "Check messages on WhatsApp"
+ Target: mobile_phone_personal
+ Reason: Has "whatsapp" capability and personal messaging apps
+
+Task 2:
+ Description: "Review today's calendar events"
+ Target: mobile_phone_work
+ Reason: Has "calendar" capability and work email/calendar
+
+Task 3:
+ Description: "Play video on YouTube"
+ Target: mobile_tablet_home
+ Reason: Has "media" capability and larger screen suitable for video
+```
+
+---
+
+## Critical Configuration Requirements
+
+!!!danger "Configuration Validation"
+ Ensure these match exactly or Galaxy cannot control the device:
+
+ - **Device ID**: `device_id` in `devices.yaml` must match `--client-id` in client command
+ - **Server URL**: `server_url` in `devices.yaml` must match `--ws-server` in client command
+ - **Platform**: Must include `--platform mobile` in client command
+ - **ADB Access**: Android device must be accessible via ADB
+ - **MCP Servers**: Both data collection and action servers must be running
+
+---
+
+## Monitoring & Debugging
+
+### Verify Device Registration
+
+**Check Galaxy device pool:**
+
+```bash
+curl http://:5000/api/devices
+```
+
+**Expected response:**
+
+```json
+{
+ "devices": [
+ {
+ "device_id": "mobile_phone_personal",
+ "os": "mobile",
+ "status": "online",
+ "capabilities": ["mobile", "android", "messaging", "whatsapp", "maps"]
+ },
+ {
+ "device_id": "mobile_phone_work",
+ "os": "mobile",
+ "status": "online",
+ "capabilities": ["mobile", "android", "email", "calendar", "teams"]
+ }
+ ]
+}
+```
+
+### View Task Assignments
+
+Galaxy logs show task routing decisions:
+
+```log
+INFO - [Galaxy] Task decomposition: 3 subtasks created
+INFO - [Galaxy] Subtask 1 → mobile_phone_personal (capability match: messaging)
+INFO - [Galaxy] Subtask 2 → mobile_phone_work (capability match: calendar)
+INFO - [Galaxy] Subtask 3 → mobile_tablet_home (capability match: media)
+```
+
+### Troubleshooting Device Connection
+
+**Issue**: Mobile agent not appearing in Galaxy device pool
+
+**Diagnosis:**
+
+1. **Check ADB connection:**
+ ```bash
+ adb devices
+ ```
+
+2. **Verify client connection:**
+ ```bash
+ curl http://192.168.1.103:5002/api/clients
+ ```
+
+3. **Check `devices.yaml` configuration** matches client parameters
+
+4. **Review Galaxy logs** for connection errors
+
+5. **Ensure `auto_connect: true`** in `devices.yaml`
+
+6. **Check MCP servers** are running:
+ ```bash
+ curl http://localhost:8020/health # Data collection server
+ curl http://localhost:8021/health # Action server
+ ```
+
+---
+
+## Mobile-Specific Considerations
+
+### Screenshot Capture for Galaxy
+
+Mobile agents automatically capture screenshots during execution, which Galaxy can:
+
+- Display in orchestration UI
+- Include in execution reports
+- Use for debugging failed tasks
+- Share with other agents for context
+
+### Touch Coordinates Across Devices
+
+Different Android devices have different screen sizes and densities. Galaxy handles this by:
+
+- Using control IDs instead of absolute coordinates
+- Having each mobile agent handle device-specific coordinate calculations
+- Storing device resolution in metadata for reference
+
+### App Availability
+
+Galaxy can query `installed_apps` from metadata to:
+
+- Route tasks to devices with required apps
+- Skip devices missing necessary apps
+- Suggest app installation when needed
+
+---
+
+## Related Documentation
+
+- [Mobile Agent Overview](overview.md) - Architecture and design principles
+- [Mobile Agent Commands](commands.md) - MCP tools for device interaction
+- [Galaxy Overview](../galaxy/overview.md) - Multi-device orchestration framework
+- [Galaxy Quick Start](../getting_started/quick_start_galaxy.md) - Galaxy deployment guide
+- [Constellation Orchestrator](../galaxy/constellation_orchestrator/overview.md) - Task orchestration
+- [Galaxy Devices Configuration](../configuration/system/galaxy_devices.md) - Complete device configuration reference
+
+---
+
+## Summary
+
+Using Mobile Agent as a Galaxy device enables sophisticated multi-device orchestration:
+
+- **Cross-Platform Workflows**: Seamlessly combine Android, Windows, and Linux tasks
+- **Capability-Based Routing**: Galaxy selects the right device for each subtask
+- **Visual Context**: Screenshots provide rich execution tracing
+- **Parallel Execution**: Multiple mobile devices work concurrently
+- **Metadata-Aware**: LLM uses device-specific context (installed apps, screen size, etc.)
+- **Robust Caching**: Efficient ADB usage through smart caching strategies
+
+With Mobile Agent in Galaxy, you can orchestrate complex workflows spanning mobile apps, desktop applications, and server systems from a single unified interface.
diff --git a/documents/docs/mobile/commands.md b/documents/docs/mobile/commands.md
new file mode 100644
index 000000000..9c6eae475
--- /dev/null
+++ b/documents/docs/mobile/commands.md
@@ -0,0 +1,1006 @@
+# MobileAgent MCP Commands
+
+MobileAgent interacts with Android devices through MCP (Model Context Protocol) tools provided by two specialized MCP servers. These tools provide atomic building blocks for mobile task execution, isolating device-specific operations within the MCP server layer.
+
+> **📖 Related Documentation:**
+>
+> - [Mobile Agent Overview](overview.md) - Architecture and core responsibilities
+> - [State Machine](state.md) - FSM states and transitions
+> - [Processing Strategy](strategy.md) - How commands are orchestrated in the 4-phase pipeline
+> - [Quick Start Guide](../getting_started/quick_start_mobile.md) - Set up MCP servers for your device
+
+## Command Architecture
+
+### Dual MCP Server Design
+
+MobileAgent uses two separate MCP servers for different responsibilities:
+
+```mermaid
+graph LR
+ A[MobileAgent] --> B[Command Dispatcher]
+ B --> C[Data Collection Server Port 8020]
+ B --> D[Action Server Port 8021]
+
+ C --> E[ADB Commands screencap, uiautomator, pm list]
+ D --> F[ADB Commands input tap/swipe/text, monkey]
+
+ E --> G[Android Device]
+ F --> G
+
+ C -.Shared State.-> H[MobileServerState Singleton]
+ D -.Shared State.-> H
+```
+
+**Why Two Servers?**
+
+- **Separation of Concerns**: Data retrieval vs. device control
+- **Performance**: Data collection can cache aggressively, actions invalidate caches
+- **Security**: Different tools can have different permission levels
+- **Scalability**: Servers can run on different hosts if needed
+
+**Shared State**: Both servers share a singleton `MobileServerState` for:
+- Caching control information (5 seconds TTL)
+- Caching installed apps (5 minutes TTL)
+- Caching UI tree (5 seconds TTL)
+- Coordinating cache invalidation after actions
+
+### Command Dispatcher
+
+The command dispatcher routes commands to the appropriate MCP server:
+
+```python
+from aip.messages import Command
+
+# Create data collection command
+command = Command(
+ tool_name="capture_screenshot",
+ parameters={},
+ tool_type="data_collection"
+)
+
+# Execute command via dispatcher
+results = await command_dispatcher.execute_commands([command])
+screenshot_url = results[0].result
+```
+
+---
+
+## Data Collection Server Tools (Port 8020)
+
+The Data Collection Server provides read-only tools for gathering device information.
+
+### 1. capture_screenshot - Capture Device Screenshot
+
+**Purpose**: Take screenshot from Android device and return as base64-encoded image.
+
+#### Tool Specification
+
+```python
+tool_name = "capture_screenshot"
+parameters = {} # No parameters required
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant ADB
+ participant Device
+
+ Agent->>MCP: capture_screenshot()
+ MCP->>ADB: screencap -p /sdcard/screen_temp.png
+ ADB->>Device: Execute screenshot
+ Device-->>ADB: Screenshot saved
+
+ ADB->>Device: pull /sdcard/screen_temp.png
+ Device-->>ADB: PNG file
+
+ MCP->>MCP: Encode to base64
+ MCP->>ADB: rm /sdcard/screen_temp.png
+ MCP-->>Agent: data:image/png;base64,...
+```
+
+#### Result Format
+
+```python
+# Direct base64 data URI string (not a dict)
+"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA..."
+```
+
+#### Use Cases
+
+| Use Case | Description |
+|----------|-------------|
+| **UI Analysis** | Understand current screen state |
+| **Visual Context** | Provide screenshots to LLM for decision making |
+| **Debugging** | Capture UI state at each step |
+| **Annotation Base** | Base image for control labeling |
+
+#### Error Handling
+
+```python
+# Failures return as exceptions
+try:
+ screenshot_url = await capture_screenshot()
+except Exception as e:
+ # "Failed to capture screenshot on device"
+ # "Failed to pull screenshot from device"
+ pass
+```
+
+---
+
+### 2. get_ui_tree - Get UI Hierarchy XML
+
+**Purpose**: Retrieve the complete UI hierarchy in XML format for detailed UI structure analysis.
+
+#### Tool Specification
+
+```python
+tool_name = "get_ui_tree"
+parameters = {} # No parameters required
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant ADB
+ participant Device
+
+ Agent->>MCP: get_ui_tree()
+ MCP->>ADB: uiautomator dump /sdcard/window_dump.xml
+ ADB->>Device: Dump UI hierarchy
+ Device-->>ADB: XML created
+
+ ADB->>Device: cat /sdcard/window_dump.xml
+ Device-->>ADB: XML content
+ ADB-->>MCP: XML string
+
+ MCP->>MCP: Cache UI tree (5s TTL)
+ MCP-->>Agent: UI tree dictionary
+```
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "ui_tree": """
+
+
+
+ ...
+
+ """,
+ "format": "xml"
+}
+```
+
+#### Use Cases
+
+- Advanced UI analysis requiring full hierarchy
+- Custom control parsing logic
+- Debugging UI structure
+- Extracting accessibility information
+
+---
+
+### 3. get_device_info - Get Device Information
+
+**Purpose**: Gather comprehensive device information including model, Android version, screen size, and battery status.
+
+#### Tool Specification
+
+```python
+tool_name = "get_device_info"
+parameters = {} # No parameters required
+```
+
+#### Information Collected
+
+| Info Type | ADB Command | Data Returned |
+|-----------|-------------|---------------|
+| **Model** | `getprop ro.product.model` | Device model name |
+| **Android Version** | `getprop ro.build.version.release` | Android version (e.g., "13") |
+| **SDK Version** | `getprop ro.build.version.sdk` | API level (e.g., "33") |
+| **Screen Size** | `wm size` | Resolution (e.g., "Physical size: 1080x2400") |
+| **Screen Density** | `wm density` | DPI (e.g., "Physical density: 420") |
+| **Battery Level** | `dumpsys battery` | Battery percentage |
+| **Battery Status** | `dumpsys battery` | Charging status |
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "device_info": {
+ "model": "Pixel 6",
+ "android_version": "13",
+ "sdk_version": "33",
+ "screen_size": "Physical size: 1080x2400",
+ "screen_density": "Physical density: 420",
+ "battery_level": "85",
+ "battery_status": "2" # 2 = Charging, 3 = Discharging
+ },
+ "from_cache": False # True if returned from cache
+}
+```
+
+**Caching**: Device info is cached for 60 seconds as it changes infrequently.
+
+---
+
+### 4. get_mobile_app_target_info - List Installed Apps
+
+**Purpose**: Retrieve list of installed applications as TargetInfo objects.
+
+#### Tool Specification
+
+```python
+tool_name = "get_mobile_app_target_info"
+parameters = {
+ "filter": "", # Filter pattern (optional)
+ "include_system_apps": False, # Include system apps (default: False)
+ "force_refresh": False # Bypass cache (default: False)
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant Cache
+ participant ADB
+ participant Device
+
+ Agent->>MCP: get_mobile_app_target_info(include_system_apps=False)
+
+ alt Cache Hit (not forced refresh)
+ MCP->>Cache: Check cache (5min TTL)
+ Cache-->>MCP: Cached app list
+ MCP-->>Agent: Apps from cache
+ else Cache Miss
+ MCP->>ADB: pm list packages -3
+ ADB->>Device: List user-installed packages
+ Device-->>ADB: Package list
+ ADB-->>MCP: Packages
+
+ MCP->>MCP: Parse to TargetInfo objects
+ MCP->>Cache: Update cache
+ MCP-->>Agent: App list
+ end
+```
+
+#### Result Format
+
+```python
+[
+ {
+ "id": "1",
+ "name": "com.android.chrome",
+ "package": "com.android.chrome"
+ },
+ {
+ "id": "2",
+ "name": "com.google.android.apps.maps",
+ "package": "com.google.android.apps.maps"
+ },
+ {
+ "id": "3",
+ "name": "com.whatsapp",
+ "package": "com.whatsapp"
+ }
+]
+```
+
+**Notes**:
+- `id`: Sequential number for LLM reference
+- `name`: Package name (display name not available via simple ADB)
+- `package`: Full package identifier
+
+**Caching**: Apps list is cached for 5 minutes to reduce overhead.
+
+---
+
+### 5. get_app_window_controls_target_info - Get UI Controls
+
+**Purpose**: Extract UI controls from current screen with IDs for precise interaction.
+
+#### Tool Specification
+
+```python
+tool_name = "get_app_window_controls_target_info"
+parameters = {
+ "force_refresh": False # Bypass cache (default: False)
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant Cache
+ participant ADB
+ participant Device
+
+ Agent->>MCP: get_app_window_controls_target_info()
+
+ alt Cache Hit (not forced refresh)
+ MCP->>Cache: Check cache (5s TTL)
+ Cache-->>MCP: Cached controls
+ MCP-->>Agent: Controls from cache
+ else Cache Miss
+ MCP->>ADB: uiautomator dump /sdcard/window_dump.xml
+ ADB->>Device: Dump UI
+ Device-->>ADB: XML file
+
+ ADB->>Device: cat /sdcard/window_dump.xml
+ Device-->>ADB: XML content
+ ADB-->>MCP: UI hierarchy
+
+ MCP->>MCP: Parse XML
+ MCP->>MCP: Filter meaningful controls
+ MCP->>MCP: Validate rectangles
+ MCP->>MCP: Assign sequential IDs
+ MCP->>Cache: Update cache
+ MCP-->>Agent: Controls list
+ end
+```
+
+#### Control Selection Criteria
+
+Controls are included if they meet any of these criteria:
+
+- `clickable="true"` - Can be tapped
+- `long-clickable="true"` - Supports long-press
+- `scrollable="true"` - Can be scrolled
+- `checkable="true"` - Checkbox or toggle
+- Has `text` or `content-desc` - Has label
+- Type includes "Edit", "Button" - Input or action element
+
+#### Rectangle Validation
+
+Controls with invalid rectangles are filtered out:
+
+```python
+# Bounds format: [left, top, right, bottom]
+# Valid rectangle must have:
+# - right > left (positive width)
+# - bottom > top (positive height)
+# - All coordinates > 0
+if right <= left or bottom <= top or right == 0 or bottom == 0:
+ skip_control() # Invalid rectangle
+```
+
+#### Result Format
+
+```python
+[
+ {
+ "id": "1",
+ "name": "Search",
+ "type": "EditText",
+ "rect": [48, 96, 912, 192] # [left, top, right, bottom] in pixels
+ },
+ {
+ "id": "2",
+ "name": "Search",
+ "type": "ImageButton",
+ "rect": [912, 96, 1032, 192]
+ },
+ {
+ "id": "3",
+ "name": "Maps",
+ "type": "TextView",
+ "rect": [0, 216, 1080, 360]
+ }
+]
+```
+
+**Caching**: Controls are cached for 5 seconds but **automatically invalidated** after any action (UI likely changed).
+
+---
+
+## Action Server Tools (Port 8021)
+
+The Action Server provides tools for device control and manipulation.
+
+### 6. tap - Tap at Coordinates
+
+**Purpose**: Perform tap/click action at specified screen coordinates.
+
+#### Tool Specification
+
+```python
+tool_name = "tap"
+parameters = {
+ "x": 480, # X coordinate (pixels from left)
+ "y": 240 # Y coordinate (pixels from top)
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant ADB
+ participant Device
+
+ Agent->>MCP: tap(x=480, y=240)
+ MCP->>ADB: input tap 480 240
+ ADB->>Device: Inject tap event
+ Device-->>ADB: Success
+ ADB-->>MCP: Success
+
+ MCP->>MCP: Invalidate controls cache
+ MCP-->>Agent: Result
+```
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "tap(480, 240)",
+ "output": "",
+ "error": ""
+}
+```
+
+**Cache Invalidation**: Automatically invalidates control cache after tap (UI likely changed).
+
+---
+
+### 7. swipe - Swipe Gesture
+
+**Purpose**: Perform swipe gesture from start to end coordinates.
+
+#### Tool Specification
+
+```python
+tool_name = "swipe"
+parameters = {
+ "start_x": 500,
+ "start_y": 1500,
+ "end_x": 500,
+ "end_y": 500,
+ "duration": 300 # milliseconds (default: 300)
+}
+```
+
+#### Common Use Cases
+
+| Use Case | Start | End | Description |
+|----------|-------|-----|-------------|
+| **Scroll Up** | (500, 1500) | (500, 500) | Swipe from bottom to top |
+| **Scroll Down** | (500, 500) | (500, 1500) | Swipe from top to bottom |
+| **Scroll Left** | (900, 600) | (100, 600) | Swipe from right to left |
+| **Scroll Right** | (100, 600) | (900, 600) | Swipe from left to right |
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "swipe(500,1500)->(500,500) in 300ms",
+ "output": "",
+ "error": ""
+}
+```
+
+**Cache Invalidation**: Automatically invalidates control cache after swipe.
+
+---
+
+### 8. type_text - Type Text into Control
+
+**Purpose**: Type text into a specific input field control.
+
+#### Tool Specification
+
+```python
+tool_name = "type_text"
+parameters = {
+ "text": "hello world",
+ "control_id": "5", # REQUIRED: Control ID from get_app_window_controls_target_info
+ "control_name": "Search", # REQUIRED: Control name (must match)
+ "clear_current_text": False # Clear existing text first (default: False)
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant Cache
+ participant ADB
+ participant Device
+
+ Agent->>MCP: type_text(text="hello", control_id="5", control_name="Search")
+
+ MCP->>Cache: Get control by ID
+ Cache-->>MCP: Control with rect
+
+ MCP->>MCP: Calculate center position
+ MCP->>ADB: input tap x y (focus control)
+ ADB->>Device: Tap input field
+
+ alt clear_current_text=True
+ MCP->>ADB: input keyevent KEYCODE_DEL (x50)
+ ADB->>Device: Delete existing text
+ end
+
+ MCP->>MCP: Escape text (spaces -> %s)
+ MCP->>ADB: input text hello%sworld
+ ADB->>Device: Type text
+ Device-->>ADB: Success
+
+ MCP->>MCP: Invalidate controls cache
+ MCP-->>Agent: Result
+```
+
+#### Important Notes
+
+!!!warning "Control ID Requirement"
+ The `control_id` parameter is **REQUIRED**. You must:
+
+ 1. Call `get_app_window_controls_target_info` first
+ 2. Identify the input field control
+ 3. Use its `id` and `name` in `type_text`
+
+ The tool will:
+ - Verify the control exists in cache
+ - Click the control to focus it
+ - Then type the text
+
+**Text Escaping**: Spaces are automatically converted to `%s` for Android input shell compatibility.
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "type_text(text='hello world', control_id='5', control_name='Search')",
+ "message": "Clicked control 'Search' at (480, 144) | Typed text: 'hello world'",
+ "control_info": {
+ "id": "5",
+ "name": "Search",
+ "type": "EditText"
+ }
+}
+```
+
+---
+
+### 9. launch_app - Launch Application
+
+**Purpose**: Launch an application by package name or app ID.
+
+#### Tool Specification
+
+```python
+tool_name = "launch_app"
+parameters = {
+ "package_name": "com.google.android.apps.maps", # Package name
+ "id": "2" # Optional: App ID from get_mobile_app_target_info
+}
+```
+
+#### Usage Modes
+
+**Mode 1: Launch by package name**
+
+```python
+launch_app(package_name="com.android.settings")
+```
+
+**Mode 2: Launch from cached app list**
+
+```python
+# First call get_mobile_app_target_info to cache apps
+# Then use app ID from the list
+launch_app(package_name="com.android.settings", id="5")
+```
+
+**Mode 3: Launch by app name (fuzzy search)**
+
+```python
+# If package_name doesn't contain ".", search by name
+launch_app(package_name="Maps") # Finds "com.google.android.apps.maps"
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant ADB
+ participant Device
+
+ Agent->>MCP: launch_app(package_name="com.google.android.apps.maps")
+
+ alt ID provided
+ MCP->>MCP: Verify ID in cache
+ MCP->>MCP: Get package from cache
+ else Name only (no dots)
+ MCP->>ADB: pm list packages
+ MCP->>MCP: Search for matching package
+ end
+
+ MCP->>ADB: monkey -p com.google.android.apps.maps -c android.intent.category.LAUNCHER 1
+ ADB->>Device: Launch app
+ Device-->>ADB: App started
+ ADB-->>MCP: Success
+ MCP-->>Agent: Result
+```
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "message": "Launched com.google.android.apps.maps",
+ "package_name": "com.google.android.apps.maps",
+ "output": "Events injected: 1",
+ "error": "",
+ "app_info": { # If ID was provided
+ "id": "2",
+ "name": "com.google.android.apps.maps",
+ "package": "com.google.android.apps.maps"
+ }
+}
+```
+
+---
+
+### 10. press_key - Press Hardware/Software Key
+
+**Purpose**: Press a hardware or software key for navigation and system actions.
+
+#### Tool Specification
+
+```python
+tool_name = "press_key"
+parameters = {
+ "key_code": "KEYCODE_BACK" # Key code name
+}
+```
+
+#### Common Key Codes
+
+| Key Code | Description | Use Case |
+|----------|-------------|----------|
+| `KEYCODE_HOME` | Home button | Return to home screen |
+| `KEYCODE_BACK` | Back button | Navigate back |
+| `KEYCODE_MENU` | Menu button | Open options menu |
+| `KEYCODE_ENTER` | Enter key | Submit form |
+| `KEYCODE_DEL` | Delete key | Delete character |
+| `KEYCODE_APP_SWITCH` | Recent apps | Switch between apps |
+| `KEYCODE_POWER` | Power button | Lock screen |
+| `KEYCODE_VOLUME_UP` | Volume up | Increase volume |
+| `KEYCODE_VOLUME_DOWN` | Volume down | Decrease volume |
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "press_key(KEYCODE_BACK)",
+ "output": "",
+ "error": ""
+}
+```
+
+---
+
+### 11. click_control - Click Control by ID
+
+**Purpose**: Click a UI control by its ID from the cached control list.
+
+#### Tool Specification
+
+```python
+tool_name = "click_control"
+parameters = {
+ "control_id": "5", # REQUIRED: Control ID from get_app_window_controls_target_info
+ "control_name": "Search Button" # REQUIRED: Control name (must match)
+}
+```
+
+#### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Agent
+ participant MCP
+ participant Cache
+ participant ADB
+ participant Device
+
+ Agent->>MCP: click_control(control_id="5", control_name="Search")
+
+ MCP->>Cache: Get control by ID "5"
+ Cache-->>MCP: Control with rect [48,96,912,192]
+
+ MCP->>MCP: Verify name matches
+ MCP->>MCP: Calculate center: x=(48+912)/2, y=(96+192)/2
+
+ MCP->>ADB: input tap 480 144
+ ADB->>Device: Tap at (480, 144)
+ Device-->>ADB: Success
+
+ MCP->>MCP: Invalidate controls cache
+ MCP-->>Agent: Result
+```
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "click_control(id=5, name=Search)",
+ "message": "Clicked control 'Search' at (480, 144)",
+ "control_info": {
+ "id": "5",
+ "name": "Search",
+ "type": "EditText",
+ "rect": [48, 96, 912, 192]
+ }
+}
+```
+
+**Name Verification**: If the provided `control_name` doesn't match the cached control's name, a warning is returned but the action still executes using the ID.
+
+---
+
+### 12. wait - Wait/Sleep
+
+**Purpose**: Wait for a specified duration.
+
+#### Tool Specification
+
+```python
+tool_name = "wait"
+parameters = {
+ "seconds": 1.0 # Duration in seconds (can be decimal)
+}
+```
+
+#### Use Cases
+
+- Wait for app to load
+- Wait for animation to complete
+- Wait for UI transition
+- Pace actions for stability
+
+#### Examples
+
+```python
+wait(seconds=1.0) # Wait 1 second
+wait(seconds=0.5) # Wait 500ms
+wait(seconds=2.5) # Wait 2.5 seconds
+```
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "action": "wait(1.0s)",
+ "message": "Waited for 1.0 seconds"
+}
+```
+
+**Limits**:
+- Minimum: 0 seconds
+- Maximum: 60 seconds
+
+---
+
+### 13. invalidate_cache - Manual Cache Invalidation
+
+**Purpose**: Manually invalidate cached data to force refresh on next query.
+
+#### Tool Specification
+
+```python
+tool_name = "invalidate_cache"
+parameters = {
+ "cache_type": "all" # "controls", "apps", "ui_tree", "device_info", or "all"
+}
+```
+
+#### Cache Types
+
+| Cache Type | Description | Auto-Invalidated |
+|------------|-------------|------------------|
+| `controls` | UI controls list | ✓ After actions |
+| `apps` | Installed apps list | ✗ Never |
+| `ui_tree` | UI hierarchy XML | ✗ Never |
+| `device_info` | Device information | ✗ Never |
+| `all` | All caches | Varies |
+
+#### Result Format
+
+```python
+{
+ "success": True,
+ "message": "Controls cache invalidated"
+}
+```
+
+**Use Cases**:
+- Manually refresh apps list after installing/uninstalling
+- Force UI tree refresh after significant screen change
+- Debug caching issues
+
+---
+
+## Command Execution Pipeline
+
+### Atomic Building Blocks
+
+The MCP tools serve as atomic operations for mobile task execution:
+
+```mermaid
+graph TD
+ A[User Request] --> B[Data Collection Phase]
+ B --> B1[capture_screenshot]
+ B --> B2[get_mobile_app_target_info]
+ B --> B3[get_app_window_controls_target_info]
+
+ B1 --> C[LLM Reasoning]
+ B2 --> C
+ B3 --> C
+
+ C --> D{Select Action}
+ D -->|Launch| E[launch_app]
+ D -->|Type| F[type_text]
+ D -->|Click| G[click_control]
+ D -->|Swipe| H[swipe]
+ D -->|Navigate| I[press_key]
+ D -->|Wait| J[wait]
+
+ E --> K[Capture Result]
+ F --> K
+ G --> K
+ H --> K
+ I --> K
+ J --> K
+
+ K --> L[Update Memory]
+ L --> M{Task Complete?}
+ M -->|No| B
+ M -->|Yes| N[FINISH]
+```
+
+### Command Composition
+
+MobileAgent executes commands sequentially, building on previous results:
+
+```python
+# Round 1: Capture UI and launch app
+{
+ "action": {
+ "function": "launch_app",
+ "arguments": {"package_name": "com.google.android.apps.maps", "id": "2"}
+ }
+}
+# Result: Maps launched
+
+# Round 2: Capture new UI, identify search field
+{
+ "action": {
+ "function": "click_control",
+ "arguments": {"control_id": "5", "control_name": "Search"}
+ }
+}
+# Result: Search field focused
+
+# Round 3: Type query
+{
+ "action": {
+ "function": "type_text",
+ "arguments": {
+ "text": "restaurants",
+ "control_id": "5",
+ "control_name": "Search"
+ }
+ }
+}
+# Result: Text entered
+```
+
+---
+
+## Best Practices
+
+### Data Collection Tools
+
+- Use `get_app_window_controls_target_info` before every action to get fresh control IDs
+- Cache is your friend: don't force refresh unless necessary
+- Annotated screenshots help LLM identify controls precisely
+
+### Action Tools
+
+!!!success "Action Best Practices"
+ - **Always** call `get_app_window_controls_target_info` before `click_control` or `type_text`
+ - Use control IDs instead of coordinates for robustness
+ - Add `wait` after actions that trigger UI changes (app launch, navigation)
+ - Check `success` field in results before considering action successful
+ - Use `press_key(KEYCODE_BACK)` for navigation instead of screen taps when possible
+
+### Caching
+
+- Controls cache: 5 seconds TTL, invalidated after actions
+- Apps cache: 5 minutes TTL, manually invalidate if apps change
+- Device info cache: 60 seconds TTL, useful for metadata
+
+### Error Handling
+
+```python
+# Always check success field
+result = await click_control(control_id="5", control_name="Search")
+if not result["success"]:
+ # Handle error: control not found, device disconnected, etc.
+ pass
+```
+
+---
+
+## Implementation Location
+
+The MCP server implementation can be found in:
+
+```
+ufo/client/mcp/http_servers/
+└── mobile_mcp_server.py
+```
+
+Key components:
+
+- `MobileServerState`: Singleton state manager for caching
+- `create_mobile_data_collection_server()`: Data collection server (port 8020)
+- `create_mobile_action_server()`: Action server (port 8021)
+
+---
+
+## Comparison with Other Agent Commands
+
+| Agent | Command Types | Execution Layer | Visual Context | Result Format |
+|-------|--------------|-----------------|----------------|---------------|
+| **MobileAgent** | UI + Apps + Touch | MCP (ADB) | ✓ Screenshots + Controls | success/message/control_info |
+| **LinuxAgent** | CLI + SysInfo | MCP (SSH) | ✗ Text-only | success/exit_code/stdout/stderr |
+| **AppAgent** | UI + API | Automator + MCP | ✓ Screenshots + Controls | UI state + API responses |
+
+MobileAgent's command set reflects the mobile environment:
+
+- **Touch-based**: tap, swipe instead of click, drag
+- **Visual**: Screenshots are essential for UI understanding
+- **App-centric**: Focus on app launching and switching
+- **Control-based**: Precise control IDs instead of coordinates
+- **Cached**: Aggressive caching to reduce ADB overhead
+
+---
+
+## Next Steps
+
+- [State Machine](state.md) - Understand how command execution fits into the FSM
+- [Processing Strategy](strategy.md) - See how commands are integrated into the 4-phase pipeline
+- [Overview](overview.md) - Return to MobileAgent architecture overview
+- [As Galaxy Device](as_galaxy_device.md) - Configure MobileAgent for multi-device workflows
diff --git a/documents/docs/mobile/overview.md b/documents/docs/mobile/overview.md
new file mode 100644
index 000000000..91aa1de34
--- /dev/null
+++ b/documents/docs/mobile/overview.md
@@ -0,0 +1,256 @@
+# MobileAgent: Android Task Executor
+
+**MobileAgent** is a specialized agent designed for executing tasks on Android mobile devices. It leverages the layered FSM architecture and server-client design to perform intelligent, iterative task execution in mobile environments through UI interaction and app control.
+
+**Quick Links:**
+
+- **New to Mobile Agent?** Start with the [Quick Start Guide](../getting_started/quick_start_mobile.md) - Set up your first Android device agent in 10 minutes
+- **Using as Sub-Agent in Galaxy?** See [Using Mobile Agent as Galaxy Device](as_galaxy_device.md)
+- **Deep Dive:** [State Machine](state.md) • [Processing Strategy](strategy.md) • [MCP Commands](commands.md)
+
+## Architecture Overview
+
+MobileAgent operates as a single-agent instance that interacts with Android devices through UI controls and app management. Unlike the two-tier architecture of UFO (HostAgent + AppAgent), MobileAgent uses a simplified single-agent model optimized for mobile device automation, similar to LinuxAgent but with visual interface support.
+
+## Core Responsibilities
+
+MobileAgent provides the following capabilities for Android device automation:
+
+### UI Interaction
+
+MobileAgent interprets user requests and translates them into appropriate UI interactions on Android devices through screenshots analysis and control manipulation.
+
+**Example:** User request "Search for restaurants on Maps" becomes:
+
+1. Take screenshot and identify app icons
+2. Launch Google Maps app
+3. Identify search field control
+4. Type "restaurants" into search field
+5. Tap search button
+
+### Visual Context Understanding
+
+The agent captures and analyzes device screenshots to understand the current UI state:
+
+- Screenshot capture (clean and annotated)
+- Control identification and labeling
+- UI hierarchy parsing
+- App detection and recognition
+
+### App Management
+
+MobileAgent can manage installed applications:
+
+- List installed apps (user apps and system apps)
+- Launch apps by package name or app name
+- Switch between apps
+- Monitor current app state
+
+### Iterative Task Execution
+
+MobileAgent executes tasks iteratively, evaluating execution outcomes at each step and determining the next action based on results and LLM reasoning.
+
+### Error Handling and Recovery
+
+The agent monitors action execution results and can adapt its strategy when errors occur, such as controls not found or apps not responding.
+
+## Key Characteristics
+
+- **Scope**: Single Android device (UI-based automation)
+- **Lifecycle**: One instance per task session
+- **Hierarchy**: Standalone agent (no child agents)
+- **Communication**: MCP server integration via ADB
+- **Control**: 3-state finite state machine with 4-phase processing pipeline
+- **Visual**: Screenshot-based UI understanding with control annotation
+
+## Execution Workflow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant MobileAgent
+ participant LLM
+ participant MCPServer
+ participant Android
+
+ User->>MobileAgent: "Search for restaurants on Maps"
+ MobileAgent->>MobileAgent: State: CONTINUE
+
+ MobileAgent->>MCPServer: Capture screenshot
+ MCPServer->>Android: Take screenshot via ADB
+ Android-->>MCPServer: Screenshot PNG
+ MCPServer-->>MobileAgent: Base64 screenshot
+
+ MobileAgent->>MCPServer: Get installed apps
+ MCPServer->>Android: List packages via ADB
+ Android-->>MCPServer: App list
+ MCPServer-->>MobileAgent: Installed apps
+
+ MobileAgent->>MCPServer: Get current controls
+ MCPServer->>Android: UI dump via ADB
+ Android-->>MCPServer: UI hierarchy XML
+ MCPServer-->>MobileAgent: Control list with IDs
+
+ MobileAgent->>LLM: Send prompt with screenshot + apps + controls
+ LLM-->>MobileAgent: Action: launch_app(package="com.google.android.apps.maps")
+
+ MobileAgent->>MCPServer: launch_app
+ MCPServer->>Android: Start app via ADB
+ Android-->>MCPServer: App launched
+ MCPServer-->>MobileAgent: Success
+
+ MobileAgent->>MobileAgent: Update memory
+ MobileAgent->>MobileAgent: State: CONTINUE
+
+ Note over MobileAgent: Next round with new screenshot
+
+ MobileAgent->>MCPServer: Capture new screenshot + controls
+ MobileAgent->>LLM: Prompt with new UI state
+ LLM-->>MobileAgent: Action: type_text(control_id="5", text="restaurants")
+
+ MobileAgent->>MCPServer: click_control + type_text
+ MCPServer->>Android: Execute actions via ADB
+ Android-->>MCPServer: Actions completed
+ MCPServer-->>MobileAgent: Success
+
+ MobileAgent->>MobileAgent: State: FINISH
+ MobileAgent-->>User: Task completed
+```
+
+## Comparison with Other Agents
+
+| Aspect | MobileAgent | LinuxAgent | AppAgent |
+|--------|-------------|------------|----------|
+| **Platform** | Android Mobile | Linux (CLI) | Windows Applications |
+| **States** | 3 (CONTINUE, FINISH, FAIL) | 3 states | 6 states |
+| **Architecture** | Single-agent | Single-agent | Child executor |
+| **Interface** | Mobile UI (touch-based) | Command-line | Desktop GUI |
+| **Processing Phases** | 4 phases (with DATA_COLLECTION) | 3 phases | 4 phases |
+| **Visual** | ✓ Screenshots + Annotations | ✗ Text-only | ✓ Screenshots + Annotations |
+| **MCP Tools** | UI controls + App management | CLI commands | UI controls + API |
+| **Input Method** | Touch (tap, swipe, type) | Keyboard commands | Mouse + Keyboard |
+| **Control Identification** | UI hierarchy + bounds | N/A | UI Automation API |
+
+## Design Principles
+
+MobileAgent exemplifies mobile-specific design considerations:
+
+- **Visual Context**: Screenshot-based UI understanding with control annotation for precise interaction
+- **Control Caching**: Efficient control information caching to reduce ADB overhead
+- **Touch-based Interaction**: Specialized actions for mobile gestures (tap, swipe, long-press)
+- **App-centric Navigation**: Focus on app launching and switching rather than window management
+- **Minimal State Set**: 3-state FSM for deterministic control flow
+- **Modular Strategies**: Clear separation between data collection, LLM interaction, action execution, and memory updates
+- **Traceable Execution**: Complete logging of screenshots, actions, and state transitions
+
+## Deep Dive Topics
+
+Explore the detailed architecture and implementation:
+
+- [State Machine](state.md) - 3-state FSM lifecycle and transitions
+- [Processing Strategy](strategy.md) - 4-phase pipeline (Data Collection, LLM, Action, Memory)
+- [MCP Commands](commands.md) - Mobile UI interaction and app management commands
+- [As Galaxy Device](as_galaxy_device.md) - Using Mobile Agent in multi-device workflows
+
+## Technology Stack
+
+### ADB (Android Debug Bridge)
+
+MobileAgent relies on ADB for all device interactions:
+
+- **Screenshot Capture**: `adb shell screencap` for visual context
+- **UI Hierarchy**: `adb shell uiautomator dump` for control information
+- **Touch Input**: `adb shell input tap/swipe` for user interaction
+- **Text Input**: `adb shell input text` for typing
+- **App Control**: `adb shell monkey` for app launching
+- **Device Info**: `adb shell getprop` for device properties
+
+### MCP Server Architecture
+
+Two separate MCP servers handle different responsibilities:
+
+1. **Data Collection Server** (Port 8020):
+ - Screenshot capture
+ - UI tree retrieval
+ - App list collection
+ - Control information gathering
+ - Device information
+
+2. **Action Server** (Port 8021):
+ - Touch actions (tap, swipe)
+ - Text input
+ - App launching
+ - Key press events
+ - Control clicking
+
+Both servers share a singleton `MobileServerState` for efficient caching and coordination.
+
+## Use Cases
+
+MobileAgent is ideal for:
+
+- **Mobile App Testing**: Automated UI testing across different apps
+- **Cross-App Workflows**: Tasks spanning multiple mobile applications
+- **Data Entry**: Automated form filling and text input
+- **App Navigation**: Exploring and interacting with mobile UIs
+- **Mobile Productivity**: Automating repetitive mobile tasks
+- **Cross-Device Workflows**: As a sub-agent in Galaxy multi-device orchestration
+
+!!!tip "Galaxy Integration"
+ MobileAgent can serve as a device agent in Galaxy's multi-device orchestration framework, executing Android-specific tasks as part of cross-platform workflows alongside Windows and Linux devices.
+
+ See [Using Mobile Agent as Galaxy Device](as_galaxy_device.md) for configuration details.
+
+## Requirements
+
+### Hardware
+
+- Android device or emulator
+- USB connection (for physical devices) or network connection (for emulators)
+- USB debugging enabled on the device
+
+### Software
+
+- ADB (Android Debug Bridge) installed and accessible
+- Android device with API level 21+ (Android 5.0+)
+- Python 3.8+
+- Required Python packages (see requirements.txt)
+
+## Implementation Location
+
+The MobileAgent implementation can be found in:
+
+```
+ufo/
+├── agents/
+│ ├── agent/
+│ │ └── customized_agent.py # MobileAgent class definition
+│ ├── states/
+│ │ └── mobile_agent_state.py # State machine implementation
+│ └── processors/
+│ ├── customized/
+│ │ └── customized_agent_processor.py # MobileAgentProcessor
+│ └── strategies/
+│ └── mobile_agent_strategy.py # Processing strategies
+├── prompter/
+│ └── customized/
+│ └── mobile_agent_prompter.py # Prompt construction
+├── module/
+│ └── sessions/
+│ └── mobile_session.py # Session management
+└── client/
+ └── mcp/
+ └── http_servers/
+ └── mobile_mcp_server.py # MCP server implementation
+```
+
+## Next Steps
+
+To understand MobileAgent's complete architecture:
+
+1. [State Machine](state.md) - Learn about the 3-state FSM
+2. [Processing Strategy](strategy.md) - Understand the 4-phase pipeline
+3. [MCP Commands](commands.md) - Explore mobile UI interaction commands
+4. [As Galaxy Device](as_galaxy_device.md) - Configure for multi-device workflows
+
+For deployment and configuration, see the Quick Start Guide (coming soon).
diff --git a/documents/docs/mobile/state.md b/documents/docs/mobile/state.md
new file mode 100644
index 000000000..18a616b6c
--- /dev/null
+++ b/documents/docs/mobile/state.md
@@ -0,0 +1,403 @@
+# MobileAgent State Machine
+
+MobileAgent uses a **3-state finite state machine (FSM)** to manage Android device task execution flow. The minimal state set captures essential execution progression while maintaining simplicity and predictability. States transition based on LLM decisions and action execution results.
+
+> **📖 Related Documentation:**
+>
+> - [Mobile Agent Overview](overview.md) - Architecture and core responsibilities
+> - [Processing Strategy](strategy.md) - 4-phase pipeline execution in CONTINUE state
+> - [MCP Commands](commands.md) - Available mobile interaction commands
+> - [Quick Start Guide](../getting_started/quick_start_mobile.md) - Set up your first Mobile Agent
+
+## State Machine Architecture
+
+### State Enumeration
+
+```python
+class MobileAgentStatus(Enum):
+ """Store the status of the mobile agent"""
+ CONTINUE = "CONTINUE" # Task is ongoing, requires further actions
+ FINISH = "FINISH" # Task completed successfully
+ FAIL = "FAIL" # Task cannot proceed, unrecoverable error
+```
+
+### State Management
+
+MobileAgent states are managed by `MobileAgentStateManager`, which implements the agent state registry pattern:
+
+```python
+class MobileAgentStateManager(AgentStateManager):
+ """Manages the states of the mobile agent"""
+ _state_mapping: Dict[str, Type[MobileAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ return NoneMobileAgentState()
+```
+
+All MobileAgent states are registered using the `@MobileAgentStateManager.register` decorator, enabling dynamic state lookup by name.
+
+## State Transition Diagram
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: Start Task
+
+ CONTINUE --> CONTINUE: More Actions Needed (LLM returns CONTINUE)
+ CONTINUE --> FINISH: Task Complete (LLM returns FINISH)
+ CONTINUE --> FAIL: Unrecoverable Error (LLM returns FAIL or Exception)
+
+ FINISH --> [*]: Session Ends
+ FAIL --> FINISH: Cleanup
+ FINISH --> [*]: Session Ends
+
+ note right of CONTINUE
+ Active execution state:
+ - Capture screenshots
+ - Collect UI controls
+ - Get LLM decision
+ - Execute actions
+ - Update memory
+ end note
+
+ note right of FINISH
+ Terminal state:
+ - Task completed successfully
+ - Results available in memory
+ - Agent can be terminated
+ end note
+
+ note right of FAIL
+ Error terminal state:
+ - Unrecoverable error occurred
+ - Error details logged
+ - Transitions to FINISH for cleanup
+ end note
+```
+
+## State Definitions
+
+### 1. CONTINUE State
+
+**Purpose**: Active execution state where MobileAgent processes the user request and executes mobile actions.
+
+```python
+@MobileAgentStateManager.register
+class ContinueMobileAgentState(MobileAgentState):
+ """The class for the continue mobile agent state"""
+
+ async def handle(self, agent: "MobileAgent", context: Optional["Context"] = None):
+ """Execute the 4-phase processing pipeline"""
+ await agent.process(context)
+
+ def is_round_end(self) -> bool:
+ return False # Round continues
+
+ def is_subtask_end(self) -> bool:
+ return False # Subtask continues
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.CONTINUE.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Active |
+| **Processor Executed** | ✓ Yes (4 phases) |
+| **Round Ends** | No |
+| **Subtask Ends** | No |
+| **Duration** | Single round |
+| **Next States** | CONTINUE, FINISH, FAIL |
+
+**Behavior**:
+
+1. **Data Collection Phase**:
+ - Captures device screenshot
+ - Retrieves installed apps list
+ - Collects current screen UI controls
+ - Creates annotated screenshot with control IDs
+
+2. **LLM Interaction Phase**:
+ - Constructs prompts with screenshots and control information
+ - Gets next action from LLM
+ - Parses and validates response
+
+3. **Action Execution Phase**:
+ - Executes mobile actions (tap, swipe, type, launch app, etc.)
+ - Captures execution results
+
+4. **Memory Update Phase**:
+ - Updates memory with screenshots and action results
+ - Stores control information for next round
+
+5. **State Determination**:
+ - Analyzes LLM response for next state
+
+**State Transition Logic**:
+
+- **CONTINUE → CONTINUE**: Task requires more actions to complete (e.g., need to navigate through multiple screens)
+- **CONTINUE → FINISH**: LLM determines task is complete (e.g., successfully filled form and submitted)
+- **CONTINUE → FAIL**: Unrecoverable error encountered (e.g., required app not installed, control not found after multiple attempts)
+
+### 2. FINISH State
+
+**Purpose**: Terminal state indicating successful task completion.
+
+```python
+@MobileAgentStateManager.register
+class FinishMobileAgentState(MobileAgentState):
+ """The class for the finish mobile agent state"""
+
+ def next_agent(self, agent: "MobileAgent") -> "MobileAgent":
+ return agent
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ return FinishMobileAgentState() # Remains in FINISH
+
+ def is_subtask_end(self) -> bool:
+ return True # Subtask completed
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.FINISH.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Subtask Ends** | Yes |
+| **Duration** | Permanent |
+| **Next States** | FINISH (no transition) |
+
+**Behavior**:
+
+- Signals task completion to session manager
+- No further processing occurs
+- Agent instance can be terminated
+- Screenshots and action history available in memory
+
+**FINISH state is reached when**:
+
+- All required mobile actions have been executed successfully
+- The LLM determines the user request has been fulfilled
+- Target UI state has been achieved (e.g., form submitted, information displayed)
+- No errors or exceptions occurred during execution
+
+### 3. FAIL State
+
+**Purpose**: Terminal state indicating task failure due to unrecoverable errors.
+
+```python
+@MobileAgentStateManager.register
+class FailMobileAgentState(MobileAgentState):
+ """The class for the fail mobile agent state"""
+
+ def next_agent(self, agent: "MobileAgent") -> "MobileAgent":
+ return agent
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ return FinishMobileAgentState() # Transitions to FINISH for cleanup
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ def is_subtask_end(self) -> bool:
+ return True # Subtask failed
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.FAIL.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal (Error) |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Subtask Ends** | Yes |
+| **Duration** | Transitions to FINISH |
+| **Next States** | FINISH |
+
+**Behavior**:
+
+- Logs failure reason and context
+- Captures final screenshot for debugging
+- Transitions to FINISH state for cleanup
+- Session manager receives failure status
+
+!!!error "Failure Conditions"
+ FAIL state is reached when:
+
+ - **App Unavailable**: Required app is not installed or cannot be launched
+ - **Control Not Found**: Target UI control cannot be located after multiple attempts
+ - **Device Disconnected**: ADB connection lost during execution
+ - **Permission Denied**: Required permissions not granted on device
+ - **Timeout**: Actions take too long to complete
+ - **LLM Explicit Failure**: LLM explicitly indicates task cannot be completed
+ - **Repeated Action Failures**: Multiple consecutive actions fail
+
+**Error Recovery**:
+
+While FAIL is a terminal state, the error information is logged for debugging:
+
+```python
+# Example error logging in FAIL state
+agent.logger.error(f"Mobile task failed: {error_message}")
+agent.logger.debug(f"Last action: {last_action}")
+agent.logger.debug(f"Current screenshot saved to: {screenshot_path}")
+agent.logger.debug(f"UI controls at failure: {current_controls}")
+```
+
+## State Transition Rules
+
+### Transition Decision Logic
+
+State transitions are determined by the LLM's response in the **CONTINUE** state:
+
+```python
+# LLM returns status in response
+parsed_response = {
+ "action": {
+ "function": "click_control",
+ "arguments": {"control_id": "5", "control_name": "Search"},
+ "status": "CONTINUE" # or "FINISH" or "FAIL"
+ },
+ "thought": "Need to click the search button to proceed"
+}
+
+# Agent updates its status based on LLM decision
+agent.status = parsed_response["action"]["status"]
+next_state = MobileAgentStateManager().get_state(agent.status)
+```
+
+### Transition Matrix
+
+| Current State | Condition | Next State | Trigger |
+|---------------|-----------|------------|---------|
+| **CONTINUE** | LLM returns CONTINUE | CONTINUE | More actions needed (e.g., navigating multiple screens) |
+| **CONTINUE** | LLM returns FINISH | FINISH | Task completed (e.g., information found and displayed) |
+| **CONTINUE** | LLM returns FAIL | FAIL | Unrecoverable error (e.g., required control not available) |
+| **CONTINUE** | Exception raised | FAIL | System error (e.g., ADB disconnected) |
+| **FINISH** | Any | FINISH | No transition |
+| **FAIL** | Any | FINISH | Cleanup transition |
+
+## State-Specific Processing
+
+### CONTINUE State Processing Pipeline
+
+When in CONTINUE state, MobileAgent executes the full 4-phase pipeline:
+
+```mermaid
+graph TD
+ A[CONTINUE State] --> B[Phase 1: Data Collection]
+ B --> B1[Capture Screenshot]
+ B1 --> B2[Get Installed Apps]
+ B2 --> B3[Get Current Controls]
+ B3 --> B4[Create Annotated Screenshot]
+
+ B4 --> C[Phase 2: LLM Interaction]
+ C --> C1[Construct Prompt with Visual Context]
+ C1 --> C2[Send to LLM]
+ C2 --> C3[Parse Response]
+
+ C3 --> D[Phase 3: Action Execution]
+ D --> D1[Execute Mobile Action]
+ D1 --> D2[Capture Result]
+
+ D2 --> E[Phase 4: Memory Update]
+ E --> E1[Store Screenshot]
+ E1 --> E2[Store Action Result]
+ E2 --> E3[Update Control Cache]
+
+ E3 --> F{Check Status}
+ F -->|CONTINUE| A
+ F -->|FINISH| G[FINISH State]
+ F -->|FAIL| H[FAIL State]
+```
+
+### Terminal States (FINISH / FAIL)
+
+Terminal states perform no processing:
+
+- **FINISH**: Clean termination, results and screenshots available in memory
+- **FAIL**: Error termination, error details and final screenshot logged
+
+## Deterministic Control Flow
+
+The 3-state design ensures deterministic, traceable execution:
+
+- **Predictable Behavior**: Every execution path is well-defined
+- **Debuggability**: State transitions are logged with screenshots for visual debugging
+- **Testability**: Finite state space simplifies testing
+- **Maintainability**: Simple state set reduces complexity
+- **Visual Traceability**: Screenshots at each state provide visual execution history
+
+## Comparison with Other Agents
+
+| Agent | States | Complexity | Visual | Use Case |
+|-------|--------|------------|--------|----------|
+| **MobileAgent** | 3 | Minimal | ✓ Screenshots | Android mobile automation |
+| **LinuxAgent** | 3 | Minimal | ✗ Text-only | Linux CLI task execution |
+| **AppAgent** | 6 | Moderate | ✓ Screenshots | Windows app automation |
+| **HostAgent** | 7 | High | ✓ Screenshots | Desktop orchestration |
+
+MobileAgent's minimal 3-state design reflects its focused scope: execute mobile UI actions to fulfill user requests. The simplified state machine eliminates unnecessary complexity while maintaining robust error handling and completion detection, similar to LinuxAgent but with visual context support.
+
+## Mobile-Specific Considerations
+
+### Screenshot-Based State Tracking
+
+Unlike LinuxAgent (text-based) or AppAgent (Windows UI API), MobileAgent relies heavily on screenshots for state understanding:
+
+- Each CONTINUE round starts with a fresh screenshot
+- Annotated screenshots show control IDs for precise interaction
+- Screenshots are saved to memory for debugging and analysis
+- Visual context helps LLM understand current UI state
+
+### Control Caching
+
+MobileAgent caches control information to minimize ADB overhead:
+
+- Controls are cached for 5 seconds
+- Cache is invalidated after each action (UI likely changed)
+- Control dictionary enables quick lookup by ID
+- Reduces repeated UI tree parsing
+
+### Touch-Based Interaction
+
+State transitions in MobileAgent are triggered by touch actions rather than keyboard commands:
+
+- **Tap**: Primary interaction method
+- **Swipe**: For scrolling and gestures
+- **Type**: Text input (requires focused control)
+- **Long-press**: For context menus (planned)
+
+## Implementation Details
+
+The state machine implementation can be found in:
+
+```
+ufo/agents/states/mobile_agent_state.py
+```
+
+Key classes:
+
+- `MobileAgentStatus`: State enumeration (CONTINUE, FINISH, FAIL)
+- `MobileAgentStateManager`: State registry and lookup
+- `MobileAgentState`: Abstract base class
+- `ContinueMobileAgentState`: Active execution state with 4-phase pipeline
+- `FinishMobileAgentState`: Successful completion state
+- `FailMobileAgentState`: Error termination state
+- `NoneMobileAgentState`: Initial/undefined state
+
+## Next Steps
+
+- [Processing Strategy](strategy.md) - Understand the 4-phase processing pipeline executed in CONTINUE state
+- [MCP Commands](commands.md) - Explore mobile UI interaction and app management commands
+- [Overview](overview.md) - Return to MobileAgent architecture overview
diff --git a/documents/docs/mobile/strategy.md b/documents/docs/mobile/strategy.md
new file mode 100644
index 000000000..583228694
--- /dev/null
+++ b/documents/docs/mobile/strategy.md
@@ -0,0 +1,886 @@
+# MobileAgent Processing Strategy
+
+MobileAgent executes a **4-phase processing pipeline** in the **CONTINUE** state. Each phase handles a specific aspect of mobile task execution: data collection (screenshots and controls), LLM decision making, action execution, and memory recording. This design separates visual context gathering from prompt construction, LLM reasoning, mobile action execution, and state updates, enhancing modularity and traceability.
+
+> **📖 Related Documentation:**
+>
+> - [Mobile Agent Overview](overview.md) - Architecture and core responsibilities
+> - [State Machine](state.md) - FSM states (this strategy runs in CONTINUE state)
+> - [MCP Commands](commands.md) - Available commands used in each phase
+> - [Quick Start Guide](../getting_started/quick_start_mobile.md) - Set up your first Mobile Agent
+
+## Strategy Assembly
+
+Processing strategies are assembled and orchestrated by the `MobileAgentProcessor` class defined in `ufo/agents/processors/customized/customized_agent_processor.py`. The processor coordinates the 4-phase pipeline execution.
+
+### MobileAgentProcessor Overview
+
+The `MobileAgentProcessor` extends `CustomizedProcessor` and manages the Mobile-specific workflow:
+
+```python
+class MobileAgentProcessor(CustomizedProcessor):
+ """
+ Processor for Mobile Android MCP Agent.
+ Handles data collection, LLM interaction, and action execution for Android devices.
+ """
+
+ def _setup_strategies(self) -> None:
+ """Setup processing strategies for Mobile Agent."""
+
+ # Phase 1: Data Collection (composed strategy - fail_fast=True)
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ MobileScreenshotCaptureStrategy(fail_fast=True),
+ MobileAppsCollectionStrategy(fail_fast=False),
+ MobileControlsCollectionStrategy(fail_fast=False),
+ ],
+ name="MobileDataCollectionStrategy",
+ fail_fast=True,
+ )
+
+ # Phase 2: LLM Interaction (critical - fail_fast=True)
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ MobileLLMInteractionStrategy(fail_fast=True)
+ )
+
+ # Phase 3: Action Execution (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ MobileActionExecutionStrategy(fail_fast=False)
+ )
+
+ # Phase 4: Memory Update (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(fail_fast=False)
+ )
+```
+
+### Strategy Registration
+
+| Phase | Strategy Class | fail_fast | Rationale |
+|-------|---------------|-----------|-----------|
+| **DATA_COLLECTION** | `ComposedStrategy` (3 sub-strategies) | ✓ True | Visual context is critical for mobile interaction |
+| **LLM_INTERACTION** | `MobileLLMInteractionStrategy` | ✓ True | LLM failure requires immediate recovery |
+| **ACTION_EXECUTION** | `MobileActionExecutionStrategy` | ✗ False | Action failures can be handled gracefully |
+| **MEMORY_UPDATE** | `AppMemoryUpdateStrategy` | ✗ False | Memory failures shouldn't block execution |
+
+**Fail-Fast vs Graceful:**
+
+- **fail_fast=True**: Critical phases where errors should immediately transition to FAIL state
+- **fail_fast=False**: Non-critical phases where errors can be logged and execution continues
+
+## Four-Phase Pipeline
+
+### Pipeline Execution Flow
+
+```mermaid
+graph LR
+ A[CONTINUE State] --> B[Phase 1: Data Collection]
+ B --> C[Phase 2: LLM Interaction]
+ C --> D[Phase 3: Action Execution]
+ D --> E[Phase 4: Memory Update]
+ E --> F[Determine Next State]
+ F --> G{Status?}
+ G -->|CONTINUE| A
+ G -->|FINISH| H[FINISH State]
+ G -->|FAIL| I[FAIL State]
+```
+
+## Phase 1: Data Collection Strategy (Composed)
+
+**Purpose**: Gather comprehensive visual and structural information about the current mobile UI state.
+
+Phase 1 is a **composed strategy** consisting of three sub-strategies executed sequentially:
+
+1. **Screenshot Capture**: Take device screenshot
+2. **Apps Collection**: List installed applications
+3. **Controls Collection**: Extract UI hierarchy and annotate controls
+
+### Sub-Strategy 1.1: Screenshot Capture
+
+```python
+@depends_on("log_path", "session_step")
+@provides(
+ "clean_screenshot_path",
+ "clean_screenshot_url",
+ "annotated_screenshot_url", # Initially None, set by Controls Collection
+ "screenshot_saved_time",
+)
+class MobileScreenshotCaptureStrategy(BaseProcessingStrategy):
+ """
+ Strategy for capturing Android device screenshots.
+ """
+```
+
+#### Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant MCP
+ participant ADB
+ participant Device
+
+ Strategy->>MCP: capture_screenshot command
+ MCP->>ADB: screencap -p /sdcard/screen_temp.png
+ ADB->>Device: Execute screenshot
+ Device-->>ADB: Screenshot saved
+
+ ADB->>Device: Pull screenshot
+ Device-->>ADB: PNG file
+ ADB-->>MCP: PNG data
+
+ MCP->>MCP: Encode to base64
+ MCP-->>Strategy: data:image/png;base64,...
+
+ Strategy->>Strategy: Save to log_path
+ Strategy-->>Agent: Screenshot URL + path
+```
+
+#### Output
+
+```python
+{
+ "clean_screenshot_path": "logs/.../action_step1.png",
+ "clean_screenshot_url": "data:image/png;base64,iVBORw0KGgoAAAANS...",
+ "annotated_screenshot_url": None, # Set by Controls Collection
+ "screenshot_saved_time": 0.234 # seconds
+}
+```
+
+### Sub-Strategy 1.2: Apps Collection
+
+```python
+@depends_on("clean_screenshot_url")
+@provides("installed_apps", "apps_collection_time")
+class MobileAppsCollectionStrategy(BaseProcessingStrategy):
+ """
+ Strategy for collecting installed apps information from Android device.
+ """
+```
+
+#### Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant MCP
+ participant ADB
+ participant Device
+
+ Strategy->>MCP: get_mobile_app_target_info
+ MCP->>MCP: Check cache (5min TTL)
+
+ alt Cache Hit
+ MCP-->>Strategy: Cached app list
+ else Cache Miss
+ MCP->>ADB: pm list packages -3
+ ADB->>Device: List user apps
+ Device-->>ADB: Package list
+ ADB-->>MCP: Packages
+
+ MCP->>MCP: Parse to TargetInfo
+ MCP->>MCP: Update cache
+ MCP-->>Strategy: App list
+ end
+
+ Strategy-->>Agent: Installed apps
+```
+
+#### Output Format
+
+```python
+{
+ "installed_apps": [
+ {
+ "id": "1",
+ "name": "com.android.chrome",
+ "package": "com.android.chrome"
+ },
+ {
+ "id": "2",
+ "name": "com.google.android.apps.maps",
+ "package": "com.google.android.apps.maps"
+ },
+ ...
+ ],
+ "apps_collection_time": 0.156 # seconds
+}
+```
+
+**Caching**: Apps list is cached for 5 minutes to reduce ADB overhead, as installed apps rarely change during a session.
+
+### Sub-Strategy 1.3: Controls Collection
+
+```python
+@depends_on("clean_screenshot_url")
+@provides(
+ "current_controls",
+ "controls_collection_time",
+ "annotated_screenshot_url",
+ "annotated_screenshot_path",
+ "annotation_dict",
+)
+class MobileControlsCollectionStrategy(BaseProcessingStrategy):
+ """
+ Strategy for collecting current screen controls information from Android device.
+ Creates annotated screenshots with control labels.
+ """
+```
+
+#### Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant MCP
+ participant ADB
+ participant Device
+ participant Photographer
+
+ Strategy->>MCP: get_app_window_controls_target_info
+ MCP->>MCP: Check cache (5s TTL)
+
+ alt Cache Hit
+ MCP-->>Strategy: Cached controls
+ else Cache Miss
+ MCP->>ADB: uiautomator dump /sdcard/window_dump.xml
+ ADB->>Device: Dump UI hierarchy
+ Device-->>ADB: XML file
+
+ ADB->>Device: cat /sdcard/window_dump.xml
+ Device-->>ADB: XML content
+ ADB-->>MCP: UI hierarchy XML
+
+ MCP->>MCP: Parse XML
+ MCP->>MCP: Extract clickable controls
+ MCP->>MCP: Validate rectangles
+ MCP->>MCP: Assign IDs
+ MCP->>MCP: Update cache
+ MCP-->>Strategy: Controls list
+ end
+
+ Strategy->>Strategy: Convert to TargetInfo
+ Strategy->>Photographer: Create annotated screenshot
+ Photographer->>Photographer: Draw control IDs on screenshot
+ Photographer-->>Strategy: Annotated image
+
+ Strategy-->>Agent: Controls + Annotated screenshot
+```
+
+#### UI Hierarchy Parsing
+
+The strategy parses Android UI XML to extract meaningful controls:
+
+```xml
+
+
+
+
+
+
+
+
+
+```
+
+**Control Selection Criteria**:
+
+- `clickable="true"` - Can be tapped
+- `long-clickable="true"` - Supports long-press
+- `scrollable="true"` - Can be scrolled
+- `checkable="true"` - Checkbox or toggle
+- Has `text` or `content-desc` - Has label
+- Type includes "Edit", "Button" - Input or action element
+
+**Rectangle Validation**:
+
+Controls with invalid rectangles are filtered out:
+
+```python
+# Bounds format: [left, top, right, bottom]
+if right <= left or bottom <= top:
+ # Invalid: width or height is zero/negative
+ skip_control()
+```
+
+#### Output Format
+
+```python
+{
+ "current_controls": [
+ {
+ "id": "1",
+ "name": "Search",
+ "type": "EditText",
+ "rect": [48, 96, 912, 192] # [left, top, right, bottom]
+ },
+ {
+ "id": "2",
+ "name": "Search",
+ "type": "ImageButton",
+ "rect": [912, 96, 1032, 192]
+ },
+ ...
+ ],
+ "annotated_screenshot_url": "data:image/png;base64,...",
+ "annotated_screenshot_path": "logs/.../action_step1_annotated.png",
+ "annotation_dict": {
+ "1": {"id": "1", "name": "Search", "type": "EditText", ...},
+ "2": {"id": "2", "name": "Search", "type": "ImageButton", ...},
+ ...
+ },
+ "controls_collection_time": 0.345 # seconds
+}
+```
+
+**Caching**: Controls are cached for 5 seconds, but the cache is invalidated after every action (UI likely changed).
+
+### Composed Strategy Execution
+
+The three sub-strategies are executed sequentially in a single composed strategy:
+
+```python
+ComposedStrategy(
+ strategies=[
+ MobileScreenshotCaptureStrategy(fail_fast=True),
+ MobileAppsCollectionStrategy(fail_fast=False),
+ MobileControlsCollectionStrategy(fail_fast=False),
+ ],
+ name="MobileDataCollectionStrategy",
+ fail_fast=True, # Overall failure if screenshot capture fails
+)
+```
+
+**Execution Order**:
+
+1. Screenshot Capture (critical)
+2. Apps Collection (optional, continues on failure)
+3. Controls Collection (optional, continues on failure)
+
+---
+
+## Phase 2: LLM Interaction Strategy
+
+**Purpose**: Construct mobile-specific prompts with visual context and obtain next action from LLM.
+
+### Strategy Implementation
+
+```python
+@depends_on("installed_apps", "current_controls", "clean_screenshot_url")
+@provides(
+ "parsed_response",
+ "response_text",
+ "llm_cost",
+ "prompt_message",
+ "action",
+ "thought",
+ "comment",
+)
+class MobileLLMInteractionStrategy(AppLLMInteractionStrategy):
+ """
+ Strategy for LLM interaction with Mobile Agent specific prompting.
+ """
+```
+
+### Phase 2 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Agent
+ participant Prompter
+ participant LLM
+
+ Strategy->>Agent: Get previous plan
+ Strategy->>Agent: Get blackboard context
+ Agent-->>Strategy: Previous execution results
+
+ Strategy->>Prompter: Construct mobile prompt
+ Prompter->>Prompter: Build system message (APIs + examples)
+ Prompter->>Prompter: Add screenshot images
+ Prompter->>Prompter: Add annotated screenshot
+ Prompter->>Prompter: Add text prompt with context
+ Prompter-->>Strategy: Complete multimodal prompt
+
+ Strategy->>LLM: Send prompt
+ LLM-->>Strategy: Mobile action + status
+
+ Strategy->>Strategy: Parse response
+ Strategy->>Strategy: Validate action
+ Strategy-->>Agent: Parsed response + cost
+```
+
+### Prompt Construction
+
+The strategy constructs comprehensive multimodal prompts:
+
+```python
+prompt_message = agent.message_constructor(
+ dynamic_examples=[], # Few-shot examples (optional)
+ dynamic_knowledge="", # Retrieved knowledge (optional)
+ plan=plan, # Previous execution plan
+ request=request, # User request
+ installed_apps=installed_apps, # Available apps
+ current_controls=current_controls, # UI controls with IDs
+ screenshot_url=clean_screenshot_url, # Clean screenshot
+ annotated_screenshot_url=annotated_screenshot_url, # With control IDs
+ blackboard_prompt=blackboard_prompt, # Shared context
+ last_success_actions=last_success_actions # Successful actions
+)
+```
+
+### Multimodal Content Structure
+
+The prompt includes both visual and textual elements:
+
+```python
+user_content = [
+ # 1. Clean screenshot (for visual understanding)
+ {
+ "type": "image_url",
+ "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}
+ },
+
+ # 2. Annotated screenshot (for control identification)
+ {
+ "type": "image_url",
+ "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}
+ },
+
+ # 3. Text prompt with context
+ {
+ "type": "text",
+ "text": """
+ [Previous Plan]: [...]
+ [User Request]: Search for restaurants on Maps
+ [Installed Apps]: [
+ {"id": "1", "name": "com.google.android.apps.maps", ...},
+ ...
+ ]
+ [Current Screen Controls]: [
+ {"id": "1", "name": "Search", "type": "EditText", ...},
+ {"id": "2", "name": "Search", "type": "ImageButton", ...},
+ ...
+ ]
+ [Last Success Actions]: [...]
+ """
+ }
+]
+```
+
+### LLM Response Format
+
+The LLM returns a structured mobile action:
+
+```json
+{
+ "thought": "I need to launch Google Maps app first",
+ "action": {
+ "function": "launch_app",
+ "arguments": {
+ "package_name": "com.google.android.apps.maps",
+ "id": "1"
+ },
+ "status": "CONTINUE"
+ },
+ "comment": "Launching Maps to search for restaurants"
+}
+```
+
+### Mobile-Specific Features
+
+**Visual Context Priority**: LLM sees both clean and annotated screenshots, enabling better UI understanding than text-only descriptions.
+
+**Control ID References**: Annotated screenshot shows control IDs, allowing LLM to precisely reference UI elements in actions.
+
+**App Awareness**: LLM knows which apps are installed, enabling intelligent app selection and launching.
+
+**Touch-Based Actions**: LLM generates mobile-specific actions (tap, swipe, type) instead of desktop actions (click, drag, keyboard).
+
+---
+
+## Phase 3: Action Execution Strategy
+
+**Purpose**: Execute mobile actions returned by LLM and capture structured results.
+
+### Strategy Implementation
+
+```python
+class MobileActionExecutionStrategy(AppActionExecutionStrategy):
+ """
+ Strategy for executing actions in Mobile Agent.
+ """
+```
+
+### Phase 3 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant MCP
+ participant ADB
+ participant Device
+
+ Strategy->>Strategy: Extract action from LLM response
+
+ alt launch_app
+ Strategy->>MCP: launch_app(package_name)
+ MCP->>ADB: monkey -p package_name
+ else click_control
+ Strategy->>MCP: click_control(control_id, control_name)
+ MCP->>MCP: Get control from cache
+ MCP->>MCP: Calculate center position
+ MCP->>ADB: input tap x y
+ else type_text
+ Strategy->>MCP: type_text(text, control_id, ...)
+ MCP->>ADB: input tap (focus control)
+ MCP->>ADB: input text (type)
+ else swipe
+ Strategy->>MCP: swipe(start_x, start_y, end_x, end_y)
+ MCP->>ADB: input swipe ...
+ else tap
+ Strategy->>MCP: tap(x, y)
+ MCP->>ADB: input tap x y
+ else press_key
+ Strategy->>MCP: press_key(key_code)
+ MCP->>ADB: input keyevent KEY_CODE
+ else wait
+ Strategy->>Strategy: asyncio.sleep(seconds)
+ end
+
+ ADB->>Device: Execute command
+ Device-->>ADB: Result
+ ADB-->>MCP: Success/Failure
+
+ MCP->>MCP: Invalidate controls cache
+ MCP-->>Strategy: Execution result
+
+ Strategy->>Strategy: Create action info
+ Strategy->>Strategy: Format for memory
+ Strategy-->>Agent: Execution results
+```
+
+### Action Execution Flow
+
+```python
+# Extract parsed LLM response
+parsed_response: AppAgentResponse = context.get_local("parsed_response")
+command_dispatcher = context.global_context.command_dispatcher
+
+# Execute the action via MCP
+execution_results = await self._execute_app_action(
+ command_dispatcher,
+ parsed_response.action
+)
+```
+
+### Result Capture
+
+Execution results are structured for downstream processing:
+
+```python
+{
+ "success": True,
+ "action": "click_control(id=5, name=Search)",
+ "message": "Clicked control 'Search' at (480, 144)",
+ "control_info": {
+ "id": "5",
+ "name": "Search",
+ "type": "EditText",
+ "rect": [48, 96, 912, 192]
+ }
+}
+```
+
+### Action Info Creation
+
+Results are formatted into `ActionCommandInfo` objects:
+
+```python
+actions = self._create_action_info(
+ parsed_response.action,
+ execution_results,
+)
+
+action_info = ListActionCommandInfo(actions)
+action_info.color_print() # Pretty print to console
+```
+
+### Cache Invalidation
+
+After each action, control caches are invalidated:
+
+```python
+# Mobile MCP server automatically invalidates caches after actions
+# This ensures next round gets fresh UI state
+mobile_state.invalidate_controls()
+```
+
+---
+
+## Phase 4: Memory Update Strategy
+
+**Purpose**: Persist execution results, screenshots, and control information into agent memory for future reference.
+
+### Strategy Implementation
+
+MobileAgent reuses the `AppMemoryUpdateStrategy` from the app agent framework:
+
+```python
+self.strategies[ProcessingPhase.MEMORY_UPDATE] = AppMemoryUpdateStrategy(
+ fail_fast=False # Memory failures shouldn't stop process
+)
+```
+
+### Phase 4 Workflow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Memory
+ participant Context
+
+ Strategy->>Context: Get execution results
+ Strategy->>Context: Get LLM response
+ Strategy->>Context: Get screenshots
+
+ Strategy->>Memory: Create memory item
+ Memory->>Memory: Store screenshots (clean + annotated)
+ Memory->>Memory: Store action details
+ Memory->>Memory: Store control information
+ Memory->>Memory: Store timestamp
+
+ Strategy->>Context: Update round result
+ Strategy-->>Agent: Memory updated
+```
+
+### Memory Structure
+
+Each execution round is stored as a memory item:
+
+```python
+{
+ "round": 1,
+ "request": "Search for restaurants on Maps",
+ "thought": "I need to launch Google Maps app first",
+ "action": {
+ "function": "launch_app",
+ "arguments": {
+ "package_name": "com.google.android.apps.maps",
+ "id": "1"
+ }
+ },
+ "result": {
+ "success": True,
+ "message": "Launched com.google.android.apps.maps"
+ },
+ "screenshots": {
+ "clean": "logs/.../action_step1.png",
+ "annotated": "logs/.../action_step1_annotated.png"
+ },
+ "controls": [
+ {"id": "1", "name": "Search", "type": "EditText", ...},
+ ...
+ ],
+ "status": "CONTINUE",
+ "timestamp": "2025-11-14T10:30:45"
+}
+```
+
+### Iterative Refinement
+
+Memory enables iterative refinement across rounds:
+
+1. **Round 1**: Launch Maps app → Maps opened
+2. **Round 2**: Click search field (using control ID from Round 1 screenshot)
+3. **Round 3**: Type "restaurants" → Text entered
+4. **Round 4**: Click search button → Results displayed
+
+Each round builds on previous results and screenshots stored in memory.
+
+### Visual Debugging
+
+Memory stores screenshots for each round, enabling visual debugging:
+
+- **Clean Screenshots**: Show actual device UI
+- **Annotated Screenshots**: Show control IDs used by LLM
+- **Action Sequence**: Visual trace of entire task execution
+
+---
+
+## Middleware Stack
+
+MobileAgent uses specialized middleware for logging:
+
+```python
+def _setup_middleware(self) -> None:
+ """Setup middleware pipeline for Mobile Agent"""
+ self.middleware_chain = [MobileLoggingMiddleware()]
+```
+
+### MobileLoggingMiddleware
+
+Provides enhanced logging specific to Mobile operations:
+
+```python
+class MobileLoggingMiddleware(AppAgentLoggingMiddleware):
+ """Specialized logging middleware for Mobile Agent"""
+
+ def starting_message(self, context: ProcessingContext) -> str:
+ request = context.get("request") or "Unknown Request"
+ return f"Completing the user request: [bold cyan]{request}[/bold cyan] on Mobile."
+```
+
+**Logged Information**:
+
+- User request
+- Screenshots captured (with paths)
+- Apps collected
+- Controls identified (with IDs)
+- Each mobile action executed
+- Action results
+- State transitions
+- LLM costs
+- Timing information
+
+---
+
+## Context Finalization
+
+After processing, the processor updates global context:
+
+```python
+def _finalize_processing_context(self, processing_context: ProcessingContext):
+ """Finalize processing context by updating ContextNames fields"""
+ super()._finalize_processing_context(processing_context)
+
+ try:
+ result = processing_context.get_local("result")
+ if result:
+ self.global_context.set(ContextNames.ROUND_RESULT, result)
+ except Exception as e:
+ self.logger.warning(f"Failed to update context: {e}")
+```
+
+This makes execution results available to:
+
+- Subsequent rounds (iterative execution)
+- Other agents (if part of multi-agent workflow)
+- Session manager (for monitoring and logging)
+
+---
+
+## Strategy Dependency Graph
+
+The four phases have clear dependencies:
+
+```mermaid
+graph TD
+ A[log_path + session_step] --> B[Phase 1.1: Screenshot Capture]
+ B --> C[clean_screenshot_url]
+
+ C --> D[Phase 1.2: Apps Collection]
+ D --> E[installed_apps]
+
+ C --> F[Phase 1.3: Controls Collection]
+ F --> G[current_controls]
+ F --> H[annotated_screenshot_url]
+ F --> I[annotation_dict]
+
+ E --> J[Phase 2: LLM Interaction]
+ G --> J
+ C --> J
+ H --> J
+ J --> K[parsed_response]
+ J --> L[llm_cost]
+
+ K --> M[Phase 3: Action Execution]
+ I --> M
+ M --> N[execution_result]
+ M --> O[action_info]
+
+ K --> P[Phase 4: Memory Update]
+ N --> P
+ O --> P
+ C --> P
+ H --> P
+ P --> Q[Memory Updated]
+
+ Q --> R[Next Round or Terminal State]
+```
+
+---
+
+## Modular Design Benefits
+
+The 4-phase strategy design provides:
+
+!!!success "Modularity Benefits"
+ - **Separation of Concerns**: Data collection, LLM reasoning, action execution, and memory are isolated
+ - **Visual Context**: Screenshots provide rich UI understanding beyond text descriptions
+ - **Testability**: Each phase can be tested independently with mocked data
+ - **Extensibility**: New data collection strategies can be added (e.g., accessibility info)
+ - **Reusability**: Memory strategy is shared with AppAgent
+ - **Maintainability**: Clear boundaries between perception, decision, and action
+ - **Traceability**: Each phase logs its operations independently with visual artifacts
+ - **Performance**: Caching strategies reduce ADB overhead
+
+---
+
+## Comparison with Other Agents
+
+| Agent | Phases | Data Collection | Visual | LLM | Action | Memory |
+|-------|--------|----------------|--------|-----|--------|--------|
+| **MobileAgent** | 4 | ✓ Screenshots + Controls + Apps | ✓ Multimodal | ✓ Mobile actions | ✓ Touch/swipe | ✓ Results + Screenshots |
+| **LinuxAgent** | 3 | ✗ On-demand | ✗ Text-only | ✓ CLI commands | ✓ Shell | ✓ Results |
+| **AppAgent** | 4 | ✓ Screenshots + UI | ✓ Multimodal | ✓ UI actions | ✓ GUI + API | ✓ Results + Screenshots |
+| **HostAgent** | 4 | ✓ Desktop snapshot | ✓ Multimodal | ✓ App selection | ✓ Orchestration | ✓ Results |
+
+MobileAgent's 4-phase pipeline includes **DATA_COLLECTION** phase because:
+
+- Mobile UI requires visual context (screenshots)
+- Control identification needs UI hierarchy parsing
+- Touch targets need precise coordinates
+- Apps list informs available actions
+- Annotation creates visual correspondence between LLM and execution
+
+This reflects the visual, touch-based nature of mobile interaction.
+
+---
+
+## Implementation Location
+
+The strategy implementations can be found in:
+
+```
+ufo/agents/processors/
+├── customized/
+│ └── customized_agent_processor.py # MobileAgentProcessor
+└── strategies/
+ └── mobile_agent_strategy.py # Mobile-specific strategies
+```
+
+Key classes:
+
+- `MobileAgentProcessor`: Strategy orchestrator
+- `MobileScreenshotCaptureStrategy`: Screenshot capture via ADB
+- `MobileAppsCollectionStrategy`: Installed apps collection
+- `MobileControlsCollectionStrategy`: UI controls extraction and annotation
+- `MobileLLMInteractionStrategy`: Multimodal prompt construction and LLM interaction
+- `MobileActionExecutionStrategy`: Mobile action execution
+- `MobileLoggingMiddleware`: Enhanced logging
+
+---
+
+## Next Steps
+
+- [MCP Commands](commands.md) - Explore the mobile UI interaction and app management commands
+- [State Machine](state.md) - Understand the 3-state FSM that controls strategy execution
+- [Overview](overview.md) - Return to MobileAgent architecture overview
diff --git a/documents/docs/modules/context.md b/documents/docs/modules/context.md
deleted file mode 100644
index 9369cb0c7..000000000
--- a/documents/docs/modules/context.md
+++ /dev/null
@@ -1,74 +0,0 @@
-# Context
-
-The `Context` object is a shared state object that stores the state of the conversation across all `Rounds` within a `Session`. It is used to maintain the context of the conversation, as well as the overall status of the conversation.
-
-## Context Attributes
-
-The attributes of the `Context` object are defined in the `ContextNames` class, which is an `Enum`. The `ContextNames` class specifies various context attributes used throughout the session. Below is the definition:
-```python
-class ContextNames(Enum):
- """
- The context names.
- """
-
- ID = "ID" # The ID of the session
- MODE = "MODE" # The mode of the session
- LOG_PATH = "LOG_PATH" # The folder path to store the logs
- REQUEST = "REQUEST" # The current request
- SUBTASK = "SUBTASK" # The current subtask processed by the AppAgent
- PREVIOUS_SUBTASKS = "PREVIOUS_SUBTASKS" # The previous subtasks processed by the AppAgent
- HOST_MESSAGE = "HOST_MESSAGE" # The message from the HostAgent sent to the AppAgent
- REQUEST_LOGGER = "REQUEST_LOGGER" # The logger for the LLM request
- LOGGER = "LOGGER" # The logger for the session
- EVALUATION_LOGGER = "EVALUATION_LOGGER" # The logger for the evaluation
- ROUND_STEP = "ROUND_STEP" # The step of all rounds
- SESSION_STEP = "SESSION_STEP" # The step of the current session
- CURRENT_ROUND_ID = "CURRENT_ROUND_ID" # The ID of the current round
- APPLICATION_WINDOW = "APPLICATION_WINDOW" # The window of the application
- APPLICATION_PROCESS_NAME = "APPLICATION_PROCESS_NAME" # The process name of the application
- APPLICATION_ROOT_NAME = "APPLICATION_ROOT_NAME" # The root name of the application
- CONTROL_REANNOTATION = "CONTROL_REANNOTATION" # The re-annotation of the control provided by the AppAgent
- SESSION_COST = "SESSION_COST" # The cost of the session
- ROUND_COST = "ROUND_COST" # The cost of all rounds
- ROUND_SUBTASK_AMOUNT = "ROUND_SUBTASK_AMOUNT" # The amount of subtasks in all rounds
- CURRENT_ROUND_STEP = "CURRENT_ROUND_STEP" # The step of the current round
- CURRENT_ROUND_COST = "CURRENT_ROUND_COST" # The cost of the current round
- CURRENT_ROUND_SUBTASK_AMOUNT = "CURRENT_ROUND_SUBTASK_AMOUNT" # The amount of subtasks in the current round
- STRUCTURAL_LOGS = "STRUCTURAL_LOGS" # The structural logs of the session
-```
-Each attribute is a string that represents a specific aspect of the session context, ensuring that all necessary information is accessible and manageable within the application.
-
-
-## Attributes Description
-
-| Attribute | Description |
-|--------------------------------|---------------------------------------------------------|
-| `ID` | The ID of the session. |
-| `MODE` | The mode of the session. |
-| `LOG_PATH` | The folder path to store the logs. |
-| `REQUEST` | The current request. |
-| `SUBTASK` | The current subtask processed by the AppAgent. |
-| `PREVIOUS_SUBTASKS` | The previous subtasks processed by the AppAgent. |
-| `HOST_MESSAGE` | The message from the HostAgent sent to the AppAgent. |
-| `REQUEST_LOGGER` | The logger for the LLM request. |
-| `LOGGER` | The logger for the session. |
-| `EVALUATION_LOGGER` | The logger for the evaluation. |
-| `ROUND_STEP` | The step of all rounds. |
-| `SESSION_STEP` | The step of the current session. |
-| `CURRENT_ROUND_ID` | The ID of the current round. |
-| `APPLICATION_WINDOW` | The window of the application. |
-| `APPLICATION_PROCESS_NAME` | The process name of the application. |
-| `APPLICATION_ROOT_NAME` | The root name of the application. |
-| `CONTROL_REANNOTATION` | The re-annotation of the control provided by the AppAgent. |
-| `SESSION_COST` | The cost of the session. |
-| `ROUND_COST` | The cost of all rounds. |
-| `ROUND_SUBTASK_AMOUNT` | The amount of subtasks in all rounds. |
-| `CURRENT_ROUND_STEP` | The step of the current round. |
-| `CURRENT_ROUND_COST` | The cost of the current round. |
-| `CURRENT_ROUND_SUBTASK_AMOUNT` | The amount of subtasks in the current round. |
-| `STRUCTURAL_LOGS` | The structural logs of the session. |
-
-
-# Reference for the `Context` object
-
-::: module.context.Context
\ No newline at end of file
diff --git a/documents/docs/modules/round.md b/documents/docs/modules/round.md
deleted file mode 100644
index 3f53f7be3..000000000
--- a/documents/docs/modules/round.md
+++ /dev/null
@@ -1,58 +0,0 @@
-# Round
-
-A `Round` is a single interaction between the user and UFO that processes a single user request. A `Round` is responsible for orchestrating the `HostAgent` and `AppAgent` to fulfill the user's request.
-
-
-## Round Lifecycle
-
-In a `Round`, the following steps are executed:
-
-### 1. Round Initialization
-At the beginning of a `Round`, the `Round` object is created, and the user's request is processed by the `HostAgent` to determine the appropriate application to fulfill the request.
-
-### 2. Action Execution
-Once created, the `Round` orchestrates the `HostAgent` and `AppAgent` to execute the necessary actions to fulfill the user's request. The core logic of a `Round` is shown below:
-
-```python
-def run(self) -> None:
- """
- Run the round.
- """
-
- while not self.is_finished():
-
- self.agent.handle(self.context)
-
- self.state = self.agent.state.next_state(self.agent)
- self.agent = self.agent.state.next_agent(self.agent)
- self.agent.set_state(self.state)
-
- # If the subtask ends, capture the last snapshot of the application.
- if self.state.is_subtask_end():
- time.sleep(configs["SLEEP_TIME"])
- self.capture_last_snapshot(sub_round_id=self.subtask_amount)
- self.subtask_amount += 1
-
- self.agent.blackboard.add_requests(
- {"request_{i}".format(i=self.id), self.request}
- )
-
- if self.application_window is not None:
- self.capture_last_snapshot()
-
- if self._should_evaluate:
- self.evaluation()
-```
-
-At each step, the `Round` processes the user's request by invoking the `handle` method of the `AppAgent` or `HostAgent` based on the current state. The state determines the next agent to handle the request and the next state to transition to.
-
-### 3. Request Completion
-The `AppAgent` completes the actions within the application. If the request spans multiple applications, the `HostAgent` may switch to a different application to continue the task.
-
-### 4. Round Termination
-Once the user's request is fulfilled, the `Round` is terminated, and the results are returned to the user. If configured, the `EvaluationAgent` evaluates the completeness of the `Round`.
-
-
-# Reference
-
-::: module.basic.BaseRound
\ No newline at end of file
diff --git a/documents/docs/modules/session.md b/documents/docs/modules/session.md
deleted file mode 100644
index 1473ba76a..000000000
--- a/documents/docs/modules/session.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Session
-
-A `Session` is a conversation instance between the user and UFO. It is a continuous interaction that starts when the user initiates a request and ends when the request is completed. UFO supports multiple requests within the same session. Each request is processed sequentially, by a `Round` of interaction, until the user's request is fulfilled. We show the relationship between `Session` and `Round` in the following figure:
-
-
-
-
-
-## Session Lifecycle
-
-The lifecycle of a `Session` is as follows:
-
-### 1. Session Initialization
-
-A `Session` is initialized when the user starts a conversation with UFO. The `Session` object is created, and the first `Round` of interaction is initiated. At this stage, the user's request is processed by the `HostAgent` to determine the appropriate application to fulfill the request. The `Context` object is created to store the state of the conversation shared across all `Rounds` within the `Session`.
-
-### 2. Session Processing
-
-Once the `Session` is initialized, the `Round` of interaction begins, which completes a single user request by orchestrating the `HostAgent` and `AppAgent`.
-
-### 3. Next Round
-
-After the completion of the first `Round`, the `Session` requests the next request from the user to start the next `Round` of interaction. This process continues until there are no more requests from the user.
-The core logic of a `Session` is shown below:
-
-```python
-def run(self) -> None:
- """
- Run the session.
- """
-
- while not self.is_finished():
-
- round = self.create_new_round()
- if round is None:
- break
- round.run()
-
- if self.application_window is not None:
- self.capture_last_snapshot()
-
- if self._should_evaluate and not self.is_error():
- self.evaluation()
-
- self.print_cost()
-```
-
-### 4. Session Termination
-If the user has no more requests or decides to end the conversation, the `Session` is terminated, and the conversation ends. The `EvaluationAgent` evaluates the completeness of the `Session` if it is configured to do so.
-
-
-## Reference
-
-::: module.basic.BaseSession
\ No newline at end of file
diff --git a/documents/docs/project_directory_structure.md b/documents/docs/project_directory_structure.md
index 8f27afbd0..42a66c9b7 100644
--- a/documents/docs/project_directory_structure.md
+++ b/documents/docs/project_directory_structure.md
@@ -1,104 +1,487 @@
-The UFO project is organized into a well-defined directory structure to facilitate development, deployment, and documentation. Below is an overview of each directory and file, along with their purpose:
+# Project Directory Structure
+
+This repository implements **UFO³**, a multi-tier AgentOS architecture spanning from single-device automation (UFO²) to cross-device orchestration (Galaxy). This document provides an overview of the directory structure to help you understand the codebase organization.
+
+> **New to UFO³?** Start with the [Documentation Home](index.md) for an introduction and [Quick Start Guide](getting_started/quick_start_galaxy.md) to get up and running.
+
+**Architecture Overview:**
+
+- **🌌 Galaxy**: Multi-device DAG-based orchestration framework that coordinates agents across different platforms
+- **🎯 UFO²**: Single-device Windows desktop agent system that can serve as Galaxy's sub-agent
+- **🔌 AIP**: Agent Integration Protocol for cross-device communication
+- **⚙️ Modular Configuration**: Type-safe configs in `config/galaxy/` and `config/ufo/`
+
+---
+
+## 📦 Root Directory Structure
+
+```
+UFO/
+├── galaxy/ # 🌌 Multi-device orchestration framework
+├── ufo/ # 🎯 Desktop AgentOS (can be Galaxy sub-agent)
+├── config/ # ⚙️ Modular configuration system
+├── aip/ # 🔌 Agent Integration Protocol
+├── documents/ # 📖 MkDocs documentation site
+├── vectordb/ # 🗄️ Vector database for RAG
+├── learner/ # 📚 Help document indexing tools
+├── record_processor/ # 🎥 Human demonstration parser
+├── dataflow/ # 📊 Data collection pipeline
+├── model_worker/ # 🤖 Custom LLM deployment tools
+├── logs/ # 📝 Execution logs (auto-generated)
+├── scripts/ # 🛠️ Utility scripts
+├── tests/ # 🧪 Unit and integration tests
+└── requirements.txt # 📦 Python dependencies
+```
+
+---
+
+## 🌌 Galaxy Framework (`galaxy/`)
+
+The cross-device orchestration framework that transforms natural language requests into executable DAG workflows distributed across heterogeneous devices.
+
+### Directory Structure
+
+```
+galaxy/
+├── agents/ # 🤖 Constellation orchestration agents
+│ ├── agent/ # ConstellationAgent and basic agent classes
+│ ├── states/ # Agent state machines
+│ ├── processors/ # Request/result processing
+│ └── presenters/ # Response formatting
+│
+├── constellation/ # 🌟 Core DAG management system
+│ ├── task_constellation.py # TaskConstellation - DAG container
+│ ├── task_star.py # TaskStar - Task nodes
+│ ├── task_star_line.py # TaskStarLine - Dependency edges
+│ ├── enums.py # Enums for constellation components
+│ ├── editor/ # Interactive DAG editing with undo/redo
+│ └── orchestrator/ # Event-driven execution coordination
+│
+├── session/ # 📊 Session lifecycle management
+│ ├── galaxy_session.py # GalaxySession implementation
+│ └── observers/ # Event-driven observers
+│
+├── client/ # 📡 Device management
+│ ├── constellation_client.py # Device registration interface
+│ ├── device_manager.py # Device management coordinator
+│ ├── config_loader.py # Configuration loading
+│ ├── components/ # Device registry, connection manager, etc.
+│ └── support/ # Client support utilities
+│
+├── core/ # ⚡ Foundational components
+│ ├── types.py # Type system (protocols, dataclasses, enums)
+│ ├── interfaces.py # Interface definitions
+│ ├── di_container.py # Dependency injection container
+│ └── events.py # Event system
+│
+├── visualization/ # 🎨 Rich console visualization
+│ ├── dag_visualizer.py # DAG topology visualization
+│ ├── task_display.py # Task status displays
+│ └── components/ # Visualization components
+│
+├── prompts/ # 💬 Prompt templates
+│ ├── constellation_agent/ # ConstellationAgent prompts
+│ └── share/ # Shared examples
+│
+├── trajectory/ # 📈 Execution trajectory parsing
+│
+├── __main__.py # 🚀 Entry point: python -m galaxy
+├── galaxy.py # Main Galaxy orchestrator
+├── galaxy_client.py # Galaxy client interface
+├── README.md # Galaxy overview
+└── README_ZH.md # Galaxy overview (Chinese)
+```
+
+### Key Components
+
+| Component | Description | Documentation |
+|-----------|-------------|---------------|
+| **ConstellationAgent** | AI-powered agent that generates and modifies task DAGs | [Galaxy Overview](galaxy/overview.md) |
+| **TaskConstellation** | DAG container with validation and state management | [Constellation](galaxy/constellation/overview.md) |
+| **TaskOrchestrator** | Event-driven execution coordinator | [Constellation Orchestrator](galaxy/constellation_orchestrator/overview.md) |
+| **DeviceManager** | Multi-device coordination and assignment | [Device Manager](galaxy/client/device_manager.md) |
+| **Visualization** | Rich console DAG monitoring | [Galaxy Overview](galaxy/overview.md) |
+
+**Galaxy Documentation:**
+
+- [Galaxy Overview](galaxy/overview.md) - Architecture and concepts
+- [Quick Start](getting_started/quick_start_galaxy.md) - Get started with Galaxy
+- [Constellation Agent](galaxy/constellation_agent/overview.md) - AI-powered task planning
+- [Constellation Orchestrator](galaxy/constellation_orchestrator/overview.md) - Event-driven coordination
+- [Device Manager](galaxy/client/device_manager.md) - Multi-device management
+
+---
+
+## 🎯 UFO² Desktop AgentOS (`ufo/`)
+
+Single-device desktop automation system implementing a two-tier agent architecture (HostAgent + AppAgent) with hybrid GUI-API automation.
+
+### Directory Structure
+
+```
+ufo/
+├── agents/ # Two-tier agent implementation
+│ ├── agent/ # Base agent classes (HostAgent, AppAgent)
+│ ├── states/ # State machine implementations
+│ ├── processors/ # Processing strategy pipelines
+│ ├── memory/ # Agent memory and blackboard
+│ └── presenters/ # Response presentation logic
+│
+├── server/ # Server-client architecture components
+│ ├── websocket_server.py # WebSocket server for remote agent control
+│ └── handlers/ # Request handlers
+│
+├── client/ # MCP client and device management
+│ ├── mcp/ # MCP server manager
+│ │ ├── local_servers/ # Built-in MCP servers (UI, CLI, Office COM)
+│ │ └── http_servers/ # Remote MCP servers (hardware, Linux)
+│ ├── ufo_client.py # UFO² client implementation
+│ └── computer.py # Computer/device abstraction
+│
+├── automator/ # GUI and API automation layer
+│ ├── ui_control/ # GUI automation (inspector, controller)
+│ ├── puppeteer/ # Execution orchestration
+│ └── *_automator.py # App-specific automators (Excel, Word, etc.)
+│
+├── prompter/ # Prompt construction engines
+├── prompts/ # Jinja2 prompt templates
+│ ├── host_agent/ # HostAgent prompts
+│ ├── app_agent/ # AppAgent prompts
+│ └── share/ # Shared components
+│
+├── llm/ # LLM provider integrations
+├── rag/ # Retrieval-Augmented Generation
+├── trajectory/ # Task trajectory parsing
+├── experience/ # Self-experience learning
+├── module/ # Core modules (session, round, context)
+├── config/ # Legacy config support
+├── logging/ # Logging utilities
+├── utils/ # Utility functions
+├── tools/ # CLI tools (config conversion, etc.)
+│
+├── __main__.py # Entry point: python -m ufo
+└── ufo.py # Main UFO² orchestrator
+```
+
+### Key Components
+
+| Component | Description | Documentation |
+|-----------|-------------|---------------|
+| **HostAgent** | Desktop-level orchestration with 7-state FSM | [HostAgent Overview](ufo2/host_agent/overview.md) |
+| **AppAgent** | Application-level execution with 6-state FSM | [AppAgent Overview](ufo2/app_agent/overview.md) |
+| **MCP System** | Extensible command execution framework | [MCP Overview](mcp/overview.md) |
+| **Automator** | Hybrid GUI-API automation with fallback | [Core Features](ufo2/core_features/hybrid_actions.md) |
+| **RAG** | Knowledge retrieval from multiple sources | [Knowledge Substrate](ufo2/core_features/knowledge_substrate/overview.md) |
+
+**UFO² Documentation:**
+
+- [UFO² Overview](ufo2/overview.md) - Architecture and concepts
+- [Quick Start](getting_started/quick_start_ufo2.md) - Get started with UFO²
+- [HostAgent States](ufo2/host_agent/state.md) - Desktop orchestration states
+- [AppAgent States](ufo2/app_agent/state.md) - Application execution states
+- [As Galaxy Device](ufo2/as_galaxy_device.md) - Using UFO² as Galaxy sub-agent
+- [Creating Custom Agents](tutorials/creating_app_agent/overview.md) - Build your own application agents
+
+---
+
+## 🔌 Agent Integration Protocol (`aip/`)
+
+Standardized message passing protocol for cross-device communication between Galaxy and UFO² agents.
+
+```
+aip/
+├── messages.py # Message types (Command, Result, Event, Error)
+├── protocol/ # Protocol definitions
+├── transport/ # Transport layers (HTTP, WebSocket, MQTT)
+├── endpoints/ # API endpoints
+├── extensions/ # Protocol extensions
+└── resilience/ # Retry and error handling
+```
+
+**Purpose**: Enables Galaxy to coordinate UFO² agents running on different devices and platforms through standardized messaging over HTTP/WebSocket.
+
+**Documentation**: See [AIP Overview](aip/overview.md) for protocol details and [Message Types](aip/messages.md) for message specifications.
+
+---
+
+## 🐧 Linux Agent
+
+Lightweight CLI-based agent for Linux devices that integrates with Galaxy as a third-party device agent.
+
+**Key Features**:
+- **CLI Execution**: Execute shell commands on Linux systems
+- **Galaxy Integration**: Register as device in Galaxy's multi-device orchestration
+- **Simple Architecture**: Minimal dependencies, easy deployment
+- **Cross-Platform Tasks**: Enable Windows + Linux workflows in Galaxy
+
+**Configuration**: Configured in `config/ufo/third_party.yaml` under `THIRD_PARTY_AGENT_CONFIG.LinuxAgent`
+
+**Linux Agent Documentation:**
+
+- [Linux Agent Overview](linux/overview.md) - Architecture and capabilities
+- [Quick Start](getting_started/quick_start_linux.md) - Setup and deployment
+- [As Galaxy Device](linux/as_galaxy_device.md) - Integration with Galaxy
+
+---
+
+## 📱 Mobile Agent
+
+Android device automation agent that enables UI automation, app control, and mobile-specific operations through ADB integration.
+
+**Key Features**:
+- **UI Automation**: Touch, swipe, and text input via ADB
+- **Visual Context**: Screenshot capture and UI hierarchy analysis
+- **App Management**: Launch apps, navigate between applications
+- **Galaxy Integration**: Serve as mobile device in cross-platform workflows
+- **Platform Support**: Android devices (physical and emulators)
+
+**Configuration**: Configured in `config/ufo/third_party.yaml` under `THIRD_PARTY_AGENT_CONFIG.MobileAgent`
+
+**Mobile Agent Documentation:**
+
+- [Mobile Agent Overview](mobile/overview.md) - Architecture and capabilities
+- [Quick Start](getting_started/quick_start_mobile.md) - Setup and deployment
+- [As Galaxy Device](mobile/as_galaxy_device.md) - Integration with Galaxy
+
+---
+
+## ⚙️ Configuration (`config/`)
+
+Modular configuration system with type-safe schemas and auto-discovery.
+
+```
+config/
+├── galaxy/ # Galaxy configuration
+│ ├── agent.yaml.template # ConstellationAgent LLM settings template
+│ ├── agent.yaml # ConstellationAgent LLM settings (active)
+│ ├── constellation.yaml # Constellation orchestration settings
+│ ├── devices.yaml # Multi-device registry
+│ └── dag_templates/ # Pre-built DAG templates (future)
+│
+├── ufo/ # UFO² configuration
+│ ├── agents.yaml.template # Agent LLM configs template
+│ ├── agents.yaml # Agent LLM configs (active)
+│ ├── system.yaml # System settings
+│ ├── rag.yaml # RAG settings
+│ ├── mcp.yaml # MCP server configs
+│ ├── third_party.yaml # Third-party agent configs (LinuxAgent, etc.)
+│ └── prices.yaml # API pricing data
+│
+├── config_loader.py # Auto-discovery config loader
+└── config_schemas.py # Pydantic validation schemas
+```
+
+**Configuration Files:**
+
+- Template files (`.yaml.template`) should be copied to `.yaml` and edited
+- Active config files (`.yaml`) contain API keys and should NOT be committed
+- **Galaxy**: Uses `config/galaxy/agent.yaml` for ConstellationAgent LLM settings
+- **UFO²**: Uses `config/ufo/agents.yaml` for HostAgent/AppAgent LLM settings
+- **Third-Party**: Configure LinuxAgent and HardwareAgent in `config/ufo/third_party.yaml`
+- Use `python -m ufo.tools.convert_config` to migrate from legacy configs
+
+**Configuration Documentation:**
+
+- [Configuration Overview](configuration/system/overview.md) - System architecture
+- [Agents Configuration](configuration/system/agents_config.md) - LLM and agent settings
+- [System Configuration](configuration/system/system_config.md) - Runtime and execution settings
+- [RAG Configuration](configuration/system/rag_config.md) - Knowledge retrieval
+- [Third-Party Configuration](configuration/system/third_party_config.md) - LinuxAgent and external agents
+- [MCP Configuration](configuration/system/mcp_reference.md) - MCP server setup
+- [Model Configuration](configuration/models/overview.md) - LLM provider setup
+
+---
+
+## 📖 Documentation (`documents/`)
+
+MkDocs documentation site with comprehensive guides and API references.
+
+```
+documents/
+├── docs/ # Markdown documentation source
+│ ├── getting_started/ # Installation and quick starts
+│ ├── galaxy/ # Galaxy framework docs
+│ ├── ufo2/ # UFO² architecture docs
+│ ├── linux/ # Linux agent documentation
+│ ├── mcp/ # MCP server documentation
+│ ├── aip/ # Agent Interaction Protocol docs
+│ ├── configuration/ # Configuration guides
+│ ├── infrastructure/ # Core infrastructure (agents, modules)
+│ ├── server/ # Server-client architecture docs
+│ ├── client/ # Client components docs
+│ ├── tutorials/ # Step-by-step tutorials
+│ ├── modules/ # Module-specific docs
+│ └── about/ # Project information
+│
+├── mkdocs.yml # MkDocs configuration
+└── site/ # Generated static site
+```
+
+**Documentation Sections**:
+
+| Section | Description |
+|---------|-------------|
+| **Getting Started** | Installation, quick starts, migration guides |
+| **Galaxy** | Multi-device orchestration, DAG workflows, device management |
+| **UFO²** | Desktop agents, automation features, benchmarks |
+| **Linux** | Linux agent integration, CLI executor for Galaxy |
+| **MCP** | Server documentation, custom server development |
+| **AIP** | Agent Interaction Protocol, message types, transport layers |
+| **Configuration** | System settings, model configs, deployment |
+| **Infrastructure** | Core components, agent design, server-client architecture |
+| **Tutorials** | Creating agents, custom automators, advanced usage |
+
+---
+
+## 🗄️ Supporting Modules
+
+### VectorDB (`vectordb/`)
+Vector database storage for RAG knowledge sources (help documents, execution traces, user demonstrations). See [RAG Configuration](configuration/system/rag_config.md) for setup details.
+
+### Learner (`learner/`)
+Tools for indexing help documents into vector database for RAG retrieval. Integrates with the [Knowledge Substrate](ufo2/core_features/knowledge_substrate/overview.md) feature.
+
+### Record Processor (`record_processor/`)
+Parses human demonstrations from Windows Step Recorder for learning from user actions.
+
+### Dataflow (`dataflow/`)
+Data collection pipeline for Large Action Model (LAM) training. See the [Dataflow](ufo2/dataflow/overview.md) documentation for workflow details.
+
+### Model Worker (`model_worker/`)
+Custom LLM deployment tools for running local models. See [Model Configuration](configuration/models/overview.md) for supported providers.
+
+### Logs (`logs/`)
+Auto-generated execution logs organized by task and timestamp, including screenshots, UI trees, and agent actions.
+
+---
+
+## 🎯 Galaxy vs UFO² vs Linux Agent vs Mobile Agent: When to Use What?
+
+| Aspect | Galaxy | UFO² | Linux Agent | Mobile Agent |
+|--------|--------|------|-------------|--------------|
+| **Scope** | Multi-device orchestration | Single-device Windows automation | Single-device Linux CLI | Single-device Android automation |
+| **Use Cases** | Cross-platform workflows, distributed tasks | Desktop automation, Office tasks | Server management, CLI operations | Mobile app testing, UI automation |
+| **Architecture** | DAG-based task workflows | Two-tier state machines | Simple CLI executor | UI automation via ADB |
+| **Platform** | Orchestrator (platform-agnostic) | Windows | Linux | Android |
+| **Complexity** | Complex multi-step workflows | Simple to moderate tasks | Simple command execution | UI interaction and app control |
+| **Best For** | Cross-device collaboration | Windows desktop tasks | Linux server operations | Mobile app automation |
+| **Integration** | Orchestrates all agents | Can be Galaxy device | Can be Galaxy device | Can be Galaxy device |
+
+**Choosing the Right Framework:**
+
+- **Use Galaxy** when: Tasks span multiple devices/platforms, complex workflows with dependencies
+- **Use UFO² Standalone** when: Single-device Windows automation, rapid prototyping
+- **Use Linux Agent** when: Linux server/CLI operations needed in Galaxy workflows
+- **Use Mobile Agent** when: Android device automation, mobile app testing, UI interactions
+- **Best Practice**: Galaxy orchestrates UFO² (Windows) + Linux Agent (Linux) + Mobile Agent (Android) for comprehensive cross-platform tasks
+
+---
+
+## 🚀 Quick Start
+
+### Galaxy Multi-Device Orchestration
+
+```bash
+# Interactive mode
+python -m galaxy --interactive
+
+# Single request
+python -m galaxy --request "Your cross-device task"
+```
+
+**Documentation**: [Galaxy Quick Start](getting_started/quick_start_galaxy.md)
+
+### UFO² Desktop Automation
```bash
-📦project
- ┣ 📂documents # Folder to store project documentation
- ┣ 📂learner # Folder to build the vector database for help documents
- ┣ 📂model_worker # Folder to store tools for deploying your own model
- ┣ 📂record_processor # Folder to parse human demonstrations from Windows Step Recorder and build the vector database
- ┣ 📂dataflow # Folder for the code of data collection pipeline for Large Action Model (LAM)
- ┣ 📂vetordb # Folder to store all data in the vector database for RAG (Retrieval-Augmented Generation)
- ┣ 📂logs # Folder to store logs, generated after the program starts
- ┗ 📂ufo # Directory containing main project code
- ┣ 📂module # Directory for the basic module of UFO, e.g., session and round
- ┣ 📂agents # Code implementation of agents in UFO
- ┣ 📂automator # Implementation of the skill set of agents to automate applications
- ┣ 📂experience # Parse and save the agent's self-experience
- ┣ 📂llm # Folder to store the LLM (Large Language Model) implementation
- ┣ 📂prompter # Prompt constructor for the agent
- ┣ 📂prompts # Prompt templates and files to construct the full prompt
- ┣ 📂rag # Implementation of RAG from different sources to enhance agents' abilities
- ┣ 📂trajectory # Implementation of loading and parsing trajectories of task completion
- ┣ 📂utils # Utility functions
- ┣ 📂config # Configuration files
- ┣ 📜config.yaml # User configuration file for LLM and other settings
- ┣ 📜config_dev.yaml # Configuration file for developers
- ┗ ...
- ┗ 📄ufo.py # Main entry point for the UFO client
+# Interactive mode
+python -m ufo --task
+
+# With custom config
+python -m ufo --task --config_path config/ufo/
```
-## Directory and File Descriptions
-
-### [documents]()
-- **Purpose:** Stores all the project documentation.
-- **Details:** This may include design documents, user manuals, API documentation, and any other relevant project documentation.
-
-### [learner](https://github.com/microsoft/UFO/tree/main/learner)
-- **Purpose:** Used to build the vector database for help documents.
-- **Details:** This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion.
-### [model_worker](https://github.com/microsoft/UFO/tree/main/model_worker)
-- **Purpose:** Contains tools and scripts necessary for deploying custom models.
-- **Details:** This includes model deployment configurations, and management tools for integrating custom models into the project.
-### [dataflow](https://github.com/microsoft/UFO/tree/main/dataflow)
-- **Purpose:** Contains the code for the data collection pipeline for the Large Action Model (LAM).
-- **Details:** This directory includes scripts and tools for collecting and processing data to train the Large Action Model, improving the agents' performance and capabilities.
-### [record_processor](https://github.com/microsoft/UFO/tree/main/record_processor)
-- **Purpose:** Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database.
-- **Details:** This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval.
-### [vetordb](https://github.com/microsoft/UFO/tree/main/vectordb)
-- **Purpose:** Stores all data within the vector database for Retrieval-Augmented Generation (RAG).
-- **Details:** This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses.
-### [logs]()
-- **Purpose:** Stores log files generated by the application.
-- **Details:** This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs.
-### [ufo](https://github.com/microsoft/UFO/tree/main/ufo)
-- **Purpose:** The core directory containing the main project code.
-- **Details:** This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project.
-
- #### [module](https://github.com/microsoft/UFO/tree/main/ufo/module)
- - **Purpose:** Contains the basic modules of the UFO project, such as session management and rounds.
- - **Details:** This includes foundational classes and functions that are used throughout the project.
- #### [agents](https://github.com/microsoft/UFO/tree/main/ufo/agents)
- - **Purpose:** Houses the code implementations of various agents in the UFO project.
- - **Details:** Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior.
- #### [automator](https://github.com/microsoft/UFO/tree/main/ufo/automator)
- - **Purpose:** Implements the skill set of agents to automate applications.
- - **Details:** This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls.
- #### [experience](https://github.com/microsoft/UFO/tree/main/ufo/experience)
- - **Purpose:** Parses and saves the agent's self-experience.
- - **Details:** This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time.
- #### [llm](https://github.com/microsoft/UFO/tree/main/ufo/llm)
- - **Purpose:** Stores the implementation of the Large Language Model (LLM).
- - **Details:** This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents.
- #### [prompter](https://github.com/microsoft/UFO/tree/main/ufo/prompter)
- - **Purpose:** Constructs prompts for the agents.
- - **Details:** This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions.
- #### [prompts](https://github.com/microsoft/UFO/tree/main/ufo/prompts)
- - **Purpose:** Contains prompt templates and files used to construct the full prompt.
- - **Details:** This includes predefined prompt structures and content that are used to create meaningful interactions with the agents.
- #### [rag](https://github.com/microsoft/UFO/tree/main/ufo/rag)
- - **Purpose:** Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities.
- - **etails:** This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs.
- #### [trajectory](https://github.com/microsoft/UFO/tree/main/ufo/trajectory)
- - **Purpose:** Implements loading and parsing of task completion trajectories.
- - **Details:** This directory includes tools and scripts to load and parse task completion trajectories, enabling agents to learn from past experiences or for evaluation purposes.
- #### [utils](https://github.com/microsoft/UFO/tree/main/ufo/utils)
- - **Purpose:** Contains utility functions.
- - **Details:** This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations.
- #### [config](https://github.com/microsoft/UFO/tree/main/ufo/config)
- - **Purpose:** Stores configuration files.
- - **Details:** This directory includes different configuration files for various environments and purposes.
- - **[config.yaml:](https://github.com/microsoft/UFO/blob/main/ufo/config/config.yaml.template)** User configuration file for LLM and other settings. You need to rename `config.yaml.template` to `config.yaml` and edit the configuration settings as needed.
- - **[config_dev.yaml](https://github.com/microsoft/UFO/blob/main/ufo/config/config_dev.yaml):** Developer-specific configuration file with settings tailored for development purposes.
- #### [ufo.py](https://github.com/microsoft/UFO/blob/main/ufo/ufo.py)
- - **Purpose:** Main entry point for the UFO client.
- - **Details:** This script initializes and starts the UFO application.
+**Documentation**: [UFO² Quick Start](getting_started/quick_start_ufo2.md)
+
+---
+
+## 📚 Key Documentation Links
+
+### Getting Started
+- [Installation & Setup](getting_started/quick_start_galaxy.md)
+- [Galaxy Quick Start](getting_started/quick_start_galaxy.md)
+- [UFO² Quick Start](getting_started/quick_start_ufo2.md)
+- [Linux Agent Quick Start](getting_started/quick_start_linux.md)
+- [Mobile Agent Quick Start](getting_started/quick_start_mobile.md)
+- [Migration Guide](getting_started/migration_ufo2_to_galaxy.md)
+
+### Galaxy Framework
+- [Galaxy Overview](galaxy/overview.md)
+- [Constellation Agent](galaxy/constellation_agent/overview.md)
+- [Constellation Orchestrator](galaxy/constellation_orchestrator/overview.md)
+- [Task Constellation](galaxy/constellation/overview.md)
+- [Device Manager](galaxy/client/device_manager.md)
+
+### UFO² Desktop AgentOS
+- [UFO² Overview](ufo2/overview.md)
+- [HostAgent](ufo2/host_agent/overview.md)
+- [AppAgent](ufo2/app_agent/overview.md)
+- [Core Features](ufo2/core_features/hybrid_actions.md)
+- [As Galaxy Device](ufo2/as_galaxy_device.md)
+
+### Linux Agent
+- [Linux Agent Overview](linux/overview.md)
+- [As Galaxy Device](linux/as_galaxy_device.md)
+
+### Mobile Agent
+- [Mobile Agent Overview](mobile/overview.md)
+- [As Galaxy Device](mobile/as_galaxy_device.md)
+
+### MCP System
+- [MCP Overview](mcp/overview.md)
+- [Local Servers](mcp/local_servers.md)
+- [Creating MCP Servers](tutorials/creating_mcp_servers.md)
+
+### Agent Integration Protocol
+- [AIP Overview](aip/overview.md)
+- [Message Types](aip/messages.md)
+- [Transport Layers](aip/transport.md)
+
+### Configuration
+- [Configuration Overview](configuration/system/overview.md)
+- [Agents Configuration](configuration/system/agents_config.md)
+- [System Configuration](configuration/system/system_config.md)
+- [Model Configuration](configuration/models/overview.md)
+- [MCP Configuration](configuration/system/mcp_reference.md)
+
+---
+## 🏗️ Architecture Principles
+UFO³ follows **SOLID principles** and established software engineering patterns:
+- **Single Responsibility**: Each component has a focused purpose
+- **Open/Closed**: Extensible through interfaces and plugins
+- **Interface Segregation**: Focused interfaces for different capabilities
+- **Dependency Inversion**: Dependency injection for loose coupling
+- **Event-Driven**: Observer pattern for real-time monitoring
+- **State Machines**: Well-defined states and transitions for agents
+- **Command Pattern**: Encapsulated DAG editing with undo/redo
+---
+## 📝 Additional Resources
+- **[GitHub Repository](https://github.com/microsoft/UFO)** - Source code and issues
+- **[Research Paper](https://arxiv.org/abs/2504.14603)** - UFO³ technical details
+- **[Documentation Site](https://microsoft.github.io/UFO/)** - Full documentation
+- **[Video Demo](https://www.youtube.com/watch?v=QT_OhygMVXU)** - YouTube demonstration
+---
+**Next Steps:**
+1. Start with [Galaxy Quick Start](getting_started/quick_start_galaxy.md) for multi-device orchestration
+2. Or explore [UFO² Quick Start](getting_started/quick_start_ufo2.md) for single-device automation
+3. Check [FAQ](faq.md) for common questions
+4. Join our community and contribute!
diff --git a/documents/docs/prompts/api_prompts.md b/documents/docs/prompts/api_prompts.md
deleted file mode 100644
index 86c5d7520..000000000
--- a/documents/docs/prompts/api_prompts.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# API Prompts
-
-The API prompts provide the description and usage of the APIs used in UFO. Shared APIs and app-specific APIs are stored in different directories:
-
-| Directory | Description |
-| --- | --- |
-| `ufo/prompts/share/base/api.yaml` | Shared APIs used by multiple applications |
-| `ufo/prompts/{app_name}` | APIs specific to an application |
-
-!!! info
- You can configure the API prompt used in the `config.yaml` file. You can find more information about the configuration file [here](../configurations/developer_configuration.md).
-!!! tip
- You may customize the API prompt for a specific application by adding the API prompt in the application's directory.
-
-
-## Example API Prompt
-
-Below is an example of an API prompt:
-
-```yaml
-click_input:
- summary: |-
- "click_input" is to click the control item with mouse.
- class_name: |-
- ClickInputCommand
- usage: |-
- [1] API call: click_input(button: str, double: bool)
- [2] Args:
- - button: 'The mouse button to click. One of ''left'', ''right'', ''middle'' or ''x'' (Default: ''left'')'
- - double: 'Whether to perform a double click or not (Default: False)'
- [3] Example: click_input(button="left", double=False)
- [4] Available control item: All control items.
- [5] Return: None
-```
-
-To create a new API prompt, follow the template above and add it to the appropriate directory.
\ No newline at end of file
diff --git a/documents/docs/prompts/basic_template.md b/documents/docs/prompts/basic_template.md
deleted file mode 100644
index b213b1637..000000000
--- a/documents/docs/prompts/basic_template.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Basic Prompt Template
-
-The basic prompt template is a fixed format that is used to generate prompts for the `HostAgent`, `AppAgent`, `FollowerAgent`, and `EvaluationAgent`. It include the template for the `system` and `user` roles to construct the agent's prompt.
-
-Below is the default file path for the basic prompt template:
-
-| Agent | File Path | Version |
-| --- | --- | --- |
-| HostAgent | [ufo/prompts/share/base/host_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/base/host_agent.yaml) | base |
-| HostAgent | [ufo/prompts/share/lite/host_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/lite/host_agent.yaml) | lite |
-| AppAgent | [ufo/prompts/share/base/app_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/base/app_agent.yaml) | base |
-| AppAgent | [ufo/prompts/share/lite/app_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/lite/app_agent.yaml) | lite |
-| FollowerAgent | [ufo/prompts/share/base/app_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/base/app_agent.yaml) | base |
-| FollowerAgent | [ufo/prompts/share/lite/app_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/lite/app_agent.yaml) | lite |
-| EvaluationAgent | [ufo/prompts/evaluation/evaluation_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/evaluation/evaluate.yaml) | - |
-
-!!! info
- You can configure the prompt template used in the `config.yaml` file. You can find more information about the configuration file [here](../configurations/developer_configuration.md).
diff --git a/documents/docs/prompts/examples_prompts.md b/documents/docs/prompts/examples_prompts.md
deleted file mode 100644
index 46b9fb8b4..000000000
--- a/documents/docs/prompts/examples_prompts.md
+++ /dev/null
@@ -1,84 +0,0 @@
-# Example Prompts
-
-The example prompts are used to generate textual demonstration examples for in-context learning. The examples are stored in the `ufo/prompts/examples` directory, with the following subdirectories:
-
-| Directory | Description |
-| --- | --- |
-| `lite` | Lite version of demonstration examples |
-| `non-visual` | Examples for non-visual LLMs |
-| `visual` | Examples for visual LLMs |
-
-!!!info
- You can configure the example prompt used in the `config.yaml` file. You can find more information about the configuration file [here](../configurations/developer_configuration.md).
-
-
-## Example Prompts
-
-Below are examples for the `HostAgent` and `AppAgent`:
-
-- **HostAgent**:
-
-```yaml
-Request: |-
- Summarize and add all to do items on Microsoft To Do from the meeting notes email, and write a summary on the meeting_notes.docx.
-Response:
- Observation: |-
- The current screenshot shows the Microsoft To Do application is visible, and outlook application and the meeting_notes.docx are available in the list of applications.
- Thought: |-
- The user request can be decomposed into three sub-tasks: (1) Summarize all to do items on Microsoft To Do from the meeting_notes email, (2) Add all to do items to Microsoft To Do, and (3) Write a summary on the meeting_notes.docx. I need to open the Microsoft To Do application to complete the first two sub-tasks.
- Each sub-task will be completed in individual applications sequentially.
- CurrentSubtask: |-
- Summarized all to do items from the meeting notes email in Outlook.
- Message:
- - (1) You need to first search for the meeting notes email in Outlook to summarize.
- - (2) Only summarize the to do items from the meeting notes email, without any redundant information.
- ControlLabel: |-
- 16
- ControlText: |-
- Mail - Outlook - Jim
- Status: |-
- CONTINUE
- Plan:
- - Add all to do items previously summarized from the meeting notes email to one-by-one Microsoft To Do.
- - Write a summary about the meeting notes email on the meeting_notes.docx.
- Comment: |-
- I plan to first summarize all to do items from the meeting notes email in Outlook.
- Questions: []
-```
-
-- **AppAgent**:
-
-```yaml
-Request: |-
- How many stars does the Imdiffusion repo have?
-Sub-task: |-
- Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
-Response:
- Observation: |-
- I observe that the Edge browser is visible in the screenshot, with the Google search page opened.
- Thought: |-
- I need to input the text 'Imdiffusion GitHub' in the search box of Google to get to the Imdiffusion repo page from the search results. The search box is usually in a type of ComboBox.
- ControlLabel: |-
- 36
- ControlText: |-
- 搜索
- Function: |-
- set_edit_text
- Args:
- {"text": "Imdiffusion GitHub"}
- Status: |-
- CONTINUE
- Plan:
- - (1) After input 'Imdiffusion GitHub', click Google Search to search for the Imdiffusion repo on github.
- - (2) Once the searched results are visible, click the Imdiffusion repo Hyperlink in the searched results to open the repo page.
- - (3) Observing and summarize the number of stars the Imdiffusion repo page, and reply to the user request.
- Comment: |-
- I plan to use Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
- SaveScreenshot:
- {"save": false, "reason": ""}
-Tips: |-
- - The search box is usually in a type of ComboBox.
- - The number of stars of a Github repo page can be found in the repo page visually.
-```
-
-These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.
\ No newline at end of file
diff --git a/documents/docs/prompts/overview.md b/documents/docs/prompts/overview.md
deleted file mode 100644
index 98106c267..000000000
--- a/documents/docs/prompts/overview.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Prompts
-
-All prompts used in UFO are stored in the `ufo/prompts` directory. The folder structure is as follows:
-
-```bash
-📦prompts
- ┣ 📂apps # Stores API prompts for specific applications
- ┣ 📂excel # Stores API prompts for Excel
- ┣ 📂word # Stores API prompts for Word
- ┗ ...
- ┣ 📂demonstration # Stores prompts for summarizing demonstrations from humans using Step Recorder
- ┣ 📂experience # Stores prompts for summarizing the agent's self-experience
- ┣ 📂evaluation # Stores prompts for the EvaluationAgent
- ┣ 📂examples # Stores demonstration examples for in-context learning
- ┣ 📂lite # Lite version of demonstration examples
- ┣ 📂non-visual # Examples for non-visual LLMs
- ┗ 📂visual # Examples for visual LLMs
- ┗ 📂share # Stores shared prompts
- ┣ 📂lite # Lite version of shared prompts
- ┗ 📂base # Basic version of shared prompts
- ┣ 📜api.yaml # Basic API prompt
- ┣ 📜app_agent.yaml # Basic AppAgent prompt template
- ┗ 📜host_agent.yaml # Basic HostAgent prompt template
-```
-
-!!! note
- The `lite` version of prompts is a simplified version of the full prompts, which is used for LLMs that have a limited token budget. However, the `lite` version is not fully optimized and may lead to **suboptimal** performance.
-
-!!! note
- The `non-visual` and `visual` folders contain examples for non-visual and visual LLMs, respectively.
-
-## Agent Prompts
-
-Prompts used an agent usually contain the following information:
-
-| Prompt | Description |
-| --- | --- |
-| `Basic template` | A basic template for the agent prompt. |
-| `API` | A prompt for all skills and APIs used by the agent. |
-| `Examples` | Demonstration examples for the agent for in-context learning. |
-
-You can find these prompts `share` directory. The prompts for specific applications are stored in the `apps` directory.
-
-
-!!! tip
- All information is constructed using the agent's `Prompter` class. You can find more details about the `Prompter` class in the documentation [here](../agents/design/prompter.md).
-
-
diff --git a/documents/docs/server/api.md b/documents/docs/server/api.md
new file mode 100644
index 000000000..31a24ef73
--- /dev/null
+++ b/documents/docs/server/api.md
@@ -0,0 +1,1390 @@
+# HTTP API Reference
+
+The UFO Server provides a RESTful HTTP API for external systems to dispatch tasks, monitor client connections, retrieve results, and perform health checks. All endpoints are prefixed with `/api`.
+
+## 🎯 Overview
+
+```mermaid
+graph LR
+ subgraph "External Systems"
+ Web[Web App]
+ Script[Python Script]
+ Tool[Automation Tool]
+ end
+
+ subgraph "UFO Server HTTP API"
+ Dispatch[POST /api/dispatch]
+ Clients[GET /api/clients]
+ Result[GET /api/task_result]
+ Health[GET /api/health]
+ end
+
+ subgraph "Server Core"
+ WSM[Client Connection Manager]
+ SM[Session Manager]
+ WH[WebSocket Handler]
+ end
+
+ Web --> Dispatch
+ Script --> Clients
+ Tool --> Result
+ Tool --> Health
+
+ Dispatch --> WSM
+ Dispatch --> SM
+ Clients --> WSM
+ Result --> SM
+ Health --> WSM
+ Health --> SM
+
+ WSM --> WH
+ SM --> WH
+
+ style Dispatch fill:#bbdefb
+ style Clients fill:#c8e6c9
+ style Result fill:#fff9c4
+ style Health fill:#ffcdd2
+```
+
+**Core Capabilities:**
+
+| Capability | Endpoint | Description |
+|------------|----------|-------------|
+| **Task Dispatch** | `POST /api/dispatch` | Send tasks to connected devices via HTTP |
+| **Client Monitoring** | `GET /api/clients` | Query connected devices and constellations |
+| **Result Retrieval** | `GET /api/task_result/{task_name}` | Fetch task execution results |
+| **Health Checks** | `GET /api/health` | Monitor server status and uptime |
+
+**Why Use the HTTP API?**
+
+- **External Integration**: Trigger UFO tasks from web apps, scripts, or CI/CD pipelines
+- **Stateless**: No WebSocket connection required
+- **RESTful**: Standard HTTP methods and JSON payloads
+- **Monitoring**: Health checks for load balancers and monitoring systems
+
+---
+
+## 📡 Endpoints
+
+### POST /api/dispatch
+
+Send a task to a connected device without establishing a WebSocket connection. Ideal for external systems, web apps, and automation scripts.
+
+#### Request Format
+
+**Corrected Request Body** (based on actual source code):
+
+```json
+{
+ "client_id": "device_windows_001",
+ "request": "Open Chrome and navigate to github.com",
+ "task_name": "github_navigation_task"
+}
+```
+
+**Request Schema:**
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `client_id` | `string` | ✅ **Yes** | - | Target client identifier (device or constellation) |
+| `request` | `string` | ✅ **Yes** | - | Natural language task description (user request) |
+| `task_name` | `string` | ⚠️ No | Auto-generated UUID | Human-readable task identifier |
+
+**Important:** The correct parameter names (verified from source code) are:
+- `client_id` (not `device_id`)
+- `request` (not `task`)
+- `task_name` (optional identifier)
+
+#### Success Response (200)
+
+```json
+{
+ "status": "dispatched",
+ "task_name": "github_navigation_task",
+ "client_id": "device_windows_001",
+ "session_id": "d4e5f6a7-b8c9-1234-5678-9abcdef01234"
+}
+```
+
+**Response Schema:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `string` | Always `"dispatched"` on success |
+| `task_name` | `string` | Task identifier (from request or auto-generated) |
+| `client_id` | `string` | Target client ID |
+| `session_id` | `string` | UUID for tracking task execution (use with `/api/task_result`) |
+
+#### Error Responses
+
+**Client Not Online (404):**
+```json
+{
+ "detail": "Client not online"
+}
+```
+
+**Cause:** Target client is not connected to the server.
+
+**Solution:** Check `/api/clients` to see available clients.
+
+**Empty Client ID (400):**
+```json
+{
+ "detail": "Empty client ID"
+}
+```
+
+**Cause:** `client_id` field is missing or empty.
+
+**Solution:** Provide a valid `client_id` in the request body.
+
+**Empty Task Content (400):**
+```json
+{
+ "detail": "Empty task content"
+}
+```
+
+**Cause:** `request` field is missing or empty.
+
+**Solution:** Provide a non-empty task description in the `request` field.
+
+#### Implementation Details
+
+**Source Code** (verified from `ufo/server/services/api.py`):
+
+```python
+@router.post("/api/dispatch")
+async def dispatch_task_api(data: Dict[str, Any]):
+ # Extract parameters
+ client_id = data.get("client_id")
+ user_request = data.get("request", "")
+ task_name = data.get("task_name", str(uuid4())) # Auto-generate if not provided
+
+ # Validation: Empty request
+ if not user_request:
+ logger.error(f"Got empty task content for client {client_id}.")
+ raise HTTPException(status_code=400, detail="Empty task content")
+
+ # Validation: Empty client ID
+ if not client_id:
+ logger.error("Client ID must be provided.")
+ raise HTTPException(status_code=400, detail="Empty client ID")
+
+ # Logging
+ if not task_name:
+ logger.warning(f"Task name not provided, using {task_name}.")
+ else:
+ logger.info(f"Task name: {task_name}.")
+
+ logger.info(f"Dispatching task '{user_request}' to client '{client_id}'")
+
+ # Get client WebSocket
+ ws = client_manager.get_client(client_id)
+ if not ws:
+ logger.error(f"Client {client_id} not online.")
+ raise HTTPException(status_code=404, detail="Client not online")
+
+ # Use AIP TaskExecutionProtocol to send task
+ transport = WebSocketTransport(ws)
+ task_protocol = TaskExecutionProtocol(transport)
+
+ session_id = str(uuid4())
+ response_id = str(uuid4())
+
+ logger.info(
+ f"[AIP] Sending task assignment via API: task_name={task_name}, "
+ f"session_id={session_id}, client_id={client_id}"
+ )
+
+ # Send via AIP protocol
+ await task_protocol.send_task_assignment(
+ user_request=user_request,
+ task_name=task_name,
+ session_id=session_id,
+ response_id=response_id,
+ )
+
+ return {
+ "status": "dispatched",
+ "task_name": task_name,
+ "client_id": client_id,
+ "session_id": session_id,
+ }
+ ```
+
+**Tip:** Use the returned `session_id` to track results via `GET /api/task_result/{task_name}`.
+
+#### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+ participant Client as External Client
+ participant API as HTTP API
+ participant WSM as Client Connection Manager
+ participant WS as Client WebSocket
+
+ Client->>API: POST /api/dispatch {client_id, request, task_name}
+
+ Note over API: Validate request (client_id, request not empty)
+
+ API->>WSM: get_client(client_id)
+ WSM-->>API: WebSocket connection
+
+ alt Client Not Online
+ WSM-->>API: None
+ API-->>Client: 404: Client not online
+ end
+
+ Note over API: Generate session_id Generate response_id
+
+ API->>WS: send_task_assignment() (via AIP TaskExecutionProtocol)
+
+ Note over WS: Task queued for execution
+
+ API-->>Client: 200: {status: "dispatched", session_id, task_name}
+
+ Note over Client: Poll /api/task_result/{task_name} to get result
+```
+
+---
+
+### GET /api/clients
+
+Query all currently connected clients (devices and constellations) to determine which targets are available for task dispatch.
+
+#### Request
+
+```http
+GET /api/clients
+```
+
+**No parameters required.**
+
+#### Success Response (200)
+
+```json
+{
+ "online_clients": [
+ "device_windows_001",
+ "device_linux_002",
+ "constellation_orchestrator_001"
+ ]
+}
+```
+
+**Response Schema:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `online_clients` | `array` | List of all connected client IDs |
+
+**Source Code:**
+
+```python
+@router.get("/api/clients")
+async def list_clients():
+ return {"online_clients": client_manager.list_clients()}
+```
+
+#### Usage Patterns
+
+**Source Code:**
+
+```python
+@router.get("/api/clients")
+async def list_clients():
+ return {"online_clients": client_manager.list_clients()}
+```
+
+#### Usage Patterns
+
+**Check Device Availability:**
+```python
+import requests
+
+response = requests.get("http://localhost:5000/api/clients")
+clients = response.json()["online_clients"]
+
+target_device = "device_windows_001"
+
+if target_device in clients:
+ print(f"✅ {target_device} is online")
+ # Dispatch task
+else:
+ print(f"❌ {target_device} is offline")
+```
+
+**Filter by Client Type:**
+```python
+# Note: Current API doesn't return client types
+# You must know your client naming convention
+# Example: devices start with "device_", constellations with "constellation_"
+
+clients = response.json()["online_clients"]
+
+devices = [c for c in clients if c.startswith("device_")]
+constellations = [c for c in clients if c.startswith("constellation_")]
+
+print(f"Devices online: {len(devices)}")
+print(f"Constellations online: {len(constellations)}")
+```
+
+**Monitor Client Count:**
+```python
+import time
+
+while True:
+ response = requests.get("http://localhost:5000/api/clients")
+ clients = response.json()["online_clients"]
+
+ print(f"[{time.strftime('%H:%M:%S')}] Clients online: {len(clients)}")
+
+ time.sleep(10) # Check every 10 seconds
+```
+
+---
+
+### GET /api/task_result/{task_name}
+
+Poll this endpoint to get the result of a dispatched task. Use the `task_name` returned from `/api/dispatch`.
+
+#### Request
+
+```http
+GET /api/task_result/github_navigation_task
+```
+
+**Path Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `task_name` | `string` | Task identifier (from `/api/dispatch` response) |
+
+#### Response States
+
+**Pending (200):**
+Task is still running:
+
+```json
+{
+ "status": "pending"
+}
+```
+
+**Action:** Continue polling until status changes to `"done"`.
+
+**Completed (200):**
+Task has finished:
+
+```json
+{
+ "status": "done",
+ "result": {
+ "action": "Opened Chrome and navigated to github.com",
+ "screenshot": "base64_encoded_image_data",
+ "control_label": "Address bar",
+ "control_text": "github.com"
+ }
+}
+```
+
+**Action:** Process the result. The `result` structure depends on the task type and device implementation.
+
+**Not Found (Implicit):**
+If `task_name` doesn't exist in session manager:
+
+```json
+{
+ "status": "pending"
+}
+```
+
+**Note:** Current implementation returns `{"status": "pending"}` for non-existent tasks (not a 404 error).
+
+#### Implementation Details
+
+**Source Code:**
+
+```python
+@router.get("/api/task_result/{task_name}")
+async def get_task_result(task_name: str):
+ # Query session manager for result
+ result = session_manager.get_result_by_task(task_name)
+
+ if not result:
+ return {"status": "pending"}
+
+ return {"status": "done", "result": result}
+```
+
+**Note on Result Retention:**
+
+Results are stored in memory and may be cleared after:
+
+- Server restart
+- Session cleanup (if implemented)
+- Memory limits reached
+
+**Recommendation:** Poll frequently and persist results on the client side.
+
+#### Polling Pattern
+
+**Recommended Polling Implementation:**
+
+```python
+ import requests
+ import time
+
+ def wait_for_result(task_name: str, timeout: int = 300, interval: int = 2) -> dict:
+ """
+ Poll for task result with timeout.
+
+ Args:
+ task_name: Task identifier
+ timeout: Maximum wait time in seconds (default: 5 minutes)
+ interval: Poll interval in seconds (default: 2 seconds)
+
+ Returns:
+ Task result dictionary
+
+ Raises:
+ TimeoutError: If task doesn't complete within timeout
+ """
+ start_time = time.time()
+
+ while True:
+ elapsed = time.time() - start_time
+
+ if elapsed > timeout:
+ raise TimeoutError(
+ f"Task '{task_name}' did not complete within {timeout}s"
+ )
+
+ response = requests.get(
+ f"http://localhost:5000/api/task_result/{task_name}"
+ )
+ data = response.json()
+
+ if data["status"] == "done":
+ print(f"✅ Task completed in {elapsed:.1f}s")
+ return data["result"]
+
+ print(f"⏳ Waiting for task... ({elapsed:.0f}s)")
+ time.sleep(interval)
+
+ # Usage
+ try:
+ result = wait_for_result("github_navigation_task", timeout=60)
+ print("Result:", result)
+ except TimeoutError as e:
+ print(f"❌ {e}")
+ ```
+
+---
+
+### GET /api/health
+
+Use this endpoint for monitoring systems, load balancers, and Kubernetes liveness/readiness probes.
+
+#### Request
+
+```http
+GET /api/health
+```
+
+**No parameters required.**
+
+#### Success Response (200)
+
+```json
+{
+ "status": "healthy",
+ "online_clients": [
+ "device_windows_001",
+ "device_linux_002",
+ "constellation_orchestrator_001"
+ ]
+}
+```
+
+**Response Schema:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `string` | Always `"healthy"` if server is responding |
+| `online_clients` | `array` | List of connected client IDs |
+
+#### Implementation Details
+
+**Source Code:**
+
+```python
+@router.get("/api/health")
+async def health_check():
+ return {
+ "status": "healthy",
+ "online_clients": client_manager.list_clients()
+ }
+```
+
+#### Integration Examples
+
+**Kubernetes Liveness Probe:**
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ name: ufo-server
+spec:
+ containers:
+ - name: ufo-server
+ image: ufo-server:latest
+ ports:
+ - containerPort: 5000
+ livenessProbe:
+ httpGet:
+ path: /api/health
+ port: 5000
+ initialDelaySeconds: 10
+ periodSeconds: 30
+ timeoutSeconds: 5
+ failureThreshold: 3
+ readinessProbe:
+ httpGet:
+ path: /api/health
+ port: 5000
+ initialDelaySeconds: 5
+ periodSeconds: 10
+```
+
+**Monitoring Script:**
+```python
+import requests
+import time
+
+def monitor_server_health(url="http://localhost:5000/api/health"):
+ """Continuous health monitoring."""
+ consecutive_failures = 0
+
+ while True:
+ try:
+ response = requests.get(url, timeout=5)
+
+ if response.status_code == 200:
+ data = response.json()
+ client_count = len(data.get("online_clients", []))
+
+ print(
+ f"✅ Server healthy - {client_count} clients connected"
+ )
+ consecutive_failures = 0
+ else:
+ consecutive_failures += 1
+ print(
+ f"⚠️ Server returned {response.status_code} "
+ f"(failures: {consecutive_failures})"
+ )
+ except requests.RequestException as e:
+ consecutive_failures += 1
+ print(
+ f"❌ Server unreachable: {e} "
+ f"(failures: {consecutive_failures})"
+ )
+
+ if consecutive_failures >= 3:
+ # Trigger alert (email, Slack, PagerDuty, etc.)
+ send_alert(f"Server down for {consecutive_failures} checks")
+
+ time.sleep(30)
+```
+
+**nginx Health Check:**
+```nginx
+upstream ufo_backend {
+ server localhost:5000;
+
+ # Health check (requires nginx plus or third-party module)
+ check interval=10000 rise=2 fall=3 timeout=5000 type=http;
+ check_http_send "GET /api/health HTTP/1.0\r\n\r\n";
+ check_http_expect_alive http_2xx http_3xx;
+}
+```
+
+---
+
+## 💻 Usage Examples
+
+### Python (requests)
+
+**Complete Task Dispatch Workflow:**
+
+```python
+ import requests
+ import time
+
+ BASE_URL = "http://localhost:5000"
+
+ # Step 1: Check if target device is online
+ response = requests.get(f"{BASE_URL}/api/clients")
+ clients = response.json()["online_clients"]
+
+ target_client = "device_windows_001"
+
+ if target_client not in clients:
+ print(f"❌ {target_client} is not online")
+ exit(1)
+
+ print(f"✅ {target_client} is online")
+
+ # Step 2: Dispatch task
+ dispatch_response = requests.post(
+ f"{BASE_URL}/api/dispatch",
+ json={
+ "client_id": target_client,
+ "request": "Open Notepad and type 'Hello from UFO API'",
+ "task_name": "notepad_hello_world"
+ }
+ )
+
+ if dispatch_response.status_code != 200:
+ print(f"❌ Dispatch failed: {dispatch_response.json()}")
+ exit(1)
+
+ dispatch_data = dispatch_response.json()
+ task_name = dispatch_data["task_name"]
+ session_id = dispatch_data["session_id"]
+
+ print(f"Task dispatched: {task_name} (session: {session_id})")
+
+ # Step 3: Poll for result
+ print("⏳ Waiting for result...")
+
+ max_wait = 120 # 2 minutes
+ poll_interval = 2
+ waited = 0
+
+ while waited < max_wait:
+ result_response = requests.get(
+ f"{BASE_URL}/api/task_result/{task_name}"
+ )
+ result_data = result_response.json()
+
+ if result_data["status"] == "done":
+ print(f"✅ Task completed!")
+ print(f"Result: {result_data['result']}")
+ break
+
+ time.sleep(poll_interval)
+ waited += poll_interval
+ print(f"⏳ Still waiting... ({waited}s)")
+ else:
+ print(f"⚠️ Timeout: Task did not complete in {max_wait}s")
+ ```
+
+### cURL
+
+**Command-Line HTTP Requests:**
+
+**Dispatch Task:**
+```bash
+ curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "device_windows_001",
+ "request": "Open Calculator",
+ "task_name": "open_calculator"
+ }'
+
+ # Response:
+ # {
+ # "status": "dispatched",
+ # "task_name": "open_calculator",
+ # "client_id": "device_windows_001",
+ # "session_id": "a1b2c3d4-..."
+ # }
+ ```
+
+ **Get Clients:**
+ ```bash
+ curl http://localhost:5000/api/clients
+
+ # Response:
+ # {
+ # "online_clients": [
+ # "device_windows_001",
+ # "device_linux_002"
+ # ]
+ # }
+ ```
+
+ **Get Task Result:**
+ ```bash
+ curl http://localhost:5000/api/task_result/open_calculator
+
+ # Response (pending):
+ # {"status": "pending"}
+
+ # Response (complete):
+ # {
+ # "status": "done",
+ # "result": {"action": "Opened Calculator", ...}
+ # }
+ ```
+
+ **Health Check:**
+ ```bash
+ curl http://localhost:5000/api/health
+
+ # Response:
+ # {
+ # "status": "healthy",
+ # "online_clients": ["device_windows_001", ...]
+ # }
+ ```
+
+### JavaScript (fetch)
+
+**Browser/Node.js Integration:**
+
+```javascript
+ // Dispatch task and wait for result
+ async function dispatchAndWait(clientId, request, taskName) {
+ const BASE_URL = 'http://localhost:5000';
+
+ // Step 1: Dispatch
+ console.log(`📤 Dispatching task to ${clientId}...`);
+
+ const dispatchResponse = await fetch(`${BASE_URL}/api/dispatch`, {
+ method: 'POST',
+ headers: {'Content-Type': 'application/json'},
+ body: JSON.stringify({
+ client_id: clientId,
+ request: request,
+ task_name: taskName
+ })
+ });
+
+ if (!dispatchResponse.ok) {
+ const error = await dispatchResponse.json();
+ throw new Error(`Dispatch failed: ${error.detail}`);
+ }
+
+ const {session_id, task_name} = await dispatchResponse.json();
+ console.log(`✅ Dispatched: ${task_name} (session: ${session_id})`);
+
+ // Step 2: Poll for result
+ console.log('⏳ Waiting for result...');
+
+ const maxWait = 120000; // 2 minutes in ms
+ const pollInterval = 2000; // 2 seconds
+ const startTime = Date.now();
+
+ while (true) {
+ const elapsed = Date.now() - startTime;
+
+ if (elapsed > maxWait) {
+ throw new Error(`Timeout: Task did not complete in ${maxWait / 1000}s`);
+ }
+
+ const resultResponse = await fetch(
+ `${BASE_URL}/api/task_result/${task_name}`
+ );
+ const resultData = await resultResponse.json();
+
+ if (resultData.status === 'done') {
+ console.log('✅ Task completed!');
+ return resultData.result;
+ }
+
+ console.log(`⏳ Still waiting... (${Math.floor(elapsed / 1000)}s)`);
+ await new Promise(resolve => setTimeout(resolve, pollInterval));
+ }
+ }
+
+ // Usage
+ try {
+ const result = await dispatchAndWait(
+ 'device_windows_001',
+ 'Open Chrome and go to google.com',
+ 'chrome_google'
+ );
+ console.log('Result:', result);
+ } catch (error) {
+ console.error(', error.message);
+ }
+ ```
+
+---
+
+## ⚠️ Error Handling
+
+### Standard Error Format
+
+All API errors follow FastAPI's standard format:
+
+```json
+{
+ "detail": "Error message description"
+}
+```
+
+### HTTP Status Codes
+
+| Code | Meaning | When It Occurs | How to Handle |
+|------|---------|----------------|---------------|
+| **200** | OK | Request succeeded | Process response data |
+| **400** | Bad Request | Missing/empty `client_id` or `request` | Check request parameters |
+| **404** | Not Found | Client not online | Check `/api/clients` first |
+| **422** | Unprocessable Entity | Invalid JSON schema | Validate request body |
+| **500** | Internal Server Error | Unexpected server error | Retry or contact admin |
+
+### Error Handling Patterns
+
+**Robust Error Handling:**
+
+```python
+ import requests
+ from requests.exceptions import RequestException
+
+ def dispatch_task_safe(client_id: str, request: str, task_name: str = None):
+ """
+ Dispatch task with comprehensive error handling.
+
+ Returns:
+ dict: Response data if successful
+ None: If dispatch failed
+ """
+ try:
+ response = requests.post(
+ "http://localhost:5000/api/dispatch",
+ json={
+ "client_id": client_id,
+ "request": request,
+ "task_name": task_name
+ },
+ timeout=10
+ )
+
+ # Raise exception for 4xx/5xx status codes
+ response.raise_for_status()
+
+ return response.json()
+
+ except requests.HTTPError as e:
+ if e.response.status_code == 400:
+ detail = e.response.json().get("detail", "Unknown error")
+ print(f"Bad request: {detail}")
+
+ if "Empty client ID" in detail:
+ print(" Ensure 'client_id' is provided and not empty")
+ elif "Empty task content" in detail:
+ print(" Ensure 'request' is provided and not empty")
+
+ elif e.response.status_code == 404:
+ print(f"Client '{client_id}' is not online")
+ print(" Check /api/clients for available devices")
+
+ elif e.response.status_code == 422:
+ print(f"Invalid request format")
+ print(" Verify JSON structure matches API schema")
+
+ else:
+ print(f"HTTP {e.response.status_code}: {e.response.text}")
+
+ return None
+
+ except requests.Timeout:
+ print("Request timeout (server not responding)")
+ return None
+
+ except RequestException as e:
+ print(f"Network error: {e}")
+ return None
+
+ # Usage
+ result = dispatch_task_safe(
+ "device_windows_001",
+ "Open Notepad",
+ "notepad_task"
+ )
+
+ if result:
+ print(f"✅ Dispatched successfully: {result['session_id']}")
+ else:
+ print("❌ Dispatch failed, check errors above")
+ ```
+
+---
+
+## 💡 Best Practices
+
+### 1. Validate Client Availability
+
+Always verify the target client is online before dispatching tasks.
+
+```python
+def is_client_online(client_id: str) -> bool:
+ """Check if a client is currently connected."""
+ response = requests.get("http://localhost:5000/api/clients")
+ clients = response.json()["online_clients"]
+ return client_id in clients
+
+# Usage
+if is_client_online("device_windows_001"):
+ # Dispatch task
+ pass
+else:
+ print("Device is offline")
+```
+
+### 2. Implement Exponential Backoff
+
+Use exponential backoff to reduce server load when polling for results.
+
+```python
+import time
+
+def poll_with_backoff(task_name: str, max_wait: int = 300):
+ """Poll for result with exponential backoff."""
+ interval = 1 # Start with 1 second
+ max_interval = 30 # Cap at 30 seconds
+ waited = 0
+
+ while waited < max_wait:
+ response = requests.get(
+ f"http://localhost:5000/api/task_result/{task_name}"
+ )
+ data = response.json()
+
+ if data["status"] == "done":
+ return data["result"]
+
+ time.sleep(interval)
+ waited += interval
+
+ # Exponential backoff: 1s 2s 4s 8s 16s 30s (capped)
+ interval = min(interval * 2, max_interval)
+
+ raise TimeoutError(f"Task did not complete in {max_wait}s")
+```
+
+### 3. Use Health Checks for Monitoring
+
+Integrate health checks into your monitoring infrastructure.
+
+```python
+import requests
+import logging
+
+def check_server_health() -> bool:
+ """
+ Check server health for monitoring.
+
+ Returns:
+ True if healthy, False otherwise
+ """
+ try:
+ response = requests.get(
+ "http://localhost:5000/api/health",
+ timeout=5
+ )
+
+ if response.status_code == 200:
+ data = response.json()
+ logging.info(
+ f"Server healthy - {len(data.get('online_clients', []))} clients"
+ )
+ return True
+ else:
+ logging.warning(f"Server returned {response.status_code}")
+ return False
+
+ except requests.RequestException as e:
+ logging.error(f"Health check failed: {e}")
+ return False
+```
+
+### 4. Handle Timeouts Gracefully
+
+Set appropriate timeouts - different tasks have different execution times.
+
+```python
+def dispatch_with_timeout(
+ client_id: str,
+ request: str,
+ task_name: str,
+ result_timeout: int = 60
+):
+ """Dispatch task and wait for result with custom timeout."""
+
+ # Dispatch (short timeout for HTTP request)
+ dispatch_response = requests.post(
+ "http://localhost:5000/api/dispatch",
+ json={"client_id": client_id, "request": request, "task_name": task_name},
+ timeout=10 # 10 seconds for dispatch
+ )
+
+ task_name = dispatch_response.json()["task_name"]
+
+ # Wait for result (longer timeout for task execution)
+ start_time = time.time()
+
+ while time.time() - start_time < result_timeout:
+ result_response = requests.get(
+ f"http://localhost:5000/api/task_result/{task_name}",
+ timeout=5 # 5 seconds per poll
+ )
+
+ data = result_response.json()
+ if data["status"] == "done":
+ return data["result"]
+
+ time.sleep(2)
+
+ raise TimeoutError(
+ f"Task '{task_name}' did not complete within {result_timeout}s"
+ )
+```
+
+### 5. Log All API Interactions
+
+**Production Logging:**
+
+```python
+ import logging
+ import requests
+
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(levelname)s - %(message)s'
+ )
+
+ def dispatch_with_logging(client_id: str, request: str, task_name: str):
+ """Dispatch task with detailed logging."""
+
+ logging.info(
+ f"Dispatching task: client_id={client_id}, "
+ f"task_name={task_name}, request='{request}'"
+ )
+
+ try:
+ response = requests.post(
+ "http://localhost:5000/api/dispatch",
+ json={
+ "client_id": client_id,
+ "request": request,
+ "task_name": task_name
+ }
+ )
+
+ response.raise_for_status()
+ data = response.json()
+
+ logging.info(
+ f"Task dispatched successfully: session_id={data['session_id']}"
+ )
+
+ return data
+
+ except requests.HTTPError as e:
+ logging.error(
+ f"Dispatch failed: {e.response.status_code} - "
+ f"{e.response.json().get('detail')}"
+ )
+ raise
+ except Exception as e:
+ logging.error(f"Unexpected error during dispatch: {e}")
+ raise
+ ```
+
+### 6. Cache Client List
+
+Reduce API calls by caching the client list if you're dispatching multiple tasks.
+
+```python
+from datetime import datetime, timedelta
+
+class ClientCache:
+ def __init__(self, ttl_seconds=10):
+ self.ttl = timedelta(seconds=ttl_seconds)
+ self.cache = None
+ self.last_fetch = None
+
+ def get_clients(self):
+ """Get clients with caching."""
+ now = datetime.now()
+
+ # Return cache if still valid
+ if self.cache and self.last_fetch and (now - self.last_fetch) < self.ttl:
+ return self.cache
+
+ # Fetch new data
+ response = requests.get("http://localhost:5000/api/clients")
+ self.cache = response.json()["online_clients"]
+ self.last_fetch = now
+
+ return self.cache
+
+# Usage
+cache = ClientCache(ttl_seconds=30)
+
+for task in tasks:
+ clients = cache.get_clients() # Uses cache if within TTL
+ if task["client_id"] in clients:
+ dispatch_task(task)
+```
+
+---
+
+## 🔗 Integration Points
+
+### API Router Architecture
+
+```mermaid
+graph TB
+ subgraph "HTTP API Layer"
+ Router[FastAPI Router]
+ Dispatch[POST /api/dispatch]
+ Clients[GET /api/clients]
+ Result[GET /api/task_result]
+ Health[GET /api/health]
+ end
+
+ subgraph "Service Layer"
+ WSM[Client Connection Manager]
+ SM[Session Manager]
+ end
+
+ subgraph "Protocol Layer"
+ AIP[AIP TaskExecutionProtocol]
+ WS[WebSocket Transport]
+ end
+
+ Router --> Dispatch
+ Router --> Clients
+ Router --> Result
+ Router --> Health
+
+ Dispatch --> WSM
+ Dispatch --> AIP
+ Clients --> WSM
+ Result --> SM
+ Health --> WSM
+
+ AIP --> WS
+ WSM --> WS
+
+ style Dispatch fill:#bbdefb
+ style Clients fill:#c8e6c9
+ style Result fill:#fff9c4
+ style Health fill:#ffcdd2
+```
+
+### With Client Connection Manager
+
+**API ClientConnectionManager:**
+
+- `get_client(client_id)`: Get WebSocket connection for task dispatch
+- `list_clients()`: List all online clients
+
+**Example:**
+
+```python
+# In POST /api/dispatch
+ws = client_manager.get_client(client_id)
+if not ws:
+ raise HTTPException(status_code=404, detail="Client not online")
+
+# In GET /api/clients
+clients = client_manager.list_clients()
+return {"online_clients": clients}
+```
+
+### With Session Manager
+
+**API SessionManager:**
+
+- `get_result_by_task(task_name)`: Retrieve task result by task name
+
+**Example:**
+
+```python
+# In GET /api/task_result/{task_name}
+result = session_manager.get_result_by_task(task_name)
+
+if not result:
+ return {"status": "pending"}
+
+return {"status": "done", "result": result}
+```
+
+### With AIP Protocol
+
+**API uses AIP for task dispatch:**
+
+```python
+# Create AIP protocol instance
+transport = WebSocketTransport(ws)
+task_protocol = TaskExecutionProtocol(transport)
+
+# Send task via AIP
+await task_protocol.send_task_assignment(
+ user_request=user_request,
+ task_name=task_name,
+ session_id=session_id,
+ response_id=response_id,
+)
+```
+
+---
+
+## 📚 Complete API Reference
+
+### Endpoints Summary
+
+| Method | Endpoint | Description | Auth Required |
+|--------|----------|-------------|---------------|
+| `POST` | `/api/dispatch` | Dispatch task to client | No |
+| `GET` | `/api/clients` | List online clients | No |
+| `GET` | `/api/task_result/{task_name}` | Get task result | No |
+| `GET` | `/api/health` | Health check | No |
+
+**Note on Authentication:**
+
+The current API implementation does **not** include authentication. For production deployments, consider adding:
+
+- API keys
+- OAuth2/JWT tokens
+- Rate limiting
+- IP whitelisting
+
+### Request/Response Models
+
+#### Dispatch Request
+
+```python
+{
+ "client_id": str, # Required
+ "request": str, # Required
+ "task_name": str # Optional (auto-generated if not provided)
+}
+```
+
+#### Dispatch Response
+
+```python
+{
+ "status": "dispatched",
+ "task_name": str,
+ "client_id": str,
+ "session_id": str # UUID
+}
+```
+
+#### Clients Response
+
+```python
+{
+ "online_clients": List[str]
+}
+```
+
+#### Task Result Response
+
+```python
+# Pending
+{
+ "status": "pending"
+}
+
+# Complete
+{
+ "status": "done",
+ "result": Dict[str, Any] # Structure depends on task type
+}
+```
+
+#### Health Response
+
+```python
+{
+ "status": "healthy",
+ "online_clients": List[str]
+}
+```
+
+---
+
+## 🎓 Summary
+
+The HTTP API provides a **stateless, RESTful interface** for external systems to interact with the UFO server without maintaining WebSocket connections.
+
+**Key Characteristics:**
+
+| Aspect | Details |
+|--------|---------|
+| **Protocol** | HTTP/1.1, REST, JSON |
+| **Port** | 5000 (default, configurable) |
+| **Authentication** | None (add for production) |
+| **State** | Stateless (uses Client Connection Manager for client state) |
+| **Task Dispatch** | Via AIP TaskExecutionProtocol |
+| **Result Retrieval** | Polling-based (no push notifications) |
+
+**Use Cases:**
+
+1. **Web Applications**: Trigger UFO tasks from web frontends
+2. **Automation Scripts**: Integrate UFO into CI/CD pipelines
+3. **External Tools**: Connect third-party systems to UFO
+4. **Monitoring**: Health checks for infrastructure monitoring
+
+**Architecture Position:**
+
+```mermaid
+graph TD
+ subgraph "External World"
+ E1[Web App]
+ E2[Python Script]
+ E3[Automation Tool]
+ end
+
+ subgraph "UFO Server"
+ API[HTTP API]
+ WSM[Client Connection Manager]
+ SM[Session Manager]
+ WH[WebSocket Handler]
+ end
+
+ subgraph "Clients"
+ D1[Device 1]
+ D2[Device 2]
+ C1[Constellation]
+ end
+
+ E1 -->|HTTP POST/GET| API
+ E2 -->|HTTP POST/GET| API
+ E3 -->|HTTP POST/GET| API
+
+ API --> WSM
+ API --> SM
+
+ WSM --> WH
+ SM --> WH
+
+ WH <-->|WebSocket| D1
+ WH <-->|WebSocket| D2
+ WH <-->|WebSocket| C1
+
+ style API fill:#bbdefb
+ style WSM fill:#c8e6c9
+ style SM fill:#fff9c4
+```
+
+**For More Information:**
+
+- [Server Overview](./overview.md) - UFO server architecture and components
+- [Client Connection Manager](./client_connection_manager.md) - Client registry and connection management
+- [Session Manager](./session_manager.md) - Task execution and result tracking
+- [Quick Start](./quick_start.md) - Get started with UFO server
+
diff --git a/documents/docs/server/client_connection_manager.md b/documents/docs/server/client_connection_manager.md
new file mode 100644
index 000000000..ae77f6c50
--- /dev/null
+++ b/documents/docs/server/client_connection_manager.md
@@ -0,0 +1,1452 @@
+# Client Connection Manager
+
+The **ClientConnectionManager** is the central registry for all connected clients, maintaining connection state, session mappings, device information, and providing efficient lookup mechanisms for client routing and management.
+
+For more context on how this component fits into the server architecture, see the [Server Overview](overview.md).
+
+---
+
+## 🎯 Overview
+
+The Client Connection Manager serves as the "address book" and "session tracker" for the entire server:
+
+| Responsibility | Description | Benefit |
+|----------------|-------------|---------|
+| **Client Registry** | Store all connected device and constellation clients | Fast O(1) client lookup by ID |
+| **Session Tracking** | Map sessions to their constellation orchestrators | Enable proper cleanup on disconnection |
+| **Device Mapping** | Track which device is executing which session | Route task results correctly |
+| **Connection State** | Monitor which clients are online | Validate before dispatching tasks |
+| **System Info Caching** | Store device capabilities and configuration | Optimize constellation decision-making |
+| **Statistics** | Provide connection metrics | Monitoring and capacity planning |
+
+### Architecture Position
+
+```mermaid
+graph TB
+ subgraph "Clients"
+ D1[Device 1]
+ D2[Device 2]
+ C1[Constellation 1]
+ end
+
+ subgraph "Server - ClientConnectionManager"
+ WSM[Client Connection Manager]
+
+ subgraph "Storage"
+ CR[Client Registry online_clients]
+ CS[Constellation Sessions _constellation_sessions]
+ DS[Device Sessions _device_sessions]
+ SI[System Info Cache system_info]
+ end
+ end
+
+ subgraph "Server Components"
+ WH[WebSocket Handler]
+ SM[Session Manager]
+ API[API Router]
+ end
+
+ D1 -->|"add_client()"| WSM
+ D2 -->|"add_client()"| WSM
+ C1 -->|"add_client()"| WSM
+
+ WSM --> CR
+ WSM --> CS
+ WSM --> DS
+ WSM --> SI
+
+ WH -->|"get_client()"| WSM
+ WH -->|"is_device_connected()"| WSM
+ SM -->|"get_device_sessions()"| WSM
+ API -->|"list_clients()"| WSM
+
+ style WSM fill:#ffecb3
+ style CR fill:#c8e6c9
+ style CS fill:#bbdefb
+ style DS fill:#f8bbd0
+```
+
+---
+
+## 📦 Core Data Structures
+
+### ClientInfo Dataclass
+
+Each connected client is represented by a `ClientInfo` dataclass that stores all relevant connection details:
+
+```python
+@dataclass
+class ClientInfo:
+ """Information about a connected client."""
+ websocket: WebSocket # Active WebSocket connection
+ client_type: ClientType # DEVICE or CONSTELLATION
+ connected_at: datetime # Connection timestamp
+ metadata: Dict = None # Additional client metadata
+ platform: str = "windows" # OS platform (windows/linux)
+ system_info: Dict = None # Device system information (for devices only)
+
+ # AIP protocol instances for this client
+ transport: Optional[WebSocketTransport] = None # AIP WebSocket transport
+ task_protocol: Optional[TaskExecutionProtocol] = None # AIP task protocol
+```
+
+**Field Descriptions:**
+
+| Field | Type | Purpose | Example |
+|-------|------|---------|---------|
+| `websocket` | `WebSocket` | FastAPI WebSocket connection object | `` |
+| `client_type` | `ClientType` | Whether DEVICE or CONSTELLATION | `ClientType.DEVICE` |
+| `connected_at` | `datetime` | When client registered | `2024-11-04 14:30:22` |
+| `metadata` | `Dict` | Custom metadata from registration message | `{"hostname": "WIN-001"}` |
+| `platform` | `str` | Operating system | `"windows"`, `"linux"` |
+| `system_info` | `Dict` | Device capabilities and system specs | See System Info Structure below |
+| `transport` | `Optional[WebSocketTransport]` | AIP WebSocket transport layer | `` |
+| `task_protocol` | `Optional[TaskExecutionProtocol]` | AIP task execution protocol handler | `` |
+
+**System Info Structure Example:**
+
+```json
+{
+ "os": "Windows",
+ "os_version": "11 Pro 22H2",
+ "processor": "Intel Core i7-1185G7",
+ "memory_total": 17014632448,
+ "memory_available": 8459743232,
+ "screen_resolution": "1920x1080",
+ "installed_applications": ["Chrome", "Excel", "Notepad++"],
+ "supported_features": ["ui_automation", "web_browsing", "file_ops"],
+ "custom_metadata": {
+ "tags": ["production", "office"],
+ "tier": "high_performance"
+ }
+}
+```
+
+---
+
+## 👥 Client Registry Management
+
+The client registry (`online_clients`) is the authoritative source of truth for all connected clients.
+
+### Adding Clients
+
+```python
+def add_client(
+ self,
+ client_id: str,
+ platform: str,
+ ws: WebSocket,
+ client_type: ClientType = ClientType.DEVICE,
+ metadata: Dict = None,
+ transport: Optional[WebSocketTransport] = None,
+ task_protocol: Optional[TaskExecutionProtocol] = None
+):
+ """Register a new client connection."""
+
+ with self.lock: # Thread-safe access
+ # Extract system info if provided (device clients only)
+ system_info = None
+ if metadata and "system_info" in metadata and client_type == ClientType.DEVICE:
+ system_info = metadata.get("system_info")
+
+ # Merge with server-configured metadata if available
+ server_config = self._device_configs.get(client_id, {})
+ if server_config:
+ system_info = self._merge_device_info(system_info, server_config)
+ logger.info(f"Merged server config for device {client_id}")
+
+ # Create ClientInfo and add to registry
+ self.online_clients[client_id] = ClientInfo(
+ websocket=ws,
+ platform=platform,
+ client_type=client_type,
+ connected_at=datetime.now(),
+ metadata=metadata or {},
+ system_info=system_info,
+ transport=transport,
+ task_protocol=task_protocol
+ )
+```
+
+**Example - Adding a Device Client:**
+
+```python
+client_manager.add_client(
+ client_id="device_windows_001",
+ platform="windows",
+ ws=websocket,
+ client_type=ClientType.DEVICE,
+ metadata={
+ "hostname": "WIN-OFFICE-01",
+ "system_info": {
+ "os": "Windows",
+ "screen_resolution": "1920x1080",
+ "installed_applications": ["Chrome", "Excel"]
+ }
+ },
+ transport=websocket_transport,
+ task_protocol=task_execution_protocol
+)
+```
+
+**Example - Adding a Constellation Client:**
+
+```python
+client_manager.add_client(
+ client_id="constellation_orchestrator_001",
+ platform="linux", # Platform of the constellation server
+ ws=websocket,
+ client_type=ClientType.CONSTELLATION,
+ metadata={
+ "orchestrator_version": "2.0.0",
+ "max_concurrent_tasks": 10
+ },
+ transport=websocket_transport,
+ task_protocol=task_execution_protocol
+)
+```
+
+**Thread Safety:**
+
+```python
+with self.lock: # threading.Lock ensures atomic operations
+ self.online_clients[client_id] = client_info
+```
+
+!!! warning "Client ID Uniqueness"
+ If a client reconnects with the same `client_id`, the new connection **overwrites** the old entry. This effectively disconnects the old WebSocket. Use unique IDs to prevent collisions.
+
+### Retrieving Clients
+
+The ClientConnectionManager provides several methods to lookup clients based on different criteria:
+
+**Get WebSocket Connection:**
+```python
+def get_client(self, client_id: str) -> WebSocket | None:
+ """Get WebSocket connection for a client."""
+ with self.lock:
+ client_info = self.online_clients.get(client_id)
+ return client_info.websocket if client_info else None
+```
+
+**Usage:**
+```python
+target_ws = client_manager.get_client("device_windows_001")
+if target_ws:
+ await target_ws.send_text(message)
+```
+
+**Get Full Client Info:**
+```python
+def get_client_info(self, client_id: str) -> ClientInfo | None:
+ """Get complete information about a client."""
+ with self.lock:
+ return self.online_clients.get(client_id)
+```
+
+**Usage:**
+```python
+client_info = client_manager.get_client_info("device_windows_001")
+if client_info:
+ print(f"Platform: {client_info.platform}")
+ print(f"Connected at: {client_info.connected_at}")
+ print(f"Type: {client_info.client_type}")
+```
+
+**Get Client Type:**
+```python
+def get_client_type(self, client_id: str) -> ClientType | None:
+ """Get the type of a client."""
+ with self.lock:
+ client_info = self.online_clients.get(client_id)
+ return client_info.client_type if client_info else None
+```
+
+**Usage:**
+```python
+client_type = client_manager.get_client_type("client_001")
+if client_type == ClientType.DEVICE:
+ # Handle device-specific logic
+elif client_type == ClientType.CONSTELLATION:
+ # Handle constellation-specific logic
+```
+
+**List All Clients:**
+```python
+def list_clients(self) -> List[str]:
+ """List all online client IDs."""
+ with self.lock:
+ return list(self.online_clients.keys())
+```
+
+**Usage:**
+```python
+online_ids = client_manager.list_clients()
+print(f"Currently online: {len(online_ids)} clients")
+```
+
+**List by Type:**
+```python
+def list_clients_by_type(self, client_type: ClientType) -> List[str]:
+ """List all online clients of a specific type."""
+ with self.lock:
+ return [
+ client_id
+ for client_id, client_info in self.online_clients.items()
+ if client_info.client_type == client_type
+ ]
+```
+
+**Usage:**
+```python
+devices = client_manager.list_clients_by_type(ClientType.DEVICE)
+constellations = client_manager.list_clients_by_type(ClientType.CONSTELLATION)
+
+print(f"Devices online: {len(devices)}")
+print(f"Constellations online: {len(constellations)}")
+```
+
+### Removing Clients
+
+```python
+def remove_client(self, client_id: str):
+ """Remove a client from the registry."""
+ with self.lock:
+ self.online_clients.pop(client_id, None)
+ logger.info(f"[ClientConnectionManager] Removed client: {client_id}")
+```
+
+!!!danger "Cleanup Required"
+ When removing a client, you should **also** clean up:
+
+ - Session mappings (`_constellation_sessions`, `_device_sessions`)
+ - Cached system info (automatically removed via ClientInfo deletion)
+ - Active sessions (via SessionManager.cancel_task())
+
+ See client disconnect cleanup pattern below.
+```
+
+---
+
+## 🔍 Connection State Checking
+
+Always check if the target device is connected before attempting to dispatch tasks. This prevents errors and improves user experience.
+
+### Device Connection Validation
+
+```python
+def is_device_connected(self, device_id: str) -> bool:
+ """Check if a device client is currently connected."""
+
+ with self.lock:
+ client_info = self.online_clients.get(device_id)
+
+ if not client_info:
+ return False
+
+ # Verify it's a DEVICE client (not constellation)
+ return client_info.client_type == ClientType.DEVICE
+```
+
+**Example - Validate Before Task Dispatch:**
+
+```python
+# In WebSocket Handler - constellation requesting task on device
+target_device_id = data.target_id
+
+if not client_manager.is_device_connected(target_device_id):
+ error_msg = f"Target device '{target_device_id}' is not connected"
+ await send_error(error_msg)
+ raise ValueError(error_msg)
+
+# Safe to dispatch
+target_ws = client_manager.get_client(target_device_id)
+await dispatch_task(target_ws, task_request)
+```
+
+!!! warning "Type Check is Critical"
+ The method returns `False` if the client exists but is **not a device** (e.g., it's a constellation). This prevents accidentally dispatching device tasks to constellation clients.
+
+### Generic Online Status Check
+
+```python
+# Not shown in source but implied
+def is_online(self, client_id: str) -> bool:
+ """Check if any client (device or constellation) is currently online."""
+ with self.lock:
+ return client_id in self.online_clients
+```
+
+**Comparison:**
+
+| Method | Checks | Returns True When |
+|--------|--------|-------------------|
+| `is_device_connected(device_id)` | Client exists **AND** is DEVICE type | Device client is online |
+| `is_online(client_id)` | Client exists (any type) | Any client is online |
+
+---
+
+## 📋 Session Mapping
+
+The ClientConnectionManager tracks sessions from **two perspectives**:
+
+1. **Constellation → Sessions**: Which sessions did a constellation initiate?
+2. **Device → Sessions**: Which sessions is a device currently executing?
+
+This dual tracking enables proper cleanup when either constellation or device disconnects.
+
+```mermaid
+graph TB
+ subgraph "Constellation Perspective"
+ C[Constellation_001]
+ CS[_constellation_sessions]
+ CS --> S1[session_abc]
+ CS --> S2[session_def]
+ CS --> S3[session_ghi]
+ end
+
+ subgraph "Device Perspective"
+ D[Device_windows_001]
+ DS[_device_sessions]
+ DS --> S1
+ DS --> S4[session_jkl]
+ end
+
+ subgraph "Disconnection Cleanup"
+ DC{Constellation Disconnects}
+ DD{Device Disconnects}
+ end
+
+ DC -->|Cancel| S1
+ DC -->|Cancel| S2
+ DC -->|Cancel| S3
+
+ DD -->|Cancel| S1
+ DD -->|Cancel| S4
+
+ style C fill:#bbdefb
+ style D fill:#c8e6c9
+ style S1 fill:#ffcdd2
+```
+
+### Constellation Session Mapping
+
+Constellation clients initiate tasks on remote devices. Track these sessions to enable cleanup when the orchestrator disconnects.
+
+**Add Constellation Session:**
+
+```python
+def add_constellation_session(self, client_id: str, session_id: str):
+ """Map a session to its constellation orchestrator."""
+
+ with self.lock:
+ if client_id not in self._constellation_sessions:
+ self._constellation_sessions[client_id] = []
+ self._constellation_sessions[client_id].append(session_id)
+```
+
+**Get Constellation Sessions:**
+
+```python
+def get_constellation_sessions(self, client_id: str) -> List[str]:
+ """Get all sessions initiated by a constellation client."""
+
+ with self.lock:
+ return self._constellation_sessions.get(client_id, []).copy()
+ # .copy() prevents external modification of internal list
+```
+
+**Remove Constellation Sessions:**
+
+```python
+def remove_constellation_sessions(self, client_id: str) -> List[str]:
+ """Remove and return all sessions for a constellation."""
+
+ with self.lock:
+ return self._constellation_sessions.pop(client_id, [])
+ # Returns removed sessions for cleanup
+```
+
+**Example - Constellation Disconnect Cleanup:**
+
+```python
+# In WebSocket Handler - when constellation disconnects
+constellation_id = "constellation_001"
+
+# Get all sessions this constellation initiated
+session_ids = client_manager.get_constellation_sessions(constellation_id)
+
+logger.info(
+ f"Constellation {constellation_id} disconnected, "
+ f"cancelling {len(session_ids)} sessions"
+)
+
+# Cancel each session
+for session_id in session_ids:
+ await session_manager.cancel_task(
+ session_id,
+ reason="constellation_disconnected" # Don't send callback
+ )
+
+# Remove mappings
+client_manager.remove_constellation_sessions(constellation_id)
+```
+
+### Device Session Mapping
+
+Device clients execute tasks sent by constellations (or themselves). Track these sessions to enable cleanup when the device disconnects.
+
+**Add Device Session:**
+
+```python
+def add_device_session(self, device_id: str, session_id: str):
+ """Map a session to the device executing it."""
+
+ with self.lock:
+ if device_id not in self._device_sessions:
+ self._device_sessions[device_id] = []
+ self._device_sessions[device_id].append(session_id)
+```
+
+**Get Device Sessions:**
+
+```python
+def get_device_sessions(self, device_id: str) -> List[str]:
+ """Get all sessions running on a specific device."""
+
+ with self.lock:
+ return self._device_sessions.get(device_id, []).copy()
+```
+
+**Remove Device Sessions:**
+
+```python
+def remove_device_sessions(self, device_id: str) -> List[str]:
+ """Remove and return all sessions for a device."""
+
+ with self.lock:
+ return self._device_sessions.pop(device_id, [])
+```
+
+!!!example "Device Disconnect Cleanup"
+ ```python
+ # In WebSocket Handler - when device disconnects
+ device_id = "device_windows_001"
+
+ # Get all sessions running on this device
+ session_ids = client_manager.get_device_sessions(device_id)
+
+ logger.info(
+ f"Device {device_id} disconnected, "
+ f"cancelling {len(session_ids)} sessions"
+ )
+
+ # Cancel each session
+ for session_id in session_ids:
+ await session_manager.cancel_task(
+ session_id,
+ reason="device_disconnected" # Send callback to constellation
+ )
+
+ # Remove mappings
+ client_manager.remove_device_sessions(device_id)
+ ```
+
+### Session Mapping Lifecycle
+
+```mermaid
+sequenceDiagram
+ participant C as Constellation
+ participant WH as WebSocket Handler
+ participant WSM as ClientConnectionManager
+ participant D as Device
+
+ Note over C,D: Task Dispatch
+ C->>WH: TASK request (target_id=device_001)
+ WH->>WH: Generate session_id="session_abc"
+
+ Note over WH,WSM: Map Session to Both Clients
+ WH->>WSM: add_constellation_session("constellation_001", "session_abc")
+ WH->>WSM: add_device_session("device_001", "session_abc")
+
+ Note over WSM: Session Mappings
+ WSM->>WSM: _constellation_sessions["constellation_001"] = ["session_abc"]
+ WSM->>WSM: _device_sessions["device_001"] = ["session_abc"]
+
+ Note over WH,D: Task Execution
+ WH->>D: TASK_ASSIGNMENT (session_abc)
+ D->>D: Execute task
+
+ Note over D,WH: Result Delivery
+ D->>WH: TASK_END (session_abc)
+ WH->>C: TASK_END (session_abc)
+
+ Note over WH,WSM: Cleanup (not shown in actual code)
+ Note right of WH: Sessions remain in mappings until client disconnects!
+```
+
+!!!warning "Sessions Persist Until Cleanup"
+ Session mappings are **not automatically removed** when tasks complete. They persist until:
+
+ 1. The constellation disconnects (removes all its sessions)
+ 2. The device disconnects (removes all its sessions)
+ 3. Manual cleanup (future feature)
+
+ **Implication:** Over time, `_constellation_sessions` and `_device_sessions` can grow large. Consider implementing periodic cleanup for completed sessions.
+
+### Dual Mapping Example
+
+!!!example "Single Session, Dual Mapping"
+ When a constellation dispatches a task to a device:
+
+ ```python
+ constellation_id = "constellation_orchestrator_001"
+ device_id = "device_windows_001"
+ session_id = "session_abc123"
+
+ # Session is mapped to BOTH the constellation and the device
+ client_manager.add_constellation_session(constellation_id, session_id)
+ client_manager.add_device_session(device_id, session_id)
+
+ # Later retrieval
+ constellation_sessions = client_manager.get_constellation_sessions(constellation_id)
+ # Returns: ["session_abc123", ...]
+
+ device_sessions = client_manager.get_device_sessions(device_id)
+ # Returns: ["session_abc123", ...]
+ ```
+
+ **Why dual mapping?**
+
+ - If **constellation disconnects**: Cancel all its sessions (notify devices)
+ - If **device disconnects**: Cancel all sessions on that device (notify constellations)
+
+---
+
+## 💻 System Information Management
+
+The ClientConnectionManager caches device system information to enable intelligent task routing by constellations without repeatedly querying devices.
+
+### System Info Storage
+
+**Stored Automatically During Registration:**
+
+```python
+def add_client(self, client_id, platform, ws, client_type, metadata):
+ """Add client and extract system info if provided."""
+
+ system_info = None
+ if metadata and "system_info" in metadata and client_type == ClientType.DEVICE:
+ system_info = metadata.get("system_info")
+
+ # Merge with server configuration if available
+ server_config = self._device_configs.get(client_id, {})
+ if server_config:
+ system_info = self._merge_device_info(system_info, server_config)
+
+ self.online_clients[client_id] = ClientInfo(
+ websocket=ws,
+ platform=platform,
+ client_type=client_type,
+ system_info=system_info, # Cached here
+ ...
+ )
+```
+
+### Retrieving System Information
+
+**Get Single Device Info:**
+```python
+def get_device_system_info(self, device_id: str) -> Optional[Dict[str, Any]]:
+ """Get device system information by device ID."""
+
+ with self.lock:
+ client_info = self.online_clients.get(device_id)
+ if client_info and client_info.client_type == ClientType.DEVICE:
+ return client_info.system_info
+ return None
+```
+
+**Usage:**
+```python
+device_info = client_manager.get_device_system_info("device_windows_001")
+
+if device_info:
+ screen_res = device_info.get("screen_resolution")
+ apps = device_info.get("installed_applications", [])
+
+ print(f"Screen: {screen_res}")
+ print(f"Apps: {len(apps)} installed")
+```
+
+**Get All Devices Info:**
+```python
+def get_all_devices_info(self) -> Dict[str, Dict[str, Any]]:
+ """Get system information for all connected devices."""
+
+ with self.lock:
+ return {
+ device_id: client_info.system_info
+ for device_id, client_info in self.online_clients.items()
+ if client_info.client_type == ClientType.DEVICE
+ and client_info.system_info
+ }
+```
+
+**Usage:**
+```python
+all_devices = client_manager.get_all_devices_info()
+
+for device_id, info in all_devices.items():
+ print(f"{device_id}: {info.get('os')} - {info.get('screen_resolution')}")
+
+# Example output:
+# device_windows_001: Windows - 1920x1080
+# device_linux_001: Linux - 2560x1440
+```
+
+### Server Configuration Merging
+
+The ClientConnectionManager supports loading device-specific configuration from YAML/JSON files and **merging** them with auto-detected system info.
+
+**Device Configuration File (`device_config.yaml`):**
+
+```yaml
+devices:
+ device_windows_001:
+ tags: ["production", "office", "high_priority"]
+ tier: "high_performance"
+ additional_features: ["excel_automation", "pdf_generation"]
+ max_concurrent_tasks: 5
+
+ device_linux_001:
+ tags: ["development", "testing"]
+ tier: "standard"
+ additional_features: ["docker_support"]
+```
+
+**Loading Configuration:**
+
+```python
+# Initialize ClientConnectionManager with config file
+client_manager = ClientConnectionManager(device_config_path="config/device_config.yaml")
+
+# Configuration is automatically loaded during __init__
+```
+
+**Merge Process:**
+
+```python
+def _merge_device_info(
+ self,
+ system_info: Dict[str, Any],
+ server_config: Dict[str, Any]
+) -> Dict[str, Any]:
+ """Merge auto-detected system info with server configuration."""
+
+ merged = {**system_info} # Start with auto-detected info
+
+ # Add all server config to custom_metadata
+ if "custom_metadata" not in merged:
+ merged["custom_metadata"] = {}
+ merged["custom_metadata"].update(server_config)
+
+ # Special handling: merge capabilities
+ if "supported_features" in system_info and "additional_features" in server_config:
+ merged["supported_features"] = list(
+ set(system_info["supported_features"] + server_config["additional_features"])
+ )
+
+ # Add server tags
+ if "tags" in server_config:
+ merged["tags"] = server_config["tags"]
+
+ return merged
+```
+
+**Result:**
+
+```json
+{
+ "os": "Windows",
+ "screen_resolution": "1920x1080",
+ "supported_features": [
+ "ui_automation",
+ "web_browsing",
+ "file_ops",
+ "excel_automation",
+ "pdf_generation"
+ ],
+ "tags": ["production", "office", "high_priority"],
+ "custom_metadata": {
+ "tier": "high_performance",
+ "max_concurrent_tasks": 5,
+ "tags": ["production", "office", "high_priority"],
+ "additional_features": ["excel_automation", "pdf_generation"]
+ }
+}
+```
+
+**Why Merge Configuration?**
+
+- **Auto-detected info**: Always accurate (OS, memory, screen resolution)
+- **Server config**: Administrative metadata (tags, tier, priorities)
+- **Combined**: Rich device profile for intelligent task routing
+
+---
+
+## 📊 Client Statistics and Monitoring
+
+The `get_stats()` method provides basic metrics for monitoring connected clients.
+
+### Get Statistics
+
+```python
+def get_stats(self) -> Dict[str, int]:
+ """Get statistics about connected clients."""
+
+ with self.lock:
+ device_count = sum(
+ 1
+ for info in self.online_clients.values()
+ if info.client_type == ClientType.DEVICE
+ )
+ constellation_count = sum(
+ 1
+ for info in self.online_clients.values()
+ if info.client_type == ClientType.CONSTELLATION
+ )
+ return {
+ "total": len(self.online_clients),
+ "device_clients": device_count,
+ "constellation_clients": constellation_count
+ }
+```
+
+**Example Usage:**
+
+```python
+# Get current statistics
+stats = client_manager.get_stats()
+
+print(f"📊 Server Statistics:")
+print(f" Total Clients: {stats['total']}")
+print(f" Devices: {stats['device_clients']}")
+print(f" Constellations: {stats['constellation_clients']}")
+
+# Output:
+# 📊 Server Statistics:
+# Total Clients: 5
+# Devices: 3
+# Constellations: 2
+```
+
+### Filtering and Querying
+
+**Filter by Platform:**
+```python
+def get_devices_by_platform(self, platform: str) -> List[str]:
+ """Get all device IDs for a specific platform."""
+
+ with self.lock:
+ return [
+ device_id
+ for device_id, client_info in self.online_clients.items()
+ if client_info.client_type == ClientType.DEVICE
+ and client_info.platform == platform
+ ]
+
+# Usage
+windows_devices = client_manager.get_devices_by_platform("Windows")
+linux_devices = client_manager.get_devices_by_platform("Linux")
+```
+
+**Filter by Connection Time:**
+```python
+from datetime import datetime, timedelta
+
+def get_recently_connected(self, minutes: int = 5) -> List[str]:
+ """Get clients connected in the last N minutes."""
+
+ cutoff_time = datetime.now() - timedelta(minutes=minutes)
+
+ with self.lock:
+ return [
+ client_id
+ for client_id, client_info in self.online_clients.items()
+ if client_info.connected_at >= cutoff_time
+ ]
+
+# Usage
+recent_clients = client_manager.get_recently_connected(minutes=10)
+```
+
+**Filter by Capability:**
+```python
+def find_devices_with_capability(self, capability: str) -> List[str]:
+ """Find devices that support a specific capability."""
+
+ with self.lock:
+ matches = []
+ for device_id, client_info in self.online_clients.items():
+ if client_info.client_type != ClientType.DEVICE:
+ continue
+
+ if not client_info.system_info:
+ continue
+
+ features = client_info.system_info.get("supported_features", [])
+ if capability in features:
+ matches.append(device_id)
+
+ return matches
+
+# Usage
+excel_devices = client_manager.find_devices_with_capability("excel_automation")
+docker_devices = client_manager.find_devices_with_capability("docker_support")
+```
+
+---
+
+## 🎯 Usage Patterns
+
+### Safe Task Dispatch
+
+```python
+async def dispatch_task_to_device(
+ client_manager: ClientConnectionManager,
+ constellation_id: str,
+ target_device_id: str,
+ task_request: dict,
+ session_id: str
+):
+ """Dispatch task with comprehensive validation."""
+
+ # Step 1: Validate constellation is connected
+ if not client_manager.is_online(constellation_id):
+ raise ValueError(f"Constellation {constellation_id} not connected")
+
+ # Step 2: Validate target device is connected
+ if not client_manager.is_device_connected(target_device_id):
+ raise ValueError(f"Device {target_device_id} not connected")
+
+ # Step 3: Get device WebSocket
+ device_ws = client_manager.get_client(target_device_id)
+ if not device_ws:
+ raise ValueError(f"Could not get WebSocket for device {target_device_id}")
+
+ # Step 4: Track session mappings
+ client_manager.add_constellation_session(constellation_id, session_id)
+ client_manager.add_device_session(target_device_id, session_id)
+
+ # Step 5: Send task
+ await device_ws.send_json({
+ "type": "TASK_ASSIGNMENT",
+ "session_id": session_id,
+ "request": task_request
+ })
+
+ logger.info(
+ f"Task {session_id} dispatched: "
+ f"{constellation_id} → {target_device_id}"
+ )
+```
+
+### Graceful Client Disconnect Handling
+
+```python
+async def handle_client_disconnect(
+ client_manager: ClientConnectionManager,
+ session_manager: SessionManager,
+ client_id: str,
+ client_type: ClientType
+):
+ """Handle client disconnect with full cleanup."""
+
+ logger.info(f"Client disconnected: {client_id} ({client_type})")
+
+ # Step 1: Get all related sessions
+ if client_type == ClientType.CONSTELLATION:
+ session_ids = client_manager.get_constellation_sessions(client_id)
+ cancel_reason = "constellation_disconnected"
+ else: # DEVICE
+ session_ids = client_manager.get_device_sessions(client_id)
+ cancel_reason = "device_disconnected"
+
+ # Step 2: Cancel all sessions
+ for session_id in session_ids:
+ try:
+ await session_manager.cancel_task(session_id, reason=cancel_reason)
+ logger.info(f"Cancelled session {session_id}")
+ except Exception as e:
+ logger.error(f"Failed to cancel {session_id}: {e}")
+
+ # Step 3: Remove session mappings
+ if client_type == ClientType.CONSTELLATION:
+ client_manager.remove_constellation_sessions(client_id)
+ else:
+ client_manager.remove_device_sessions(client_id)
+
+ # Step 4: Remove client from registry
+ client_manager.remove_client(client_id)
+
+ logger.info(
+ f"Cleanup complete: {client_id}, "
+ f"cancelled {len(session_ids)} sessions"
+ )
+```
+
+### Intelligent Device Selection
+
+```python
+def select_optimal_device(
+ client_manager: ClientConnectionManager,
+ required_platform: str = None,
+ required_capabilities: List[str] = None,
+ preferred_tags: List[str] = None
+) -> Optional[str]:
+ """Select the best available device for a task."""
+
+ with client_manager.lock:
+ candidates = []
+
+ for device_id, client_info in client_manager.online_clients.items():
+ # Filter by type
+ if client_info.client_type != ClientType.DEVICE:
+ continue
+
+ # Filter by platform
+ if required_platform and client_info.platform != required_platform:
+ continue
+
+ # Filter by capabilities
+ if required_capabilities and client_info.system_info:
+ features = client_info.system_info.get("supported_features", [])
+ if not all(cap in features for cap in required_capabilities):
+ continue
+
+ # Calculate score based on preferred tags
+ score = 0
+ if preferred_tags and client_info.system_info:
+ tags = client_info.system_info.get("tags", [])
+ score = len(set(tags) & set(preferred_tags))
+
+ candidates.append((device_id, score))
+
+ if not candidates:
+ return None
+
+ # Return device with highest score (or first if all score 0)
+ candidates.sort(key=lambda x: x[1], reverse=True)
+ return candidates[0][0]
+
+# Usage
+device_id = select_optimal_device(
+ client_manager,
+ required_platform="Windows",
+ required_capabilities=["excel_automation"],
+ preferred_tags=["production", "high_priority"]
+)
+
+if device_id:
+ print(f"Selected device: {device_id}")
+else:
+ print("No suitable device available")
+```
+
+### Session Cleanup After Task Completion
+
+**Note:** Current implementation does **not automatically remove** session mappings when tasks complete. Consider implementing this pattern:
+
+```python
+async def handle_task_completion(
+ client_manager: ClientConnectionManager,
+ session_id: str,
+ constellation_id: str,
+ device_id: str
+):
+ """Clean up session mappings after task completes."""
+
+ # Task has completed (or failed)
+
+ # Option 1: Remove from both mappings
+ # (Requires adding remove_session method to ClientConnectionManager)
+ # client_manager.remove_session(session_id)
+
+ # Option 2: Leave mappings until disconnect
+ # (Current behavior - sessions accumulate)
+
+ logger.info(f"Task {session_id} completed, mappings retained")
+```
+
+---
+
+## 💡 Best Practices
+
+### Thread Safety
+
+The ClientConnectionManager is accessed by multiple WebSocket handlers concurrently. **Always** acquire the lock before modifying shared state.
+
+```python
+# WRONG - No thread safety
+def bad_example(self):
+ if "device_001" in self.online_clients:
+ client = self.online_clients["device_001"]
+ # Another thread might remove the client here!
+ return client.websocket
+
+# CORRECT - Thread-safe
+def good_example(self):
+ with self.lock:
+ if "device_001" in self.online_clients:
+ client = self.online_clients["device_001"]
+ return client.websocket
+ return None
+```
+
+### Validate Before Dispatch
+
+Always check if the target device is connected before attempting to send messages.
+
+```python
+# CORRECT - Validation first
+if client_manager.is_device_connected(target_device_id):
+ device_ws = client_manager.get_client(target_device_id)
+ await device_ws.send_json(task_data)
+else:
+ logger.error(f"Device {target_device_id} not connected")
+ # Handle error appropriately
+```
+
+### Cleanup on Disconnect
+
+When a client disconnects, clean up **all** related resources:
+
+**Checklist:**
+
+- [x] Cancel all related sessions
+- [x] Remove session mappings (constellation/device)
+- [x] Remove client from online registry
+- [x] Remove device info cache (if applicable)
+- [x] Notify affected parties
+
+### Cache Device Information
+
+Balance freshness and performance:
+
+- **Cache during registration**: Fast lookups for task routing
+- **Update on REQUEST_DEVICE_LIST**: Keep cache fresh
+- **Don't cache sensitive data**: Only cache non-sensitive system info
+
+```python
+# During registration - cache system info
+client_manager.add_client(
+ device_id,
+ platform="Windows",
+ ws=websocket,
+ client_type=ClientType.DEVICE,
+ metadata={"system_info": system_info} # Cached automatically
+)
+
+# Later - fast lookup without querying device
+device_info = client_manager.get_device_system_info(device_id)
+```
+
+### Handle Edge Cases
+
+**Case 1: Client re-connects with same ID**
+ ```python
+ # Old connection still in registry
+ if client_manager.is_online(client_id):
+ logger.warning(f"Client {client_id} already connected, removing old connection")
+ client_manager.remove_client(client_id)
+
+ # Now add new connection
+ client_manager.add_client(client_id, platform, ws, client_type, metadata)
+```
+
+**Case 2: Session mapped to disconnected clients**
+
+```python
+ # Before dispatching
+ if not client_manager.is_device_connected(device_id):
+ # Device disconnected, session mapping might still exist
+ # This is expected - cleanup happens on disconnect
+ raise ValueError(f"Device {device_id} no longer connected")
+```
+
+**Case 3: Constellation and device both disconnect**
+
+```python
+ # Session will be cancelled twice (once for each disconnect)
+ # Ensure cancel_task is idempotent:
+ async def cancel_task(self, session_id, reason):
+ if session_id not in self.sessions:
+ logger.debug(f"Session {session_id} already cancelled")
+ return # Idempotent
+
+ # Proceed with cancellation
+```
+
+### Monitor Session Accumulation
+
+**Note:** Session mappings are **not automatically removed** after task completion. Over time, this can cause memory growth.
+
+**Mitigation strategies:**
+
+**Periodic Cleanup:**
+```python
+async def cleanup_completed_sessions(client_manager, session_manager):
+ """Remove mappings for completed sessions."""
+
+ all_sessions = set()
+ all_sessions.update(
+ session_id
+ for sessions in client_manager._constellation_sessions.values()
+ for session_id in sessions
+ )
+ all_sessions.update(
+ session_id
+ for sessions in client_manager._device_sessions.values()
+ for session_id in sessions
+ )
+
+ for session_id in all_sessions:
+ session = session_manager.get_session(session_id)
+ if session and session.state in [SessionState.COMPLETED, SessionState.FAILED]:
+ # Remove from ClientConnectionManager
+ # (Requires implementing remove_session method)
+ pass
+```
+
+**Cleanup on Completion:**
+```python
+# In task completion handler
+async def on_task_complete(session_id, constellation_id, device_id):
+ # Remove specific session from mappings
+ client_manager.remove_session_from_constellation(constellation_id, session_id)
+ client_manager.remove_session_from_device(device_id, session_id)
+```
+
+---
+
+## 🔗 Integration Points
+
+### With WebSocket Handler
+
+```mermaid
+sequenceDiagram
+ participant WH as WebSocket Handler
+ participant WSM as ClientConnectionManager
+ participant SM as Session Manager
+
+ Note over WH,WSM: Client Registration
+ WH->>WSM: add_client(id, platform, ws, type, metadata)
+ WSM-->>WH: Client added
+
+ Note over WH,WSM: Task Dispatch
+ WH->>WSM: is_device_connected(device_id)
+ WSM-->>WH: True
+ WH->>WSM: get_client(device_id)
+ WSM-->>WH: WebSocket
+ WH->>WSM: add_constellation_session(const_id, session_id)
+ WH->>WSM: add_device_session(device_id, session_id)
+
+ Note over WH,SM: Task Execution
+ WH->>SM: execute_task_async(...)
+
+ Note over WH,WSM: Client Disconnect
+ WH->>WSM: get_constellation_sessions(client_id)
+ WSM-->>WH: [session_ids...]
+ WH->>SM: cancel_task(session_id, reason)
+ WH->>WSM: remove_constellation_sessions(client_id)
+ WH->>WSM: remove_client(client_id)
+```
+
+**ClientConnectionManager provides:**
+
+- Client registration and lookup
+- Connection state validation
+- Session tracking for cleanup
+
+**WebSocket Handler provides:**
+
+- WebSocket lifecycle management
+- Protocol message handling
+- Disconnect notifications
+
+### With Session Manager
+
+**ClientConnectionManager provides:**
+
+- Client connectivity status Session Manager checks before execution
+- Session mappings Session Manager uses for cleanup
+
+**Session Manager provides:**
+
+- Session state ClientConnectionManager can query to cleanup completed sessions (future)
+- Cancellation results ClientConnectionManager uses to notify clients
+
+### With API Router
+
+```python
+# In HTTP API endpoints
+from fastapi import APIRouter, HTTPException
+
+@router.get("/devices")
+async def list_devices():
+ """List all connected devices."""
+ stats = client_manager.get_stats()
+ return {
+ "devices": stats["devices"]["ids"],
+ "count": stats["devices"]["count"],
+ "by_platform": stats["devices"]["platforms"]
+ }
+
+@router.get("/devices/{device_id}")
+async def get_device_info(device_id: str):
+ """Get device system information."""
+ if not client_manager.is_device_connected(device_id):
+ raise HTTPException(status_code=404, detail="Device not connected")
+
+ system_info = client_manager.get_device_system_info(device_id)
+ return {"device_id": device_id, "system_info": system_info}
+
+@router.get("/stats")
+async def get_server_stats():
+ """Get server statistics."""
+ return client_manager.get_stats()
+```
+
+---
+
+## 📚 Complete API Reference
+
+### Client Management
+
+| Method | Parameters | Returns | Description |
+|--------|------------|---------|-------------|
+| `add_client()` | `client_id: str` `platform: str` `ws: WebSocket` `client_type: ClientType` `metadata: Optional[Dict]` | `None` | Register a new client connection |
+| `remove_client()` | `client_id: str` | `None` | Remove client from registry |
+| `get_client()` | `client_id: str` | `Optional[WebSocket]` | Get client WebSocket connection |
+| `get_client_info()` | `client_id: str` | `Optional[ClientInfo]` | Get full client information |
+| `get_client_type()` | `client_id: str` | `Optional[ClientType]` | Get client type (DEVICE/CONSTELLATION) |
+| `list_clients()` | `client_type: Optional[ClientType]` | `List[str]` | List all or filtered client IDs |
+
+### Connection State
+
+| Method | Parameters | Returns | Description |
+|--------|------------|---------|-------------|
+| `is_device_connected()` | `device_id: str` | `bool` | Check if device is connected |
+| `is_online()` | `client_id: str` | `bool` | Check if any client is online |
+
+### Session Mapping
+
+| Method | Parameters | Returns | Description |
+|--------|------------|---------|-------------|
+| `add_constellation_session()` | `client_id: str` `session_id: str` | `None` | Map session to constellation |
+| `get_constellation_sessions()` | `client_id: str` | `List[str]` | Get constellation's sessions |
+| `remove_constellation_sessions()` | `client_id: str` | `List[str]` | Remove and return sessions |
+| `add_device_session()` | `device_id: str` `session_id: str` | `None` | Map session to device |
+| `get_device_sessions()` | `device_id: str` | `List[str]` | Get device's sessions |
+| `remove_device_sessions()` | `device_id: str` | `List[str]` | Remove and return sessions |
+
+### Device Information
+
+| Method | Parameters | Returns | Description |
+|--------|------------|---------|-------------|
+| `get_device_system_info()` | `device_id: str` | `Optional[Dict]` | Get device system information |
+| `get_all_devices_info()` | None | `Dict[str, Dict]` | Get all devices' system info |
+
+### Statistics and Monitoring
+
+| Method | Parameters | Returns | Description |
+|--------|------------|---------|-------------|
+| `get_stats()` | None | `Dict[str, Any]` | Get comprehensive server statistics |
+
+### Data Structures
+
+**ClientInfo:**
+
+```python
+@dataclass
+class ClientInfo:
+ websocket: WebSocket # WebSocket connection
+ client_type: ClientType # DEVICE or CONSTELLATION
+ connected_at: datetime # Connection timestamp
+ metadata: Optional[Dict] # Registration metadata
+ platform: Optional[str] # "Windows", "Linux", "Darwin"
+ system_info: Optional[Dict] # Device capabilities and configuration
+```
+
+**ClientType:**
+
+```python
+class ClientType(Enum):
+ DEVICE = "device" # Execution client
+ CONSTELLATION = "constellation" # Orchestrator client
+```
+
+---
+
+## 🎓 Summary
+
+The ClientConnectionManager is the **central registry** for all client connections and session mappings in the UFO server. It provides thread-safe operations for tracking clients, validating connectivity, mapping sessions, and caching device information.
+
+**Core Capabilities:**
+
+| Capability | Description | Key Benefit |
+|------------|-------------|-------------|
+| **Client Registry** | Track connected devices and constellation clients | Single source of truth for client state |
+| **Connection State** | Query client availability and type | Prevent dispatch to disconnected clients |
+| **Session Mapping** | Associate sessions with orchestrators and executors | Enable proper cleanup on disconnect |
+| **Device Info** | Cache device capabilities for routing decisions | Fast task routing without repeated queries |
+| **Thread Safety** | Lock-protected concurrent access | Safe operation in async/multi-threaded environment |
+| **Statistics** | Real-time metrics on clients and sessions | Monitoring and observability |
+
+**Key Design Patterns:**
+
+1. **Dual Session Mapping**: Track sessions from both constellation and device perspectives for comprehensive cleanup
+2. **Lazy Cleanup**: Session mappings persist until disconnect (consider periodic cleanup for production)
+3. **Configuration Merging**: Combine auto-detected device info with server-configured metadata
+4. **Type-Safe Validation**: Always verify client type (DEVICE vs CONSTELLATION) before operations
+
+**Integration with UFO Server:**
+
+```mermaid
+graph TD
+ subgraph "ClientConnectionManager Core"
+ R[Client Registry]
+ S[Session Mapping]
+ D[Device Info Cache]
+ T[Thread Safety]
+ end
+
+ WH[WebSocket Handler] -->|Register/Lookup| R
+ WH -->|Track Sessions| S
+ WH -->|Cache System Info| D
+
+ SM[Session Manager] -->|Validate Connection| R
+ SM -->|Query Sessions| S
+
+ API[API Router] -->|List Devices| R
+ API -->|Get Stats| R
+ API -->|Device Info| D
+
+ R -.->|Thread Lock| T
+ S -.->|Thread Lock| T
+ D -.->|Thread Lock| T
+
+ style R fill:#bbdefb
+ style S fill:#c8e6c9
+ style D fill:#fff9c4
+ style T fill:#ffcdd2
+```
+
+**For More Information:**
+
+- [Server Overview](./overview.md) - UFO server architecture and components
+- [WebSocket Handler](./websocket_handler.md) - AIP protocol message handling
+- [Session Manager](./session_manager.md) - Session lifecycle and background execution
+- [Quick Start](./quick_start.md) - Get started with UFO server
+
diff --git a/documents/docs/server/monitoring.md b/documents/docs/server/monitoring.md
new file mode 100644
index 000000000..7dffb96e7
--- /dev/null
+++ b/documents/docs/server/monitoring.md
@@ -0,0 +1,1226 @@
+# Monitoring and Observability
+
+Monitor the health, performance, and reliability of your UFO Server deployment with comprehensive observability tools, metrics, and alerting strategies.
+
+!!! tip "Before You Begin"
+ Make sure you have the UFO Server running. See the [Quick Start Guide](./quick_start.md) for setup instructions.
+
+## 🎯 Overview
+
+```mermaid
+graph TB
+ subgraph "Monitoring Layers"
+ Health[Health Checks]
+ Metrics[Performance Metrics]
+ Logs[Logs & Analysis]
+ Alerts[Alerting]
+ end
+
+ subgraph "UFO Server"
+ API[HTTP API]
+ WS[WebSocket]
+ end
+
+ subgraph "Tools"
+ K8s[Kubernetes]
+ Prom[Prometheus]
+ Slack[Notifications]
+ end
+
+ Health --> API
+ Metrics --> WS
+ Logs --> WS
+
+ Health --> K8s
+ Metrics --> Prom
+ Alerts --> Slack
+
+ style Health fill:#bbdefb
+ style Metrics fill:#c8e6c9
+ style Logs fill:#fff9c4
+ style Alerts fill:#ffcdd2
+```
+
+**Monitoring Capabilities:**
+
+| Layer | Purpose | Tools |
+|-------|---------|-------|
+| **Health Checks** | Service availability and uptime | `/api/health`, Kubernetes probes |
+| **Performance Metrics** | Latency, throughput, resource usage | Prometheus, custom dashboards |
+| **Logging** | Event tracking, debugging, auditing | Python logging, log aggregation |
+| **Alerting** | Proactive issue detection | Slack, Email, PagerDuty |
+
+**Why Monitor?**
+
+- **Detect Issues Early**: Catch problems before users notice
+- **Performance Optimization**: Identify bottlenecks and inefficiencies
+- **Capacity Planning**: Track growth and resource utilization
+- **Debugging**: Trace errors and understand system behavior
+- **SLA Compliance**: Ensure service level objectives are met
+
+---
+
+## 🏥 Health Checks
+
+### HTTP Health Endpoint
+
+The `/api/health` endpoint provides real-time server status without authentication. For detailed API specifications, see the [HTTP API Reference](./api.md).
+
+#### Endpoint Details
+
+```http
+GET /api/health
+```
+
+**Response (200 OK):**
+
+```json
+{
+ "status": "healthy",
+ "online_clients": [
+ "device_windows_001",
+ "device_linux_002",
+ "constellation_orchestrator_001"
+ ]
+}
+```
+
+**Response Schema:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `string` | Always `"healthy"` if server is responding |
+| `online_clients` | `array` | List of connected client IDs |
+
+**Quick Test:**
+
+```bash
+# Test health endpoint
+curl http://localhost:5000/api/health
+
+# With jq for formatted output
+curl -s http://localhost:5000/api/health | jq .
+```
+
+### Automated Health Monitoring
+
+#### Kubernetes Liveness and Readiness Probes
+
+Example production Kubernetes configuration:
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ name: ufo-server
+ labels:
+ app: ufo-server
+spec:
+ containers:
+ - name: ufo-server
+ image: ufo-server:latest
+ ports:
+ - containerPort: 5000
+ name: http
+ protocol: TCP
+
+ # Liveness probe - restart container if failing
+ livenessProbe:
+ httpGet:
+ path: /api/health
+ port: 5000
+ scheme: HTTP
+ initialDelaySeconds: 30 # Wait 30s after startup
+ periodSeconds: 10 # Check every 10s
+ timeoutSeconds: 5 # 5s timeout per check
+ successThreshold: 1 # 1 success = healthy
+ failureThreshold: 3 # 3 failures = restart
+
+ # Readiness probe - remove from service if failing
+ readinessProbe:
+ httpGet:
+ path: /api/health
+ port: 5000
+ scheme: HTTP
+ initialDelaySeconds: 10 # Wait 10s after startup
+ periodSeconds: 5 # Check every 5s
+ timeoutSeconds: 3 # 3s timeout
+ successThreshold: 1 # 1 success = ready
+ failureThreshold: 2 # 2 failures = not ready
+
+ # Resource limits
+ resources:
+ requests:
+ memory: "256Mi"
+ cpu: "250m"
+ limits:
+ memory: "512Mi"
+ cpu: "500m"
+```
+
+**Probe Configuration Guide:**
+
+| Parameter | Liveness | Readiness | Purpose |
+|-----------|----------|-----------|---------|
+| `initialDelaySeconds` | 30 | 10 | Time before first check (allow startup) |
+| `periodSeconds` | 10 | 5 | How often to check |
+| `timeoutSeconds` | 5 | 3 | Max time for response |
+| `failureThreshold` | 3 | 2 | Failures before action (restart/unready) |
+
+#### Uptime Monitoring Script
+
+Example continuous health monitoring script:
+
+```python
+import requests
+import time
+from datetime import datetime
+import logging
+
+logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(levelname)s - %(message)s'
+)
+logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(levelname)s - %(message)s'
+)
+
+class HealthMonitor:
+ def __init__(self, url="http://localhost:5000/api/health", interval=30):
+ self.url = url
+ self.interval = interval
+ self.consecutive_failures = 0
+ self.uptime_start = None
+ self.total_checks = 0
+ self.failed_checks = 0
+
+ def check_health(self):
+ """Perform single health check."""
+ self.total_checks += 1
+
+ try:
+ response = requests.get(self.url, timeout=5)
+
+ if response.status_code == 200:
+ data = response.json()
+ client_count = len(data.get("online_clients", []))
+
+ if self.uptime_start is None:
+ self.uptime_start = datetime.now()
+
+ uptime = datetime.now() - self.uptime_start
+ availability = ((self.total_checks - self.failed_checks) /
+ self.total_checks * 100)
+
+ logging.info(
+ f"✅ Server healthy - {client_count} clients connected | "
+ f"Uptime: {uptime} | Availability: {availability:.2f}%"
+ )
+
+ self.consecutive_failures = 0
+ return True
+ else:
+ raise Exception(f"HTTP {response.status_code}")
+
+ except Exception as e:
+ self.consecutive_failures += 1
+ self.failed_checks += 1
+ self.uptime_start = None # Reset uptime on failure
+
+ logging.error(
+ f"❌ Health check failed: {e} "
+ f"(consecutive failures: {self.consecutive_failures})"
+ )
+
+ # Alert after 3 consecutive failures
+ if self.consecutive_failures == 3:
+ self.send_alert(
+ f"Server down for {self.consecutive_failures} checks! "
+ f"Last error: {e}"
+ )
+
+ return False
+
+ def send_alert(self, message):
+ """Send alert (implement your alerting mechanism)."""
+ logging.critical(f"🚨 ALERT: {message}")
+ # TODO: Implement Slack/Email/PagerDuty notification
+
+ def run(self):
+ """Run continuous monitoring."""
+ logging.info(f"Starting health monitor (interval: {self.interval}s)")
+
+ while True:
+ self.check_health()
+ time.sleep(self.interval)
+
+# Run monitor
+if __name__ == "__main__":
+ monitor = HealthMonitor(interval=30)
+ monitor.run()
+```
+
+#### Docker Healthcheck
+
+Docker Compose health configuration:
+
+```yaml
+version: '3.8'
+
+services:
+ ufo-server:
+ image: ufo-server:latest
+ ports:
+ - "5000:5000"
+
+ # Docker health check
+ healthcheck:
+ test: ["CMD", "curl", "-f", "http://localhost:5000/api/health"]
+ interval: 30s
+ timeout: 5s
+ retries: 3
+ start_period: 40s
+
+ restart: unless-stopped
+
+ environment:
+ - LOG_LEVEL=INFO
+
+ volumes:
+ - ./logs:/app/logs
+ - ./config:/app/config
+```
+
+---
+
+## 📊 Performance Metrics
+
+### Request Latency Monitoring
+
+Track API response times to detect performance degradation.
+
+#### Latency Measurement
+
+```python
+import requests
+import time
+import statistics
+from typing import List, Dict
+
+class LatencyMonitor:
+ def __init__(self):
+ self.measurements: Dict[str, List[float]] = {}
+
+ def measure_endpoint(self, url: str, name: str = None) -> float:
+ """Measure endpoint latency in milliseconds."""
+ if name is None:
+ name = url
+
+ start = time.time()
+ try:
+ response = requests.get(url, timeout=10)
+ latency_ms = (time.time() - start) * 1000
+
+ # Store measurement
+ if name not in self.measurements:
+ self.measurements[name] = []
+ self.measurements[name].append(latency_ms)
+
+ return latency_ms
+ except Exception as e:
+ logging.error(f"Failed to measure {name}: {e}")
+ return -1
+
+ def get_stats(self, name: str) -> Dict[str, float]:
+ """Get statistics for an endpoint."""
+ if name not in self.measurements or not self.measurements[name]:
+ return {}
+
+ data = self.measurements[name]
+ return {
+ "count": len(data),
+ "min": min(data),
+ "max": max(data),
+ "mean": statistics.mean(data),
+ "median": statistics.median(data),
+ "p95": statistics.quantiles(data, n=20)[18] if len(data) >= 20 else max(data),
+ "p99": statistics.quantiles(data, n=100)[98] if len(data) >= 100 else max(data)
+ }
+
+ def print_report(self):
+ """Print latency report."""
+ print("\n📊 Latency Report:")
+ print("=" * 80)
+
+ for name, measurements in self.measurements.items():
+ stats = self.get_stats(name)
+
+ print(f"\n{name}:")
+ print(f" Count: {stats['count']}")
+ print(f" Min: {stats['min']:.2f} ms")
+ print(f" Max: {stats['max']:.2f} ms")
+ print(f" Mean: {stats['mean']:.2f} ms")
+ print(f" Median: {stats['median']:.2f} ms")
+ print(f" P95: {stats['p95']:.2f} ms")
+ print(f" P99: {stats['p99']:.2f} ms")
+
+# Usage
+monitor = LatencyMonitor()
+
+for _ in range(100):
+ monitor.measure_endpoint("http://localhost:5000/api/health", "health")
+ monitor.measure_endpoint("http://localhost:5000/api/clients", "clients")
+ time.sleep(1)
+
+monitor.print_report()
+```
+
+**Sample Output:**
+
+```
+📊 Latency Report:
+================================================================================
+
+health:
+ Count: 100
+ Min: 2.34 ms
+ Max: 45.67 ms
+ Mean: 8.12 ms
+ Median: 6.89 ms
+ P95: 15.23 ms
+ P99: 32.45 ms
+
+clients:
+ Count: 100
+ Min: 3.12 ms
+ Max: 52.34 ms
+ Mean: 10.45 ms
+ Median: 9.12 ms
+ P95: 18.90 ms
+ P99: 38.67 ms
+```
+
+### Task Throughput Monitoring
+
+Track task completion rate to detect bottlenecks.
+
+```python
+from collections import deque
+import time
+
+class ThroughputMonitor:
+ def __init__(self, window_seconds=60):
+ self.window = window_seconds
+ self.completions = deque()
+ self.total_completions = 0
+
+ def record_completion(self):
+ """Record a task completion."""
+ now = time.time()
+ self.completions.append(now)
+ self.total_completions += 1
+
+ # Remove completions outside the time window
+ cutoff = now - self.window
+ while self.completions and self.completions[0] < cutoff:
+ self.completions.popleft()
+
+ def get_rate_per_minute(self) -> float:
+ """Get current completion rate (tasks per minute)."""
+ return len(self.completions) * (60.0 / self.window)
+
+ def get_rate_per_second(self) -> float:
+ """Get current completion rate (tasks per second)."""
+ return len(self.completions) / self.window
+
+ def get_stats(self) -> dict:
+ """Get comprehensive statistics."""
+ return {
+ "window_seconds": self.window,
+ "completions_in_window": len(self.completions),
+ "rate_per_second": self.get_rate_per_second(),
+ "rate_per_minute": self.get_rate_per_minute(),
+ "total_completions": self.total_completions
+ }
+
+# Usage
+monitor = ThroughputMonitor(window_seconds=60)
+
+# Record completions as they happen
+for task in completed_tasks:
+ monitor.record_completion()
+
+# Get current rate
+stats = monitor.get_stats()
+print(f"Current throughput: {stats['rate_per_minute']:.2f} tasks/min")
+print(f"Tasks in last {stats['window_seconds']}s: {stats['completions_in_window']}")
+```
+
+### Connection Stability Metrics
+
+!!! warning "Monitor Client Connection Reliability"
+ Track disconnection rates to identify network or client issues. For more on client management, see the [Client Connection Manager](./client_connection_manager.md) documentation.
+
+```python
+from datetime import datetime, timedelta
+
+class ConnectionStabilityMonitor:
+ def __init__(self):
+ self.connections = []
+ self.disconnections = []
+ self.reconnections = {} # client_id -> count
+
+ def on_connect(self, client_id: str):
+ """Record client connection."""
+ now = datetime.now()
+ self.connections.append({
+ "client_id": client_id,
+ "timestamp": now
+ })
+
+ # Track reconnections
+ if client_id in self.reconnections:
+ self.reconnections[client_id] += 1
+ else:
+ self.reconnections[client_id] = 0
+
+ def on_disconnect(self, client_id: str, reason: str = "unknown"):
+ """Record client disconnection."""
+ now = datetime.now()
+ self.disconnections.append({
+ "client_id": client_id,
+ "timestamp": now,
+ "reason": reason
+ })
+
+ def get_stability_rate(self) -> float:
+ """
+ Calculate connection stability (0.0 to 1.0).
+ Returns: 1.0 - (disconnections / connections)
+ """
+ if not self.connections:
+ return 1.0
+
+ return 1.0 - (len(self.disconnections) / len(self.connections))
+
+ def get_disconnection_rate_per_hour(self) -> float:
+ """Get average disconnections per hour."""
+ if not self.disconnections:
+ return 0.0
+
+ first = self.disconnections[0]["timestamp"]
+ last = self.disconnections[-1]["timestamp"]
+ duration_hours = (last - first).total_seconds() / 3600
+
+ if duration_hours == 0:
+ return 0.0
+
+ return len(self.disconnections) / duration_hours
+
+ def get_flaky_clients(self, threshold=3) -> list:
+ """Get clients with excessive reconnections."""
+ return [
+ (client_id, count)
+ for client_id, count in self.reconnections.items()
+ if count >= threshold
+ ]
+
+ def get_stats(self) -> dict:
+ """Get comprehensive stability statistics."""
+ return {
+ "total_connections": len(self.connections),
+ "total_disconnections": len(self.disconnections),
+ "stability_rate": self.get_stability_rate(),
+ "disconnections_per_hour": self.get_disconnection_rate_per_hour(),
+ "flaky_clients": self.get_flaky_clients()
+ }
+
+# Usage
+monitor = ConnectionStabilityMonitor()
+
+# Record events
+monitor.on_connect("device_windows_001")
+monitor.on_disconnect("device_windows_001", reason="network_error")
+monitor.on_connect("device_windows_001") # Reconnection
+
+# Get statistics
+stats = monitor.get_stats()
+print(f"Stability: {stats['stability_rate'] * 100:.1f}%")
+print(f"Flaky clients: {stats['flaky_clients']}")
+```
+
+---
+
+## 📝 Logging and Analysis
+
+### Log Configuration
+
+Production logging setup:
+
+```python
+import logging
+ import sys
+ from logging.handlers import RotatingFileHandler, TimedRotatingFileHandler
+ import json
+ from datetime import datetime
+
+ # Custom JSON formatter for structured logging
+ class JsonFormatter(logging.Formatter):
+ def format(self, record):
+ log_data = {
+ "timestamp": datetime.utcnow().isoformat(),
+ "level": record.levelname,
+ "logger": record.name,
+ "message": record.getMessage(),
+ "module": record.module,
+ "function": record.funcName,
+ "line": record.lineno
+ }
+
+ # Add exception info if present
+ if record.exc_info:
+ log_data["exception"] = self.formatException(record.exc_info)
+
+ # Add custom fields
+ if hasattr(record, 'client_id'):
+ log_data["client_id"] = record.client_id
+ if hasattr(record, 'session_id'):
+ log_data["session_id"] = record.session_id
+
+ return json.dumps(log_data)
+
+ # Configure root logger
+ def setup_logging(log_level=logging.INFO, log_dir="logs"):
+ """Set up production logging configuration."""
+
+ # Create logger
+ logger = logging.getLogger()
+ logger.setLevel(log_level)
+
+ # Remove default handlers
+ logger.handlers = []
+
+ # Console handler (human-readable)
+ console_handler = logging.StreamHandler(sys.stdout)
+ console_handler.setLevel(logging.INFO)
+ console_formatter = logging.Formatter(
+ '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+ console_handler.setFormatter(console_formatter)
+ logger.addHandler(console_handler)
+
+ # File handler (JSON, rotating by size)
+ file_handler = RotatingFileHandler(
+ filename=f"{log_dir}/ufo_server.log",
+ maxBytes=10 * 1024 * 1024, # 10 MB
+ backupCount=5, # Keep 5 backup files
+ encoding='utf-8'
+ )
+ file_handler.setLevel(logging.DEBUG)
+ file_handler.setFormatter(JsonFormatter())
+ logger.addHandler(file_handler)
+
+ # Daily rotation handler (for long-term storage)
+ daily_handler = TimedRotatingFileHandler(
+ filename=f"{log_dir}/ufo_server_daily.log",
+ when='midnight',
+ interval=1,
+ backupCount=30, # Keep 30 days
+ encoding='utf-8'
+ )
+ daily_handler.setLevel(logging.INFO)
+ daily_handler.setFormatter(JsonFormatter())
+ logger.addHandler(daily_handler)
+
+ # Error-only handler (separate file for errors)
+ error_handler = RotatingFileHandler(
+ filename=f"{log_dir}/ufo_server_errors.log",
+ maxBytes=5 * 1024 * 1024, # 5 MB
+ backupCount=10,
+ encoding='utf-8'
+ )
+ error_handler.setLevel(logging.ERROR)
+ error_handler.setFormatter(JsonFormatter())
+ logger.addHandler(error_handler)
+
+ logger.info("Logging configured successfully")
+
+ # Usage
+ setup_logging(log_level=logging.INFO, log_dir="./logs")
+```
+
+### Log Event Categories
+
+**Key Events to Log:**
+
+**Connection Events:**
+
+```python
+# These log messages are generated by the WebSocket Handler
+# See: WebSocket Handler documentation for connection lifecycle details
+logger.info(f"[WS] ✅ Registered {client_type} client: {client_id}",
+ extra={"client_id": client_id, "client_type": client_type})
+
+logger.info(f"[WS] 🔌 Client disconnected: {client_id}",
+ extra={"client_id": client_id})
+```
+
+**Task Events:**
+
+```python
+# These log messages are generated by the Session Manager
+# See: Session Manager documentation for task lifecycle details
+logger.info(f"[Session] Created session {session_id} for task: {task_name}",
+ extra={"session_id": session_id, "task_name": task_name})
+
+ logger.info(f"[Session] Task completed: {session_id}",
+ extra={"session_id": session_id, "duration_seconds": duration})
+
+logger.warning(f"[Session] Task cancelled: {session_id} (reason: {reason})",
+ extra={"session_id": session_id, "cancel_reason": reason})
+```
+
+**Error Events:**
+
+```python
+logger.error(f"[WS] ❌ Failed to send result for session {session_id}: {error}",
+ extra={"session_id": session_id}, exc_info=True)
+
+logger.error(f"[Session] Task execution failed: {session_id}",
+ extra={"session_id": session_id}, exc_info=True)
+```
+
+### Log Analysis Scripts
+
+Parse and analyze JSON logs:
+
+```python
+ import json
+ from collections import Counter, defaultdict
+ from datetime import datetime
+
+ def analyze_logs(log_file="logs/ufo_server.log"):
+ """Analyze JSON logs and generate statistics."""
+
+ # Counters
+ level_counts = Counter()
+ module_counts = Counter()
+ error_types = Counter()
+ client_activity = defaultdict(int)
+ hourly_activity = defaultdict(int)
+
+ with open(log_file, 'r') as f:
+ for line in f:
+ try:
+ log = json.loads(line)
+
+ # Count by level
+ level_counts[log.get("level")] += 1
+
+ # Count by module
+ module_counts[log.get("module")] += 1
+
+ # Track errors
+ if log.get("level") in ["ERROR", "WARNING"]:
+ error_types[log.get("message")[:50]] += 1
+
+ # Track client activity
+ if "client_id" in log:
+ client_activity[log["client_id"]] += 1
+
+ # Hourly activity
+ timestamp = datetime.fromisoformat(log.get("timestamp"))
+ hour = timestamp.hour
+ hourly_activity[hour] += 1
+
+ except json.JSONDecodeError:
+ continue
+
+ # Print report
+ print("\n📊 Log Analysis Report")
+ print("=" * 80)
+
+ print("\n📈 Events by Level:")
+ for level, count in level_counts.most_common():
+ print(f" {level:10s}: {count:6d}")
+
+ print("\n📦 Events by Module:")
+ for module, count in module_counts.most_common(10):
+ print(f" {module:20s}: {count:6d}")
+
+ print("\n⚠️ Top Errors/Warnings:")
+ for message, count in error_types.most_common(5):
+ print(f" [{count:3d}] {message}")
+
+ print("\n👥 Top Active Clients:")
+ for client_id, count in sorted(client_activity.items(),
+ key=lambda x: x[1], reverse=True)[:10]:
+ print(f" {client_id:30s}: {count:6d} events")
+
+ print("\n🕐 Activity by Hour:")
+ for hour in sorted(hourly_activity.keys()):
+ bar = "█" * (hourly_activity[hour] // 10)
+ print(f" {hour:02d}:00 - {bar} ({hourly_activity[hour]} events)")
+
+ # Run analysis
+ analyze_logs("logs/ufo_server.log")
+```
+
+---
+
+## 🚨 Alerting Systems
+
+### Alert Conditions
+
+!!! danger "Critical Conditions to Monitor"
+
+ Track these critical conditions to maintain server reliability.
+
+ **1. No Connected Devices**
+
+ ```python
+ def check_device_availability():
+ """Alert if no devices are connected."""
+ response = requests.get("http://localhost:5000/api/clients")
+ clients = response.json()["online_clients"]
+
+ devices = [c for c in clients if c.startswith("device_")]
+
+ if len(devices) == 0:
+ send_alert(
+ severity="critical",
+ title="No Devices Connected",
+ message="UFO server has no connected devices. Task dispatch unavailable."
+ )
+ return False
+ elif len(devices) < 3:
+ send_alert(
+ severity="warning",
+ title="Low Device Count",
+ message=f"Only {len(devices)} devices connected (expected 3+)."
+ )
+
+ return True
+ ```
+
+ **2. High Error Rate**
+
+ ```python
+ def check_error_rate(log_file="logs/ufo_server.log", threshold=0.1):
+ """Alert if error rate exceeds threshold."""
+ import json
+
+ total = 0
+ errors = 0
+
+ with open(log_file, 'r') as f:
+ for line in f:
+ try:
+ log = json.loads(line)
+ total += 1
+ if log.get("level") in ["ERROR", "CRITICAL"]:
+ errors += 1
+ except:
+ continue
+
+ error_rate = errors / total if total > 0 else 0
+
+ if error_rate > threshold:
+ send_alert(
+ severity="warning",
+ title=f"High Error Rate: {error_rate * 100:.1f}%",
+ message=f"{errors} errors out of {total} log entries"
+ )
+ return False
+
+ return True
+ ```
+
+ **3. Slow Response Times**
+
+ ```python
+ def check_latency(threshold_ms=1000):
+ """Alert if health endpoint is slow."""
+ start = time.time()
+
+ try:
+ response = requests.get("http://localhost:5000/api/health", timeout=5)
+ latency_ms = (time.time() - start) * 1000
+
+ if latency_ms > threshold_ms:
+ send_alert(
+ severity="warning",
+ title=f"Slow Response Time: {latency_ms:.0f}ms",
+ message=f"/api/health responded in {latency_ms:.0f}ms (threshold: {threshold_ms}ms)"
+ )
+ return False
+
+ return True
+ except Exception as e:
+ send_alert(
+ severity="critical",
+ title="Health Check Failed",
+ message=f"Cannot reach health endpoint: {e}"
+ )
+ return False
+ ```
+
+ **4. Session Failure Rate**
+
+ ```python
+ def check_session_failure_rate(threshold=0.2):
+ """Alert if too many sessions are failing."""
+ # Requires session tracking in logs
+ import json
+
+ completed = 0
+ failed = 0
+
+ with open("logs/ufo_server.log", 'r') as f:
+ for line in f:
+ try:
+ log = json.loads(line)
+ message = log.get("message", "")
+
+ if "Task completed" in message:
+ completed += 1
+ elif "Task failed" in message or "Task cancelled" in message:
+ failed += 1
+ except:
+ continue
+
+ total = completed + failed
+ failure_rate = failed / total if total > 0 else 0
+
+ if failure_rate > threshold:
+ send_alert(
+ severity="warning",
+ title=f"High Task Failure Rate: {failure_rate * 100:.1f}%",
+ message=f"{failed} failed out of {total} tasks"
+ )
+ return False
+
+ return True
+ ```
+
+### Alert Delivery Methods
+
+**Email Alerts:**
+```python
+import smtplib
+from email.message import EmailMessage
+
+def send_email_alert(title, message, severity="info"):
+ """Send email alert via SMTP."""
+
+ # Email configuration
+ smtp_host = "smtp.gmail.com"
+ smtp_port = 587
+ sender = "alerts@example.com"
+ receiver = "admin@example.com"
+ password = "your_app_password"
+
+ # Create message
+ msg = EmailMessage()
+ msg['Subject'] = f"[{severity.upper()}] UFO Server - {title}"
+ msg['From'] = sender
+ msg['To'] = receiver
+
+ # Email body
+ body = f"""
+ UFO Server Alert
+
+ Severity: {severity.upper()}
+ Title: {title}
+
+ Message:
+ {message}
+
+ Timestamp: {datetime.now().isoformat()}
+
+ --
+ UFO Server Monitoring System
+ """
+ msg.set_content(body)
+
+ try:
+ with smtplib.SMTP(smtp_host, smtp_port) as server:
+ server.starttls()
+ server.login(sender, password)
+ server.send_message(msg)
+
+ logging.info(f"Email alert sent: {title}")
+ except Exception as e:
+ logging.error(f"Failed to send email alert: {e}")
+```
+
+**Slack Alerts:**
+```python
+import requests
+
+def send_slack_alert(title, message, severity="info"):
+ """Send alert to Slack via webhook."""
+
+ webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
+
+ # Color coding by severity
+ colors = {
+ "critical": "#ff0000",
+ "error": "#ff6600",
+ "warning": "#ffcc00",
+ "info": "#00ccff"
+ }
+
+ # Slack message payload
+ payload = {
+ "attachments": [{
+ "color": colors.get(severity, "#cccccc"),
+ "title": f"🚨 UFO Server Alert - {title}",
+ "text": message,
+ "fields": [
+ {
+ "title": "Severity",
+ "value": severity.upper(),
+ "short": True
+ },
+ {
+ "title": "Timestamp",
+ "value": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+ "short": True
+ }
+ ],
+ "footer": "UFO Server Monitoring"
+ }]
+ }
+
+ try:
+ response = requests.post(webhook_url, json=payload, timeout=5)
+ response.raise_for_status()
+ logging.info(f"Slack alert sent: {title}")
+ except Exception as e:
+ logging.error(f"Failed to send Slack alert: {e}")
+```
+
+**PagerDuty Integration:**
+```python
+import requests
+
+def send_pagerduty_alert(title, message, severity="error"):
+ """Send alert to PagerDuty."""
+
+ routing_key = "YOUR_PAGERDUTY_ROUTING_KEY"
+
+ # Map severity to PagerDuty severity
+ pd_severity_map = {
+ "critical": "critical",
+ "error": "error",
+ "warning": "warning",
+ "info": "info"
+ }
+
+ payload = {
+ "routing_key": routing_key,
+ "event_action": "trigger",
+ "payload": {
+ "summary": title,
+ "source": "ufo-server",
+ "severity": pd_severity_map.get(severity, "error"),
+ "custom_details": {
+ "message": message,
+ "timestamp": datetime.now().isoformat()
+ }
+ }
+ }
+
+ try:
+ response = requests.post(
+ "https://events.pagerduty.com/v2/enqueue",
+ json=payload,
+ timeout=5
+ )
+ response.raise_for_status()
+ logging.info(f"PagerDuty alert sent: {title}")
+ except Exception as e:
+ logging.error(f"Failed to send PagerDuty alert: {e}")
+```
+
+### Unified Alert Function
+
+```python
+def send_alert(title: str, message: str, severity: str = "info",
+ channels: list = ["slack", "email"]):
+ """
+ Send alert to multiple channels.
+
+ Args:
+ title: Alert title
+ message: Alert message
+ severity: One of "critical", "error", "warning", "info"
+ channels: List of channels to send to
+ """
+ for channel in channels:
+ try:
+ if channel == "slack":
+ send_slack_alert(title, message, severity)
+ elif channel == "email":
+ send_email_alert(title, message, severity)
+ elif channel == "pagerduty":
+ send_pagerduty_alert(title, message, severity)
+ else:
+ logging.warning(f"Unknown alert channel: {channel}")
+ except Exception as e:
+ logging.error(f"Failed to send alert via {channel}: {e}")
+
+# Usage
+send_alert(
+ title="Server Healthy",
+ message="All systems operational",
+ severity="info",
+ channels=["slack"]
+)
+
+send_alert(
+ title="No Devices Connected",
+ message="Critical: UFO server has no connected devices",
+ severity="critical",
+ channels=["slack", "email", "pagerduty"]
+)
+```
+
+---
+
+## Best Practices
+
+### 1. Monitoring Strategy
+
+**Layered Monitoring Approach:**
+
+| Layer | Purpose | Frequency |
+|-------|---------|-----------|
+| **Health Checks** | Service availability | Every 30-60 seconds |
+| **Performance Metrics** | Response times, throughput | Continuous |
+| **Error Logs** | Debugging and diagnostics | Real-time |
+| **Alerts** | Critical issue notification | Event-driven |
+
+### 2. Alert Thresholds
+
+!!! warning "Avoid Alert Fatigue"
+ Set reasonable thresholds to prevent excessive alerting:
+
+ - **No devices for > 5 minutes**: Critical
+ - **Error rate > 10%**: Warning
+ - **Response time > 2 seconds**: Warning
+ - **Session failure rate > 20%**: Warning
+ - **3 consecutive health check failures**: Critical
+
+### 3. Log Retention
+
+**Log Retention Policy:**
+
+| Log Type | Retention | Storage |
+|----------|-----------|---------|
+| **Detailed logs** | 7 days | Local SSD |
+| **Summary logs** | 30 days | Local disk |
+| **Monthly summaries** | 1 year | Archive storage |
+| **Error logs** | 90 days | Separate file |
+
+### 4. Performance Baselines
+
+**Establish Baselines:**
+
+Track normal operating metrics to detect anomalies:
+
+ ```python
+ BASELINE_METRICS = {
+ "health_latency_ms": 10, # Typical: 5-15ms
+ "clients_latency_ms": 15, # Typical: 10-20ms
+ "active_clients": 5, # Expected: 3-10
+ "tasks_per_minute": 2, # Expected: 1-5
+ "error_rate": 0.02, # Expected: < 5%
+ }
+
+ # Alert if deviation > 50%
+ if actual_latency > BASELINE_METRICS["health_latency_ms"] * 1.5:
+ send_alert("Performance degradation detected")
+ ```
+
+### 5. Multi-Channel Alerting
+
+!!!example "Route Alerts by Severity"
+
+ ```python
+ ALERT_ROUTING = {
+ "critical": ["slack", "email", "pagerduty"],
+ "error": ["slack", "email"],
+ "warning": ["slack"],
+ "info": ["slack"]
+ }
+
+ def send_alert(title, message, severity="info"):
+ channels = ALERT_ROUTING.get(severity, ["slack"])
+ # Send to appropriate channels...
+```
+
+---
+
+## 🎓 Summary
+
+Production monitoring requires a **layered approach** combining health checks, performance metrics, structured logging, and proactive alerting.
+
+**Monitoring Stack:**
+
+```mermaid
+graph LR
+ subgraph "Collect"
+ HC[Health Checks]
+ PM[Metrics]
+ LOG[Logs]
+ end
+
+ subgraph "Store & Analyze"
+ Files[Log Files]
+ Dash[Dashboard]
+ end
+
+ subgraph "Alert"
+ Rules[Alert Rules]
+ Notify[Notifications]
+ end
+
+ HC --> Dash
+ PM --> Dash
+ LOG --> Files
+
+ Files --> Rules
+ Dash --> Rules
+ Rules --> Notify
+
+ style HC fill:#bbdefb
+ style PM fill:#c8e6c9
+ style LOG fill:#fff9c4
+ style Rules fill:#ffcdd2
+```
+
+**Key Takeaways:**
+
+1. **Health Checks**: Use `/api/health` for liveness/readiness probes
+2. **Metrics**: Track latency, throughput, and stability continuously
+3. **Logging**: Use structured JSON logs for machine-readable analysis
+4. **Alerting**: Set up multi-channel alerts with appropriate thresholds
+5. **Dashboards**: Build real-time dashboards for visibility
+
+**For More Information:**
+
+- [HTTP API](./api.md) - Health endpoint details
+- [Client Connection Manager](./client_connection_manager.md) - Client statistics
+- [Session Manager](./session_manager.md) - Task tracking
+- [Quick Start](./quick_start.md) - Get started with UFO server
+
+## Next Steps
+
+- [Quick Start](./quick_start.md) - Get the server running
+- [HTTP API](./api.md) - API endpoint reference
+- [WebSocket Handler](./websocket_handler.md) - Connection management
+- [Session Manager](./session_manager.md) - Task execution tracking
+
diff --git a/documents/docs/server/overview.md b/documents/docs/server/overview.md
new file mode 100644
index 000000000..9f7a66b76
--- /dev/null
+++ b/documents/docs/server/overview.md
@@ -0,0 +1,780 @@
+# Agent Server Overview
+
+The **Agent Server** is the central orchestration engine that transforms UFO into a distributed multi-agent system, enabling seamless task coordination across heterogeneous devices through persistent WebSocket connections and robust state management.
+
+New to the Agent Server? Start with the [Quick Start Guide](./quick_start.md) to get up and running in minutes.
+
+## What is the Agent Server?
+
+The Agent Server is a **FastAPI-based asynchronous WebSocket server** that serves as the communication hub for UFO's distributed architecture. It bridges constellation orchestrators, device agents, and external systems through a unified protocol interface.
+
+### Core Responsibilities
+
+| Capability | Description | Key Benefit |
+|------------|-------------|-------------|
+| **🔌 Connection Management** | Tracks device & constellation client lifecycles | Real-time device availability awareness |
+| **🎯 Task Orchestration** | Coordinates execution across distributed devices | Centralized workflow control |
+| **💾 State Management** | Maintains session lifecycles & execution contexts | Stateful multi-turn task execution |
+| **🌐 Dual API Interface** | WebSocket (AIP) + HTTP (REST) endpoints | Flexible integration options |
+| **🛡️ Resilience** | Handles disconnections, timeouts, failures gracefully | Production-grade reliability |
+
+**Why Use the Agent Server?**
+
+- **Centralized Control**: Single point of orchestration for multi-device workflows
+- **Protocol Abstraction**: Clients communicate via [AIP](../aip/overview.md), hiding network complexity
+- **Async by Design**: Non-blocking execution enables high concurrency
+- **Platform Agnostic**: Supports Windows, Linux, macOS (in development)
+
+The Agent Server is part of UFO's distributed **server-client architecture**, where it handles orchestration and state management while [Agent Clients](../client/overview.md) handle command execution. See [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md) for the complete design rationale and communication patterns.
+
+---
+
+## Architecture
+
+The server follows a clean separation of concerns with distinct layers for web service, connection management, and protocol handling.
+
+### Architectural Overview
+
+**Component Interaction Diagram:**
+
+```mermaid
+graph TB
+ subgraph "Web Layer"
+ FastAPI[FastAPI App]
+ HTTP[HTTP API]
+ WS[WebSocket /ws]
+ end
+
+ subgraph "Service Layer"
+ WSM[Client Manager]
+ SM[Session Manager]
+ WSH[WebSocket Handler]
+ end
+
+ subgraph "Clients"
+ DC[Device Clients]
+ CC[Constellation Clients]
+ end
+
+ FastAPI --> HTTP
+ FastAPI --> WS
+
+ HTTP --> SM
+ HTTP --> WSM
+ WS --> WSH
+
+ WSH --> WSM
+ WSH --> SM
+
+ DC -->|WebSocket| WS
+ CC -->|WebSocket| WS
+
+ style FastAPI fill:#e1f5ff
+ style WSM fill:#fff4e1
+ style SM fill:#f0ffe1
+ style WSH fill:#ffe1f5
+```
+
+This layered design ensures each component has a single, well-defined responsibility. The managers maintain state while the handler implements protocol logic.
+
+### Core Components
+
+| Component | Responsibility | Key Operations |
+|-----------|---------------|----------------|
+| **FastAPI Application** | Web service layer | ✅ HTTP endpoint routing ✅ WebSocket connection acceptance ✅ Request/response handling ✅ CORS and middleware |
+| **Client Connection Manager** | Connection registry | ✅ Client identity tracking ✅ Session ↔ client mapping ✅ Device info caching ✅ Connection lifecycle hooks |
+| **Session Manager** | Execution lifecycle | ✅ Platform-specific session creation ✅ Background async task execution ✅ Result callback delivery ✅ Session cancellation |
+| **WebSocket Handler** | Protocol implementation | ✅ AIP message parsing/routing ✅ Client registration ✅ Heartbeat monitoring ✅ Task/command dispatch |
+
+**Component Documentation:**
+- [Session Manager](./session_manager.md) - Session lifecycle and background execution
+- [Client Connection Manager](./client_connection_manager.md) - Connection registry and client tracking
+- [WebSocket Handler](./websocket_handler.md) - AIP protocol message handling
+- [HTTP API](./api.md) - REST endpoint specifications
+
+---
+
+## Key Capabilities
+
+### 1. Multi-Client Coordination
+
+The server supports two distinct client types with different roles in the distributed architecture.
+
+**Client Type Comparison:**
+
+| Aspect | Device Client | Constellation Client |
+|--------|---------------|---------------------|
+| **Role** | Task executor | Task orchestrator |
+| **Connection** | Long-lived WebSocket | Long-lived WebSocket |
+| **Registration** | `ClientType.DEVICE` | `ClientType.CONSTELLATION` |
+| **Capabilities** | Local execution, telemetry | Multi-device coordination |
+| **Target Field** | Not required | Required for routing |
+| **Example** | Windows agent, Linux agent | ConstellationClient orchestrator |
+
+**Device Clients**
+- Execute tasks locally on Windows/Linux machines
+- Report hardware specs and real-time status
+- Respond to commands via MCP tool servers
+- Stream execution logs back to server
+
+See [Agent Client Overview](../client/overview.md) for detailed client architecture.
+
+**Constellation Clients**
+- Orchestrate multi-device workflows from a central point
+- Dispatch tasks to specific target devices via `target_id`
+- Coordinate complex cross-device DAG execution
+- Aggregate results from multiple devices
+
+Both client types connect to `/ws` and register using the `REGISTER` message. The server differentiates behavior based on `client_type` field. For the complete server-client architecture and design rationale, see [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md).
+
+See [Quick Start](./quick_start.md) for registration examples.
+
+---
+
+### 2. Session Lifecycle Management
+
+Unlike stateless HTTP servers, the Agent Server maintains **session state** throughout task execution, enabling multi-turn interactions and result callbacks.
+
+**Session Lifecycle State Machine:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> Created: create_session()
+ Created --> Running: Start execution
+ Running --> Completed: Success
+ Running --> Failed: Error
+ Running --> Cancelled: Disconnect
+ Completed --> [*]
+ Failed --> [*]
+ Cancelled --> [*]
+
+ note right of Running
+ Async background execution
+ Non-blocking server
+ end note
+```
+
+**Lifecycle Stages:**
+
+| Stage | Trigger | Session Manager Action | Server State |
+|-------|---------|----------------------|--------------|
+| **Created** | HTTP dispatch or AIP `TASK` | Platform-specific session instantiation | Session ID generated |
+| **Running** | Background task start | Async execution without blocking | Awaiting results |
+| **Completed** | `TASK_END` (success) | Callback delivery to client | Results cached |
+| **Failed** | `TASK_END` (error) | Error callback delivery | Error logged |
+| **Cancelled** | Client disconnect | Cancel async task, cleanup | Session removed |
+
+!!!warning "Platform-Specific Sessions"
+ The SessionManager creates different session types based on the target platform:
+ - **Windows**: `WindowsSession` with UI automation support
+ - **Linux**: `LinuxSession` with bash automation
+ - Auto-detected or overridden via `--platform` flag
+
+**Session Manager Responsibilities:**
+
+- ✅ **Platform abstraction**: Hides Windows/Linux differences
+- ✅ **Background execution**: Non-blocking async task execution
+- ✅ **Callback routing**: Delivers results via WebSocket
+- ✅ **Resource cleanup**: Cancels tasks on disconnect
+- ✅ **Result caching**: Stores results for HTTP retrieval
+
+---
+
+### 3. Resilient Communication
+
+The server implements the [Agent Interaction Protocol (AIP)](../aip/overview.md), providing structured, type-safe communication with automatic failure handling.
+
+**Protocol Features:**
+
+| Feature | Implementation | Benefit |
+|---------|----------------|---------|
+| **Structured Messages** | Pydantic models with validation | Type safety, automatic serialization |
+| **Connection Health** | Heartbeat every 20-30s | Early failure detection |
+| **Error Recovery** | Exponential backoff reconnection | Transient fault tolerance |
+| **State Tracking** | Session client mapping | Proper cleanup on disconnect |
+| **Message Correlation** | `request_id`, `prev_response_id` chains | Request-response tracing |
+
+**Disconnection Handling Flow:**
+
+```mermaid
+sequenceDiagram
+ participant Client
+ participant Server
+ participant SM as Session Manager
+
+ Client-xServer: Connection lost
+ Server->>SM: Cancel sessions
+ SM->>SM: Cleanup resources
+ Server->>Server: Remove from registry
+
+ Note over Server: Client can reconnect with same client_id
+```
+
+!!!danger "Important: Session Cancellation on Disconnect"
+ When a client disconnects (device or constellation), **all associated sessions are immediately cancelled** to prevent orphaned tasks and resource leaks.
+
+---
+
+### 4. Dual API Interface
+
+The server provides two API styles to support different integration patterns: real-time WebSocket for agents and simple HTTP for external systems.
+
+**WebSocket API (AIP-based)**
+
+Purpose: Real-time bidirectional communication with agent clients
+
+| Message Type | Direction | Purpose |
+|--------------|-----------|---------|
+| `REGISTER` | Client Server | Initial capability advertisement |
+| `TASK` | Server Client | Task assignment with commands |
+| `COMMAND` | Server Client | Individual command execution |
+| `COMMAND_RESULTS` | Client Server | Execution results |
+| `TASK_END` | Bidirectional | Task completion notification |
+| `HEARTBEAT` | Bidirectional | Connection keepalive |
+| `DEVICE_INFO_REQUEST/RESPONSE` | Bidirectional | Telemetry exchange |
+| `ERROR` | Bidirectional | Error condition reporting |
+
+!!!example "WebSocket Connection"
+ ```python
+ import websockets
+
+ async with websockets.connect("ws://localhost:5000/ws") as ws:
+ # Register as device client
+ await ws.send(json.dumps({
+ "message_type": "REGISTER",
+ "client_id": "windows_agent_001",
+ "client_type": "device",
+ "metadata": {"platform": "windows", "gpu": "NVIDIA RTX 3080"}
+ }))
+ ```
+
+**HTTP REST API**
+
+Purpose: Task dispatch and monitoring from external systems (HTTP clients, CI/CD, etc.)
+
+| Endpoint | Method | Purpose | Authentication |
+|----------|--------|---------|----------------|
+| `/api/dispatch` | POST | Dispatch task to device | Optional (if configured) |
+| `/api/task_result/{task_name}` | GET | Retrieve task results | Optional |
+| `/api/clients` | GET | List connected clients | Optional |
+| `/api/health` | GET | Server health check | None |
+
+!!!example "HTTP Task Dispatch"
+ ```bash
+ # Dispatch task to device
+ curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "my_windows_device",
+ "request": "Open Notepad and type Hello World",
+ "task_name": "test_task_001"
+ }'
+
+ # Response: {"status": "dispatched", "session_id": "session_abc123", "task_name": "test_task_001"}
+
+ # Retrieve results
+ curl http://localhost:5000/api/task_result/test_task_001
+ ```
+
+See [HTTP API Reference](./api.md) for complete endpoint documentation.
+
+---
+
+## Workflow Examples
+
+### Complete Task Dispatch Flow
+
+**End-to-End HTTP WebSocket Device Execution:**
+
+```mermaid
+sequenceDiagram
+ participant EXT as External System
+ participant HTTP as HTTP API
+ participant SM as Session Manager
+ participant WSH as WebSocket Handler
+ participant DC as Device Client
+
+ EXT->>HTTP: POST /api/dispatch {client_id, request, task_name}
+ HTTP->>SM: create_session()
+ SM->>SM: Create platform session
+ SM-->>HTTP: session_id
+ HTTP-->>EXT: 200 {session_id, task_name}
+
+ SM->>WSH: send_task(session_id, task)
+ WSH->>DC: TASK message (AIP)
+ DC-->>WSH: ACK
+
+ rect rgb(240, 255, 240)
+ Note over DC: Background Execution
+ DC->>DC: Execute via MCP tools
+ DC->>DC: Generate results
+ end
+
+ DC->>WSH: COMMAND_RESULTS
+ WSH->>SM: on_result_callback()
+ SM->>SM: Cache results
+
+ DC->>WSH: TASK_END (COMPLETED)
+ WSH->>SM: on_task_end()
+
+ EXT->>HTTP: GET /task_result/{session_id}
+ HTTP->>SM: get_results()
+ SM-->>HTTP: results
+ HTTP-->>EXT: 200 {results}
+```
+
+The green box highlights async execution on the device side, which doesn't block the server.
+
+### Multi-Device Constellation Workflow
+
+**Constellation Client Coordinating Multiple Devices:**
+
+```mermaid
+sequenceDiagram
+ participant CC as Constellation Client
+ participant Server as Agent Server
+ participant D1 as Device 1 (GPU)
+ participant D2 as Device 2 (CPU)
+
+ CC->>Server: REGISTER (constellation)
+ Server-->>CC: HEARTBEAT (OK)
+
+ Note over CC: Plan multi-device DAG
+
+ CC->>Server: TASK (target: device_1) Subtask 1: Image processing
+ Server->>D1: TASK (forward)
+
+ CC->>Server: TASK (target: device_2) Subtask 2: Data extraction
+ Server->>D2: TASK (forward)
+
+ par Parallel Execution
+ D1->>D1: Process image on GPU
+ D2->>D2: Extract data from DB
+ end
+
+ D1->>Server: COMMAND_RESULTS
+ Server->>CC: COMMAND_RESULTS (from device_1)
+
+ D2->>Server: COMMAND_RESULTS
+ Server->>CC: COMMAND_RESULTS (from device_2)
+
+ Note over CC: Combine results, Update DAG
+
+ D1->>Server: TASK_END
+ D2->>Server: TASK_END
+ Server->>CC: TASK_END (both tasks)
+```
+
+The server acts as a message router, forwarding tasks to target devices and routing results back to the constellation orchestrator. See [Constellation Documentation](../galaxy/overview.md) for more details on multi-device orchestration.
+
+---
+
+## Platform Support
+
+The server automatically detects client platforms and creates appropriate session implementations.
+
+**Supported Platforms:**
+
+| Platform | Session Type | Capabilities | Status |
+|----------|--------------|--------------|--------|
+| **Windows** | `WindowsSession` | UI automation (UIA) COM API integration Native app control Screenshot capture | Full support |
+| **Linux** | `LinuxSession` | Bash automation GUI tools (xdotool) Package management Process control | Full support |
+| **macOS** | (Planned) | AppleScript UI automation Native app control | 🚧 In development |
+
+**Platform Auto-Detection:**
+
+The server automatically detects the client's platform during registration. You can override this globally with the `--platform` flag when needed for testing or specific deployment scenarios.
+
+```bash
+python -m ufo.server.app --platform windows # Force Windows sessions
+python -m ufo.server.app --platform linux # Force Linux sessions
+python -m ufo.server.app # Auto-detect (default)
+```
+
+!!!warning "When to Use Platform Override"
+ Use `--platform` override when:
+ - Testing cross-platform sessions without actual devices
+ - Running server in container different from target platform
+ - Debugging platform-specific session behavior
+
+For more details on platform-specific implementations, see [Windows Agent](../linux/overview.md) and [Linux Agent](../linux/overview.md).
+
+---
+
+## Configuration
+
+The server runs out-of-the-box with sensible defaults. Advanced configuration inherits from UFO's central config system.
+
+### Command-Line Arguments
+
+```bash
+python -m ufo.server.app [OPTIONS]
+```
+
+**Available Options:**
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `--port` | int | 5000 | Server listening port |
+| `--host` | str | `0.0.0.0` | Bind address (use `127.0.0.1` for localhost only) |
+| `--platform` | str | auto | Force platform (`windows`, `linux`) |
+| `--log-level` | str | `INFO` | Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`, `OFF`) |
+| `--local` | flag | False | Restrict to local connections only |
+
+!!!example "Configuration Examples"
+ ```bash
+ # Development: Local-only with debug logging
+ python -m ufo.server.app --local --log-level DEBUG --port 8000
+
+ # Production: External access, info logging
+ python -m ufo.server.app --host 0.0.0.0 --port 5000 --log-level INFO
+
+ # Testing: Force Linux sessions
+ python -m ufo.server.app --platform linux --port 9000
+ ```
+
+### UFO Configuration Inheritance
+
+The server uses UFO's central configuration from `config_dev.yaml`:
+
+| Config Section | Inherited Settings |
+|----------------|-------------------|
+| **Agent Strategies** | HostAgent, AppAgent, EvaluationAgent configurations |
+| **LLM Models** | Model endpoints, API keys, temperature settings |
+| **Automators** | UI automation, COM API, web automation configs |
+| **Logging** | Log file paths, rotation, format |
+| **Prompts** | Agent system prompts, example templates |
+
+See [Configuration Guide](../configuration/system/overview.md) for comprehensive config documentation.
+
+---
+
+## Monitoring & Operations
+
+### Health Monitoring
+
+Monitor server status and performance using HTTP endpoints.
+
+**Health Check Endpoints:**
+
+```bash
+# Server health and uptime
+curl http://localhost:5000/api/health
+
+# Response:
+# {
+# "status": "healthy",
+# "online_clients": [...]
+# }
+
+# Connected clients list
+curl http://localhost:5000/api/clients
+
+# Response:
+# {
+# "online_clients": ["windows_001", "linux_002", ...]
+# }
+```
+
+For comprehensive monitoring strategies including performance metrics collection, log aggregation patterns, alert configuration, and dashboard setup, see [Monitoring Guide](./monitoring.md).
+
+### Error Handling
+
+The server handles common failure scenarios gracefully to maintain system stability.
+
+**Disconnection Handling Matrix:**
+
+| Scenario | Server Detection | Automatic Action | Client Impact |
+|----------|-----------------|------------------|---------------|
+| **Device Disconnect** | Heartbeat timeout / WebSocket close | Cancel device sessions, notify constellation | Task fails, constellation retries |
+| **Constellation Disconnect** | Heartbeat timeout / WebSocket close | Continue device execution, skip callbacks | Device completes but results not delivered |
+| **Task Execution Failure** | `TASK_END` with error status | Log error, store in results | Client receives error via callback/HTTP |
+| **Network Partition** | Heartbeat timeout | Mark disconnected, enable reconnection | Client reconnects with same ID |
+| **Server Crash** | N/A | Clients detect via heartbeat | Clients reconnect to new instance |
+
+!!!note "Reconnection Support"
+ Clients can reconnect with the same `client_id`. The server will re-register the client and restore heartbeat monitoring, but **will not restore previous sessions** (sessions are ephemeral).
+
+---
+
+## Best Practices
+
+### Development Environment
+
+Optimize your development workflow with these recommended practices.
+
+**Recommended Development Configuration:**
+
+```bash
+# Isolate to localhost, enable detailed logging
+python -m ufo.server.app \
+ --host 127.0.0.1 \
+ --port 5000 \
+ --local \
+ --log-level DEBUG
+```
+
+**Development Checklist:**
+
+- Use `--local` flag to prevent external access
+- Enable `DEBUG` logging for detailed traces
+- Monitor logs in separate terminal: `tail -f logs/ufo_server.log`
+- Test with single device before adding multiple clients
+- Use HTTP API for quick task dispatch testing
+- Verify heartbeat monitoring with client disconnection
+
+!!!example "Development Testing Pattern"
+ ```bash
+ # Terminal 1: Start server with debug logging
+ python -m ufo.server.app --local --log-level DEBUG
+
+ # Terminal 2: Connect device client
+ python -m ufo.client.client --ws --ws-server ws://127.0.0.1:5000/ws
+
+ # Terminal 3: Dispatch test task
+ curl -X POST http://127.0.0.1:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{"client_id": "windowsagent", "request": "Open Notepad", "task_name": "test_001"}'
+ ```
+
+### Production Deployment
+
+The default configuration is **not production-ready**. Implement these security and reliability measures.
+
+**Production Architecture:**
+
+```mermaid
+graph LR
+ Internet[Internet]
+ LB[Load Balancer nginx/HAProxy]
+ SSL[SSL/TLS Termination]
+
+ subgraph "UFO Server Cluster"
+ S1[Server Instance 1 :5000]
+ S2[Server Instance 2 :5001]
+ S3[Server Instance 3 :5002]
+ end
+
+ Monitor[Monitoring Prometheus/Grafana]
+ PM[Process Manager systemd/PM2]
+
+ Internet --> LB
+ LB --> SSL
+ SSL --> S1
+ SSL --> S2
+ SSL --> S3
+
+ PM -.Manages.-> S1
+ PM -.Manages.-> S2
+ PM -.Manages.-> S3
+
+ S1 -.Metrics.-> Monitor
+ S2 -.Metrics.-> Monitor
+ S3 -.Metrics.-> Monitor
+
+ style LB fill:#ffe1f5
+ style SSL fill:#fff4e1
+ style Monitor fill:#f0ffe1
+```
+
+**Production Checklist:**
+
+| Category | Recommendation | Rationale |
+|----------|---------------|-----------|
+| **Reverse Proxy** | nginx, Apache, or cloud load balancer | SSL termination, rate limiting, DDoS protection |
+| **SSL/TLS** | Enable WSS (WebSocket Secure) | Encrypt client-server communication |
+| **Authentication** | Add auth middleware to FastAPI | Prevent unauthorized access |
+| **Process Management** | systemd (Linux), PM2 (Node.js), Docker | Auto-restart on crash, resource limits |
+| **Monitoring** | `/api/health` polling, metrics export | Detect issues proactively |
+| **Logging** | Structured logging, log aggregation (ELK) | Centralized debugging and audit trails |
+| **Resource Limits** | Set max connections, memory limits | Prevent resource exhaustion |
+
+**Example Nginx Configuration:**
+
+```nginx
+upstream ufo_server {
+ server localhost:5000;
+}
+
+server {
+ listen 443 ssl;
+ server_name ufo-server.example.com;
+
+ ssl_certificate /path/to/cert.pem;
+ ssl_certificate_key /path/to/key.pem;
+
+ # WebSocket endpoint
+ location /ws {
+ proxy_pass http://ufo_server;
+ proxy_http_version 1.1;
+ proxy_set_header Upgrade $http_upgrade;
+ proxy_set_header Connection "upgrade";
+ proxy_set_header Host $host;
+ proxy_read_timeout 3600s;
+ }
+
+ # HTTP API
+ location /api/ {
+ proxy_pass http://ufo_server;
+ proxy_set_header Host $host;
+ }
+}
+```
+
+
+
+### Scaling Strategies
+
+The server can scale horizontally for high-load deployments, but requires careful session management.
+
+**Scaling Patterns:**
+
+| Pattern | Description | Use Case | Considerations |
+|---------|-------------|----------|----------------|
+| **Vertical** | Increase CPU/RAM on single instance | < 100 concurrent clients | Simplest, no session distribution |
+| **Horizontal (Sticky Sessions)** | Multiple instances with session affinity | 100-1000 clients | Load balancer routes same client to same instance |
+| **Horizontal (Shared State)** | Multiple instances with Redis | > 1000 clients | Requires session state externalization |
+
+!!!warning "Current Limitation"
+ The current implementation stores session state in-memory. For horizontal scaling, use **sticky sessions** (client affinity) in your load balancer to route clients to consistent server instances. **Future**: Shared state backend (Redis) for true stateless horizontal scaling.
+
+
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue: Clients Can't Connect**
+
+```bash
+# Symptom: Connection refused
+Error: WebSocket connection to 'ws://localhost:5000/ws' failed
+
+# Diagnosis:
+1. Check server is running: curl http://localhost:5000/api/health
+2. Verify port: netstat -an | grep 5000
+3. Check firewall: sudo ufw status
+
+# Solution:
+# Start server with correct host binding
+python -m ufo.server.app --host 0.0.0.0 --port 5000
+```
+
+**Issue: Sessions Not Executing**
+
+```bash
+# Symptom: Task dispatched but no results
+
+# Diagnosis:
+1. Check server logs for errors
+2. Verify client is connected: curl http://localhost:5000/api/clients
+3. Check target_id matches client_id
+
+# Solution:
+# Ensure client_id in request matches registered client
+curl -X POST http://localhost:5000/api/dispatch \
+ -d '{"client_id": "correct_client_id", "request": "test", "task_name": "test_001"}'
+```
+
+**Issue: Memory Leak / High Memory Usage**
+
+```bash
+# Symptom: Server memory grows over time
+
+# Diagnosis:
+1. Check session cleanup in logs
+2. Monitor /api/health for session count
+3. Profile with memory_profiler
+
+# Solution:
+# Ensure clients send TASK_END to complete sessions
+# Restart server periodically (systemd handles this)
+# Implement session timeout (future feature)
+```
+
+### Debug Mode
+
+!!!example "Enable Maximum Verbosity"
+ ```bash
+ # Ultra-verbose debugging
+ python -m ufo.server.app \
+ --log-level DEBUG \
+ --local \
+ --port 5000 2>&1 | tee debug.log
+
+ # Watch logs in real-time
+ tail -f debug.log | grep -E "(ERROR|WARNING|Session|WebSocket)"
+ ```
+
+---
+
+## Documentation Map
+
+Explore related documentation to deepen your understanding of the Agent Server ecosystem.
+
+### Getting Started
+
+| Document | Purpose |
+|----------|---------|
+| [Quick Start](./quick_start.md) | Get server running in < 5 minutes |
+| [Client Registration](./quick_start.md) | How clients connect to server |
+
+### Architecture & Components
+
+| Document | Purpose |
+|----------|---------|
+| [Session Manager](./session_manager.md) | Task execution lifecycle deep-dive |
+| [Client Connection Manager](./client_connection_manager.md) | Connection registry internals |
+| [WebSocket Handler](./websocket_handler.md) | AIP protocol message handling |
+| [HTTP API](./api.md) | REST endpoint specifications |
+
+### Operations
+
+| Document | Purpose |
+|----------|---------|
+| [Monitoring](./monitoring.md) | Health checks, metrics, alerting |
+
+### Related Documentation
+
+| Document | Purpose |
+|----------|---------|
+| [AIP Protocol](../aip/overview.md) | Communication protocol specification |
+| [Agent Architecture](../infrastructure/agents/overview.md) | Agent design and FSM framework |
+| [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md) | Distributed architecture rationale |
+| [Client Overview](../client/overview.md) | Device client architecture |
+| [MCP Integration](../mcp/overview.md) | Model Context Protocol tool servers |
+
+---
+
+## Next Steps
+
+Follow this recommended sequence to master the Agent Server:
+
+**1. Run the Server** (5 minutes)
+- Follow the [Quick Start Guide](./quick_start.md)
+- Verify server responds to `/api/health`
+
+**2. Connect a Client** (10 minutes)
+- Use [Device Client](../client/quick_start.md)
+- Verify registration in server logs
+- Check `/api/clients` endpoint
+
+**3. Dispatch Tasks** (15 minutes)
+- Use [HTTP API](./api.md) to send tasks
+- Retrieve results via `/api/task_result`
+- Observe WebSocket message flow in logs
+
+**4. Understand Architecture** (30 minutes)
+- Read [Session Manager](./session_manager.md) internals
+- Study [WebSocket Handler](./websocket_handler.md) protocol implementation
+- Review [AIP Protocol](../aip/overview.md) message types
+
+**5. Deploy to Production** (varies)
+- Set up reverse proxy (nginx)
+- Configure SSL/TLS
+- Implement monitoring
+- Test failover scenarios
+
+
\ No newline at end of file
diff --git a/documents/docs/server/quick_start.md b/documents/docs/server/quick_start.md
new file mode 100644
index 000000000..19392c6de
--- /dev/null
+++ b/documents/docs/server/quick_start.md
@@ -0,0 +1,606 @@
+# Quick Start
+
+This hands-on guide walks you through starting the UFO Agent Server, connecting clients, and dispatching your first task. Perfect for first-time users.
+
+---
+
+## 📋 Prerequisites
+
+Before you begin, ensure you have:
+
+- **Python 3.10+** installed
+- **UFO dependencies** installed (`pip install -r requirements.txt`)
+- **Network connectivity** for WebSocket connections
+- **Terminal access** (PowerShell, bash, or equivalent)
+
+| Component | Minimum Version | Recommended |
+|-----------|----------------|-------------|
+| Python | 3.10 | 3.11+ |
+| FastAPI | 0.104+ | Latest |
+| Uvicorn | 0.24+ | Latest |
+| UFO | - | Latest commit |
+
+---
+
+## 🚀 Starting the Server
+
+### Basic Startup
+
+Start the server with default settings (port **5000**):
+
+```bash
+python -m ufo.server.app
+```
+
+**Expected Output:**
+
+```console
+2024-11-04 14:30:22 - ufo.server.app - INFO - Starting UFO Server on 0.0.0.0:5000
+2024-11-04 14:30:22 - ufo.server.app - INFO - Platform: auto-detected
+2024-11-04 14:30:22 - ufo.server.app - INFO - Log level: WARNING
+INFO: Started server process [12345]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
+```
+
+Once you see "Uvicorn running", the server is ready to accept WebSocket connections at `ws://0.0.0.0:5000/ws`.
+
+### Configuration Options
+
+| Argument | Type | Default | Description | Example |
+|----------|------|---------|-------------|---------|
+| `--port` | int | `5000` | Server listening port | `--port 8080` |
+| `--host` | str | `0.0.0.0` | Bind address (0.0.0.0 = all interfaces) | `--host 127.0.0.1` |
+| `--platform` | str | `auto` | Platform override (`windows`, `linux`) | `--platform windows` |
+| `--log-level` | str | `WARNING` | Logging verbosity | `--log-level DEBUG` |
+| `--local` | flag | `False` | Restrict to localhost connections only | `--local` |
+
+**Common Startup Configurations:**
+
+**Development (Local Only):**
+```bash
+python -m ufo.server.app --local --log-level DEBUG
+```
+- Accepts connections only from `localhost`
+- Verbose debug logging
+- Default port 5000
+
+**Custom Port:**
+```bash
+python -m ufo.server.app --port 8080
+```
+- Useful if port 5000 is already in use
+- Accessible from network
+
+**Production (Linux):**
+```bash
+python -m ufo.server.app --port 5000 --platform linux --log-level WARNING
+```
+- Explicit platform specification
+- Reduced logging for performance
+- Production-ready configuration
+
+**Multi-Interface Binding:**
+```bash
+python -m ufo.server.app --host 192.168.1.100 --port 5000
+```
+- Binds to specific network interface
+- Useful for multi-homed servers
+
+---
+
+## 🖥️ Connecting Device Clients
+
+A Device Client is an agent running on a physical or virtual machine that can execute tasks. Each device connects via WebSocket and registers with a unique `client_id`.
+
+Once the server is running, connect device agents using the command line:
+
+### Platform-Specific Commands
+
+**Windows Device:**
+```bash
+python -m ufo.client.client --ws --ws-server ws://127.0.0.1:5000/ws --client-id my_windows_device
+```
+
+**Linux Device:**
+```bash
+python -m ufo.client.client --ws --ws-server ws://127.0.0.1:5000/ws --client-id my_linux_device --platform linux
+```
+
+When a client connects successfully, the server logs will display:
+```console
+INFO: [WS] 📱 Device client my_windows_device connected
+```
+
+### Client Connection Parameters
+
+| Parameter | Required | Type | Description | Example |
+|-----------|----------|------|-------------|---------|
+| `--ws` | Yes | flag | Enable WebSocket mode (vs. local mode) | `--ws` |
+| `--ws-server` | Yes | URL | Server WebSocket endpoint | `ws://127.0.0.1:5000/ws` |
+| `--client-id` | Yes | string | Unique device identifier (must be unique across all clients) | `device_win_001` |
+| `--platform` | ⚠️ Optional | string | Platform type: `windows`, `linux` | `--platform windows` |
+
+!!!warning "Important: Client ID Uniqueness"
+ Each `client_id` must be globally unique. If a client connects with an existing ID, the old connection will be terminated.
+
+!!!tip "Platform Auto-Detection"
+ If you don't specify `--platform`, the client will auto-detect the operating system. However, **explicit specification is recommended** for clarity.
+
+### Registration Protocol Flow
+
+```mermaid
+sequenceDiagram
+ participant C as Device Client
+ participant S as Agent Server
+
+ C->>S: WebSocket CONNECT /ws
+ S-->>C: Connection accepted
+
+ C->>S: REGISTER {client_id, platform}
+ S->>S: Validate & register
+ S-->>C: REGISTER_CONFIRM
+
+ Note over C: Client Ready
+```
+
+The registration process uses the **Agent Interaction Protocol (AIP)** for structured communication. See [AIP Documentation](../aip/overview.md) for details.
+
+---
+
+## 🌌 Connecting Constellation Clients
+
+A Constellation Client is an orchestrator that coordinates multi-device tasks. It connects to the server and can dispatch work across multiple registered device clients.
+
+### Basic Constellation Connection
+
+```bash
+python -m galaxy.constellation.constellation --ws --ws-server ws://127.0.0.1:5000/ws --target-id my_windows_device
+```
+
+### Constellation Parameters
+
+| Parameter | Required | Description | Example |
+|-----------|----------|-------------|---------|
+| `--ws` | Yes | Enable WebSocket mode | `--ws` |
+| `--ws-server` | Yes | Server WebSocket URL | `ws://127.0.0.1:5000/ws` |
+| `--target-id` | ⚠️ Optional | Initial target device ID for tasks | `my_windows_device` |
+
+!!!danger "Important: Target Device Must Be Online"
+ If you specify `--target-id`, that device **must already be connected** to the server. Otherwise, registration will fail with: `Target device 'my_windows_device' is not connected`
+
+A constellation can dynamically dispatch tasks to different devices, not just the `target-id`. For more on multi-device orchestration, see [Constellation Documentation](../galaxy/overview.md).
+
+---
+
+## Verifying the Setup
+
+### Method 1: Check Connected Clients
+
+Use the HTTP API to verify connections:
+
+```bash
+curl http://localhost:5000/api/clients
+```
+
+**Expected Response:**
+
+```json
+{
+ "online_clients": ["my_windows_device", "my_linux_device"]
+}
+```
+
+If you see your `client_id` in the list, the device is successfully connected and ready to receive tasks.
+
+### Method 2: Health Check
+
+```bash
+curl http://localhost:5000/api/health
+```
+
+**Expected Response:**
+
+```json
+{
+ "status": "healthy",
+ "online_clients": ["my_windows_device"]
+}
+```
+
+The `/api/health` endpoint is useful for health checks in production monitoring systems.
+
+---
+
+## 🎯 Dispatching Your First Task
+
+The easiest way to send a task to a connected device is through the HTTP `/api/dispatch` endpoint.
+
+### Basic Task Dispatch
+
+Use the HTTP API to dispatch a task to a connected device:
+
+```bash
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "my_windows_device",
+ "request": "Open Notepad and type Hello World",
+ "task_name": "test_task_001"
+ }'
+```
+
+**Request Body Parameters:**
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `client_id` | Yes | string | Target device identifier | `"my_windows_device"` |
+| `request` | Yes | string | Natural language task description | `"Open Notepad"` |
+| `task_name` | ⚠️ Optional | string | Unique task identifier (auto-generated if omitted) | `"task_001"` |
+
+**Successful Response:**
+
+```json
+{
+ "status": "dispatched",
+ "task_name": "test_task_001",
+ "client_id": "my_windows_device",
+ "session_id": "3f4a2b1c-9d8e-4f3a-b2c1-9a8b7c6d5e4f"
+}
+```
+
+The `status: "dispatched"` indicates the task was successfully sent to the device. The device will begin executing immediately.
+
+!!!warning "Client Must Be Online"
+ If the target `client_id` is not connected, you'll receive `{"detail": "Client not online"}`. Use `/api/clients` to verify the device is connected first.
+
+### Task Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant API as HTTP Client
+ participant S as Server
+ participant D as Device
+
+ API->>S: POST /api/dispatch
+ S->>D: TASK (AIP)
+ D->>D: Execute task
+ D->>S: TASK_RESULT
+ API->>S: GET /task_result
+ S->>API: Results
+```
+
+For detailed API specifications, see [HTTP API Reference](./api.md).
+
+### Checking Task Results
+
+Use the task name to retrieve results:
+
+```bash
+curl http://localhost:5000/api/task_result/test_task_001
+```
+
+**While Task is Running:**
+
+```json
+{
+ "status": "pending"
+}
+```
+
+**When Task Completes:**
+
+```json
+{
+ "status": "done",
+ "result": {
+ "action_taken": "Opened Notepad and typed 'Hello World'",
+ "screenshot": "base64_encoded_image...",
+ "observation": "Task completed successfully"
+ }
+}
+```
+
+!!!tip "Polling Best Practice"
+ For long-running tasks, poll every 2-5 seconds. Most simple tasks complete within 10-30 seconds.
+
+### Advanced Task Dispatch
+
+**Complex Multi-Step Task:**
+```bash
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "my_windows_device",
+ "request": "Open Excel, create a new worksheet, and enter sales data for Q4 2024",
+ "task_name": "excel_q4_report"
+ }'
+```
+
+**Web Automation Task:**
+```bash
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "my_windows_device",
+ "request": "Open Chrome, navigate to GitHub.com, and search for UFO framework",
+ "task_name": "github_search"
+ }'
+```
+
+**File Management Task:**
+```bash
+curl -X POST http://localhost:5000/api/dispatch \
+ -H "Content-Type: application/json" \
+ -d '{
+ "client_id": "my_linux_device",
+ "request": "Create a folder named test_data and copy all .txt files from Documents",
+ "task_name": "file_organization"
+ }'
+```
+
+---
+
+## 🐛 Common Issues & Troubleshooting
+
+### Issue 1: Port Already in Use
+
+**Symptoms:**
+```console
+ERROR: [Errno 98] Address already in use
+```
+
+**Cause:** Another process is already using port 5000.
+
+**Solutions:**
+
+**Use Different Port:**
+```bash
+python -m ufo.server.app --port 8080
+```
+
+**Find & Kill Process (Linux/Mac):**
+```bash
+# Find process using port 5000
+lsof -i :5000
+
+# Kill the process
+kill -9
+```
+
+**Find & Kill Process (Windows):**
+```powershell
+# Find process using port 5000
+netstat -ano | findstr :5000
+
+# Kill the process
+taskkill /PID /F
+```
+
+### Issue 2: Connection Refused
+
+**Symptoms:**
+```console
+[WS] Failed to connect to ws://127.0.0.1:5000/ws
+Connection refused
+```
+
+**Diagnosis Checklist:**
+
+- Is the server actually running? Check for "Uvicorn running" message
+- Does the port match in both server and client commands?
+- Are you using `--local` mode? If yes, clients must connect from `localhost`
+- Is there a firewall blocking the connection?
+
+**Solutions:**
+
+1. Verify server is running:
+ ```bash
+ curl http://localhost:5000/api/health
+ ```
+
+2. Check server logs for startup errors
+
+3. If using `--local` mode, ensure client uses `127.0.0.1`
+
+4. If connecting from another machine, remove `--local` flag
+
+### Issue 3: Device Not Connected Error
+
+**Symptoms:**
+When dispatching a task:
+```json
+{
+ "detail": "Client not online"
+}
+```
+
+**Diagnosis:**
+
+1. List all connected clients:
+ ```bash
+ curl http://localhost:5000/api/clients
+ ```
+
+2. Check the `client_id` matches exactly (case-sensitive!)
+
+**Solutions:**
+
+- Verify the device client is running and successfully registered
+- Check server logs for `📱 Device client connected`
+- Ensure no typos in `client_id` when dispatching
+- If the device disconnected, restart the client connection
+
+### Issue 4: Empty Task Content Error
+
+**Symptoms:**
+```json
+{
+ "detail": "Empty task content"
+}
+```
+
+**Cause:** The `request` field in `/api/dispatch` is missing or empty.
+
+**Solution:** Always include the `request` field with a task description.
+
+### Issue 5: Firewall Blocking Connections
+ **Symptoms:** Clients on other machines cannot connect, but `curl localhost:5000/api/health` works on server machine.
+
+ **Diagnosis:**
+
+ 1. **Check server is listening on all interfaces:**
+ ```bash
+ # Should show 0.0.0.0:5000 (not 127.0.0.1:5000)
+ netstat -tuln | grep 5000
+ ```
+
+ 2. **Test from remote machine:**
+ ```bash
+ curl http://:5000/api/health
+ ```
+
+ **Solutions:**
+
+ **Windows Firewall:**
+ ```powershell
+ # Allow incoming connections on port 5000
+ New-NetFirewallRule -DisplayName "UFO Server" `
+ -Direction Inbound `
+ -Protocol TCP `
+ -LocalPort 5000 `
+ -Action Allow
+ ```
+
+ **Linux (ufw):**
+ ```bash
+ sudo ufw allow 5000/tcp
+ sudo ufw reload
+ ```
+
+ **Linux (firewalld):**
+ ```bash
+ sudo firewall-cmd --zone=public --add-port=5000/tcp --permanent
+ sudo firewall-cmd --reload
+ ```
+
+### Issue 6: Target Device Not Connected (Constellation)
+
+**Symptoms:**
+```console
+Target device 'my_windows_device' is not connected
+```
+
+**Solution:**
+
+1. Connect the device client first
+2. Wait for registration confirmation (check server logs)
+3. Then connect constellation
+
+!!!tip "Debug Mode"
+ For maximum verbosity, start the server with: `python -m ufo.server.app --log-level DEBUG`
+
+---
+
+## 📚 Next Steps
+
+Now that you have the server running and can dispatch tasks, explore these topics:
+
+### Immediate Next Steps
+
+| Step | Topic | Time | Description |
+|------|-------|------|-------------|
+| 1️⃣ | [Server Architecture](./overview.md) | 10 min | Understand the three-tier architecture and component interactions |
+| 2️⃣ | [HTTP API Reference](./api.md) | 15 min | Explore all available API endpoints for integration |
+| 3️⃣ | [Client Setup Guide](../client/quick_start.md) | 10 min | Learn advanced client configuration options |
+| 4️⃣ | [AIP Protocol](../aip/overview.md) | 20 min | Deep dive into the Agent Interaction Protocol |
+
+### Advanced Topics
+
+| Topic | Relevance | Link |
+|-------|-----------|------|
+| **Session Management** | Understanding task lifecycle and state | [Session Manager](./session_manager.md) |
+| **WebSocket Handler** | Low-level connection handling | [WebSocket Handler](./websocket_handler.md) |
+| **Monitoring & Operations** | Production deployment best practices | [Monitoring](./monitoring.md) |
+| **Constellation Mode** | Multi-device orchestration | Coming Soon |
+
+---
+
+## 🚀 Production Deployment
+
+!!!warning "Production Readiness Checklist"
+ Before deploying to production, ensure you address these critical areas:
+
+### 1. Process Management
+
+!!!example "Systemd Service (Linux)"
+ Create `/etc/systemd/system/ufo-server.service`:
+
+ ```ini
+ [Unit]
+ Description=UFO Agent Server
+ After=network.target
+
+ [Service]
+ Type=simple
+ User=ufo
+ WorkingDirectory=/opt/ufo
+ Environment="PATH=/opt/ufo/venv/bin"
+ ExecStart=/opt/ufo/venv/bin/python -m ufo.server.app --port 5000 --log-level INFO
+ Restart=always
+ RestartSec=10
+ StandardOutput=journal
+ StandardError=journal
+
+ [Install]
+ WantedBy=multi-user.target
+ ```
+
+ **Enable and start:**
+ ```bash
+ sudo systemctl daemon-reload
+ sudo systemctl enable ufo-server
+ sudo systemctl start ufo-server
+ sudo systemctl status ufo-server
+ ```
+
+**PM2 Process Manager (Cross-Platform):**
+```bash
+# Install PM2
+npm install -g pm2
+
+# Start server with PM2
+pm2 start "python -m ufo.server.app --port 5000" --name ufo-server
+
+# Setup auto-restart on system boot
+pm2 startup
+pm2 save
+
+# Monitor
+pm2 logs ufo-server
+pm2 monit
+```
+
+For complete production deployment guidance including SSL/TLS, security hardening, and scaling strategies, see [Server Overview - Production Deployment](./overview.md#production-deployment).
+
+---
+
+## 🎓 What You Learned
+
+You've successfully:
+
+- Started the UFO Agent Server with custom configurations
+- Connected device and constellation clients via WebSocket
+- Dispatched tasks using the HTTP API
+- Verified connections and monitored health
+- Troubleshot common issues
+- Learned production deployment best practices
+
+Continue your journey with:
+
+- **Architecture Deep Dive**: [Server Overview](./overview.md)
+- **API Exploration**: [HTTP API Reference](./api.md)
+- **Client Development**: [Client Documentation](../client/overview.md)
+- **Multi-Device Coordination**: [Constellation Overview](../galaxy/overview.md)
+
\ No newline at end of file
diff --git a/documents/docs/server/session_manager.md b/documents/docs/server/session_manager.md
new file mode 100644
index 000000000..4d8035282
--- /dev/null
+++ b/documents/docs/server/session_manager.md
@@ -0,0 +1,1449 @@
+# Session Manager
+
+The **SessionManager** orchestrates agent session lifecycles, coordinates background task execution, and maintains execution state across the server. It serves as the "execution engine" that powers UFO's autonomous task capabilities.
+
+For context on how this component fits into the server architecture, see the [Server Overview](overview.md).
+
+---
+
+## 🎯 Overview
+
+The SessionManager is a critical server component that bridges task dispatch and actual execution:
+
+| Capability | Description | Benefit |
+|------------|-------------|---------|
+| **Platform-Agnostic Creation** | Automatically creates Windows/Linux sessions | No manual platform handling needed |
+| **Background Execution** | Tasks run without blocking WebSocket event loop | Maintains connection health during long tasks |
+| **State Tracking** | Monitors session lifecycle (created → running → completed/failed) | Enables task monitoring & result retrieval |
+| **Graceful Cancellation** | Handles disconnections with context-aware cleanup | Prevents orphaned tasks & resource leaks |
+| **Concurrent Management** | Multiple sessions can run simultaneously | Supports multi-device orchestration |
+
+### Architecture Position
+
+```mermaid
+graph TB
+ subgraph "Agent Server"
+ WH[WebSocket Handler]
+ SM[Session Manager]
+ SF[Session Factory]
+
+ subgraph "Sessions"
+ WS[Windows Service Session]
+ LS[Linux Service Session]
+ LOC[Local Session]
+ end
+ end
+
+ Client[Device Client] -->|WebSocket| WH
+ WH -->|"execute_task_async()"| SM
+ SM -->|"create session"| SF
+ SF -->|"platform=windows"| WS
+ SF -->|"platform=linux"| LS
+ SF -->|"local=true"| LOC
+
+ WS -->|"execute commands"| Client
+ LS -->|"execute commands"| Client
+
+ SM -->|"callback(result)"| WH
+ WH -->|"TASK_END message"| Client
+
+ style SM fill:#ffecb3
+ style SF fill:#c8e6c9
+ style WH fill:#bbdefb
+```
+
+**Why Background Execution Matters:**
+
+Without background execution, a long-running task (e.g., 5-minute workflow) would **block the WebSocket event loop**, preventing:
+
+- Heartbeat messages from being sent/received
+- Ping/pong frames from maintaining the connection
+- Other clients' tasks from being dispatched
+
+Background execution solves this by using Python's `asyncio.create_task()` to run sessions concurrently.
+
+---
+
+## 🏗 Core Functionality
+
+### Session Creation
+
+The SessionManager uses the **SessionFactory** pattern to create platform-specific session implementations. This abstraction layer automatically selects the correct session type based on platform and mode.
+
+**Creating a Session:**
+
+```python
+session = session_manager.get_or_create_session(
+ session_id="session_abc123",
+ task_name="create_file",
+ request="Open Notepad and create a file",
+ task_protocol=task_protocol, # AIP TaskExecutionProtocol instance
+ platform_override="windows" # or "linux" or None (auto-detect)
+)
+```
+
+**Session Types:**
+
+| Session Type | Use Case | Platform | Dispatcher | MCP Tools |
+|--------------|----------|----------|------------|-----------|
+| **ServiceSession (Windows)** | Remote Windows device | Windows | AIP protocol-based | Windows MCP servers |
+| **LinuxServiceSession** | Remote Linux device | Linux | AIP protocol-based | Linux MCP servers |
+| **Local Session** | Local testing/debugging | Any | Direct execution | Local MCP servers |
+
+**Platform Detection:**
+
+If `platform_override=None`, the SessionManager uses Python's `platform.system()` to auto-detect:
+
+- `"Windows"` → ServiceSession (Windows)
+- `"Linux"` → LinuxServiceSession
+- `"Darwin"` (macOS) → Currently uses LinuxServiceSession
+
+**Session Factory Logic Flow:**
+
+```mermaid
+graph LR
+ A[get_or_create_session] --> B{Session exists?}
+ B -->|Yes| C[Return existing]
+ B -->|No| D{local mode?}
+ D -->|Yes| E[Create Local Session]
+ D -->|No| F{Platform?}
+ F -->|windows| G[ServiceSession]
+ F -->|linux| H[LinuxServiceSession]
+ E --> I[Store in sessions dict]
+ G --> I
+ H --> I
+ I --> J[Return session]
+
+ style D fill:#ffe0b2
+ style F fill:#ffe0b2
+ style I fill:#c8e6c9
+```
+
+### Background Execution
+
+The **critical innovation** of the SessionManager is background task execution using `asyncio.create_task()`. This prevents long-running sessions from blocking the WebSocket event loop.
+
+**Execute Task Asynchronously:**
+
+```python
+await session_manager.execute_task_async(
+ session_id=session_id,
+ task_name=task_name,
+ request=user_request,
+ task_protocol=task_protocol, # AIP TaskExecutionProtocol instance
+ platform_override="windows",
+ callback=result_callback # Called when task completes
+)
+```
+
+**Benefits of Background Execution:**
+
+| Benefit | Description | Impact |
+|---------|-------------|--------|
+| **WebSocket Health** | Ping/pong continues uninterrupted | Prevents connection timeouts (30-60s) |
+| **Heartbeat Flow** | Heartbeat messages can be sent/received | Maintains connection liveness |
+| **Concurrent Sessions** | Multiple sessions run simultaneously | Supports multi-device orchestration |
+| **Event Loop Availability** | Handler can process other messages | Responsive to new connections/dispatches |
+| **Graceful Cancellation** | Tasks can be cancelled mid-execution | Clean disconnection handling |
+
+**Background Execution Flow:**
+
+```mermaid
+sequenceDiagram
+ participant WH as WebSocket Handler
+ participant SM as Session Manager
+ participant BT as Background Task
+ participant S as Session
+ participant CB as Callback
+
+ Note over WH,SM: 1️⃣ Task Dispatch
+ WH->>SM: execute_task_async(session_id, request, callback)
+ SM->>SM: get_or_create_session()
+ SM->>BT: asyncio.create_task(_run_session_background)
+ SM-->>WH: Return immediately (non-blocking!)
+
+ Note over WH: Event loop free for other tasks
+ WH->>WH: Can process heartbeats, ping/pong, new tasks
+
+ Note over BT,S: 2️⃣ Background Execution
+ BT->>S: await session.run()
+ S->>S: LLM reasoning Action selection Execution
+ Note over S: Long-running task (30s - 5min)
+ S-->>BT: Execution complete
+
+ Note over BT,CB: 3️⃣ Result Callback
+ BT->>BT: Build ServerMessage (TASK_END)
+ BT->>SM: set_results(session_id)
+ BT->>CB: await callback(session_id, result_message)
+ CB->>WH: Send result via WebSocket
+
+ Note over BT: 4️⃣ Cleanup
+ BT->>SM: Remove from _running_tasks dict
+```
+
+**Thread Safety:**
+
+The SessionManager uses `threading.Lock` for thread-safe access to shared dictionaries:
+
+```python
+with self.lock:
+ self.sessions[session_id] = session
+```
+
+This prevents race conditions in multi-threaded environments (though FastAPI primarily uses async/await).
+
+### Callback Mechanism
+
+When a task completes (successfully, with errors, or via cancellation), the SessionManager invokes a registered callback function. This decouples task execution from result delivery.
+
+**Registering a Callback:**
+
+```python
+async def send_result_to_client(session_id: str, result_msg: ServerMessage):
+ """Called when task completes."""
+ await websocket.send_text(result_msg.model_dump_json())
+ logger.info(f"Sent TASK_END for {session_id}")
+
+await session_manager.execute_task_async(
+ session_id="abc123",
+ task_name="open_notepad",
+ request="Open Notepad",
+ task_protocol=task_protocol,
+ callback=send_result_to_client # Register callback
+)
+```
+
+**Callback Execution Flow:**
+
+```mermaid
+stateDiagram-v2
+ [*] --> TaskRunning: Background task starts
+ TaskRunning --> ResultsCollected: session.run() completes
+ ResultsCollected --> StatusDetermined: Check session.is_finished() / is_error()
+ StatusDetermined --> MessageBuilt: Create ServerMessage(TASK_END)
+ MessageBuilt --> ResultsPersisted: set_results(session_id)
+ ResultsPersisted --> CallbackInvoked: await callback(session_id, message)
+ CallbackInvoked --> [*]: Cleanup _running_tasks
+
+ TaskRunning --> TaskCancelled: asyncio.CancelledError
+ TaskCancelled --> CancellationHandled: Check cancellation_reason
+ CancellationHandled --> MessageBuilt: Create failure message
+
+ TaskRunning --> ErrorOccurred: Exception raised
+ ErrorOccurred --> ErrorLogged: Log traceback
+ ErrorLogged --> MessageBuilt: Create error message
+```
+
+**ServerMessage Structure:**
+
+```python
+result_message = ServerMessage(
+ type=ServerMessageType.TASK_END,
+ status=TaskStatus.COMPLETED, # or FAILED
+ session_id="abc123",
+ error=None, # or error message if failed
+ result=session.results, # Dict[str, Any]
+ timestamp="2024-11-04T14:30:22.123456+00:00",
+ response_id="uuid-v4"
+)
+```
+
+| Field | Type | Description | Example |
+|-------|------|-------------|---------|
+| `type` | ServerMessageType | Always `TASK_END` for completion | `ServerMessageType.TASK_END` |
+| `status` | TaskStatus | `COMPLETED`, `FAILED`, or `CANCELLED` | `TaskStatus.COMPLETED` |
+| `session_id` | str | Session identifier | `"abc123"` |
+| `error` | Optional[str] | Error message if task failed | `"Device disconnected"` |
+| `result` | Dict[str, Any] | Task execution results | `{"action": "opened notepad", "screenshot": "..."}` |
+| `timestamp` | str | ISO 8601 timestamp (UTC) | `"2024-11-04T14:30:22Z"` |
+| `response_id` | str | Unique response UUID | `"3f4a2b1c-9d8e-4f3a-b2c1-..."` |
+
+**Callback Error Handling:**
+
+If the callback raises an exception, the SessionManager **logs the error but doesn't fail the session**:
+
+```python
+try:
+ await callback(session_id, result_message)
+except Exception as e:
+ self.logger.error(f"Callback error: {e}")
+ # Session results are still persisted!
+```
+
+This prevents callback bugs from breaking task execution.
+
+### Task Cancellation
+
+The SessionManager supports **graceful task cancellation** with different behaviors based on **why** the cancellation occurred. This is critical for handling client disconnections properly.
+
+**Cancel a Running Task:**
+
+```python
+await session_manager.cancel_task(
+ session_id="session_abc123",
+ reason="device_disconnected" # or "constellation_disconnected"
+)
+```
+
+**Cancellation Reasons:**
+
+| Reason | Scenario | Callback Behavior | Use Case |
+|--------|----------|-------------------|----------|
+| `constellation_disconnected` | Constellation client lost connection | **No callback** (client is gone) | Task requester disconnected, no one to notify |
+| `device_disconnected` | Target device lost connection | **Send callback** to constellation | Notify orchestrator to reassign task |
+| `user_requested` | Manual cancellation via API | **Send callback** to requester | Explicit cancellation command |
+
+**Cancellation Flow:**
+
+```mermaid
+sequenceDiagram
+ participant C as Client (Constellation)
+ participant WH as WebSocket Handler
+ participant SM as Session Manager
+ participant BT as Background Task
+ participant D as Device
+
+ Note over C,BT: Scenario 1: Device Disconnects During Task
+ C->>WH: Task dispatched to device
+ WH->>SM: execute_task_async(session_id, callback)
+ SM->>BT: Background task running
+ BT->>D: Executing actions
+
+ Note over D: Device disconnects
+ D--xWH: WebSocket closed
+ WH->>SM: cancel_task(session_id, reason="device_disconnected")
+ SM->>BT: task.cancel()
+ BT->>BT: Catch asyncio.CancelledError
+ BT->>BT: Build failure message
+ BT->>WH: callback(session_id, failure_msg)
+ WH->>C: TASK_END (status=FAILED, error="Device disconnected")
+
+ Note over C,SM: Scenario 2: Constellation Disconnects During Task
+ C->>WH: Task dispatched
+ WH->>SM: execute_task_async()
+ SM->>BT: Background task running
+
+ Note over C: Constellation disconnects
+ C--xWH: WebSocket closed
+ WH->>SM: cancel_task(session_id, reason="constellation_disconnected")
+ SM->>BT: task.cancel()
+ BT->>BT: Catch asyncio.CancelledError
+ BT->>BT: Skip callback (client gone)
+ BT->>SM: Remove session
+```
+
+**Cancellation Implementation Details:**
+
+```python
+async def cancel_task(self, session_id: str, reason: str) -> bool:
+ """Cancel a running background task."""
+ task = self._running_tasks.get(session_id)
+
+ if task and not task.done():
+ # Store reason for use in _run_session_background
+ self._cancellation_reasons[session_id] = reason
+
+ # Request cancellation
+ task.cancel()
+
+ # Wait for graceful shutdown (max 2 seconds)
+ try:
+ await asyncio.wait_for(task, timeout=2.0)
+ except (asyncio.CancelledError, asyncio.TimeoutError):
+ pass # Expected
+
+ # Cleanup
+ self._running_tasks.pop(session_id, None)
+ self._cancellation_reasons.pop(session_id, None)
+ self.remove_session(session_id)
+
+ return True
+return False
+```
+
+**Important Notes:**
+
+- **Cancellation is asynchronous**: The background task receives an `asyncio.CancelledError` at the next `await` point. If the session is executing synchronous code (e.g., LLM inference), cancellation won't take effect until that operation completes.
+- **Grace Period**: The SessionManager waits up to **2 seconds** for graceful cancellation before giving up.
+
+**Best Practice:**
+
+When a client disconnects, the WebSocket Handler should:
+
+1. Identify all active sessions for that client
+2. Call `cancel_task()` with the appropriate `reason`
+3. Clean up client registration in ClientConnectionManager
+
+This prevents orphaned sessions from consuming resources.
+
+---
+
+## 🔄 Session Lifecycle
+
+Sessions follow a predictable lifecycle from initial dispatch through execution to final cleanup. Understanding this flow is essential for debugging and monitoring.```mermaid
+stateDiagram-v2
+ [*] --> Created: get_or_create_session()
+ Created --> Stored: Add to sessions dict
+ Stored --> BackgroundTask: execute_task_async()
+ BackgroundTask --> Running: await session.run()
+
+ Running --> Completed: session.is_finished() == True
+ Running --> Failed: session.is_error() == True
+ Running --> Cancelled: asyncio.CancelledError
+ Running --> Exception: Exception raised
+
+ Completed --> ResultsCollected: Gather session.results
+ Failed --> ResultsCollected: Include error details
+ Cancelled --> ResultsCollected: Include cancellation reason
+ Exception --> ResultsCollected: Include exception message
+
+ ResultsCollected --> ResultsPersisted: set_results(session_id)
+ ResultsPersisted --> CallbackInvoked: await callback(session_id, message)
+ CallbackInvoked --> Cleanup: remove_session(session_id)
+ Cleanup --> [*]
+```
+
+### Lifecycle Stages
+
+| Stage | Description | Key Operations | Duration |
+|-------|-------------|----------------|----------|
+| **1. Creation** | Session object instantiated | `get_or_create_session()` | < 100ms |
+| **2. Registration** | Stored in sessions dict with ID | `sessions[session_id] = session` | < 10ms |
+| **3. Background Dispatch** | Task created with `asyncio.create_task()` | `_running_tasks[session_id] = task` | < 50ms |
+| **4. Execution** | Session runs (LLM + actions) | `await session.run()` | 10s - 5min |
+| **5. Result Collection** | Gather results and determine status | `session.results`, `session.is_finished()` | < 100ms |
+| **6. Persistence** | Save results to results dict | `set_results(session_id)` | < 10ms |
+| **7. Callback** | Notify registered callback | `await callback(session_id, msg)` | 50-500ms |
+| **8. Cleanup** | Remove from active sessions | `remove_session(session_id)` | < 10ms |
+
+**Complete Lifecycle Example:**
+
+```python
+# Stage 1-2: Creation
+session = session_manager.get_or_create_session(
+ session_id="abc123",
+ task_name="demo_task",
+ request="Open Notepad",
+ task_protocol=task_protocol,
+ platform_override="windows"
+)
+
+# Stage 3: Background Dispatch
+await session_manager.execute_task_async(
+ session_id="abc123",
+ task_name="demo_task",
+ request="Open Notepad",
+ task_protocol=task_protocol,
+ platform_override="windows",
+ callback=send_result_callback
+)
+# Returns immediately! Task runs in background
+
+# Stage 4: Execution (happens in background)
+# session.run() executes:
+# - LLM reasoning
+# - Action selection
+# - Command execution via device
+# - Result observation
+
+# Stage 5-6: Results (automatic)
+# Session completes, results collected and persisted
+
+# Stage 7: Callback (automatic)
+# await callback("abc123", ServerMessage(...))
+
+# Stage 8: Cleanup (manual or automatic)
+session_manager.remove_session("abc123")
+```
+
+**Session Persistence:**
+
+Sessions remain in the `sessions` dict until explicitly removed via `remove_session()`. This allows:
+
+- **Result retrieval** via `/api/task_result/{task_name}`
+- **Session inspection** for debugging
+- **Reconnection scenarios** (future feature)
+
+However, this means **sessions consume memory** until cleaned up. Implement periodic cleanup for production deployments.
+
+---
+
+## 💾 State Management
+
+The SessionManager maintains three separate dictionaries for different aspects of session state:
+
+### 1. Active Sessions Storage
+
+```python
+self.sessions: Dict[str, BaseSession] = {}
+```
+
+| Purpose | Structure | Lifecycle | Thread Safety |
+|---------|-----------|-----------|---------------|
+| Store active session objects | `{session_id: BaseSession}` | Until `remove_session()` called | `threading.Lock` |
+
+**Session Storage Operations:**
+
+```python
+# Store session
+with self.lock:
+ self.sessions[session_id] = session
+
+# Retrieve session
+with self.lock:
+ session = self.sessions.get(session_id)
+
+# Remove session
+with self.lock:
+ self.sessions.pop(session_id, None)
+```
+
+**Benefits:**
+
+- Fast O(1) lookup by session ID
+- Thread-safe with lock
+- Supports session reuse (future reconnections)
+
+**Considerations:**
+
+- ⚠️ Memory grows with active sessions
+- ⚠️ Manual cleanup required (`remove_session()`)
+- ⚠️ No automatic expiration
+
+### 2. Result Caching
+
+```python
+self.results: Dict[str, Dict[str, Any]] = {}
+```
+
+| Purpose | Structure | When Populated | Retrieval Methods |
+|---------|-----------|----------------|-------------------|
+| Cache completed task results | `{session_id: results_dict}` | After task completion via `set_results()` | `get_result()`, `get_result_by_task()` |
+
+**Result Storage & Retrieval:**
+
+```python
+# Persist results after completion
+def set_results(self, session_id: str):
+ with self.lock:
+ if session_id in self.sessions:
+ self.results[session_id] = self.sessions[session_id].results
+
+# Retrieve by session ID
+result = session_manager.get_result("abc123")
+# Returns: {"action": "opened notepad", "screenshot": "base64..."}
+
+# Retrieve by task name
+result = session_manager.get_result_by_task("demo_task")
+```
+
+**Result Structure Example:**
+
+```json
+{
+ "action_taken": "Opened Notepad and typed 'Hello World'",
+ "screenshot": "base64_encoded_screenshot_data...",
+ "observation": "Notepad window is visible with text 'Hello World'",
+ "success": true,
+ "metadata": {
+ "steps_taken": 3,
+ "execution_time_seconds": 12.5
+ }
+}
+```
+
+### 3. Task Name Mapping
+
+```python
+self.session_id_dict: Dict[str, str] = {}
+```
+
+| Purpose | Structure | Use Case |
+|---------|-----------|----------|
+| Map task names to session IDs | `{task_name: session_id}` | Allow result retrieval by task name instead of session ID |
+
+**Task Name Mapping:**
+
+```python
+# Created during session creation
+self.session_id_dict[task_name] = session_id
+
+# Usage: Get result by task name
+def get_result_by_task(self, task_name: str):
+ with self.lock:
+ session_id = self.session_id_dict.get(task_name)
+ if session_id:
+ return self.get_result(session_id)
+```
+
+**Why This Matters:**
+
+The HTTP API endpoint `/api/task_result/{task_name}` allows clients to check results using the **task name** they provided, without needing to track session IDs:
+
+```bash
+# Client only needs to remember task name
+curl http://localhost:5000/api/task_result/demo_task
+
+# Instead of tracking session ID
+curl http://localhost:5000/api/task_result/abc123
+```
+
+### 4. Running Tasks Tracking
+
+```python
+self._running_tasks: Dict[str, asyncio.Task] = {}
+```
+
+| Purpose | Structure | Use Case |
+|---------|-----------|----------|
+| Track active background tasks for cancellation | `{session_id: asyncio.Task}` | Enable graceful task cancellation when clients disconnect |
+
+**Running Task Management:**
+
+```python
+# Register background task
+task = asyncio.create_task(self._run_session_background(...))
+self._running_tasks[session_id] = task
+
+# Cancel running task
+task = self._running_tasks.get(session_id)
+if task and not task.done():
+ task.cancel()
+ await asyncio.wait_for(task, timeout=2.0)
+
+# Cleanup after completion
+self._running_tasks.pop(session_id, None)
+```
+
+### 5. Cancellation Reasons Tracking
+
+```python
+self._cancellation_reasons: Dict[str, str] = {}
+```
+
+| Purpose | Structure | Lifecycle |
+|---------|-----------|-----------|
+| Store why each task was cancelled | `{session_id: reason}` | From `cancel_task()` to `_run_session_background()` cleanup |
+
+**Cancellation Reason Flow:**
+
+```python
+# Store reason when cancelling
+async def cancel_task(self, session_id: str, reason: str):
+ self._cancellation_reasons[session_id] = reason
+ task.cancel()
+
+# Retrieve reason during cancellation handling
+async def _run_session_background(...):
+ try:
+ await session.run()
+ except asyncio.CancelledError:
+ reason = self._cancellation_reasons.get(session_id, "unknown")
+ if reason == "device_disconnected":
+ # Send callback to constellation
+ elif reason == "constellation_disconnected":
+ # Skip callback
+```
+
+---
+
+### Thread Safety
+
+The SessionManager uses `threading.Lock` for thread-safe access to shared dictionaries:
+
+```python
+def __init__(self):
+ self.lock = threading.Lock()
+
+def get_or_create_session(self, ...):
+ with self.lock:
+ if session_id not in self.sessions:
+ self.sessions[session_id] = session
+ return self.sessions[session_id]
+```
+
+**Why this matters:** Although FastAPI primarily uses async/await (single-threaded event loop), the lock protects against:
+
+- **Thread pool executors** for sync operations
+- **Background tasks** accessing shared state
+- **Future multi-threading** in FastAPI/Uvicorn
+
+**Performance Consideration:**
+
+Lock contention is minimal because:
+
+- Lock is held only for **dictionary operations** (O(1) operations)
+- Session execution happens **outside the lock** (async background tasks)
+- Most operations are **read-heavy** (get_result) which are fast
+
+---
+
+## 🖥 Platform Support
+
+The SessionManager supports both Windows and Linux platforms through the **SessionFactory** abstraction layer. Platform-specific implementations handle OS-specific UI automation and tool execution.
+
+### Platform Detection
+
+```mermaid
+graph TD
+ A[get_or_create_session] --> B{platform_override specified?}
+ B -->|Yes| C[Use specified platform]
+ B -->|No| D[Auto-detect via platform.system]
+ D --> E{OS Detected}
+ E -->|"Windows"| F[platform = 'windows']
+ E -->|"Linux"| G[platform = 'linux']
+ E -->|"Darwin" macOS| H[platform = 'linux' ⚠️ Treated as Linux]
+
+ C --> I[SessionFactory.create_service_session]
+ F --> I
+ G --> I
+ H --> I
+
+ I --> J{Platform?}
+ J -->|windows| K[ServiceSession]
+ J -->|linux| L[LinuxServiceSession]
+
+ style H fill:#ffe0b2
+ style K fill:#c8e6c9
+ style L fill:#bbdefb
+```
+
+**Platform Detection Code:**
+
+```python
+def __init__(self, platform_override: Optional[str] = None):
+ self.platform = platform_override or platform.system().lower()
+ # platform.system() returns: "Windows", "Linux", or "Darwin"
+ self.logger.info(f"SessionManager initialized for platform: {self.platform}")
+```
+
+### Platform-Specific Sessions
+
+| Platform | Session Class | UI Automation | MCP Tools | Status |
+|----------|---------------|---------------|-----------|--------|
+| **Windows** | `ServiceSession` | Win32 API, UI Automation | Windows MCP servers (filesystem, browser, etc.) | Fully Supported |
+| **Linux** | `LinuxServiceSession` | X11/Wayland, AT-SPI | Linux MCP servers | Fully Supported |
+| **macOS (Darwin)** | `LinuxServiceSession` | Currently treated as Linux | Linux MCP servers | ⚠️ Experimental |
+
+**Windows Session Creation:**
+
+```python
+# Explicit Windows platform
+session = session_manager.get_or_create_session(
+ session_id="win_session_001",
+ task_name="windows_task",
+ request="Open File Explorer and navigate to Downloads",
+ task_protocol=task_protocol,
+ platform_override="windows"
+)
+# Creates ServiceSession
+```
+
+**Linux Session Creation:**
+
+```python
+# Explicit Linux platform
+session = session_manager.get_or_create_session(
+ session_id="linux_session_001",
+ task_name="linux_task",
+ request="Open Nautilus and create a new folder",
+ task_protocol=task_protocol,
+ platform_override="linux"
+)
+# Creates LinuxServiceSession
+```
+
+**Auto-Detection:**
+
+```python
+# Let SessionManager detect platform
+session = session_manager.get_or_create_session(
+ session_id="auto_session_001",
+ task_name="auto_task",
+ request="Open text editor",
+ task_protocol=task_protocol,
+ platform_override=None # Auto-detect
+)
+# Uses platform.system() to determine session type
+```
+
+**macOS Limitations:**
+
+macOS (Darwin) is currently treated as Linux, which may result in:
+
+- Incorrect UI automation commands
+- Missing macOS-specific tool integrations
+- ⚠️ Limited functionality
+
+**Recommendation:** Use explicit `platform_override="linux"` for Linux-like behavior, or wait for dedicated macOS session implementation.
+
+---
+
+## 🐛 Error Handling
+
+The SessionManager implements comprehensive error handling to prevent task failures from breaking the server.
+
+### Error Categories
+
+| Error Type | Handler | Behavior | Example |
+|------------|---------|----------|---------|
+| **Session Execution Errors** | `try/except in _run_session_background` | Status = FAILED, error message in results | LLM API timeout, invalid action |
+| **Callback Errors** | `try/except around callback invocation` | Log error, continue execution | WebSocket closed before callback |
+| **Cancellation** | `asyncio.CancelledError handler` | Check reason, conditional callback | Client disconnected mid-task |
+| **Unknown State** | Status check after `session.run()` | Status = FAILED, error = "unknown state" | Session neither finished nor errored |
+
+### Session Execution Error Handling
+
+```python
+async def _run_session_background(...):
+ try:
+ await session.run() # May raise exceptions
+
+ # Determine status
+ if session.is_error():
+ status = TaskStatus.FAILED
+ session.results = session.results or {"failure": "session ended with an error"}
+ elif session.is_finished():
+ status = TaskStatus.COMPLETED
+ else:
+ status = TaskStatus.FAILED
+ error = "Session ended in unknown state"
+
+ except asyncio.CancelledError:
+ # Handle cancellation (see Cancellation section)
+ ...
+
+ except Exception as e:
+ # Catch all other exceptions
+ import traceback
+ traceback.print_exc()
+ self.logger.error(f"Error in session {session_id}: {e}")
+ status = TaskStatus.FAILED
+ error = str(e)
+```
+
+**Error Result Structure:**
+
+When a session fails, the result includes error details:
+
+```json
+{
+ "status": "FAILED",
+ "error": "LLM API timeout after 60 seconds",
+ "session_id": "abc123",
+ "result": {
+ "failure": "session ended with an error",
+ "last_action": "open_notepad",
+ "traceback": "Traceback (most recent call last)..."
+ }
+}
+```
+
+### Callback Error Handling
+
+```python
+try:
+ await callback(session_id, result_message)
+except Exception as e:
+ import traceback
+ self.logger.error(
+ f"Callback error for session {session_id}: {e}\n{traceback.format_exc()}"
+ )
+ # Session results are STILL persisted!
+ # Client may not receive notification
+```
+
+**Callback Failures Don't Fail Sessions:**
+
+If the callback raises an exception (e.g., WebSocket already closed), the SessionManager:
+
+- **Logs the error** for debugging
+- **Persists the results** in `self.results`
+- **Completes cleanup** (removes from `_running_tasks`)
+- **Does NOT re-raise** the exception
+
+**Implication:** Results can be retrieved via `/api/task_result/{task_name}` even if WebSocket notification failed.
+
+### Unknown State Handling
+
+```python
+if session.is_error():
+ status = TaskStatus.FAILED
+elif session.is_finished():
+ status = TaskStatus.COMPLETED
+else:
+ # Unknown state - neither finished nor errored
+ status = TaskStatus.FAILED
+ error = "Session ended in unknown state"
+ self.logger.warning(f"Session {session_id} ended in unknown state")
+```
+
+**Edge Case - Session Hangs:**
+
+If `session.run()` completes but the session is neither `is_finished()` nor `is_error()`, this indicates:
+
+- Possible bug in session state management
+- Incomplete session implementation
+- Unexpected session interruption
+
+The SessionManager marks this as **FAILED** to prevent silent failures.
+
+---
+
+## 💡 Best Practices
+
+Follow these best practices to ensure reliable, scalable session management:
+
+### 1. Configure Appropriate Timeouts
+
+Session timeouts should match task complexity:
+
+ | Task Type | Timeout | Reason |
+ |-----------|---------|--------|
+ | **Simple UI Actions** | 60-120s | Open app, click button, type text |
+ | **Medium Workflows** | 120-300s | Multi-step automation (3-5 steps) |
+ | **Complex Tasks** | 300-600s | Complex workflows requiring LLM reasoning |
+ | **Batch Operations** | 600-1800s | Processing multiple files, data entry |
+
+ ```python
+ # Configure in UFO config
+ ufo_config.system.timeout = 300 # 5 minutes for medium tasks
+ ```
+
+### 2. Monitor Session Count
+
+Sessions consume memory. Implement limits to prevent resource exhaustion:
+
+```python
+MAX_CONCURRENT_SESSIONS = 100 # Adjust based on server resources
+
+async def execute_task_safe(session_manager, ...):
+ active_count = len(session_manager.sessions)
+
+ if active_count >= MAX_CONCURRENT_SESSIONS:
+ # Option 1: Reject new sessions
+ raise HTTPException(
+ status_code=503,
+ detail=f"Server at capacity ({active_count} active sessions)"
+ )
+
+ # Option 2: Cancel oldest sessions
+ oldest_session_id = min(
+ session_manager.sessions.keys(),
+ key=lambda s: session_manager.sessions[s].created_at
+ )
+ await session_manager.cancel_task(
+ oldest_session_id,
+ reason="capacity_limit"
+ )
+
+ # Proceed with new session
+ await session_manager.execute_task_async(...)
+```
+
+### 3. Clean Up Completed Sessions
+
+⚠️ **Memory Leak Prevention:**
+
+Sessions persist in `sessions` dict until explicitly removed. Implement cleanup:
+
+ ```python
+ # Option 1: Cleanup immediately after result retrieval
+ result = session_manager.get_result(session_id)
+ if result:
+ session_manager.remove_session(session_id)
+
+ # Option 2: Periodic cleanup task
+ import asyncio
+
+ async def cleanup_old_sessions(session_manager, max_age_seconds=3600):
+ """Remove sessions older than max_age_seconds."""
+ while True:
+ await asyncio.sleep(300) # Check every 5 minutes
+
+ current_time = time.time()
+ with session_manager.lock:
+ to_remove = []
+ for session_id, session in session_manager.sessions.items():
+ age = current_time - session.created_at
+ if age > max_age_seconds and session_id not in session_manager._running_tasks:
+ to_remove.append(session_id)
+
+ for session_id in to_remove:
+ session_manager.remove_session(session_id)
+ logger.info(f"Cleaned up old session: {session_id}")
+
+ # Start cleanup task on server startup
+asyncio.create_task(cleanup_old_sessions(session_manager))
+```
+
+### 4. Handle Cancellation Gracefully
+
+Different cancellation reasons require different responses:
+
+```python
+async def handle_client_disconnect(client_id, client_type, session_manager, client_manager):
+ """Handle disconnection based on client type."""
+
+ if client_type == ClientType.CONSTELLATION:
+ # Constellation disconnected - cancel all its tasks
+ session_ids = client_manager.get_constellation_sessions(client_id)
+ for session_id in session_ids:
+ await session_manager.cancel_task(
+ session_id,
+ reason="constellation_disconnected" # Don't send callback
+ )
+
+ elif client_type == ClientType.DEVICE:
+ # Device disconnected - notify constellations to reassign
+ session_ids = client_manager.get_device_sessions(client_id)
+ for session_id in session_ids:
+ await session_manager.cancel_task(
+ session_id,
+ reason="device_disconnected" # Send callback to constellation
+ )
+
+ # Clean up client registration
+ client_manager.remove_client(client_id)
+```
+
+### 5. Log Session Lifecycle Events
+
+Log key lifecycle events for debugging and monitoring: ```python
+ # Session creation
+ self.logger.info(f"Created {platform} session: {session_id} (type: {session_type})")
+
+ # Background task start
+ self.logger.info(f"🚀 Started background task {session_id}")
+
+ # Execution timing
+ elapsed = loop.time() - start_time
+ self.logger.info(f"⏱️ Session {session_id} execution took {elapsed:.2f}s")
+
+ # Status determination
+ self.logger.info(f"Session {session_id} finished successfully")
+ self.logger.warning(f"⚠️ Session {session_id} ended with error")
+
+ # Cancellation
+ self.logger.warning(f"🛑 Session {session_id} was cancelled (reason: {reason})")
+
+ # Cleanup
+ self.logger.info(f"Session {session_id} completed with status {status}")
+```
+
+### 6. Implement Result Expiration
+
+Prevent `results` dict from growing indefinitely:
+
+```python
+from collections import OrderedDict
+import time
+
+class SessionManagerWithExpiration(SessionManager):
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+ # Store (result, timestamp) tuples
+ self.results: Dict[str, Tuple[Dict, float]] = {}
+ self.result_ttl = 3600 # 1 hour
+
+ def set_results(self, session_id: str):
+ with self.lock:
+ if session_id in self.sessions:
+ self.results[session_id] = (
+ self.sessions[session_id].results,
+ time.time()
+ )
+
+ def get_result(self, session_id: str):
+ with self.lock:
+ if session_id in self.results:
+ result, timestamp = self.results[session_id]
+ # Check expiration
+ if time.time() - timestamp > self.result_ttl:
+ self.results.pop(session_id)
+ return None
+ return result
+ return None
+```
+
+### 7. Monitor Background Tasks
+
+Monitor background tasks for unexpectedly long execution:
+
+ ```python
+ import asyncio
+
+ async def monitor_long_running_tasks(session_manager, threshold=600):
+ """Alert on tasks running longer than threshold seconds."""
+ while True:
+ await asyncio.sleep(60) # Check every minute
+
+ current_time = asyncio.get_event_loop().time()
+ for session_id, task in session_manager._running_tasks.items():
+ # Calculate task age (approximation)
+ session = session_manager.sessions.get(session_id)
+ if session and hasattr(session, 'start_time'):
+ age = current_time - session.start_time
+ if age > threshold:
+ logger.warning(
+ f"⚠️ Long-running task detected: {session_id} "
+ f"(running for {age:.1f}s)"
+ )
+ ```
+
+---
+
+## 🔗 Integration with Server Components
+
+The SessionManager doesn't operate in isolation—it's deeply integrated with other server components.
+
+### Integration Architecture
+
+```mermaid
+graph TB
+ subgraph "External"
+ HTTP[HTTP API Client]
+ WS_C[WebSocket Client]
+ end
+
+ subgraph "Server Components"
+ API[API Router /api/dispatch]
+ WH[WebSocket Handler]
+ WSM[Client Connection Manager]
+ SM[Session Manager]
+ SF[Session Factory]
+ end
+
+ subgraph "Sessions"
+ WIN[Windows Session]
+ LIN[Linux Session]
+ end
+
+ HTTP -->|POST /api/dispatch| API
+ WS_C -->|WebSocket /ws| WH
+
+ API -->|execute_task_async| SM
+ WH -->|execute_task_async| SM
+
+ SM -->|create session| SF
+ SF -->|windows| WIN
+ SF -->|linux| LIN
+
+ SM -->|add_constellation_session| WSM
+ SM -->|add_device_session| WSM
+
+ SM -->|callback| WH
+ WH -->|TASK_END message| WS_C
+
+ style SM fill:#ffecb3
+ style SF fill:#c8e6c9
+ style WSM fill:#bbdefb
+```
+
+### 1. WebSocket Handler Integration
+
+The WebSocket Handler creates sessions with callbacks to send results back to clients:
+
+```python
+# In WebSocket Handler
+async def handle_task_dispatch(self, session_id, request, client_id):
+ """Handle incoming task from constellation."""
+
+ # Define callback to send results back
+ async def send_result(sid: str, msg: ServerMessage):
+ await self.websocket.send_text(msg.model_dump_json())
+ logger.info(f"Sent TASK_END for {sid}")
+
+ # Execute task with callback
+ await self.session_manager.execute_task_async(
+ session_id=session_id,
+ task_name=f"task_{session_id[:8]}",
+ request=request,
+ task_protocol=self.task_protocol, # AIP protocol instance
+ platform_override=None, # Auto-detect
+ callback=send_result # Register callback
+ )
+```
+
+For more details, see the [WebSocket Handler Documentation](websocket_handler.md).
+
+### 2. Client Connection Manager Integration
+
+The Client Connection Manager tracks which clients own which sessions:
+
+ ```python
+ # Track constellation sessions
+ client_manager.add_constellation_session(
+ constellation_id="constellation_001",
+ session_id="session_abc123"
+ )
+
+ # Track device sessions
+ client_manager.add_device_session(
+ device_id="device_windows_001",
+ session_id="session_abc123"
+ )
+
+ # Retrieve all sessions for a client
+ session_ids = client_manager.get_constellation_sessions("constellation_001")
+
+# On disconnect, cancel all client sessions
+for session_id in session_ids:
+ await session_manager.cancel_task(session_id, reason="client_disconnected")
+```
+
+For more details, see the [Client Connection Manager Documentation](client_connection_manager.md).
+
+### 3. HTTP API Integration
+
+The API router uses SessionManager to retrieve results:
+
+```python
+# In API router (ufo/server/services/api.py)
+@router.post("/api/dispatch")
+async def dispatch_task_api(data: Dict[str, Any]):
+ client_id = data.get("client_id")
+ user_request = data.get("request")
+ task_name = data.get("task_name", str(uuid4()))
+
+ # Get client protocol
+ task_protocol = client_manager.get_task_protocol(client_id)
+ if not task_protocol:
+ raise HTTPException(status_code=404, detail="Client not online")
+
+ session_id = str(uuid4())
+
+ # Use AIP protocol to send task
+ # ... send TASK_ASSIGNMENT via protocol ...
+
+ return {
+ "status": "dispatched",
+ "task_name": task_name,
+ "client_id": client_id,
+ "session_id": session_id
+ }
+
+@router.get("/api/task_result/{task_name}")
+async def get_task_result(task_name: str):
+ # Use SessionManager to retrieve results
+ result = session_manager.get_result_by_task(task_name)
+ if not result:
+ return {"status": "pending"}
+ return {"status": "done", "result": result}
+```
+
+---
+
+## 📖 API Reference
+
+Complete SessionManager API reference:### Initialization
+
+```python
+from ufo.server.services.session_manager import SessionManager
+
+# Initialize with platform override
+manager = SessionManager(platform_override="windows")
+
+# Initialize with auto-detection
+manager = SessionManager(platform_override=None)
+```
+
+**Parameters:**
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `platform_override` | `Optional[str]` | `None` | Platform type (`"windows"`, `"linux"`, or `None` for auto-detect) |
+
+---
+
+### get_or_create_session()
+
+```python
+session = manager.get_or_create_session(
+ session_id="abc123",
+ task_name="demo_task",
+ request="Open Notepad",
+ task_protocol=task_protocol,
+ platform_override="windows",
+ local=False
+)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `session_id` | `str` | Yes | - | Unique session identifier |
+| `task_name` | `Optional[str]` | No | `"test_task"` | Human-readable task name |
+| `request` | `Optional[str]` | No | `None` | User request text |
+| `task_protocol` | `Optional[TaskExecutionProtocol]` | No | `None` | AIP TaskExecutionProtocol instance |
+| `platform_override` | `Optional[str]` | No | `None` | Platform type override |
+| `local` | `bool` | No | `False` | Whether to create local session (for testing) |
+
+**Returns:** `BaseSession` - Platform-specific session instance
+
+---
+
+### execute_task_async()
+
+```python
+session_id = await manager.execute_task_async(
+ session_id="abc123",
+ task_name="demo_task",
+ request="Open Notepad",
+ task_protocol=task_protocol,
+ platform_override="windows",
+ callback=my_callback
+)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `session_id` | `str` | Yes | Session identifier |
+| `task_name` | `str` | Yes | Task name |
+| `request` | `str` | Yes | User request text |
+| `task_protocol` | `Optional[TaskExecutionProtocol]` | No | AIP TaskExecutionProtocol instance |
+| `platform_override` | `str` | Yes | Platform type |
+| `callback` | `Optional[Callable]` | No | Async function called on completion |
+
+**Callback Signature:**
+
+```python
+async def callback(session_id: str, result_message: ServerMessage) -> None:
+ ...
+```
+
+**Returns:** `str` - The session ID (same as input)
+
+---
+
+### cancel_task()
+
+```python
+success = await manager.cancel_task(
+ session_id="abc123",
+ reason="device_disconnected"
+)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `session_id` | `str` | Yes | - | Session to cancel |
+| `reason` | `str` | No | `"constellation_disconnected"` | Cancellation reason |
+
+**Valid Reasons:**
+
+- `"constellation_disconnected"` - Don't send callback
+- `"device_disconnected"` - Send callback to constellation
+- `"user_requested"` - Manual cancellation
+
+**Returns:** `bool` - `True` if task was found and cancelled, `False` otherwise
+
+---
+
+### get_result()
+
+```python
+result = manager.get_result("abc123")
+```
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `session_id` | `str` | Session identifier |
+
+**Returns:** `Optional[Dict[str, Any]]` - Session results dict, or `None` if not found
+
+---
+
+### get_result_by_task()
+
+```python
+result = manager.get_result_by_task("demo_task")
+```
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `task_name` | `str` | Task name |
+
+**Returns:** `Optional[Dict[str, Any]]` - Session results dict, or `None` if not found
+
+---
+
+### set_results()
+
+```python
+manager.set_results("abc123")
+```
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `session_id` | `str` | Session identifier |
+
+**Returns:** `None`
+
+**Purpose:** Persist session results to `results` dict for later retrieval
+
+---
+
+### remove_session()
+
+```python
+manager.remove_session("abc123")
+```
+
+**Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `session_id` | `str` | Session to remove |
+
+**Returns:** `None`
+
+**Purpose:** Remove session from active sessions dict (cleanup)
+
+---
+
+## 📚 Related Documentation
+
+Explore related components to understand the full server architecture:
+
+| Component | Purpose | Link |
+|-----------|---------|------|
+| **Server Overview** | High-level architecture and capabilities | [Overview](./overview.md) |
+| **Quick Start** | Start server and dispatch first task | [Quick Start](./quick_start.md) |
+| **WebSocket Handler** | Message handling and protocol implementation | [WebSocket Handler](./websocket_handler.md) |
+| **Client Connection Manager** | Connection management and client tracking | [Client Connection Manager](./client_connection_manager.md) |
+| **HTTP API** | RESTful API endpoints | [API Reference](./api.md) |
+| **Session Factory** | Session creation patterns | [Session Pool](../infrastructure/modules/session_pool.md) |
+| **AIP Protocol** | Agent Interaction Protocol details | [AIP Overview](../aip/overview.md) |
+
+---
+
+## 🎓 Key Takeaways
+
+After reading this guide, you should understand:
+
+- **Background execution** prevents WebSocket blocking during long tasks
+- **SessionFactory** creates platform-specific sessions (Windows/Linux)
+- **Callbacks** decouple task execution from result delivery
+- **Cancellation reasons** enable context-aware disconnection handling
+- **Thread safety** protects shared state in concurrent environments
+- **State management** uses five separate dicts (sessions, results, task_names, running_tasks, cancellation_reasons)
+- **Best practices** prevent resource exhaustion and memory leaks
+
+**Next Steps:**
+
+- Explore [WebSocket Handler](./websocket_handler.md) to see how sessions are triggered
+- Learn about [AIP Protocol](../aip/overview.md) for task assignment message format
+- Review [Client Connection Manager](./client_connection_manager.md) for session-to-client mapping
+
diff --git a/documents/docs/server/websocket_handler.md b/documents/docs/server/websocket_handler.md
new file mode 100644
index 000000000..7b919064a
--- /dev/null
+++ b/documents/docs/server/websocket_handler.md
@@ -0,0 +1,1135 @@
+# WebSocket Handler
+
+The **UFOWebSocketHandler** is the central nervous system of the server, implementing the Agent Interaction Protocol (AIP) to manage structured, reliable communication between the server and all connected clients.
+
+For context on how this component fits into the server architecture, see the [Server Overview](overview.md).
+
+---
+
+## 🎯 Overview
+
+The WebSocket Handler acts as the protocol orchestrator, managing all aspects of client communication:
+
+| Responsibility | Description | Protocol Used |
+|----------------|-------------|---------------|
+| **Client Registration** | Validate and register new device/constellation connections | AIP Registration Protocol |
+| **Task Dispatch** | Route task requests to appropriate devices | AIP Task Execution Protocol |
+| **Heartbeat Monitoring** | Maintain connection health via periodic pings | AIP Heartbeat Protocol |
+| **Device Info Exchange** | Query and share device capabilities | AIP Device Info Protocol |
+| **Command Results** | Relay execution results from devices to requesters | AIP Message Transport |
+| **Error Handling** | Gracefully handle communication failures | AIP Error Protocol |
+| **Connection Lifecycle** | Manage registration → active → cleanup flow | WebSocket + AIP |
+
+### Architecture Position
+
+```mermaid
+graph TB
+ subgraph "Clients"
+ DC[Device Clients]
+ CC[Constellation Clients]
+ end
+
+ subgraph "Server - WebSocket Handler"
+ WH[UFOWebSocketHandler]
+
+ subgraph "AIP Protocols"
+ REG[Registration Protocol]
+ HB[Heartbeat Protocol]
+ DI[Device Info Protocol]
+ TE[Task Execution Protocol]
+ end
+
+ subgraph "Message Router"
+ MH[handle_message]
+ end
+ end
+
+ subgraph "Server Components"
+ WSM[Client Connection Manager]
+ SM[Session Manager]
+ end
+
+ DC -->|WebSocket /ws| WH
+ CC -->|WebSocket /ws| WH
+
+ WH --> REG
+ WH --> HB
+ WH --> DI
+ WH --> TE
+
+ WH --> MH
+ MH -->|"handle_task_request"| TE
+ MH -->|"handle_heartbeat"| HB
+ MH -->|"handle_device_info_request"| DI
+ MH -->|"handle_command_result"| SM
+
+ WH -->|"add_client / get_client"| WSM
+ WH -->|"execute_task_async"| SM
+
+ style WH fill:#ffecb3
+ style MH fill:#bbdefb
+ style SM fill:#c8e6c9
+ style WSM fill:#f8bbd0
+```
+
+---
+
+## 🔌 AIP Protocol Integration
+
+The handler uses **four specialized AIP protocols**, each handling a specific aspect of communication. This separation of concerns makes the code maintainable and testable. For detailed protocol specifications, see the [AIP Protocol Documentation](../aip/overview.md).
+
+```python
+def __init__(self, client_manager, session_manager, local=False):
+ # Initialize per-connection protocols
+ self.transport = None
+ self.registration_protocol = None
+ self.heartbeat_protocol = None
+ self.device_info_protocol = None
+ self.task_protocol = None
+```
+
+| Protocol | Purpose | Key Methods | Message Types |
+|----------|---------|-------------|---------------|
+| **Registration Protocol** | Client identity and validation | `send_registration_confirmation()`, `send_registration_error()` | `REGISTER`, `REGISTER_CONFIRM` |
+| **Heartbeat Protocol** | Connection health monitoring | `send_heartbeat_ack()` | `HEARTBEAT`, `HEARTBEAT_ACK` |
+| **Device Info Protocol** | Capability exchange | `send_device_info_response()`, `send_device_info_request()` | `DEVICE_INFO_REQUEST`, `DEVICE_INFO_RESPONSE` |
+| **Task Execution Protocol** | Task lifecycle management | `send_task_assignment()`, `send_ack()`, `send_error()` | `TASK`, `TASK_ASSIGNMENT`, `TASK_END` |
+
+**Protocol Initialization per Connection:**
+
+```python
+async def connect(self, websocket: WebSocket) -> str:
+ await websocket.accept()
+
+ # Initialize AIP protocols for this connection
+ self.transport = WebSocketTransport(websocket)
+ self.registration_protocol = RegistrationProtocol(self.transport)
+ self.heartbeat_protocol = HeartbeatProtocol(self.transport)
+ self.device_info_protocol = DeviceInfoProtocol(self.transport)
+ self.task_protocol = TaskExecutionProtocol(self.transport)
+
+ # ... registration flow ...
+```
+
+**Per-Connection Protocol Instances:**
+
+Each WebSocket connection gets its **own set of protocol instances**, ensuring message routing and state management are isolated between clients.
+
+---
+
+## 📝 Client Registration
+
+Registration is the critical first step when a client connects. The handler validates client identity, checks permissions, and establishes the communication session.
+
+### Registration Flow
+
+```mermaid
+sequenceDiagram
+ participant C as Client (Device/Constellation)
+ participant WS as WebSocket Handler
+ participant RP as Registration Protocol
+ participant WSM as Client Connection Manager
+
+ Note over C,WS: 1️⃣ Connection Establishment
+ C->>WS: WebSocket CONNECT /ws
+ WS->>WS: await websocket.accept()
+ WS->>WS: Initialize AIP protocols
+
+ Note over WS,RP: 2️⃣ Registration Message
+ WS->>C: (AIP Transport ready)
+ C->>RP: REGISTER {client_id, client_type, platform, metadata}
+ RP->>RP: Parse & validate JSON
+
+ Note over RP,WS: 3️⃣ Validation
+ RP->>WS: ClientMessage object
+ WS->>WS: Validate client_id exists
+ WS->>WS: Validate message type = REGISTER
+
+ alt Client Type = Constellation
+ WS->>WSM: is_device_connected(target_id)?
+ alt Target device offline
+ WSM-->>WS: False
+ WS->>RP: send_registration_error()
+ RP->>C: ERROR: "Target device not connected"
+ WS->>C: WebSocket close()
+ else Target device online
+ WSM-->>WS: True
+ WS->>WSM: add_client(client_id, ...)
+ end
+ else Client Type = Device
+ WS->>WSM: add_client(client_id, platform, ...)
+ end
+
+ Note over WS,C: 4️⃣ Confirmation
+ WS->>RP: send_registration_confirmation()
+ RP->>C: REGISTER_CONFIRM {status: "success"}
+
+ Note over C: Client registered, ready for tasks
+ C->>C: Start message listening loop
+```
+
+### Registration Steps (Code Walkthrough)
+
+**Step 1: WebSocket Connection Accepted**
+
+```python
+async def connect(self, websocket: WebSocket) -> str:
+ # Accept WebSocket connection
+ await websocket.accept()
+
+ # Initialize AIP protocols for this connection
+ self.transport = WebSocketTransport(websocket)
+ self.registration_protocol = RegistrationProtocol(self.transport)
+ self.heartbeat_protocol = HeartbeatProtocol(self.transport)
+ self.device_info_protocol = DeviceInfoProtocol(self.transport)
+ self.task_protocol = TaskExecutionProtocol(self.transport)
+```
+
+**Step 2: Receive Registration Message**
+
+```python
+async def _parse_registration_message(self) -> ClientMessage:
+ """Parse and validate registration message using AIP Transport."""
+ self.logger.info("[WS] [AIP] Waiting for registration message...")
+
+ # Receive via AIP Transport
+ reg_data = await self.transport.receive()
+ if isinstance(reg_data, bytes):
+ reg_data = reg_data.decode("utf-8")
+
+ # Parse using Pydantic model
+ reg_info = ClientMessage.model_validate_json(reg_data)
+
+ self.logger.info(
+ f"[WS] [AIP] Received registration from {reg_info.client_id}, "
+ f"type={reg_info.client_type}"
+ )
+
+ return reg_info
+```
+
+**Expected Registration Message:**
+
+```json
+{
+ "type": "REGISTER",
+ "client_id": "device_windows_001",
+ "client_type": "DEVICE",
+ "platform": "windows",
+ "metadata": {
+ "hostname": "DESKTOP-ABC123",
+ "os_version": "Windows 11 Pro",
+ "screen_resolution": "1920x1080"
+ }
+}
+```
+
+**Step 3: Validation**
+
+```python
+# Basic validation
+if not reg_info.client_id:
+ raise ValueError("Client ID is required for WebSocket registration")
+if reg_info.type != ClientMessageType.REGISTER:
+ raise ValueError("First message must be a registration message")
+
+# Constellation-specific validation
+if client_type == ClientType.CONSTELLATION:
+ await self._validate_constellation_client(reg_info)
+```
+
+**Constellation Validation:**
+
+```python
+async def _validate_constellation_client(self, reg_info: ClientMessage) -> None:
+ """Validate constellation's claimed target_id."""
+ claimed_device_id = reg_info.target_id
+
+ if not claimed_device_id:
+ return # No device_id to validate
+
+ # Check if target device is connected
+ if not self.client_manager.is_device_connected(claimed_device_id):
+ error_msg = f"Target device '{claimed_device_id}' is not connected"
+ self.logger.warning(f"[WS] Constellation registration failed: {error_msg}")
+
+ # Send error via AIP protocol
+ await self._send_error_response(error_msg)
+ await self.transport.close()
+ raise ValueError(error_msg)
+```
+
+**Step 4: Register Client in [ClientConnectionManager](./client_connection_manager.md)**
+
+```python
+client_type = reg_info.client_type
+platform = reg_info.metadata.get("platform", "windows") if reg_info.metadata else "windows"
+
+# Register in Client Connection Manager
+self.client_manager.add_client(
+ client_id,
+ platform,
+ websocket,
+ client_type,
+ reg_info.metadata,
+ transport=self.transport,
+ task_protocol=self.task_protocol,
+)
+```
+
+**Step 5: Send Confirmation**
+
+```python
+async def _send_registration_confirmation(self) -> None:
+ """Send successful registration confirmation using AIP RegistrationProtocol."""
+ self.logger.info("[WS] [AIP] Sending registration confirmation...")
+ await self.registration_protocol.send_registration_confirmation()
+ self.logger.info("[WS] [AIP] Registration confirmation sent")
+```
+
+**Confirmation Message:**
+
+```json
+{
+ "type": "REGISTER_CONFIRM",
+ "status": "success",
+ "timestamp": "2024-11-04T14:30:22.123456+00:00",
+ "response_id": "uuid-v4"
+}
+```
+
+**Step 6: Log Success**
+
+```python
+def _log_client_connection(self, client_id: str, client_type: ClientType) -> None:
+ """Log successful client connection with appropriate emoji."""
+ if client_type == ClientType.DEVICE:
+ self.logger.info(f"[WS] Registered device client: {client_id}")
+ elif client_type == ClientType.CONSTELLATION:
+ self.logger.info(f"[WS] 🌟 Registered constellation client: {client_id}")
+```
+
+### Validation Rules
+
+| Validation | Check | Error Message | Action |
+|------------|-------|---------------|--------|
+| **Client ID Presence** | `client_id` field exists and not empty | `"Client ID is required"` | Reject connection |
+| **Message Type** | First message type == `REGISTER` | `"First message must be a registration message"` | Reject connection |
+| **Target Device (Constellation)** | If `target_id` specified, device must be online | `"Target device '' is not connected"` | Send error + close |
+| **Client ID Uniqueness** | No existing client with same ID | Handled by ClientConnectionManager | Disconnect old connection |
+
+**Constellation Dependency:**
+
+Constellations **must** specify a valid `target_id` that refers to an already-connected device. If the device is offline or doesn't exist, registration fails immediately.
+
+**Workaround:** Connect devices first, then constellations.
+
+**Security Consideration:**
+ The current implementation does **not** authenticate clients. Any client can register with any `client_id`. For production deployments:
+
+ - Implement authentication tokens in `metadata`
+ - Validate client certificates (TLS client auth)
+ - Use API keys or OAuth tokens
+ - Whitelist allowed `client_id` patterns
+
+---
+
+## 📨 Message Handling
+
+After registration, the handler enters a message loop, routing incoming client messages to specialized handlers based on message type.
+
+### Message Dispatcher
+
+```mermaid
+graph TB
+ WS[WebSocket receive_text] --> Parse[Parse ClientMessage JSON]
+ Parse --> Router{Message Type?}
+
+ Router -->|TASK| HT[handle_task_request]
+ Router -->|COMMAND_RESULTS| HC[handle_command_result]
+ Router -->|HEARTBEAT| HH[handle_heartbeat]
+ Router -->|ERROR| HE[handle_error]
+ Router -->|DEVICE_INFO_REQUEST| HD[handle_device_info_request]
+ Router -->|DEVICE_INFO_RESPONSE| HDR[handle_device_info_response]
+ Router -->|Unknown| HU[handle_unknown]
+
+ HT --> SM[Session Manager]
+ HC --> CD[Command Dispatcher]
+ HH --> HP[Heartbeat Protocol]
+ HE --> Log[Error Logging]
+ HD --> DIP[Device Info Protocol]
+
+ style Router fill:#ffe0b2
+ style SM fill:#c8e6c9
+ style HP fill:#bbdefb
+```
+
+**Dispatcher Implementation:**
+
+```python
+async def handle_message(self, msg: str, websocket: WebSocket) -> None:
+ """Dispatch incoming messages to specific handlers."""
+ try:
+ # Parse message using Pydantic model
+ data = ClientMessage.model_validate_json(msg)
+
+ client_id = data.client_id
+ client_type = data.client_type
+ msg_type = data.type
+
+ # Route to appropriate handler
+ if msg_type == ClientMessageType.TASK:
+ await self.handle_task_request(data, websocket)
+ elif msg_type == ClientMessageType.COMMAND_RESULTS:
+ await self.handle_command_result(data)
+ elif msg_type == ClientMessageType.HEARTBEAT:
+ await self.handle_heartbeat(data, websocket)
+ elif msg_type == ClientMessageType.ERROR:
+ await self.handle_error(data, websocket)
+ elif msg_type == ClientMessageType.DEVICE_INFO_REQUEST:
+ await self.handle_device_info_request(data, websocket)
+ elif msg_type == ClientMessageType.DEVICE_INFO_RESPONSE:
+ await self.handle_device_info_response(data, websocket)
+ else:
+ await self.handle_unknown(data, websocket)
+
+ except Exception as e:
+ self.logger.error(f"Error handling message: {e}")
+ try:
+ await self.task_protocol.send_error(str(e))
+ except (ConnectionError, IOError):
+ pass # Connection already closed
+```
+
+**Message Type Handlers:**
+
+| Handler | Triggered By | Purpose | Response |
+|---------|-------------|---------|----------|
+| `handle_task_request` | `TASK` | Client requests task execution | `TASK_ASSIGNMENT` device |
+| `handle_command_result` | `COMMAND_RESULTS` | Device reports command execution result | Unblock command dispatcher |
+| `handle_heartbeat` | `HEARTBEAT` | Connection health ping | `HEARTBEAT_ACK` |
+| `handle_error` | `ERROR` | Client reports error | Log + send error acknowledgment |
+| `handle_device_info_request` | `DEVICE_INFO_REQUEST` | Constellation queries device capabilities | `DEVICE_INFO_RESPONSE` |
+| `handle_device_info_response` | `DEVICE_INFO_RESPONSE` | Device provides info (pull model) | Store in ClientConnectionManager |
+| `handle_unknown` | Any other type | Unknown/unsupported message | Log warning + send error |
+
+---
+
+### Task Request Handling
+
+The handler supports task requests from **both device clients** (self-execution) and **constellation clients** (orchestrated execution on target devices).
+
+**Task Request Flow:**
+
+```mermaid
+sequenceDiagram
+ participant C as Constellation
+ participant WH as WebSocket Handler
+ participant WSM as Client Connection Manager
+ participant SM as Session Manager
+ participant D as Device
+
+ Note over C,WH: 1️⃣ Task Request
+ C->>WH: TASK {request, target_id, session_id}
+ WH->>WH: Validate target_id
+
+ Note over WH,WSM: 2️⃣ Resolve Target Device
+ WH->>WSM: get_client(target_id)
+ WSM-->>WH: Device WebSocket
+ WH->>WSM: get_client_info(target_id)
+ WSM-->>WH: {platform: "windows"}
+
+ Note over WH,SM: 3️⃣ Create Session
+ WH->>SM: execute_task_async( session_id, request, target_ws, platform, callback )
+ SM-->>WH: session_id (non-blocking!)
+
+ Note over WH,C: 4️⃣ Immediate Acknowledgment
+ WH->>C: ACK {session_id, status: "dispatched"}
+
+ Note over SM,D: 5️⃣ Background Execution
+ SM->>D: TASK_ASSIGNMENT {request, session_id}
+ D->>D: Execute task (LLM + actions)
+
+ Note over D,WH: 6️⃣ Result Callback
+ D-->>SM: Task complete
+ SM->>WH: callback(session_id, result_msg)
+ WH->>C: TASK_END {status, result}
+```
+
+**Device Client Self-Execution:**
+
+When a **device** requests a task for itself:
+
+```python
+async def handle_task_request(self, data: ClientMessage, websocket: WebSocket):
+ client_id = data.client_id
+ client_type = data.client_type
+
+ if client_type == ClientType.DEVICE:
+ # Device executing task on itself
+ target_ws = websocket # Use requesting client's WebSocket
+ platform = self.client_manager.get_client_info(client_id).platform
+ target_device_id = client_id
+ # ...
+```
+
+**Constellation Orchestrated Execution:**
+
+When a **constellation** dispatches a task to a target device:
+
+```python
+async def handle_task_request(self, data: ClientMessage, websocket: WebSocket):
+ client_id = data.client_id
+ client_type = data.client_type
+
+ if client_type == ClientType.CONSTELLATION:
+ # Constellation dispatching to target device
+ target_device_id = data.target_id
+ target_ws = self.client_manager.get_client(target_device_id)
+ platform = self.client_manager.get_client_info(target_device_id).platform
+
+ # Validate target device exists
+ if not target_ws:
+ raise ValueError(f"Target device '{target_device_id}' not connected")
+
+ # Track session mappings
+ session_id = data.session_id or str(uuid.uuid4())
+ self.client_manager.add_constellation_session(client_id, session_id)
+ self.client_manager.add_device_session(target_device_id, session_id)
+ # ...
+ ```
+
+**Background Task Execution:**
+
+```python
+# Define callback for result delivery
+async def send_result(sid: str, result_msg: ServerMessage):
+ """Send result back to requester when task completes."""
+ # Send to constellation client
+ if client_type == ClientType.CONSTELLATION:
+ if websocket.client_state == WebSocketState.CONNECTED:
+ await websocket.send_text(result_msg.model_dump_json())
+
+ # Also send to target device (optional)
+ if target_ws and target_ws.client_state == WebSocketState.CONNECTED:
+ await target_ws.send_text(result_msg.model_dump_json())
+ else:
+ # Send to device client
+ if websocket.client_state == WebSocketState.CONNECTED:
+ await websocket.send_text(result_msg.model_dump_json())
+
+# Execute in background via SessionManager
+await self.session_manager.execute_task_async(
+ session_id=session_id,
+ task_name=task_name,
+ request=data.request,
+ websocket=target_ws, # Device WebSocket for command dispatcher
+ platform_override=platform,
+ callback=send_result # Called when task completes
+)
+
+# Send immediate acknowledgment (non-blocking)
+await self.task_protocol.send_ack(session_id=session_id)
+```
+
+**Why Immediate ACK?**
+
+The handler sends an **immediate ACK** after dispatching the task to the [SessionManager](./session_manager.md). This confirms:
+
+- Task was received and validated
+- Session was created successfully
+- Task is now executing in background
+
+The actual task result is delivered later via the `send_result` callback.
+
+**Session Tracking:**
+
+| Client Type | Session Tracking | Purpose |
+|-------------|------------------|---------|
+| **Device** | `client_manager.add_device_session(device_id, session_id)` | Track which device is executing the session |
+| **Constellation** | `client_manager.add_constellation_session(constellation_id, session_id)` | Track which constellation requested the session |
+| Both | Session Manager stores session `BaseSession` object | Execute and manage task lifecycle |
+
+---
+
+### Command Result Handling
+
+When a device executes a command (e.g., "click button", "type text"), it sends results back to the server for processing by the session's command dispatcher.
+
+**Command Result Flow:**
+
+```mermaid
+sequenceDiagram
+ participant S as Session (on server)
+ participant CD as Command Dispatcher
+ participant D as Device
+ participant WH as WebSocket Handler
+
+ Note over S,D: Session is running on server
+ S->>CD: execute_command("open_notepad")
+ CD->>D: Send command via WebSocket (response_id="cmd_123")
+ CD->>CD: await response (blocking)
+
+ Note over D: Device executes command
+ D->>D: Open Notepad application
+ D->>D: Take screenshot
+
+ Note over D,WH: Send result back
+ D->>WH: COMMAND_RESULTS {response_id="cmd_123", result, screenshot}
+ WH->>WH: handle_command_result()
+ WH->>CD: set_result(response_id, data)
+
+ Note over CD: Unblocks await!
+ CD-->>S: Return command result
+ S->>S: Continue session execution
+```
+
+**Handler Implementation:**
+
+```python
+async def handle_command_result(self, data: ClientMessage):
+ """
+ Handle command execution results from devices.
+ Unblocks the command dispatcher waiting for this response.
+ """
+ response_id = data.prev_response_id # ID of the command request
+ session_id = data.session_id
+
+ self.logger.debug(
+ f"[WS] Received command result for response_id={response_id}, "
+ f"session_id={session_id}"
+ )
+
+ # Get session's command dispatcher
+ session = self.session_manager.get_or_create_session(session_id)
+ command_dispatcher = session.context.command_dispatcher
+
+ # Set result (unblocks waiting dispatcher)
+ await command_dispatcher.set_result(response_id, data)
+
+ self.logger.debug(
+ f"[WS] Command result set for response_id={response_id}"
+ )
+```
+
+**Critical for Session Execution:**
+
+Without proper command result handling, sessions would **hang indefinitely** waiting for device responses. The `set_result()` call is what unblocks the `await` in the command dispatcher.
+
+---
+
+### Heartbeat Handling
+
+Heartbeats are lightweight ping/pong messages that ensure the WebSocket connection is alive and healthy.
+
+```python
+async def handle_heartbeat(self, data: ClientMessage, websocket: WebSocket) -> None:
+ """Handle heartbeat messages using AIP HeartbeatProtocol."""
+ self.logger.debug(f"[WS] [AIP] Heartbeat from {data.client_id}")
+
+ try:
+ # Send acknowledgment via AIP protocol
+ await self.heartbeat_protocol.send_heartbeat_ack()
+ self.logger.debug(f"[WS] [AIP] Heartbeat response sent to {data.client_id}")
+ except (ConnectionError, IOError) as e:
+ # Connection closed - log but don't fail
+ self.logger.debug(f"[WS] [AIP] Could not send heartbeat ack: {e}")
+```
+
+**Heartbeat Message:**
+
+```json
+{
+ "type": "HEARTBEAT",
+ "client_id": "device_windows_001",
+ "timestamp": "2024-11-04T14:30:22.123456+00:00"
+}
+```
+
+**Heartbeat ACK:**
+
+```json
+{
+ "type": "HEARTBEAT_ACK",
+ "timestamp": "2024-11-04T14:30:22.234567+00:00",
+ "response_id": "uuid-v4"
+}
+```
+
+**Heartbeat Best Practices:**
+
+- **Frequency:** Clients should send heartbeats every **30-60 seconds**
+- **Timeout:** Server should consider connection dead after **2-3 missed heartbeats**
+- **Lightweight:** Heartbeat messages are small and processed quickly
+- **Non-blocking:** Heartbeat handling doesn't block task execution
+
+---
+
+### Device Info Handling
+
+Constellations can query device capabilities (screen resolution, installed apps, OS version) to make intelligent task routing decisions.
+
+**Device Info Request Flow:**
+
+```mermaid
+sequenceDiagram
+ participant C as Constellation
+ participant WH as WebSocket Handler
+ participant WSM as Client Connection Manager
+ participant DIP as Device Info Protocol
+ participant D as Device
+
+ Note over C,WH: 1️⃣ Request Device Info
+ C->>WH: DEVICE_INFO_REQUEST {target_id, request_id}
+
+ Note over WH,WSM: 2️⃣ Resolve Device
+ WH->>WSM: get_client(target_id)
+ WSM-->>WH: Device WebSocket
+
+ Note over WH,D: 3️⃣ Forward Request
+ WH->>DIP: send_device_info_request()
+ DIP->>D: DEVICE_INFO_REQUEST
+
+ Note over D: 4️⃣ Collect Info
+ D->>D: Gather system info (screen, OS, apps)
+
+ Note over D,WH: 5️⃣ Response
+ D->>DIP: DEVICE_INFO_RESPONSE {screen_res, os_version, ...}
+ DIP->>WH: Parse response
+
+ Note over WH,C: 6️⃣ Forward to Constellation
+ WH->>C: DEVICE_INFO_RESPONSE {device_info, request_id}
+```
+
+```python
+async def handle_device_info_request(
+ self, data: ClientMessage, websocket: WebSocket
+) -> None:
+ """Handle device info requests from constellations."""
+ device_id = data.target_id
+ request_id = data.request_id
+
+ self.logger.info(
+ f"[WS] Constellation {data.client_id} requesting info for device {device_id}"
+ )
+
+ # Get device info (may involve querying the device)
+ device_info = await self.get_device_info(device_id)
+
+ # Send via AIP protocol
+ await self.device_info_protocol.send_device_info_response(
+ device_info=device_info,
+ request_id=request_id
+ )
+```
+
+**Device Info Structure:**
+
+```json
+{
+ "device_id": "device_windows_001",
+ "platform": "windows",
+ "os_version": "Windows 11 Pro 22H2",
+ "screen_resolution": "1920x1080",
+ "installed_applications": ["Chrome", "Excel", "Notepad", "..."],
+ "capabilities": ["ui_automation", "file_operations", "web_browsing"],
+ "cpu_cores": 8,
+ "memory_gb": 16
+}
+```
+
+---
+
+## 🔌 Client Disconnection
+
+**Critical Cleanup Process:**
+
+When a client disconnects (gracefully or abruptly), the handler must clean up sessions, remove registry entries, and prevent resource leaks.
+
+### Disconnection Detection
+
+```python
+async def handler(self, websocket: WebSocket) -> None:
+ """FastAPI WebSocket entry point."""
+ client_id = None
+
+ try:
+ # Registration
+ client_id = await self.connect(websocket)
+
+ # Message loop
+ while True:
+ msg = await websocket.receive_text()
+ asyncio.create_task(self.handle_message(msg, websocket))
+
+ except WebSocketDisconnect as e:
+ # Normal disconnection
+ self.logger.warning(
+ f"[WS] {client_id} disconnected code={e.code}, reason={e.reason}"
+ )
+ if client_id:
+ await self.disconnect(client_id)
+
+ except Exception as e:
+ # Unexpected error
+ self.logger.error(f"[WS] Error with client {client_id}: {e}")
+ if client_id:
+ await self.disconnect(client_id)
+```
+
+### Cleanup Process
+
+```mermaid
+graph TD
+ A[Client Disconnects] --> B{Get Client Info}
+ B --> C{Client Type?}
+
+ C -->|Device| D[Get Device Sessions]
+ C -->|Constellation| E[Get Constellation Sessions]
+
+ D --> F[Cancel Each Session reason='device_disconnected']
+ E --> G[Cancel Each Session reason='constellation_disconnected']
+
+ F --> H[Remove Device Session Mappings]
+ G --> I[Remove Constellation Session Mappings]
+
+ H --> J[Remove Client from ClientConnectionManager]
+ I --> J
+
+ J --> K[Log Disconnection]
+ K --> L[Cleanup Complete]
+
+ style F fill:#ffcdd2
+ style G fill:#ffcdd2
+ style J fill:#c8e6c9
+```
+
+**Device Client Cleanup:**
+
+```python
+async def disconnect(self, client_id: str) -> None:
+ """Handle client disconnection and cleanup."""
+ client_info = self.client_manager.get_client_info(client_id)
+
+ if client_info and client_info.client_type == ClientType.DEVICE:
+ # Get all sessions running on this device
+ session_ids = self.client_manager.get_device_sessions(client_id)
+
+ if session_ids:
+ self.logger.info(
+ f"[WS] 📱 Device {client_id} disconnected, "
+ f"cancelling {len(session_ids)} active session(s)"
+ )
+
+ # Cancel all sessions
+ for session_id in session_ids:
+ try:
+ await self.session_manager.cancel_task(
+ session_id,
+ reason="device_disconnected" # Send callback to constellation
+ )
+ except Exception as e:
+ self.logger.error(f"Error cancelling session {session_id}: {e}")
+
+ # Clean up mappings
+ self.client_manager.remove_device_sessions(client_id)
+```
+
+**Constellation Client Cleanup:**
+
+```python
+if client_info and client_info.client_type == ClientType.CONSTELLATION:
+ # Get all sessions initiated by constellation
+ session_ids = self.client_manager.get_constellation_sessions(client_id)
+
+ if session_ids:
+ self.logger.info(
+ f"[WS] 🌟 Constellation {client_id} disconnected, "
+ f"cancelling {len(session_ids)} active session(s)"
+ )
+
+ # Cancel all associated sessions
+ for session_id in session_ids:
+ try:
+ await self.session_manager.cancel_task(
+ session_id,
+ reason="constellation_disconnected" # Don't send callback
+ )
+ except Exception as e:
+ self.logger.error(f"Error cancelling session {session_id}: {e}")
+
+ # Clean up mappings
+ self.client_manager.remove_constellation_sessions(client_id)
+```
+
+**Final Registry Cleanup:**
+
+```python
+# Remove client from registry
+self.client_manager.remove_client(client_id)
+self.logger.info(f"[WS] {client_id} disconnected")
+```
+
+### Cancellation Behavior Comparison
+
+| Scenario | Cancellation Reason | Callback Sent? | Why? |
+|----------|---------------------|----------------|------|
+| **Device Disconnects** | `device_disconnected` | Yes Constellation | Notify orchestrator to reassign task |
+| **Constellation Disconnects** | `constellation_disconnected` | No | Requester is gone, no one to notify |
+
+**Proper Cleanup is Critical:**
+
+Failing to clean up disconnected clients leads to:
+
+- **Orphaned sessions** consuming server memory
+- **Stale WebSocket references** causing errors
+- **Registry pollution** with non-existent clients
+- **Resource leaks** (file handles, memory)
+
+---
+
+## 🚨 Error Handling
+
+The handler implements comprehensive error handling to prevent failures from cascading and breaking the entire server.
+
+### Error Categories
+
+| Error Type | Handler Location | Recovery Strategy |
+|------------|------------------|-------------------|
+| **Connection Errors** | `send_*` methods | Log and skip (connection already closed) |
+| **Message Parsing Errors** | `handle_message` | Send error response via AIP |
+| **Task Execution Errors** | `handle_task_request` | Log + send error via task protocol |
+| **Validation Errors** | `_validate_*` methods | Send error + close connection |
+| **Callback Errors** | Session Manager | Log but don't fail session |
+
+### Connection Error Handling
+
+```python
+async def handle_heartbeat(self, data: ClientMessage, websocket: WebSocket):
+ try:
+ await self.heartbeat_protocol.send_heartbeat_ack()
+ except (ConnectionError, IOError) as e:
+ # Connection closed - log but don't fail
+ self.logger.debug(f"Could not send heartbeat ack: {e}")
+ # Don't raise - connection is already closed
+```
+
+**Why Catch and Ignore?**
+
+When a connection is abruptly closed, attempts to send messages will raise `ConnectionError`. Since the client is already gone, there's no point in propagating the error—just log it and continue cleanup.
+
+### Message Parsing Errors
+
+```python
+async def handle_message(self, msg: str, websocket: WebSocket):
+ try:
+ data = ClientMessage.model_validate_json(msg)
+ # ... route to handlers ...
+
+ except Exception as e:
+ import traceback
+ traceback.print_exc()
+ self.logger.error(f"Error handling message: {e}")
+
+ # Try to send error response
+ try:
+ await self.task_protocol.send_error(str(e))
+ except (ConnectionError, IOError) as send_error:
+ self.logger.debug(f"Could not send error response: {send_error}")
+```
+
+**Error Message Format:**
+
+```json
+{
+ "type": "ERROR",
+ "error": "Invalid message format: missing required field 'client_id'",
+ "timestamp": "2024-11-04T14:30:22.123456+00:00",
+ "response_id": "uuid-v4"
+}
+```
+
+### Task Execution Errors
+
+```python
+async def handle_task_request(self, data: ClientMessage, websocket: WebSocket):
+ try:
+ # Validate target device
+ if client_type == ClientType.CONSTELLATION:
+ target_ws = self.client_manager.get_client(target_device_id)
+ if not target_ws:
+ raise ValueError(f"Target device '{target_device_id}' not connected")
+
+ # Execute task
+ await self.session_manager.execute_task_async(...)
+
+ except Exception as e:
+ self.logger.error(f"Error handling task: {e}")
+ await self.task_protocol.send_error(str(e))
+```
+
+### Validation Errors with Connection Closure
+
+```python
+async def _validate_constellation_client(self, reg_info: ClientMessage) -> None:
+ """Validate constellation's target device."""
+ claimed_device_id = reg_info.target_id
+
+ if not self.client_manager.is_device_connected(claimed_device_id):
+ error_msg = f"Target device '{claimed_device_id}' is not connected"
+ self.logger.warning(f"Constellation registration failed: {error_msg}")
+
+ # Send error via AIP protocol
+ await self._send_error_response(error_msg)
+
+ # Close connection immediately
+ await self.transport.close()
+
+ # Raise to prevent further processing
+ raise ValueError(error_msg)
+```
+
+**When to Close Connections:**
+
+Close connections immediately for:
+
+- **Invalid registration** (missing client_id, wrong message type)
+- **Authorization failures** (target device not connected for constellations)
+- **Protocol violations** (sending TASK before REGISTER)
+
+For other errors, log and send error messages but **keep connection alive**.
+
+---
+
+## Best Practices
+
+### 1. Validate Early and Thoroughly
+
+```python
+# Good: Validate immediately after parsing
+async def handle_task_request(self, data: ClientMessage, websocket: WebSocket):
+ if not data.request:
+ raise ValueError("Task request cannot be empty")
+ if not data.client_id:
+ raise ValueError("Client ID required")
+ if data.client_type == ClientType.CONSTELLATION and not data.target_id:
+ raise ValueError("Constellation must specify target_id")
+ # ... proceed with validated data ...
+```
+
+### 2. Always Check Connection State Before Sending
+
+```python
+from starlette.websockets import WebSocketState
+
+# Good: Check state before sending
+async def send_result(sid: str, result_msg: ServerMessage):
+ if websocket.client_state == WebSocketState.CONNECTED:
+ await websocket.send_text(result_msg.model_dump_json())
+ else:
+ self.logger.debug(f"Cannot send result, connection closed for {sid}")
+```
+
+**WebSocket States:**
+
+| State | Description | Can Send? |
+|-------|-------------|-----------|
+| `CONNECTING` | Handshake in progress | No |
+| `CONNECTED` | Active connection | Yes |
+| `DISCONNECTED` | Connection closed | No |
+
+### 3. Handle Cancellation Gracefully with Context
+
+```python
+# Good: Different reasons need different handling
+async def disconnect(self, client_id: str):
+ client_info = self.client_manager.get_client_info(client_id)
+
+ if client_info.client_type == ClientType.CONSTELLATION:
+ reason = "constellation_disconnected" # Don't send callback
+ else:
+ reason = "device_disconnected" # Send callback to constellation
+
+ for session_id in session_ids:
+ await self.session_manager.cancel_task(session_id, reason=reason)
+```
+
+### 4. Use Structured Logging with Context
+
+```python
+# Good: Include client type and context
+if client_type == ClientType.CONSTELLATION:
+ self.logger.info(
+ f"[WS] 🌟 Constellation {client_id} requesting task on {target_id}"
+ )
+else:
+ self.logger.debug(
+ f"[WS] 📱 Received device message from {client_id}, type: {data.type}"
+ )
+```
+
+**Logging Levels:**
+
+- `DEBUG`: Heartbeats, message routing, low-level protocol details
+- `INFO`: Registration, disconnection, task dispatch, major lifecycle events
+- `WARNING`: Validation failures, connection issues, recoverable errors
+- `ERROR`: Unexpected exceptions, critical failures
+
+### 5. Implement Async Message Handling
+
+```python
+# Good: Process messages in background tasks
+async def handler(self, websocket: WebSocket):
+ while True:
+ msg = await websocket.receive_text()
+ asyncio.create_task(self.handle_message(msg, websocket))
+ # Loop continues immediately, doesn't wait for handler to finish
+```
+
+**Why `asyncio.create_task`?**
+
+Without `create_task`, the handler would process messages **sequentially**, blocking new messages while handling the current one. This is problematic for:
+
+- Long-running task dispatches
+- Command result processing
+- Device info queries
+
+Background tasks allow **concurrent message processing** while keeping the receive loop responsive.
+
+---
+
+## 📚 Related Documentation
+
+Explore related components to understand the full server architecture:
+
+| Component | Purpose | Link |
+|-----------|---------|------|
+| **Server Overview** | High-level architecture and capabilities | [Overview](./overview.md) |
+| **Quick Start** | Start server and dispatch first task | [Quick Start](./quick_start.md) |
+| **Session Manager** | Session lifecycle and background execution | [Session Manager](./session_manager.md) |
+| **Client Connection Manager** | Connection registry and session tracking | [Client Connection Manager](./client_connection_manager.md) |
+| **HTTP API** | RESTful API endpoints | [API Reference](./api.md) |
+| **AIP Protocol** | Agent Interaction Protocol details | [AIP Overview](../aip/overview.md) |
+
+---
+
+## 🎓 What You Learned
+
+After reading this guide, you should understand:
+
+- **AIP Protocol Integration** - Four specialized protocols handle different communication aspects
+- **Registration Flow** - Validation → Registration → Confirmation
+- **Message Routing** - Central dispatcher routes messages to specialized handlers
+- **Dual Client Support** - Devices (self-execution) vs. Constellations (orchestration)
+- **Background Task Dispatch** - Immediate ACK + async execution
+- **Command Result Handling** - Unblocks command dispatcher waiting for device responses
+- **Heartbeat Monitoring** - Lightweight connection health checks
+- **Disconnection Cleanup** - Context-aware session cancellation and registry cleanup
+- **Error Handling** - Graceful degradation without cascading failures
+
+**Next Steps:**
+
+- Explore [Session Manager](./session_manager.md) to understand background execution internals
+- Learn about [Client Connection Manager](./client_connection_manager.md) for client registry management
+- Review [AIP Protocol Documentation](../aip/overview.md) for message format specifications
+
diff --git a/documents/docs/supported_models/azure_openai.md b/documents/docs/supported_models/azure_openai.md
deleted file mode 100644
index 964f9c387..000000000
--- a/documents/docs/supported_models/azure_openai.md
+++ /dev/null
@@ -1,31 +0,0 @@
-# Azure OpenAI (AOAI)
-
-## Step 1
-To use the Azure OpenAI API, you need to create an account on the [Azure OpenAI website](https://azure.microsoft.com/en-us/products/ai-services/openai-service). After creating an account, you can deploy the AOAI API and access the API key.
-
-## Step 2
-After obtaining the API key, you can configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "aoai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
-API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
-API_KEY: "YOUR_KEY", # The aoai API key
-API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
-API_MODEL: "gpt-4-vision-preview", # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
-API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
-```
-
-If you want to use AAD for authentication, you should also set the following configuration:
-
-```yaml
- AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
- AAD_API_SCOPE: "YOUR_SCOPE", # Set the value to your scope for the llm model
- AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE
-```
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_DEPLOYMENT_ID` supports visual inputs.
-
-## Step 3
-After configuring the `HOST_AGENT` and `APP_AGENT` with the OpenAI API, you can start using UFO to interact with the AOAI API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
\ No newline at end of file
diff --git a/documents/docs/supported_models/claude.md b/documents/docs/supported_models/claude.md
deleted file mode 100644
index 08da73916..000000000
--- a/documents/docs/supported_models/claude.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Anthropic Claude
-
-## Step 1
-To use the Claude API, you need to create an account on the [Claude website](https://www.anthropic.com/) and access the API key.
-
-## Step 2
-You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command:
-
-```bash
-pip install -U anthropic==0.37.1
-```
-
-## Step 3
-Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Claude API. The following is an example configuration for the Claude API:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "Claude" ,
-API_KEY: "YOUR_KEY",
-API_MODEL: "YOUR_MODEL"
-```
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
-!!! tip
- `API_MODEL` is the model name of Claude LLM API. You can find the model name in the [Claude LLM model](https://www.anthropic.com/pricing#anthropic-api) list.
-
-## Step 4
-After configuring the `HOST_AGENT` and `APP_AGENT` with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
\ No newline at end of file
diff --git a/documents/docs/supported_models/custom_model.md b/documents/docs/supported_models/custom_model.md
deleted file mode 100644
index a82485e5c..000000000
--- a/documents/docs/supported_models/custom_model.md
+++ /dev/null
@@ -1,49 +0,0 @@
-# Customized LLM Models
-
-We support and welcome the integration of custom LLM models in UFO. If you have a custom LLM model that you would like to use with UFO, you can follow the steps below to configure the model in UFO.
-
-## Step 1
- Create a custom LLM model and serve it on your local environment.
-
-## Step 2
- Create a python script under the `ufo/llm` directory, and implement your own LLM model class by inheriting the `BaseService` class in the `ufo/llm/base.py` file. We leave a `PlaceHolderService` class in the `ufo/llm/placeholder.py` file as an example. You must implement the `chat_completion` method in your LLM model class to accept a list of messages and return a list of completions for each message.
-
-```python
-def chat_completion(
- self,
- messages,
- n,
- temperature: Optional[float] = None,
- max_tokens: Optional[int] = None,
- top_p: Optional[float] = None,
- **kwargs: Any,
-):
- """
- Generates completions for a given list of messages.
- Args:
- messages (List[str]): The list of messages to generate completions for.
- n (int): The number of completions to generate for each message.
- temperature (float, optional): Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used.
- max_tokens (int, optional): The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used.
- top_p (float, optional): Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used.
- **kwargs: Additional keyword arguments to be passed to the underlying completion method.
- Returns:
- List[str], None:A list of generated completions for each message and the cost set to be None.
- Raises:
- Exception: If an error occurs while making the API request.
- """
- pass
-```
-
-## Step 3
-After implementing the LLM model class, you can configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the custom LLM model. The following is an example configuration for the custom LLM model:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "custom_model" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
-API_BASE: "YOUR_ENDPOINT", # The custom LLM API address.
-API_MODEL: "YOUR_MODEL", # The custom LLM model name.
-```
-
-## Step 4
-After configuring the `HOST_AGENT` and `APP_AGENT` with the custom LLM model, you can start using UFO to interact with the custom LLM model for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
\ No newline at end of file
diff --git a/documents/docs/supported_models/deepseek.md b/documents/docs/supported_models/deepseek.md
deleted file mode 100644
index 8cc10fc95..000000000
--- a/documents/docs/supported_models/deepseek.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# DeepSeek Model
-
-## Step 1
-DeepSeek is developed by Alibaba DAMO Academy. To use the DeepSeek models, Go to [DeepSeek](https://www.deepseek.com/) and register an account and get the API key.
-
-## Step 2
-Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the DeepSeek model. The following is an example configuration for the DeepSeek model:
-
-```yaml
- VISUAL_MODE: False, # Whether to use visual mode to understand screenshots and take actions
- API_TYPE: "deepseek" , # The API type, "deepseek" for the DeepSeek model.
- API_KEY: "YOUR_KEY", # The DeepSeek API key
- API_MODEL: "YOUR_MODEL" # The DeepSeek model name
-```
-
-!!! tip
- Most DeepSeek models don't support visual inputs, rembmer to set `VISUAL_MODE` to `False`.
-
-## Step 3
-After configuring the `HOST_AGENT` and `APP_AGENT` with the DeepSeek model, you can start using UFO to interact with the DeepSeek model for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
diff --git a/documents/docs/supported_models/gemini.md b/documents/docs/supported_models/gemini.md
deleted file mode 100644
index ef352eadb..000000000
--- a/documents/docs/supported_models/gemini.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Google Gemini
-
-## Step 1
-To use the Google Gemini API, you need to create an account on the [Google Gemini website](https://ai.google.dev/) and access the API key.
-
-## Step 2
-You may need to install additional dependencies to use the Google Gemini API. You can install the dependencies using the following command:
-
-```bash
-pip install -U google-genai==1.12.1
-```
-
-## Step 3
-Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Google Gemini API. The following is an example configuration for the Google Gemini API:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "Gemini" ,
-API_KEY: "YOUR_KEY",
-API_MODEL: "YOUR_MODEL"
-```
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
-!!! tip
- `API_MODEL` is the model name of Gemini LLM API. You can find the model name in the [Gemini LLM model](https://ai.google.dev/gemini-api) list. If you meet the `429` Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.
-
-## Step 4
-After configuring the `HOST_AGENT` and `APP_AGENT` with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
\ No newline at end of file
diff --git a/documents/docs/supported_models/ollama.md b/documents/docs/supported_models/ollama.md
deleted file mode 100644
index 05ee9e1d2..000000000
--- a/documents/docs/supported_models/ollama.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Ollama
-
-## Step 1
-If you want to use the Ollama model, Go to [Ollama](https://github.com/jmorganca/ollama) and follow the instructions to serve a LLM model on your local environment. We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.
-
-```bash
-## Install ollama on Linux & WSL2
-curl https://ollama.ai/install.sh | sh
-## Run the serving
-ollama serve
-```
-
-## Step 2
-Open another terminal and run the following command to test the ollama model:
-
-```bash
-ollama run YOUR_MODEL
-```
-
-!!!info
- When serving LLMs via Ollama, it will by default start a server at `http://localhost:11434`, which will later be used as the API base in `config.yaml`.
-
-## Step 3
-After obtaining the API key, you can configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Ollama API. The following is an example configuration for the Ollama API:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "ollama" ,
-API_BASE: "YOUR_ENDPOINT",
-API_KEY: "ollama", # not used but required
-API_MODEL: "YOUR_MODEL"
-```
-
-
-!!! tip
- `API_BASE` is the URL started in the Ollama LLM server and `API_MODEL` is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model token limitations, you can use lite version of prompt to have a taste on UFO which can be configured in `config_dev.yaml`.
-
-!!! note
- To run UFO successfully with Ollama, you must increase the default token limit of 2048 tokens by creating a custom model with a modified Modelfile. Create a new Modelfile that specifies `PARAMETER num_ctx 32768` (or your model's maximum context length), then build your custom model with `ollama create [model]-max-ctx -f Modelfile`. UFO requires at least 20,000 tokens to function properly, so setting the `num_ctx` parameter to your model's maximum supported context length will ensure optimal performance. For more details on Modelfile configuration, refer to [Ollama's official documentation](https://github.com/ollama/ollama/blob/main/docs/modelfile.md).
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
-
-## Step 4
-After configuring the `HOST_AGENT` and `APP_AGENT` with the Ollama API, you can start using UFO to interact with the Ollama API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
-
-
-
diff --git a/documents/docs/supported_models/openai.md b/documents/docs/supported_models/openai.md
deleted file mode 100644
index 8e704c657..000000000
--- a/documents/docs/supported_models/openai.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# OpenAI
-
-## Step 1
-
-To use the OpenAI API, you need to create an account on the [OpenAI website](https://platform.openai.com/signup). After creating an account, you can access the API key from the [API keys page](https://platform.openai.com/account/api-keys).
-
-## Step 2
-
-After obtaining the API key, you can configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the OpenAI API. The following is an example configuration for the OpenAI API:
-
-```yaml
-VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
-API_TYPE: "openai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
-API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API.
-API_KEY: "sk-", # The OpenAI API key, begin with sk-
-API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
-API_MODEL: "gpt-4-vision-preview", # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
-```
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs. You can find the list of models [here](https://platform.openai.com/docs/models).
-
-
-
-## Step 3
-After configuring the `HOST_AGENT` and `APP_AGENT` with the OpenAI API, you can start using UFO to interact with the OpenAI API for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
\ No newline at end of file
diff --git a/documents/docs/supported_models/operator.md b/documents/docs/supported_models/operator.md
deleted file mode 100644
index 313b24a6e..000000000
--- a/documents/docs/supported_models/operator.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# OpenAI CUA (Operator)
-
-The [Opeartor](https://openai.com/index/computer-using-agent/) is a specialized agentic model tailored for Computer-Using Agents (CUA). We now support calling via the Azure OpenAI API (AOAI). The following sections provide a comprehensive guide on how to set up and use the AOAI API with UFO. Note that now AOAI only supports the [Response API](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure) to invoke the model.
-
-
-
-## Step 1
-To use the Azure OpenAI API, you need to create an account on the [Azure OpenAI website](https://azure.microsoft.com/en-us/products/ai-services/openai-service). After creating an account, you can deploy the AOAI API and access the API key.
-
-## Step 2
-After obtaining the API key, you can configure the `OPERATOR` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API:
-
-```yaml
-OPERATOR: {
- SCALER: [1024, 768], # The scaler for the visual input in a list format, [width, height]
- API_TYPE: "azure_ad" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
- API_MODEL: "computer-use-preview-20250311", #"gpt-4o-mini-20240718", #"gpt-4o-20240513", # The only OpenAI model by now that accepts visual input
- API_VERSION: "2025-03-01-preview", # "2024-02-15-preview" by default
- API_BASE: "", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API. As for the AAD, it should be your endpoints.
-}
-```
-
-If you want to use AAD for authentication, you should additionally set the following configuration:
-
-```yaml
- AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
- AAD_API_SCOPE: "YOUR_SCOPE", # Set the value to your scope for the llm model
- AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE
-```
-
-## Step 3
-
-Now UFO only support to run Operator as a single agent, or as a separate `AppAgent` that can be called by the `HostAgent`. Please refer to the [documents](../advanced_usage/operator_as_app_agent.md) for how to run Operator within UFO.
-
-!!!note
- The Opeartor is a visual-only model and use different workflow from the other models. Currently, it does not support reuse the `AppAgent` workflow. Please refer to the documents for how to run Operator within UFO.
-
diff --git a/documents/docs/supported_models/overview.md b/documents/docs/supported_models/overview.md
deleted file mode 100644
index 048699f22..000000000
--- a/documents/docs/supported_models/overview.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# Supported Models
-
-UFO supports a variety of LLM models and APIs. You can customize the model and API used by the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file. Additionally, you can configure a `BACKUP_AGENT` to handle requests when the primary agent fails to respond.
-
-Please refer to the following sections for more information on the supported models and APIs:
-
-| LLMs | Documentation |
-| --- | --- |
-| `OPENAI` | [OpenAI API](./openai.md) |
-| `Azure OpenAI (AOAI)` | [Azure OpenAI API](./azure_openai.md) |
-| `Gemini` | [Gemini API](./gemini.md) |
-| `Claude` | [Claude API](./claude.md) |
-| `QWEN` | [QWEN API](./qwen.md) |
-| `Ollama` | [Ollama API](./ollama.md) |
-| `Custom` | [Custom API](./custom_model.md) |
-
-
-!!! info
- Each model is implemented as a separate class in the `ufo/llm` directory, and uses the functions `chat_completion` defined in the `BaseService` class of the `ufo/llm/base.py` file to obtain responses from the model.
\ No newline at end of file
diff --git a/documents/docs/supported_models/qwen.md b/documents/docs/supported_models/qwen.md
deleted file mode 100644
index 06301f854..000000000
--- a/documents/docs/supported_models/qwen.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Qwen Model
-
-## Step 1
-Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use the Qwen model, Go to [QWen](https://dashscope.aliyun.com/) and register an account and get the API key. More details can be found [here](https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key?spm=a2c4g.11186623.0.0.7b5749d72j3SYU) (in Chinese).
-
-## Step 2
-Configure the `HOST_AGENT` and `APP_AGENT` in the `config.yaml` file (rename the `config_template.yaml` file to `config.yaml`) to use the Qwen model. The following is an example configuration for the Qwen model:
-
-```yaml
- VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
- API_TYPE: "qwen" , # The API type, "qwen" for the Qwen model.
- API_KEY: "YOUR_KEY", # The Qwen API key
- API_MODEL: "YOUR_MODEL" # The Qwen model name
-```
-
-!!! tip
- If you set `VISUAL_MODE` to `True`, make sure the `API_MODEL` supports visual inputs.
-
-!!! tip
- `API_MODEL` is the model name of Qwen LLM API. You can find the model name in the [Qwen LLM model](https://help.aliyun.com/zh/dashscope/developer-reference/model-square/?spm=a2c4g.11186623.0.0.35a36ffdt97ljI) list.
-
-## Step 3
-After configuring the `HOST_AGENT` and `APP_AGENT` with the Qwen model, you can start using UFO to interact with the Qwen model for various tasks on Windows OS. Please refer to the [Quick Start Guide](../getting_started/quick_start.md) for more details on how to get started with UFO.
diff --git a/documents/docs/creating_app_agent/demonstration_provision.md b/documents/docs/tutorials/creating_app_agent/demonstration_provision.md
similarity index 73%
rename from documents/docs/creating_app_agent/demonstration_provision.md
rename to documents/docs/tutorials/creating_app_agent/demonstration_provision.md
index 1a926ec04..1988f1e5d 100644
--- a/documents/docs/creating_app_agent/demonstration_provision.md
+++ b/documents/docs/tutorials/creating_app_agent/demonstration_provision.md
@@ -1,8 +1,8 @@
-## Provide Human Demonstrations to the AppAgent
+# Provide Human Demonstrations to the AppAgent
Users or application developers can provide human demonstrations to the `AppAgent` to guide it in executing similar tasks in the future. The `AppAgent` uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.
-### How to Prepare Human Demonstrations for the AppAgent?
+## How to Prepare Human Demonstrations for the AppAgent?
Currently, UFO supports learning from user trajectories recorded by [Steps Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) integrated within Windows. More tools will be supported in the future.
@@ -14,9 +14,10 @@ Follow the [official guidance](https://support.microsoft.com/en-us/windows/recor
Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well.
-
-
-
+
+ 
+ Adding comments in Steps Recorder for additional context
+
### Step 3: Review and Save the Recorded Demonstrations
@@ -57,14 +58,15 @@ Would you like to save any one of them as a future reference for the agent? Pres
Press `1` to save the plan into its memory for future reference. A sample can be found [here](https://github.com/microsoft/UFO/blob/main/vectordb/demonstration/example.yaml).
-You can view a demonstration video below:
+You can view a demonstration video [here](https://github.com/yunhao0204/UFO/assets/59384816/0146f83e-1b5e-4933-8985-fe3f24ec4777).
-
+## How to Use Human Demonstrations to Enhance the AppAgent?
-
+After creating the offline indexer, refer to the [Learning from User Demonstrations](../../ufo2/core_features/knowledge_substrate/learning_from_demonstration.md) section for guidance on how to use human demonstrations to enhance the AppAgent.
-### How to Use Human Demonstrations to Enhance the AppAgent?
+## Related Documentation
-After creating the offline indexer, refer to the [Learning from User Demonstrations](../advanced_usage/reinforce_appagent/learning_from_demonstration.md) section for guidance on how to use human demonstrations to enhance the AppAgent.
-
----
\ No newline at end of file
+- [Overview: Enhancing AppAgent Capabilities](./overview.md) - Learn about all enhancement approaches
+- [Help Document Provision](./help_document_provision.md) - Provide knowledge through documentation
+- [Wrapping App-Native API](./warpping_app_native_api.md) - Create efficient MCP action servers
+- [Knowledge Substrate Overview](../../ufo2/core_features/knowledge_substrate/overview.md) - Understanding the RAG architecture
\ No newline at end of file
diff --git a/documents/docs/creating_app_agent/help_document_provision.md b/documents/docs/tutorials/creating_app_agent/help_document_provision.md
similarity index 73%
rename from documents/docs/creating_app_agent/help_document_provision.md
rename to documents/docs/tutorials/creating_app_agent/help_document_provision.md
index 2722eb12e..31a9d3cb3 100644
--- a/documents/docs/creating_app_agent/help_document_provision.md
+++ b/documents/docs/tutorials/creating_app_agent/help_document_provision.md
@@ -2,9 +2,7 @@
Help documents provide guidance to the `AppAgent` in executing specific tasks. The `AppAgent` uses these documents to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.
-## How to Provide Help Documents to the AppAgent?
-
-### Step 1: Prepare Help Documents and Metadata
+## Step 1: Prepare Help Documents and Metadata
UFO currently supports processing help documents in `json` format. More formats will be supported in the future.
@@ -30,11 +28,11 @@ An example of a help document in `json` format is as follows:
Save each help document in a `json` file of your target folder.
-### Step 2: Place Help Documents in the AppAgent Directory
+## Step 2: Place Help Documents in the AppAgent Directory
Once you have prepared all help documents and their metadata, place them into a folder. Sub-folders for the help documents are allowed, but ensure that each help document and its corresponding metadata are placed in the same directory.
-### Step 3: Create a Help Document Indexer
+## Step 3: Create a Help Document Indexer
After organizing your documents in a folder named `path_of_the_docs`, you can create an offline indexer to support RAG for UFO. Follow these steps:
@@ -48,10 +46,16 @@ python -m learner --app --docs
This command will create an offline indexer for all documents in the `path_of_the_docs` folder using Faiss and embedding with sentence transformer (additional embeddings will be supported soon). By default, the created index will be placed [here](https://github.com/microsoft/UFO/tree/main/vectordb/docs).
-!!! note
+!!! note "Application Name Requirement"
Ensure the `app_name` is accurately defined, as it is used to match the offline indexer in online RAG.
+## How to Use Help Documents to Enhance the AppAgent?
+
+After creating the offline indexer, refer to the [Learning from Help Documents](../../ufo2/core_features/knowledge_substrate/learning_from_help_document.md) section for guidance on how to use the help documents to enhance the `AppAgent`.
-### How to Use Help Documents to Enhance the AppAgent?
+## Related Documentation
-After creating the offline indexer, you can find the guidance on how to use the help documents to enhance the `AppAgent` in the [Learning from Help Documents](../advanced_usage/reinforce_appagent/learning_from_help_document.md) section.
\ No newline at end of file
+- [Overview: Enhancing AppAgent Capabilities](./overview.md) - Learn about all enhancement approaches
+- [User Demonstrations Provision](./demonstration_provision.md) - Teach through examples
+- [Wrapping App-Native API](./warpping_app_native_api.md) - Create efficient MCP action servers
+- [Knowledge Substrate Overview](../../ufo2/core_features/knowledge_substrate/overview.md) - Understanding the RAG architecture
\ No newline at end of file
diff --git a/documents/docs/tutorials/creating_app_agent/overview.md b/documents/docs/tutorials/creating_app_agent/overview.md
new file mode 100644
index 000000000..c6cc3bdd5
--- /dev/null
+++ b/documents/docs/tutorials/creating_app_agent/overview.md
@@ -0,0 +1,104 @@
+# Enhancing AppAgent Capabilities
+
+UFO² provides a flexible framework for application developers and users to enhance `AppAgent` capabilities for specific applications. AppAgent enhancement is about **augmenting** the existing AppAgent's capabilities through:
+
+- **Knowledge** (help documents, demonstrations) to guide decision-making
+- **Native API tools** (via MCP servers) for efficient automation
+- **Application-specific context** for better understanding
+
+## Enhancement Components
+
+The `AppAgent` can be enhanced through three complementary approaches:
+
+| Component | Description | Tutorial | Implementation Guide |
+| --- | --- | --- | --- |
+| **[Help Documents](./help_document_provision.md)** | Provide application-specific guidance and instructions to help the agent understand tasks and workflows | [Provision Guide](./help_document_provision.md) | [Learning from Help Documents](../../ufo2/core_features/knowledge_substrate/learning_from_help_document.md) |
+| **[User Demonstrations](./demonstration_provision.md)** | Supply recorded user interactions to teach the agent how to perform specific tasks through examples | [Provision Guide](./demonstration_provision.md) | [Learning from Demonstrations](../../ufo2/core_features/knowledge_substrate/learning_from_demonstration.md) |
+| **[Native API Tools](./warpping_app_native_api.md)** | Create custom MCP action servers that wrap application COM APIs or other native interfaces for efficient automation | [Wrapping Guide](./warpping_app_native_api.md) | [Creating MCP Servers](../creating_mcp_servers.md) |
+
+## Enhancement Workflow
+
+```mermaid
+graph TB
+ Enhancement[AppAgent Enhancement Workflow]
+
+ Enhancement --> KnowledgeLayer[Knowledge Layer RAG-based]
+ Enhancement --> ToolLayer[Tool Layer MCP Servers]
+
+ KnowledgeLayer --> HelpDocs[Help Documents]
+ KnowledgeLayer --> DemoTraj[User Demonstrations]
+
+ ToolLayer --> UITools[UI Automation Tools]
+ ToolLayer --> APITools[Native API Tools]
+
+ HelpDocs --> EnhancedAgent[Enhanced AppAgent]
+ DemoTraj --> EnhancedAgent
+ UITools --> EnhancedAgent
+ APITools --> EnhancedAgent
+
+ style Enhancement fill:#e1f5ff,stroke:#01579b,stroke-width:3px
+ style KnowledgeLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px
+ style ToolLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
+ style HelpDocs fill:#fffde7,stroke:#f57f17,stroke-width:2px
+ style DemoTraj fill:#fffde7,stroke:#f57f17,stroke-width:2px
+ style UITools fill:#fce4ec,stroke:#880e4f,stroke-width:2px
+ style APITools fill:#fce4ec,stroke:#880e4f,stroke-width:2px
+ style EnhancedAgent fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px
+```
+
+## When to Use Each Component?
+
+### Help Documents
+**Use when:**
+- You have official documentation, tutorials, or guides for your application
+- Tasks require domain-specific knowledge or procedures
+- You want the agent to understand application concepts and terminology
+
+**Example:** Providing Excel formula documentation to help the agent use advanced Excel functions correctly.
+
+### User Demonstrations
+**Use when:**
+- You can demonstrate the task yourself
+- The task involves a specific sequence of UI interactions
+- Visual/procedural knowledge is easier to show than describe
+
+**Example:** Recording how to create a pivot table in Excel to teach the agent the exact steps.
+
+### Native API Tools
+**Use when:**
+- Your application exposes COM APIs, REST APIs, or other programmable interfaces
+- GUI automation is slow or unreliable for certain operations
+- You need deterministic, high-performance automation
+
+**Example:** Creating an MCP server that wraps Excel's COM API for inserting tables, formatting cells, etc.
+
+## Enhancement Strategy
+
+!!!tip "Hybrid Approach for Best Results"
+ Combine all three components for maximum effectiveness:
+
+ 1. **Knowledge Foundation**: Provide help documents for conceptual understanding
+ 2. **Procedural Learning**: Add demonstrations for complex workflows
+ 3. **Efficient Execution**: Implement native API tools for performance-critical operations
+
+ The AppAgent will:
+ - Use knowledge to **understand** what to do
+ - Reference demonstrations to **learn** how to do it
+ - Leverage API tools when available for **efficient** execution
+ - Fall back to UI automation when needed
+
+## Getting Started
+
+Follow the tutorials in order to enhance your AppAgent:
+
+1. **[Provide Help Documents](./help_document_provision.md)** - Start with knowledge
+2. **[Add User Demonstrations](./demonstration_provision.md)** - Teach by example
+3. **[Wrap Native APIs](./warpping_app_native_api.md)** - Enable efficient automation
+
+## Related Documentation
+
+- [AppAgent Overview](../../ufo2/app_agent/overview.md) - Understanding AppAgent architecture
+- [Knowledge Substrate](../../ufo2/core_features/knowledge_substrate/overview.md) - How knowledge enhancement works
+- [Creating MCP Servers](../creating_mcp_servers.md) - Building custom automation tools
+- [MCP Configuration](../../mcp/configuration.md) - Registering MCP servers with AppAgent
+- [Hybrid GUI–API Actions](../../ufo2/core_features/hybrid_actions.md) - Understanding dual-mode automation
diff --git a/documents/docs/tutorials/creating_app_agent/warpping_app_native_api.md b/documents/docs/tutorials/creating_app_agent/warpping_app_native_api.md
new file mode 100644
index 000000000..b249f4dd8
--- /dev/null
+++ b/documents/docs/tutorials/creating_app_agent/warpping_app_native_api.md
@@ -0,0 +1,606 @@
+# Wrapping Application Native APIs as MCP Action Servers
+
+UFO² uses **MCP (Model Context Protocol) servers** to expose application native APIs to the AppAgent. This document shows you how to create custom MCP action servers that wrap your application's COM APIs, REST APIs, or other programmable interfaces.
+
+## Overview
+
+While AppAgent can automate applications through UI controls, providing **native API tools** via MCP servers offers significant advantages:
+
+| Automation Method | Speed | Reliability | Use Case |
+|-------------------|-------|-------------|----------|
+| **UI Automation** | Slower | Prone to UI changes | Visual elements, dialogs, menus |
+| **Native API** | ~10x faster | Deterministic | Data manipulation, batch operations |
+
+!!! tip "Hybrid Automation"
+ AppAgent combines both approaches - the LLM intelligently selects **GUI tools** (from UIExecutor) or **API tools** (from your custom MCP server) based on the task requirements.
+
+## Prerequisites
+
+Before creating a native API MCP server:
+
+1. **Understand MCP Servers**: Read [Creating MCP Servers Tutorial](../creating_mcp_servers.md)
+2. **Know Your API**: Familiarize yourself with your application's COM API, REST API, or SDK
+3. **Review Examples**: Study existing servers in `ufo/client/mcp/local_servers/`
+
+## Step-by-Step Guide
+
+### Step 1: Create Your MCP Server File
+
+Create a new Python file in `ufo/client/mcp/local_servers/` for your application's MCP server:
+
+```python
+# File: ufo/client/mcp/local_servers/your_app_executor.py
+
+from typing import Annotated, Optional
+from fastmcp import FastMCP
+from pydantic import Field
+from ufo.client.mcp.mcp_registry import MCPRegistry
+from ufo.automator.puppeteer import AppPuppeteer
+from ufo.automator.action_execution import ActionExecutor
+from ufo.agents.processors.schemas.actions import ActionCommandInfo
+
+
+@MCPRegistry.register_factory_decorator("YourAppExecutor")
+def create_your_app_executor(process_name: str, *args, **kwargs) -> FastMCP:
+ """
+ Create MCP server for YourApp COM API automation.
+
+ :param process_name: Process name for UI automation context.
+ :return: FastMCP instance with YourApp tools.
+ """
+
+ # Initialize puppeteer for UI context
+ puppeteer = AppPuppeteer(
+ process_name=process_name,
+ app_root_name="YOURAPP.EXE", # Your app's executable name
+ )
+
+ # Create COM API receiver
+ puppeteer.receiver_manager.create_api_receiver(
+ app_root_name="YOURAPP.EXE",
+ process_name=process_name,
+ )
+
+ executor = ActionExecutor()
+
+ def _execute_action(action: ActionCommandInfo) -> dict:
+ """Execute action via puppeteer."""
+ return executor.execute(action, puppeteer, control_dict={})
+
+ # Create FastMCP instance
+ mcp = FastMCP("YourApp COM Executor MCP Server")
+
+ # Define tools below...
+
+ return mcp
+```
+
+### Step 2: Define Tool Methods with @mcp.tool()
+
+Add tool methods to your MCP server using the `@mcp.tool()` decorator. Each tool wraps a native API call:
+
+```python
+ @mcp.tool()
+ def insert_data_table(
+ data: Annotated[
+ list[list[str]],
+ Field(description="2D array of table data. Example: [['Name', 'Age'], ['Alice', '25']]")
+ ],
+ start_row: Annotated[
+ int,
+ Field(description="Starting row index (1-based).")
+ ] = 1,
+ start_col: Annotated[
+ int,
+ Field(description="Starting column index (1-based).")
+ ] = 1,
+ ) -> Annotated[str, Field(description="Result message.")]:
+ """
+ Insert a data table into the application at the specified position.
+ Use this for bulk data insertion instead of manual cell-by-cell input.
+
+ Example usage:
+ - Insert CSV data: insert_data_table(data=csv_data, start_row=1, start_col=1)
+ - Add header and rows: insert_data_table(data=[['ID', 'Name'], ['1', 'Alice'], ['2', 'Bob']])
+ """
+ action = ActionCommandInfo(
+ function="insert_table",
+ arguments={
+ "data": data,
+ "start_row": start_row,
+ "start_col": start_col,
+ },
+ )
+ return _execute_action(action)
+
+ @mcp.tool()
+ def format_range(
+ start_cell: Annotated[
+ str,
+ Field(description="Starting cell address (e.g., 'A1').")
+ ],
+ end_cell: Annotated[
+ str,
+ Field(description="Ending cell address (e.g., 'B10').")
+ ],
+ font_bold: Annotated[
+ Optional[bool],
+ Field(description="Make font bold?")
+ ] = None,
+ font_size: Annotated[
+ Optional[int],
+ Field(description="Font size in points.")
+ ] = None,
+ background_color: Annotated[
+ Optional[str],
+ Field(description="Background color (hex code like '#FF0000' for red).")
+ ] = None,
+ ) -> Annotated[str, Field(description="Formatting result.")]:
+ """
+ Apply formatting to a cell range in the application.
+ Much faster than clicking format buttons multiple times.
+
+ Example:
+ - Bold header: format_range(start_cell='A1', end_cell='E1', font_bold=True)
+ - Highlight cells: format_range(start_cell='A2', end_cell='A10', background_color='#FFFF00')
+ """
+ action = ActionCommandInfo(
+ function="format_cells",
+ arguments={
+ "start_cell": start_cell,
+ "end_cell": end_cell,
+ "font_bold": font_bold,
+ "font_size": font_size,
+ "background_color": background_color,
+ },
+ )
+ return _execute_action(action)
+
+ @mcp.tool()
+ def save_as_pdf(
+ output_path: Annotated[
+ str,
+ Field(description="Full path for the PDF file (e.g., 'C:/Users/Documents/report.pdf').")
+ ],
+ ) -> Annotated[str, Field(description="Save result message.")]:
+ """
+ Export the current document as a PDF file.
+ One-click operation - much faster than File > Save As > PDF > Navigate > Save.
+
+ Example: save_as_pdf(output_path='C:/Reports/monthly_report.pdf')
+ """
+ action = ActionCommandInfo(
+ function="save_as",
+ arguments={
+ "file_path": output_path,
+ "file_format": "pdf",
+ },
+ )
+ return _execute_action(action)
+```
+
+!!!tip "Tool Design Best Practices"
+ - **Clear docstrings**: Explain what the tool does, when to use it, and provide examples
+ - **Descriptive parameters**: Use `Annotated` with `Field(description=...)`for all parameters
+ - **Error handling**: Return descriptive error messages when operations fail
+ - **Comprehensive coverage**: Wrap common operations that benefit from API speed
+
+### Step 3: Implement the Underlying API Receiver
+
+The receiver class executes the actual COM API calls. Create it in `ufo/automator/app_apis/`:
+
+```python
+# File: ufo/automator/app_apis/your_app/your_app_client.py
+
+import win32com.client
+from typing import Dict, Any, List, Optional
+from ufo.automator.app_apis.basic import WinCOMReceiverBasic
+from ufo.automator.basic import CommandBasic
+
+
+class YourAppCOMReceiver(WinCOMReceiverBasic):
+ """
+ COM API receiver for YourApp automation.
+ """
+
+ _command_registry: Dict[str, type[CommandBasic]] = {}
+
+ def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
+ """
+ Initialize the YourApp COM client.
+ :param app_root_name: Application root name.
+ :param process_name: Process name.
+ :param clsid: COM object CLSID.
+ """
+ super().__init__(app_root_name, process_name, clsid)
+
+ def insert_table_data(
+ self,
+ data: List[List[str]],
+ start_row: int = 1,
+ start_col: int = 1
+ ) -> str:
+ """
+ Insert table data using COM API.
+ :param data: 2D array of table data.
+ :param start_row: Starting row (1-based).
+ :param start_col: Starting column (1-based).
+ :return: Result message.
+ """
+ try:
+ # Access the active document/workbook via COM
+ doc = self.com_object.ActiveDocument # Or ActiveWorkbook for Excel
+
+ # Insert data row by row
+ for i, row in enumerate(data):
+ for j, cell_value in enumerate(row):
+ # Example: Set cell value
+ cell = doc.Tables(1).Cell(start_row + i, start_col + j)
+ cell.Range.Text = str(cell_value)
+
+ return f"Successfully inserted {len(data)} rows of data"
+ except Exception as e:
+ return f"Error inserting table: {str(e)}"
+
+ def format_cells(
+ self,
+ start_cell: str,
+ end_cell: str,
+ font_bold: Optional[bool] = None,
+ font_size: Optional[int] = None,
+ background_color: Optional[str] = None,
+ ) -> str:
+ """
+ Format cell range using COM API.
+ """
+ try:
+ doc = self.com_object.ActiveDocument
+ range_obj = doc.Range(start_cell, end_cell)
+
+ if font_bold is not None:
+ range_obj.Font.Bold = font_bold
+ if font_size is not None:
+ range_obj.Font.Size = font_size
+ if background_color is not None:
+ # Convert hex to RGB and apply
+ range_obj.Shading.BackgroundPatternColor = self._hex_to_rgb(background_color)
+
+ return f"Successfully formatted range {start_cell}:{end_cell}"
+ except Exception as e:
+ return f"Error formatting cells: {str(e)}"
+
+ def save_document_as(self, file_path: str, file_format: str) -> str:
+ """
+ Save document in specified format.
+ """
+ try:
+ doc = self.com_object.ActiveDocument
+
+ # Map format string to COM constant
+ format_map = {
+ "pdf": 17, # wdFormatPDF
+ "docx": 16, # wdFormatXMLDocument
+ # Add more formats as needed
+ }
+
+ format_code = format_map.get(file_format.lower(), 16)
+ doc.SaveAs2(file_path, FileFormat=format_code)
+
+ return f"Successfully saved document to {file_path}"
+ except Exception as e:
+ return f"Error saving document: {str(e)}"
+
+ @staticmethod
+ def _hex_to_rgb(hex_color: str) -> int:
+ """Convert hex color to RGB integer for COM."""
+ hex_color = hex_color.lstrip('#')
+ r, g, b = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
+ return r + (g << 8) + (b << 16)
+```
+
+### Step 4: Create Command Classes
+
+Define command classes that bridge the MCP tools to the receiver methods:
+
+```python
+# In the same file: ufo/automator/app_apis/your_app/your_app_client.py
+
+@YourAppCOMReceiver.register
+class InsertTableCommand(CommandBasic):
+ """Command to insert table data."""
+
+ def execute(self) -> Dict[str, Any]:
+ """Execute table insertion."""
+ return self.receiver.insert_table_data(
+ data=self.params.get("data", []),
+ start_row=self.params.get("start_row", 1),
+ start_col=self.params.get("start_col", 1),
+ )
+
+
+@YourAppCOMReceiver.register
+class FormatCellsCommand(CommandBasic):
+ """Command to format cell range."""
+
+ def execute(self) -> Dict[str, Any]:
+ """Execute cell formatting."""
+ return self.receiver.format_cells(
+ start_cell=self.params.get("start_cell"),
+ end_cell=self.params.get("end_cell"),
+ font_bold=self.params.get("font_bold"),
+ font_size=self.params.get("font_size"),
+ background_color=self.params.get("background_color"),
+ )
+
+
+@YourAppCOMReceiver.register
+class SaveAsCommand(CommandBasic):
+ """Command to save document."""
+
+ def execute(self) -> Dict[str, Any]:
+ """Execute document save."""
+ return self.receiver.save_document_as(
+ file_path=self.params.get("file_path"),
+ file_format=self.params.get("file_format", "pdf"),
+ )
+```
+
+!!!note "Command Registration"
+ Use `@YourAppCOMReceiver.register` decorator to register each command class with the receiver.
+
+### Step 5: Register Your Receiver in the Factory
+
+Add your receiver to the COM receiver factory in `ufo/automator/app_apis/factory.py`:
+
+```python
+def __com_client_mapper(self, app_root_name: str) -> Type[WinCOMReceiverBasic]:
+ """Map application to its COM receiver class."""
+ mapping = {
+ "WINWORD.EXE": WordWinCOMReceiver,
+ "EXCEL.EXE": ExcelWinCOMReceiver,
+ "POWERPNT.EXE": PowerPointWinCOMReceiver,
+ "YOURAPP.EXE": YourAppCOMReceiver, # Add your app here
+ }
+ return mapping.get(app_root_name)
+
+def __app_root_mappping(self, app_root_name: str) -> Optional[str]:
+ """Map application to its COM CLSID."""
+ mapping = {
+ "WINWORD.EXE": "Word.Application",
+ "EXCEL.EXE": "Excel.Application",
+ "POWERPNT.EXE": "PowerPoint.Application",
+ "YOURAPP.EXE": "YourApp.Application", # Add your CLSID here
+ }
+ return mapping.get(app_root_name)
+```
+
+### Step 6: Register the MCP Server in mcp.yaml
+
+Configure the MCP server for your application in `config/ufo/mcp.yaml`:
+
+```yaml
+AppAgent:
+ YOURAPP.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ reset: false
+ action:
+ - namespace: AppUIExecutor # Generic UI automation
+ type: local
+ reset: false
+ - namespace: YourAppExecutor # Your custom COM API tools
+ type: local
+ reset: true # Reset COM state when switching documents
+ - namespace: CommandLineExecutor # Shell commands
+ type: local
+ reset: false
+```
+
+!!!tip "Why `reset: true`?"
+ Set `reset: true` for COM-based MCP servers to prevent state leakage when switching between documents or application instances.
+
+### Step 7: Test Your MCP Server
+
+Test your server in isolation before integration:
+
+```python
+# File: test_your_app_server.py
+
+import asyncio
+from fastmcp.client import Client
+from ufo.client.mcp.local_servers.your_app_executor import create_your_app_executor
+
+
+async def test_server():
+ """Test YourApp MCP server."""
+ process_name = "your_app_process"
+ server = create_your_app_executor(process_name)
+
+ async with Client(server) as client:
+ # List available tools
+ tools = await client.list_tools()
+ print(f"Available tools: {[t.name for t in tools]}")
+
+ # Test insert_data_table
+ result = await client.call_tool(
+ "insert_data_table",
+ arguments={
+ "data": [["Name", "Age"], ["Alice", "25"], ["Bob", "30"]],
+ "start_row": 1,
+ "start_col": 1,
+ }
+ )
+ print(f"Insert result: {result.data}")
+
+ # Test format_range
+ result = await client.call_tool(
+ "format_range",
+ arguments={
+ "start_cell": "A1",
+ "end_cell": "B1",
+ "font_bold": True,
+ "font_size": 14,
+ }
+ )
+ print(f"Format result: {result.data}")
+
+
+if __name__ == "__main__":
+ asyncio.run(test_server())
+```
+
+## Complete Example: Excel COM Executor
+
+See the complete implementation in UFO²'s codebase:
+
+- **MCP Server**: `ufo/client/mcp/local_servers/excel_wincom_mcp_server.py`
+- **COM Receiver**: `ufo/automator/app_apis/excel/excel_client.py`
+- **Configuration**: `config/ufo/mcp.yaml` (under `AppAgent.EXCEL.EXE`)
+
+Key features:
+- `insert_table`: Bulk data insertion
+- `format_cells`: Cell formatting (fonts, colors, borders)
+- `create_chart`: Chart generation
+- `apply_formula`: Formula application
+- `save_as`: Export to PDF/CSV
+
+## Legacy Approach: API Prompt Files (Deprecated)
+
+!!!warning "Deprecated: API Prompt Files"
+ The old approach of creating `api.yaml` prompt files and configuring `APP_API_PROMPT_ADDRESS` is **deprecated**. The new MCP architecture provides:
+
+ - ✅ **Better tool discovery**: Tools are automatically introspected from MCP servers
+ - ✅ **Type safety**: Pydantic models ensure parameter validation
+ - ✅ **Cleaner code**: No manual prompt file maintenance
+ - ✅ **Better testing**: Direct server testing with FastMCP Client
+
+ If you're migrating from the old system, see [Creating MCP Servers Tutorial](../creating_mcp_servers.md).
+
+## Best Practices
+
+### 1. Comprehensive Docstrings
+
+```python
+@mcp.tool()
+def insert_data_table(...) -> ...:
+ """
+ Insert a data table into the application at the specified position.
+ Use this for bulk data insertion instead of manual cell-by-cell input.
+
+ When to use:
+ - Inserting CSV/Excel data
+ - Creating tables from lists
+ - Bulk data population
+
+ Example usage:
+ - Insert CSV data: insert_data_table(data=csv_data, start_row=1, start_col=1)
+ - Add header and rows: insert_data_table(data=[['ID', 'Name'], ['1', 'Alice']])
+ """
+```
+
+### 2. Error Handling
+
+```python
+def insert_table_data(self, data: List[List[str]], ...) -> str:
+ """Insert table data using COM API."""
+ try:
+ # Validate input
+ if not data or not data[0]:
+ return "Error: Empty data table provided"
+
+ # Execute COM operation
+ doc = self.com_object.ActiveDocument
+ # ... insert logic ...
+
+ return f"Successfully inserted {len(data)} rows"
+ except Exception as e:
+ return f"Error inserting table: {str(e)}"
+```
+
+### 3. Parameter Validation
+
+```python
+@mcp.tool()
+def format_range(
+ start_cell: Annotated[
+ str,
+ Field(
+ description="Starting cell address (e.g., 'A1'). Must be valid Excel notation.",
+ pattern=r"^[A-Z]+[0-9]+$" # Regex validation
+ )
+ ],
+ ...
+) -> ...:
+ """Format cell range."""
+```
+
+### 4. Fallback to UI Automation
+
+Design your API tools to complement (not replace) UI automation:
+
+```python
+@mcp.tool()
+def apply_table_style(style_name: str) -> str:
+ """
+ Apply a predefined table style.
+
+ Note: For custom styling, use format_range() or UI automation
+ via AppUIExecutor::click_input() on the Design tab.
+ """
+```
+
+## Troubleshooting
+
+### Issue: COM Object Not Found
+
+**Symptom**: `pywintypes.com_error: (-2147221005, 'Invalid class string', None, None)`
+
+**Solution**:
+1. Verify the CLSID is correct for your application
+2. Ensure the application is installed and registered
+3. Check if the application supports COM automation
+
+### Issue: Permission Denied
+
+**Symptom**: `com_error: (-2147352567, 'Exception occurred.', ...)`
+
+**Solution**:
+- Run UFO² with administrator privileges
+- Check application security settings
+- Verify COM permissions in `dcomcnfg`
+
+### Issue: Tools Not Appearing in LLM Prompt
+
+**Symptom**: AppAgent doesn't use your API tools
+
+**Solution**:
+1. Verify MCP server is registered in `mcp.yaml`
+2. Check namespace matches: `@MCPRegistry.register_factory_decorator("YourAppExecutor")`
+3. Ensure server is under `action:` (not `data_collection:`)
+4. Test server independently with FastMCP Client
+
+## Related Documentation
+
+**Core Tutorials:**
+
+- **[Creating MCP Servers Tutorial](../creating_mcp_servers.md)** - Complete MCP server development guide
+- [Overview: Enhancing AppAgent Capabilities](./overview.md) - Learn about all enhancement approaches
+- [Help Document Provision](./help_document_provision.md) - Provide knowledge through documentation
+- [User Demonstrations Provision](./demonstration_provision.md) - Teach through examples
+
+**MCP Documentation:**
+
+- [MCP Configuration](../../mcp/configuration.md) - Registering MCP servers
+- [MCP Overview](../../mcp/overview.md) - Understanding MCP architecture
+- [WordCOMExecutor](../../mcp/servers/word_com_executor.md) - Reference implementation
+- [ExcelCOMExecutor](../../mcp/servers/excel_com_executor.md) - Reference implementation
+
+**Advanced Features:**
+
+- [Hybrid GUI–API Actions](../../ufo2/core_features/hybrid_actions.md) - How AppAgent chooses tools
+- [Knowledge Substrate Overview](../../ufo2/core_features/knowledge_substrate/overview.md) - Understanding the RAG architecture
+
+---
+
+By following this guide, you've successfully wrapped your application's native API as an MCP action server, enabling the AppAgent to perform fast, reliable automation through direct API calls!
diff --git a/documents/docs/tutorials/creating_device_agent/client_setup.md b/documents/docs/tutorials/creating_device_agent/client_setup.md
new file mode 100644
index 000000000..6110a4d23
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/client_setup.md
@@ -0,0 +1,1107 @@
+# Part 3: Client Setup
+
+This tutorial teaches you how to set up the **UFO device client** that runs on the target device, manages MCP servers, and communicates with the agent server via WebSocket. We'll use the existing client implementation as reference.
+
+---
+
+## Table of Contents
+
+1. [Client Architecture Overview](#client-architecture-overview)
+2. [Client Components](#client-components)
+3. [UFO Client Implementation](#ufo-client-implementation)
+4. [WebSocket Client](#websocket-client)
+5. [MCP Server Manager](#mcp-server-manager)
+6. [Platform Detection](#platform-detection)
+7. [Configuration and Deployment](#configuration-and-deployment)
+8. [Testing Your Client](#testing-your-client)
+
+---
+
+## Client Architecture Overview
+
+### Client Role in Device Agent System
+
+```mermaid
+graph TB
+ subgraph "Agent Server (Orchestrator)"
+ Agent[Device Agent]
+ Dispatcher[Command Dispatcher]
+ end
+
+ subgraph "Network Layer"
+ WS[WebSocket AIP Protocol]
+ end
+
+ subgraph "Device Client (Your Implementation)"
+ Main[client.py Entry Point]
+ UFOClient[UFOClient Core Logic]
+ WSClient[WebSocketClient Communication]
+
+ subgraph "Managers"
+ MCPMgr[MCP Server Manager]
+ CompMgr[Computer Manager]
+ CmdRouter[Command Router]
+ end
+
+ Main --> UFOClient
+ Main --> WSClient
+ UFOClient --> MCPMgr
+ UFOClient --> CompMgr
+ UFOClient --> CmdRouter
+ WSClient --> UFOClient
+ end
+
+ subgraph "MCP Servers"
+ MCP1[Mobile MCP Server]
+ MCP2[Linux MCP Server]
+ MCPN[...]
+ end
+
+ Agent --> Dispatcher
+ Dispatcher -->|Commands| WS
+ WS -->|Commands| WSClient
+ WSClient -->|Results| WS
+ WS -->|Results| Agent
+
+ UFOClient --> MCPMgr
+ MCPMgr --> MCP1 & MCP2 & MCPN
+
+ style Main fill:#c8e6c9
+ style UFOClient fill:#e1f5ff
+ style WSClient fill:#fff3e0
+ style MCPMgr fill:#f3e5f5
+```
+
+**Client Responsibilities**:
+
+| Component | Responsibility | Example |
+|-----------|----------------|---------|
+| **Entry Point** | Parse args, initialize services | `client.py main()` |
+| **UFO Client** | Execute commands, route actions | `UFOClient.execute_actions()` |
+| **WebSocket Client** | Bidirectional communication | `UFOWebSocketClient.handle_messages()` |
+| **MCP Server Manager** | Start/stop MCP servers | `MCPServerManager.start()` |
+| **Computer Manager** | Manage device computers | `ComputerManager.get_computer()` |
+| **Command Router** | Route commands to MCP tools | `CommandRouter.execute()` |
+
+---
+
+## Client Components
+
+### Component Hierarchy
+
+```mermaid
+graph TB
+ subgraph "Client Entry Point"
+ Main[client.py main function]
+ end
+
+ subgraph "Core Components"
+ UFO[UFOClient]
+ WS[UFOWebSocketClient]
+ end
+
+ subgraph "Management Layer"
+ MCP[MCPServerManager]
+ Comp[ComputerManager]
+ Router[CommandRouter]
+ end
+
+ subgraph "Protocol Layer"
+ AIP[AIP Protocol]
+ Reg[RegistrationProtocol]
+ Heart[HeartbeatProtocol]
+ Task[TaskExecutionProtocol]
+ end
+
+ subgraph "MCP Integration"
+ HTTP[HTTPMCPServer]
+ Local[LocalMCPServer]
+ Stdio[StdioMCPServer]
+ end
+
+ Main --> UFO
+ Main --> WS
+ UFO --> MCP
+ UFO --> Comp
+ UFO --> Router
+ WS --> Reg & Heart & Task
+ MCP --> HTTP & Local & Stdio
+
+ style Main fill:#c8e6c9
+ style UFO fill:#e1f5ff
+ style WS fill:#fff3e0
+ style MCP fill:#f3e5f5
+```
+
+---
+
+## UFO Client Implementation
+
+### File Location
+
+**Path**: `ufo/client/ufo_client.py`
+
+### Core UFO Client Class
+
+```python
+# ufo/client/ufo_client.py
+
+import asyncio
+import logging
+from typing import List, Optional
+
+from ufo.client.computer import CommandRouter, ComputerManager
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+from aip.messages import Command, Result, ServerMessage
+
+
+class UFOClient:
+ """
+ Client for interacting with the UFO web service.
+ Executes commands from agent server and returns results.
+ """
+
+ def __init__(
+ self,
+ mcp_server_manager: MCPServerManager,
+ computer_manager: ComputerManager,
+ client_id: Optional[str] = None,
+ platform: Optional[str] = None,
+ ):
+ """
+ Initialize the UFO client.
+
+ :param mcp_server_manager: Manages MCP servers
+ :param computer_manager: Manages computer instances
+ :param client_id: Unique client identifier
+ :param platform: Platform type ('windows', 'linux', 'android', 'ios')
+ """
+ self.mcp_server_manager = mcp_server_manager
+ self.computer_manager = computer_manager
+ self.command_router = CommandRouter(
+ computer_manager=self.computer_manager,
+ )
+ self.logger = logging.getLogger(__name__)
+ self.task_lock = asyncio.Lock() # Thread safety
+
+ self.client_id = client_id or "client_001"
+ self.platform = platform
+
+ # Session state
+ self._agent_name: Optional[str] = None
+ self._process_name: Optional[str] = None
+ self._root_name: Optional[str] = None
+ self._session_id: Optional[str] = None
+
+ async def execute_step(self, response: ServerMessage) -> List[Result]:
+ """
+ Execute a single step from the agent server.
+
+ :param response: ServerMessage with commands to execute
+ :return: List of execution results
+ """
+ # Update agent context
+ self.agent_name = response.agent_name
+ self.process_name = response.process_name
+ self.root_name = response.root_name
+
+ # Execute actions and collect results
+ action_results = await self.execute_actions(response.actions)
+ return action_results
+
+ async def execute_actions(
+ self, commands: Optional[List[Command]]
+ ) -> List[Result]:
+ """
+ Execute commands via MCP servers.
+
+ :param commands: List of commands to execute
+ :return: List of execution results
+ """
+ action_results = []
+
+ if commands:
+ self.logger.info(f"Executing {len(commands)} commands")
+
+ # Route commands to appropriate MCP servers
+ action_results = await self.command_router.execute(
+ agent_name=self.agent_name,
+ process_name=self.process_name,
+ root_name=self.root_name,
+ commands=commands,
+ )
+
+ return action_results
+
+ # Property setters/getters for agent context
+ @property
+ def session_id(self) -> Optional[str]:
+ """Get current session ID."""
+ return self._session_id
+
+ @session_id.setter
+ def session_id(self, value: Optional[str]):
+ """Set session ID."""
+ if value is not None and not isinstance(value, str):
+ raise ValueError("Session ID must be a string or None.")
+ self._session_id = value
+ self.logger.info(f"Session ID set to: {value}")
+
+ @property
+ def agent_name(self) -> Optional[str]:
+ """Get agent name."""
+ return self._agent_name
+
+ @agent_name.setter
+ def agent_name(self, value: Optional[str]):
+ """Set agent name."""
+ self._agent_name = value
+ self.logger.info(f"Agent name: {value}")
+
+ @property
+ def process_name(self) -> Optional[str]:
+ """Get process name."""
+ return self._process_name
+
+ @process_name.setter
+ def process_name(self, value: Optional[str]):
+ """Set process name."""
+ self._process_name = value
+
+ @property
+ def root_name(self) -> Optional[str]:
+ """Get root name."""
+ return self._root_name
+
+ @root_name.setter
+ def root_name(self, value: Optional[str]):
+ """Set root name."""
+ self._root_name = value
+```
+
+### Key Client Methods
+
+| Method | Purpose | Called By |
+|--------|---------|-----------|
+| `execute_step()` | Process one agent step | WebSocket client |
+| `execute_actions()` | Execute command list | `execute_step()` |
+| Property setters | Update agent context | WebSocket client |
+
+---
+
+## WebSocket Client
+
+### File Location
+
+**Path**: `ufo/client/websocket.py`
+
+### WebSocket Client Implementation
+
+```python
+# ufo/client/websocket.py (simplified)
+
+import asyncio
+import logging
+import websockets
+from typing import TYPE_CHECKING, Optional
+
+from aip.protocol.registration import RegistrationProtocol
+from aip.protocol.heartbeat import HeartbeatProtocol
+from aip.protocol.task_execution import TaskExecutionProtocol
+from aip.transport.websocket import WebSocketTransport
+from aip.messages import ServerMessage, ServerMessageType
+
+if TYPE_CHECKING:
+ from ufo.client.ufo_client import UFOClient
+
+
+class UFOWebSocketClient:
+ """
+ WebSocket client for UFO device agents.
+ Uses AIP (Agent Interaction Protocol) for structured communication.
+ """
+
+ def __init__(
+ self,
+ ws_url: str,
+ ufo_client: "UFOClient",
+ max_retries: int = 3,
+ timeout: float = 120,
+ ):
+ """
+ Initialize WebSocket client.
+
+ :param ws_url: WebSocket server URL (e.g., ws://localhost:5010/ws)
+ :param ufo_client: UFOClient instance
+ :param max_retries: Maximum connection retries
+ :param timeout: Connection timeout in seconds
+ """
+ self.ws_url = ws_url
+ self.ufo_client = ufo_client
+ self.max_retries = max_retries
+ self.retry_count = 0
+ self.timeout = timeout
+ self.logger = logging.getLogger(__name__)
+
+ self.connected_event = asyncio.Event()
+ self._ws: Optional[websockets.WebSocketClientProtocol] = None
+
+ # AIP protocol instances
+ self.transport: Optional[WebSocketTransport] = None
+ self.registration_protocol: Optional[RegistrationProtocol] = None
+ self.heartbeat_protocol: Optional[HeartbeatProtocol] = None
+ self.task_protocol: Optional[TaskExecutionProtocol] = None
+
+ async def connect_and_listen(self):
+ """
+ Connect to server and listen for messages.
+ Automatically retries on failure.
+ """
+ while True:
+ try:
+ # Check retry limit
+ if self.retry_count >= self.max_retries:
+ self.logger.error(f"Max retries ({self.max_retries}) reached")
+ break
+
+ self.logger.info(
+ f"Connecting to {self.ws_url} "
+ f"(attempt {self.retry_count + 1}/{self.max_retries})"
+ )
+
+ # Reset connection state
+ self.connected_event.clear()
+ self._ws = None
+
+ # Establish WebSocket connection
+ async with websockets.connect(
+ self.ws_url,
+ ping_interval=20,
+ ping_timeout=180,
+ close_timeout=10,
+ max_size=100 * 1024 * 1024, # 100MB max message size
+ ) as ws:
+ self._ws = ws
+
+ # Initialize AIP protocols
+ self.transport = WebSocketTransport(ws)
+ self.registration_protocol = RegistrationProtocol(self.transport)
+ self.heartbeat_protocol = HeartbeatProtocol(self.transport)
+ self.task_protocol = TaskExecutionProtocol(self.transport)
+
+ # Register with server
+ await self.register_client()
+
+ # Reset retry count on success
+ self.retry_count = 0
+
+ # Start message handling loop
+ await self.handle_messages()
+
+ except (
+ websockets.ConnectionClosed,
+ websockets.ConnectionClosedError,
+ asyncio.TimeoutError,
+ ) as e:
+ self.logger.warning(f"Connection closed: {e}. Retrying...")
+ self.connected_event.clear()
+ self.retry_count += 1
+ await self._maybe_retry()
+
+ except Exception as e:
+ self.logger.error(f"Unexpected error: {e}", exc_info=True)
+ self.connected_event.clear()
+ self.retry_count += 1
+ await self._maybe_retry()
+
+ async def register_client(self):
+ """
+ Register client with server.
+ Sends client ID and device system information.
+ """
+ from ufo.client.device_info_provider import DeviceInfoProvider
+
+ # Collect device system information
+ system_info = DeviceInfoProvider.collect_system_info(
+ self.ufo_client.client_id,
+ custom_metadata=None,
+ )
+
+ # Prepare metadata
+ metadata = {
+ "system_info": system_info,
+ "platform": self.ufo_client.platform,
+ "client_version": "3.0",
+ }
+
+ # Send registration via AIP
+ response = await self.registration_protocol.register(
+ client_id=self.ufo_client.client_id,
+ metadata=metadata,
+ )
+
+ if response.status == "success":
+ self.logger.info(f"✅ Client registered: {self.ufo_client.client_id}")
+ self.connected_event.set() # Signal connection ready
+ else:
+ raise ConnectionError(f"Registration failed: {response.message}")
+
+ async def handle_messages(self):
+ """
+ Handle incoming messages from server.
+ Dispatches to appropriate protocol handlers.
+ """
+ self.logger.info("Starting message handling loop")
+
+ while True:
+ try:
+ # Receive message via transport
+ message = await self.transport.receive()
+
+ if message is None:
+ self.logger.warning("Received None message, closing")
+ break
+
+ # Dispatch based on message type
+ if message.type == ServerMessageType.TASK_REQUEST:
+ await self._handle_task_request(message)
+
+ elif message.type == ServerMessageType.HEARTBEAT:
+ await self._handle_heartbeat(message)
+
+ elif message.type == ServerMessageType.RESULT_ACK:
+ await self._handle_result_ack(message)
+
+ else:
+ self.logger.warning(f"Unknown message type: {message.type}")
+
+ except Exception as e:
+ self.logger.error(f"Error handling message: {e}", exc_info=True)
+ break
+
+ async def _handle_task_request(self, message: ServerMessage):
+ """Handle task request from server."""
+ self.logger.info(f"📨 Task request received: {message.task_id}")
+
+ # Execute task via UFO client
+ results = await self.ufo_client.execute_step(message)
+
+ # Send results back via AIP
+ await self.task_protocol.send_result(
+ task_id=message.task_id,
+ results=results,
+ )
+
+ self.logger.info(f"✅ Task completed: {message.task_id}")
+
+ async def _handle_heartbeat(self, message: ServerMessage):
+ """Handle heartbeat from server."""
+ await self.heartbeat_protocol.send_heartbeat_ack(
+ timestamp=message.timestamp
+ )
+
+ async def _handle_result_ack(self, message: ServerMessage):
+ """Handle result acknowledgment from server."""
+ self.logger.info(f"✅ Result acknowledged: {message.task_id}")
+
+ async def _maybe_retry(self):
+ """Wait before retrying connection."""
+ if self.retry_count < self.max_retries:
+ wait_time = 2 ** self.retry_count # Exponential backoff
+ self.logger.info(f"Retrying in {wait_time}s...")
+ await asyncio.sleep(wait_time)
+
+ async def start_task(self, request_text: str, task_name: Optional[str] = None):
+ """
+ Initiate a task from client side (optional feature).
+
+ :param request_text: Task description
+ :param task_name: Optional task name
+ """
+ await self.task_protocol.request_task(
+ request_text=request_text,
+ task_name=task_name or "client_task",
+ )
+```
+
+### WebSocket Communication Flow
+
+```mermaid
+sequenceDiagram
+ participant Client as UFOWebSocketClient
+ participant Server as Agent Server
+ participant UFO as UFOClient
+ participant MCP as MCP Server
+
+ Client->>Server: REGISTER (client_id, metadata)
+ Server->>Client: REGISTER_ACK (success)
+
+ Note over Client,Server: Connection Established
+
+ Server->>Client: HEARTBEAT
+ Client->>Server: HEARTBEAT_ACK
+
+ Server->>Client: TASK_REQUEST (commands)
+ Client->>UFO: execute_step(message)
+ UFO->>MCP: execute(commands)
+ MCP->>UFO: results
+ UFO->>Client: results
+ Client->>Server: TASK_RESULT (results)
+ Server->>Client: RESULT_ACK
+
+ Note over Client,Server: Continuous Loop
+```
+
+---
+
+## MCP Server Manager
+
+### Manager Architecture
+
+```mermaid
+graph TB
+ subgraph "MCP Server Manager"
+ Mgr[MCPServerManager]
+
+ subgraph "Server Types"
+ HTTP[HTTPMCPServer Remote HTTP]
+ Local[LocalMCPServer In-Memory]
+ Stdio[StdioMCPServer Process]
+ end
+
+ Mgr --> HTTP & Local & Stdio
+ end
+
+ subgraph "MCP Servers"
+ MCP1[Mobile MCP port 8020]
+ MCP2[Linux MCP port 8010]
+ MCP3[Custom MCP port 8030]
+ end
+
+ HTTP --> MCP1 & MCP2
+ Local --> MCP3
+
+ style Mgr fill:#c8e6c9
+ style HTTP fill:#e1f5ff
+ style MCP1 fill:#fff3e0
+```
+
+### MCP Server Manager Implementation
+
+```python
+# ufo/client/mcp/mcp_server_manager.py (simplified)
+
+from typing import Dict, Any, Optional
+from abc import ABC, abstractmethod
+
+
+class BaseMCPServer(ABC):
+ """Base class for MCP servers."""
+
+ def __init__(self, config: Dict[str, Any]):
+ self._config = config
+ self._server = None
+ self._namespace = config.get("namespace", "default")
+
+ @abstractmethod
+ def start(self, *args, **kwargs) -> None:
+ """Start the MCP server."""
+ pass
+
+ @abstractmethod
+ def stop(self) -> None:
+ """Stop the MCP server."""
+ pass
+
+
+class HTTPMCPServer(BaseMCPServer):
+ """HTTP-based MCP server (most common for device agents)."""
+
+ def start(self, *args, **kwargs) -> None:
+ """Construct HTTP URL for MCP server."""
+ host = self._config.get("host", "localhost")
+ port = self._config.get("port", 8000)
+ path = self._config.get("path", "/mcp")
+ self._server = f"http://{host}:{port}{path}"
+
+ def stop(self) -> None:
+ """HTTP servers are typically managed externally."""
+ pass
+
+
+class LocalMCPServer(BaseMCPServer):
+ """Local in-memory MCP server."""
+
+ def start(self, *args, **kwargs) -> None:
+ """Get server from registry."""
+ from ufo.client.mcp.mcp_registry import MCPRegistry
+
+ server_namespace = self._config.get("namespace")
+ self._server = MCPRegistry.get(server_namespace, *args, **kwargs)
+
+
+class StdioMCPServer(BaseMCPServer):
+ """Standard I/O MCP server (for subprocess-based tools)."""
+
+ def start(self, *args, **kwargs) -> None:
+ """Create StdioTransport."""
+ from fastmcp.client.transports import StdioTransport
+
+ command = self._config.get("command", "python")
+ start_args = self._config.get("start_args", [])
+ self._server = StdioTransport(command, start_args)
+
+
+class MCPServerManager:
+ """Manages multiple MCP servers."""
+
+ def __init__(self):
+ self.servers: Dict[str, BaseMCPServer] = {}
+
+ def register_server(self, name: str, server_type: str, config: Dict):
+ """
+ Register an MCP server.
+
+ :param name: Server name
+ :param server_type: Type ('http', 'local', 'stdio')
+ :param config: Server configuration
+ """
+ if server_type == "http":
+ server = HTTPMCPServer(config)
+ elif server_type == "local":
+ server = LocalMCPServer(config)
+ elif server_type == "stdio":
+ server = StdioMCPServer(config)
+ else:
+ raise ValueError(f"Unknown server type: {server_type}")
+
+ self.servers[name] = server
+
+ def start_server(self, name: str):
+ """Start a registered MCP server."""
+ if name not in self.servers:
+ raise KeyError(f"Server '{name}' not registered")
+
+ self.servers[name].start()
+
+ def get_server(self, name: str) -> BaseMCPServer:
+ """Get a registered server."""
+ return self.servers.get(name)
+```
+
+---
+
+## Platform Detection
+
+### Auto-Detection Logic
+
+```python
+# ufo/client/client.py (platform detection)
+
+import platform as platform_module
+
+# Auto-detect platform if not specified
+if args.platform is None:
+ detected_platform = platform_module.system().lower()
+
+ if detected_platform in ["windows", "linux"]:
+ args.platform = detected_platform
+
+ elif detected_platform == "darwin":
+ # macOS detection
+ args.platform = "macos"
+
+ else:
+ # Fallback for unknown platforms
+ args.platform = "windows"
+
+logger.info(f"Platform: {args.platform}")
+```
+
+### Platform-Specific Configuration
+
+```python
+# Platform-specific MCP server registration
+
+def setup_mcp_servers(platform: str, mcp_manager: MCPServerManager):
+ """Setup MCP servers based on platform."""
+
+ if platform == "android":
+ # Register Android MCP server
+ mcp_manager.register_server(
+ name="mobile_mcp",
+ server_type="http",
+ config={
+ "host": "localhost",
+ "port": 8020,
+ "path": "/mcp",
+ "namespace": "mobile",
+ }
+ )
+ mcp_manager.start_server("mobile_mcp")
+
+ elif platform == "linux":
+ # Register Linux MCP server
+ mcp_manager.register_server(
+ name="linux_mcp",
+ server_type="http",
+ config={
+ "host": "localhost",
+ "port": 8010,
+ "path": "/mcp",
+ "namespace": "linux",
+ }
+ )
+ mcp_manager.start_server("linux_mcp")
+
+ elif platform == "windows":
+ # Windows uses local MCP servers
+ mcp_manager.register_server(
+ name="windows_mcp",
+ server_type="local",
+ config={"namespace": "windows"}
+ )
+ mcp_manager.start_server("windows_mcp")
+```
+
+---
+
+## Configuration and Deployment
+
+### Client Entry Point
+
+**File**: `ufo/client/client.py`
+
+```python
+#!/usr/bin/env python
+# ufo/client/client.py
+
+import argparse
+import asyncio
+import logging
+import platform as platform_module
+
+from ufo.client.computer import ComputerManager
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+from ufo.client.ufo_client import UFOClient
+from ufo.client.websocket import UFOWebSocketClient
+from config.config_loader import get_ufo_config
+from ufo.logging.setup import setup_logger
+
+
+def parse_arguments():
+ """Parse command line arguments."""
+ parser = argparse.ArgumentParser(description="UFO Device Client")
+
+ parser.add_argument(
+ "--client-id",
+ default="client_001",
+ help="Unique client ID (default: client_001)"
+ )
+
+ parser.add_argument(
+ "--ws-server",
+ default="ws://localhost:5000/ws",
+ help="WebSocket server URL (default: ws://localhost:5000/ws)"
+ )
+
+ parser.add_argument(
+ "--ws",
+ action="store_true",
+ help="Enable WebSocket mode (required)"
+ )
+
+ parser.add_argument(
+ "--max-retries",
+ type=int,
+ default=5,
+ help="Maximum connection retries (default: 5)"
+ )
+
+ parser.add_argument(
+ "--platform",
+ choices=["windows", "linux", "android", "ios"],
+ default=None,
+ help="Platform type (auto-detected if not specified)"
+ )
+
+ parser.add_argument(
+ "--log-level",
+ default="WARNING",
+ choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL", "OFF"],
+ help="Logging level (default: WARNING)"
+ )
+
+ return parser.parse_args()
+
+
+async def main():
+ """Main client entry point."""
+
+ # Parse arguments
+ args = parse_arguments()
+
+ # Auto-detect platform if not specified
+ if args.platform is None:
+ detected = platform_module.system().lower()
+ args.platform = detected if detected in ["windows", "linux"] else "windows"
+
+ # Setup logging
+ setup_logger(args.log_level)
+ logger = logging.getLogger(__name__)
+ logger.info(f"Platform: {args.platform}")
+
+ # Load configuration
+ ufo_config = get_ufo_config()
+
+ # Initialize managers
+ mcp_server_manager = MCPServerManager()
+ computer_manager = ComputerManager(ufo_config.to_dict(), mcp_server_manager)
+
+ # Setup platform-specific MCP servers
+ setup_mcp_servers(args.platform, mcp_server_manager)
+
+ # Create UFO client
+ client = UFOClient(
+ mcp_server_manager=mcp_server_manager,
+ computer_manager=computer_manager,
+ client_id=args.client_id,
+ platform=args.platform,
+ )
+
+ logger.info(f"UFO Client initialized: {args.client_id}")
+
+ # Create WebSocket client
+ ws_client = UFOWebSocketClient(
+ args.ws_server,
+ client,
+ max_retries=args.max_retries,
+ )
+
+ # Start connection
+ try:
+ await ws_client.connect_and_listen()
+ except Exception as e:
+ logger.error(f"Client error: {e}", exc_info=True)
+ return 1
+
+ return 0
+
+
+if __name__ == "__main__":
+ exit_code = asyncio.run(main())
+ exit(exit_code)
+```
+
+### Deployment Commands
+
+```bash
+# ========================================
+# Mobile Agent Client (Android)
+# ========================================
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5010/ws \
+ --client-id mobile_agent_1 \
+ --platform android \
+ --log-level INFO
+
+# ========================================
+# Linux Agent Client
+# ========================================
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_agent_1 \
+ --platform linux \
+ --max-retries 10
+
+# ========================================
+# iOS Agent Client
+# ========================================
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://192.168.1.100:5020/ws \
+ --client-id ios_agent_1 \
+ --platform ios \
+ --log-level DEBUG
+```
+
+### Configuration File (Optional)
+
+**File**: `config/client_config.yaml`
+
+```yaml
+# Client Configuration
+
+client:
+ client_id: "mobile_agent_1"
+ platform: "android"
+
+websocket:
+ server_url: "ws://192.168.1.100:5010/ws"
+ max_retries: 5
+ timeout: 120
+
+logging:
+ level: "INFO"
+ file: "logs/client.log"
+
+mcp_servers:
+ - name: "mobile_mcp"
+ type: "http"
+ config:
+ host: "localhost"
+ port: 8020
+ path: "/mcp"
+```
+
+---
+
+## Testing Your Client
+
+### Unit Testing
+
+```python
+# tests/unit/test_ufo_client.py
+
+import pytest
+from unittest.mock import MagicMock, AsyncMock
+from ufo.client.ufo_client import UFOClient
+from aip.messages import ServerMessage, Command
+
+
+class TestUFOClient:
+ """Unit tests for UFO Client."""
+
+ @pytest.fixture
+ def client(self):
+ """Create test client."""
+ mcp_manager = MagicMock()
+ comp_manager = MagicMock()
+
+ return UFOClient(
+ mcp_server_manager=mcp_manager,
+ computer_manager=comp_manager,
+ client_id="test_client",
+ platform="android",
+ )
+
+ @pytest.mark.asyncio
+ async def test_execute_actions(self, client):
+ """Test command execution."""
+ commands = [
+ Command(function="tap_screen", arguments={"x": 100, "y": 200})
+ ]
+
+ # Mock command router
+ client.command_router.execute = AsyncMock(return_value=[
+ {"success": True, "message": "Tapped"}
+ ])
+
+ results = await client.execute_actions(commands)
+
+ assert len(results) == 1
+ assert results[0]["success"] == True
+
+ def test_session_id_setter(self, client):
+ """Test session ID property."""
+ client.session_id = "session_123"
+ assert client.session_id == "session_123"
+```
+
+### Integration Testing
+
+```python
+# tests/integration/test_client_integration.py
+
+import pytest
+import asyncio
+from ufo.client.client import main
+
+
+class TestClientIntegration:
+ """Integration tests for client."""
+
+ @pytest.mark.asyncio
+ async def test_client_startup(self):
+ """Test client starts successfully."""
+ # Mock arguments
+ import sys
+ sys.argv = [
+ "client.py",
+ "--ws",
+ "--ws-server", "ws://localhost:5010/ws",
+ "--client-id", "test_client",
+ "--platform", "android",
+ ]
+
+ # Should not raise exceptions
+ # (Note: Will timeout waiting for server)
+ task = asyncio.create_task(main())
+ await asyncio.sleep(2)
+ task.cancel()
+```
+
+### Manual Testing
+
+```bash
+# 1. Start MCP server
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --port 8020
+
+# 2. Start agent server
+python -m ufo.server.app --port 5010
+
+# 3. Start client (in another terminal)
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5010/ws \
+ --client-id test_client \
+ --platform android \
+ --log-level DEBUG
+
+# 4. Check logs
+tail -f logs/client.log
+```
+
+---
+
+## Summary
+
+**What You've Built**:
+
+- ✅ UFO Client for command execution
+- ✅ WebSocket client for server communication
+- ✅ MCP Server Manager for MCP integration
+- ✅ Platform detection and configuration
+- ✅ Complete deployment setup
+
+**Key Takeaways**:
+
+| Component | Purpose | Key Methods |
+|-----------|---------|-------------|
+| **UFOClient** | Execute commands | `execute_step()`, `execute_actions()` |
+| **UFOWebSocketClient** | Server communication | `connect_and_listen()`, `handle_messages()` |
+| **MCPServerManager** | Manage MCP servers | `register_server()`, `start_server()` |
+| **client.py** | Entry point | `main()`, argument parsing |
+
+---
+
+## Next Steps
+
+**Continue to**: [Part 4: Configuration & Deployment →](configuration.md)
+
+Learn how to configure your device agent in `third_party.yaml`, register devices in `devices.yaml`, and deploy the complete system.
+
+---
+
+## Related Documentation
+
+- **[Client Overview](../../client/overview.md)** - Client architecture deep dive
+- **[AIP Protocol](../../aip/overview.md)** - Agent Interaction Protocol
+- **[MCP Integration](../../mcp/overview.md)** - MCP fundamentals
+
+---
+
+**Previous**: [← Part 2: MCP Server](mcp_server.md)
+**Next**: [Part 4: Configuration & Deployment →](configuration.md)
diff --git a/documents/docs/tutorials/creating_device_agent/configuration.md b/documents/docs/tutorials/creating_device_agent/configuration.md
new file mode 100644
index 000000000..c0bf43798
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/configuration.md
@@ -0,0 +1,879 @@
+# Part 4: Configuration & Deployment
+
+This tutorial covers the **configuration files and deployment procedures** needed to integrate your device agent into UFO³. You'll learn to configure `third_party.yaml`, register devices in `devices.yaml`, create prompt templates, and deploy the complete system.
+
+---
+
+## Table of Contents
+
+1. [Configuration Overview](#configuration-overview)
+2. [Third-Party Agent Configuration](#third-party-agent-configuration)
+3. [Device Registration](#device-registration)
+4. [Prompt Template Creation](#prompt-template-creation)
+5. [Step-by-Step Deployment](#step-by-step-deployment)
+6. [Galaxy Multi-Device Integration](#galaxy-multi-device-integration)
+7. [Common Configuration Patterns](#common-configuration-patterns)
+
+---
+
+## Configuration Overview
+
+### Configuration File Hierarchy
+
+```mermaid
+graph TB
+ subgraph "UFO Configuration"
+ UFOConfig[config/ufo/ UFO Framework Config]
+ ThirdParty[third_party.yaml Agent Registration]
+
+ UFOConfig --> ThirdParty
+ end
+
+ subgraph "Galaxy Configuration"
+ GalaxyConfig[config/galaxy/ Multi-Device Config]
+ Devices[devices.yaml Device Registry]
+ Constellation[constellation.yaml Orchestration]
+
+ GalaxyConfig --> Devices
+ GalaxyConfig --> Constellation
+ end
+
+ subgraph "Prompt Templates"
+ MainPrompt[ufo/prompts/third_party/ agent_name.yaml]
+ ExamplePrompt[ufo/prompts/third_party/ agent_name_example.yaml]
+ end
+
+ ThirdParty -.references.-> MainPrompt
+ ThirdParty -.references.-> ExamplePrompt
+ Devices -.references.-> ThirdParty
+
+ style ThirdParty fill:#c8e6c9
+ style Devices fill:#e1f5ff
+ style MainPrompt fill:#fff3e0
+```
+
+**Configuration Files**:
+
+| File | Purpose | Required |
+|------|---------|----------|
+| `config/ufo/third_party.yaml` | Register agent with UFO | ✅ Yes |
+| `config/galaxy/devices.yaml` | Register device instances | ✅ Yes (for Galaxy) |
+| `config/galaxy/constellation.yaml` | Multi-device orchestration | Optional |
+| `ufo/prompts/third_party/.yaml` | Main prompt template | ✅ Yes |
+| `ufo/prompts/third_party/_example.yaml` | Few-shot examples | ✅ Yes |
+
+---
+
+## Third-Party Agent Configuration
+
+### File Location
+
+**Path**: `config/ufo/third_party.yaml`
+
+### Configuration Structure
+
+```yaml
+# Third-Party Agent Integration Configuration
+# This file configures external/third-party agents that extend UFO's capabilities
+
+# ========================================
+# Enabled Agents
+# ========================================
+# List of third-party agents to enable
+ENABLED_THIRD_PARTY_AGENTS: ["MobileAgent", "LinuxAgent"]
+
+
+# ========================================
+# Agent Configurations
+# ========================================
+THIRD_PARTY_AGENT_CONFIG:
+
+ # ----------------------------------
+ # MobileAgent Configuration
+ # ----------------------------------
+ MobileAgent:
+ # Visual mode enables screenshot capture
+ VISUAL_MODE: True
+
+ # Agent name (must match @AgentRegistry.register)
+ AGENT_NAME: "MobileAgent"
+
+ # Prompt template paths (relative to project root)
+ APPAGENT_PROMPT: "ufo/prompts/third_party/mobile_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/mobile_agent_example.yaml"
+
+ # Optional: API prompt template (for custom tool descriptions)
+ # API_PROMPT: "ufo/prompts/third_party/mobile_agent_api.yaml"
+
+ # Agent introduction (shown to HostAgent for delegation)
+ INTRODUCTION: >
+ The MobileAgent controls Android and iOS mobile devices.
+ It can perform UI automation, tap/swipe gestures, type text,
+ launch apps, and capture screenshots. Use it for mobile
+ app testing, automation, and device control tasks.
+
+ # ----------------------------------
+ # LinuxAgent Configuration (Reference)
+ # ----------------------------------
+ LinuxAgent:
+ # Visual mode disabled for CLI-based agent
+ VISUAL_MODE: False
+
+ AGENT_NAME: "LinuxAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/linux_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/linux_agent_example.yaml"
+
+ INTRODUCTION: >
+ The LinuxAgent executes commands on Linux systems.
+ It can run bash commands, manage files, inspect processes,
+ configure services, and perform system administration tasks.
+ Use it for Linux server management and automation.
+```
+
+### Configuration Field Reference
+
+| Field | Type | Required | Description | Example |
+|-------|------|----------|-------------|---------|
+| `VISUAL_MODE` | boolean | ✅ Yes | Enable screenshot capture | `True` for mobile/GUI, `False` for CLI |
+| `AGENT_NAME` | string | ✅ Yes | Must match `@AgentRegistry.register` | `"MobileAgent"` |
+| `APPAGENT_PROMPT` | string | ✅ Yes | Path to main prompt template | `"ufo/prompts/third_party/mobile_agent.yaml"` |
+| `APPAGENT_EXAMPLE_PROMPT` | string | ✅ Yes | Path to example prompt template | `"ufo/prompts/third_party/mobile_agent_example.yaml"` |
+| `API_PROMPT` | string | Optional | Custom API descriptions | `"ufo/prompts/third_party/mobile_agent_api.yaml"` |
+| `INTRODUCTION` | string | ✅ Yes | Agent description for HostAgent | Multi-line string describing capabilities |
+
+!!! warning "Configuration Checklist"
+ - ✅ Add your agent to `ENABLED_THIRD_PARTY_AGENTS` list
+ - ✅ Create a config section with agent name as key
+ - ✅ Set `AGENT_NAME` to match `@AgentRegistry.register(agent_name="...")`
+ - ✅ Set `VISUAL_MODE` based on whether agent uses screenshots
+ - ✅ Create prompt template files before starting UFO
+ - ✅ Write descriptive `INTRODUCTION` for Galaxy orchestration
+
+---
+
+## Device Registration
+
+### File Location
+
+**Path**: `config/galaxy/devices.yaml`
+
+### Device Configuration Structure
+
+```yaml
+# Device Configuration - YAML Format
+# This configuration defines device instances for Galaxy constellation
+
+devices:
+ # ----------------------------------
+ # Mobile Agent Device 1 (Android)
+ # ----------------------------------
+ - device_id: "mobile_agent_1"
+
+ # WebSocket server URL for this device
+ server_url: "ws://192.168.1.100:5010/ws"
+
+ # Operating system
+ os: "android"
+
+ # Device capabilities (used by Galaxy for task routing)
+ capabilities:
+ - "ui_automation"
+ - "mobile_app_testing"
+ - "touch_gestures"
+ - "screenshot_capture"
+ - "android_apps"
+
+ # Custom metadata (accessible in prompts via {tips})
+ metadata:
+ device_model: "Google Pixel 6"
+ android_version: "14"
+ screen_resolution: "1080x2400"
+ device_location: "Test Lab A"
+ performance: "high"
+ description: "Primary Android test device"
+
+ # Custom instructions for the agent
+ tips: >
+ This device runs Android 14 on Google Pixel 6.
+ Screen resolution is 1080x2400 pixels.
+ All standard Android apps are installed.
+ For app testing, use package name format: com.example.app
+
+ # Auto-connect on startup
+ auto_connect: true
+
+ # Maximum connection retries
+ max_retries: 5
+
+ # ----------------------------------
+ # Mobile Agent Device 2 (iOS)
+ # ----------------------------------
+ - device_id: "mobile_agent_2"
+ server_url: "ws://192.168.1.101:5020/ws"
+ os: "ios"
+ capabilities:
+ - "ui_automation"
+ - "ios_app_testing"
+ - "xcuitest"
+ - "screenshot_capture"
+ metadata:
+ device_model: "iPhone 14 Pro"
+ ios_version: "17.2"
+ screen_resolution: "1179x2556"
+ device_location: "Test Lab B"
+ tips: >
+ iOS device using XCUITest for automation.
+ Use bundle ID format: com.company.AppName
+ auto_connect: true
+ max_retries: 5
+
+ # ----------------------------------
+ # Linux Agent (Server)
+ # ----------------------------------
+ - device_id: "linux_agent_1"
+ server_url: "ws://192.168.1.50:5001/ws"
+ os: "linux"
+ capabilities:
+ - "bash_commands"
+ - "server_management"
+ - "file_operations"
+ - "process_management"
+ metadata:
+ os_version: "Ubuntu 22.04"
+ hostname: "server-01"
+ logs_file_path: "/var/log/app/app.log"
+ dev_path: "/home/developer/projects/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ tips: >
+ Ubuntu 22.04 server.
+ Application logs: /var/log/app/app.log
+ Development path: /home/developer/projects/
+ Use 'sudo' for privileged operations.
+ auto_connect: true
+ max_retries: 10
+
+ # ----------------------------------
+ # Additional Device Template
+ # ----------------------------------
+ # - device_id: "your_device_id"
+ # server_url: "ws://HOST:PORT/ws"
+ # os: "android|ios|linux|windows"
+ # capabilities: ["capability1", "capability2"]
+ # metadata:
+ # key: "value"
+ # tips: "Custom instructions"
+ # auto_connect: true
+ # max_retries: 5
+```
+
+### Device Configuration Field Reference
+
+| Field | Type | Required | Description | Example |
+|-------|------|----------|-------------|---------|
+| `device_id` | string | ✅ Yes | Unique device identifier | `"mobile_agent_1"` |
+| `server_url` | string | ✅ Yes | WebSocket server URL | `"ws://192.168.1.100:5010/ws"` |
+| `os` | string | ✅ Yes | Operating system | `"android"`, `"ios"`, `"linux"`, `"windows"` |
+| `capabilities` | list[string] | ✅ Yes | Device capabilities | `["ui_automation", "app_testing"]` |
+| `metadata` | dict | Optional | Custom device metadata | `{device_model: "Pixel 6", ...}` |
+| `metadata.tips` | string | Recommended | Agent-specific instructions | Multi-line instructions |
+| `auto_connect` | boolean | Optional | Auto-connect on startup | `true` (default: `false`) |
+| `max_retries` | integer | Optional | Connection retry limit | `5` (default: `3`) |
+
+!!! tip "Device Configuration Best Practices"
+ - ✅ Use descriptive `device_id` (e.g., `mobile_android_pixel6_lab1`)
+ - ✅ Add comprehensive `capabilities` for Galaxy task routing
+ - ✅ Include device-specific details in `metadata.tips`
+ - ✅ Set `auto_connect: true` for production devices
+ - ✅ Use higher `max_retries` for unstable networks
+ - ✅ Include log paths, dev paths, and patterns in `metadata`
+
+---
+
+## Prompt Template Creation
+
+### Main Prompt Template
+
+**File**: `ufo/prompts/third_party/mobile_agent.yaml`
+
+```yaml
+version: 1.0
+
+system: |-
+ You are **MobileAgent**, the UFO framework's intelligent agent for mobile device automation.
+ Your goal is to **complete the entire User Request** by interacting with mobile devices using touch gestures, UI automation, and available APIs.
+
+ ## Capabilities
+ - **Tap** elements by coordinates or UI element properties
+ - **Swipe** gestures (up, down, left, right) for scrolling and navigation
+ - **Type** text into input fields
+ - **Launch** applications by package/bundle ID
+ - **Capture** screenshots for visual inspection
+ - **Extract** UI hierarchy (XML tree on Android, Accessibility tree on iOS)
+
+ ## Platform Support
+ - **Android**: Via ADB (Android Debug Bridge) and UI Automator
+ - **iOS**: Via XCTest framework and accessibility APIs
+
+ ## Task Status
+ After each step, decide the overall status of the **User Request**:
+ - `CONTINUE` — the request is partially complete; further actions are required.
+ - `FINISH` — the request has been successfully fulfilled; no further actions are needed.
+ - `FAIL` — the request cannot be completed due to errors, invalid UI state, or repeated ineffective attempts.
+
+ ## Response Format
+ Always respond **only** with valid JSON that strictly follows the structure below.
+ Your output must be directly parseable by `json.loads()` — no markdown, comments, or extra text.
+
+ Required JSON keys:
+
+ {{{{
+ "observation": str, "",
+ "thought": str, "",
+ "action": {{{{
+ "function": str, "",
+ "arguments": Dict[str, Any], "': ''}}, for the function. Use an empty dictionary if no arguments are needed.>",
+ "status": str, ""
+ }}}},
+ "plan": List[str], "",
+ "result": str, ""
+ }}}}
+
+ ## Operational Rules
+ - **Use screenshots and UI tree** to understand the current mobile UI state
+ - **Identify UI elements** by text, content-desc, resource-id, or coordinates
+ - **Plan actions carefully** - mobile UIs may have animations, loading states, or modal dialogs
+ - **Verify actions** - after tapping a button, check if the expected screen transition occurred
+ - **Handle edge cases** - pop-ups, permissions dialogs, network errors, app crashes
+ - Do **not** ask for user confirmation
+ - Avoid **destructive actions** (uninstall apps, factory reset) unless explicitly instructed
+ - Review previous actions to avoid repeating ineffective steps
+
+ ## Actions
+ - You are able to use the following APIs to interact with the mobile device.
+ {{apis}}
+
+ ## Examples
+ - Below are some examples for your reference. Only use them as guidance and do not copy them directly.
+ {{examples}}
+
+ ## Final Reminder
+ Please observe the previous steps, current screenshot, and UI tree carefully to decide your next action.
+ Think step-by-step, act carefully, and output only the required JSON structure.
+ Any invalid JSON or extra text will crash the system.
+
+
+user: |-
+ {{user_request}}
+ [See attached image]
+ {{installed_apps}}
+ {{current_controls}}
+ {{last_success_actions}}
+ {{prev_plan}}
+
+```
+
+### Example Prompt Template
+
+**File**: `ufo/prompts/third_party/mobile_agent_example.yaml`
+
+```yaml
+version: 1.0
+
+example1:
+ Request: |-
+ Open the Settings app and enable Wi-Fi.
+ Response:
+ observation: |-
+ The device is on the home screen. I can see app icons for various applications. The Settings app icon is visible in the UI tree with package name "com.android.settings".
+ thought: |-
+ To enable Wi-Fi, I need to first launch the Settings app. I'll use the launch_app function with the Settings package name.
+ action:
+ function: |-
+ launch_app
+ arguments:
+ {"package_name": "com.android.settings"}
+ status: |-
+ CONTINUE
+ plan:
+ - (1) Launch Settings app.
+ - (2) Navigate to Wi-Fi settings (may require scrolling).
+ - (3) Tap the Wi-Fi toggle to enable it.
+ - (4) Verify Wi-Fi is enabled and mark FINISH.
+ result: |-
+ Request in CONTINUE state. Successfully launched Settings app. Next steps: navigate to Wi-Fi settings and enable the toggle.
+
+example2:
+ Request: |-
+ Find and tap the "Login" button on the current screen.
+ Response:
+ observation: |-
+ The current screenshot shows a login screen with email and password input fields. There is a button with text "Login" visible near the bottom of the screen. According to the current screen controls list, the button is located at coordinates (540, 1650) with resource-id "com.example.app:id/login_button".
+ thought: |-
+ I can see the Login button in the controls list. I'll tap it using the coordinates provided.
+ action:
+ function: |-
+ tap_screen
+ arguments:
+ {"x": 540, "y": 1650}
+ status: |-
+ CONTINUE
+ plan:
+ - (1) Tap the Login button at (540, 1650).
+ - (2) Wait for navigation or error message.
+ - (3) Verify successful login or handle errors.
+ - (4) Mark FINISH if logged in successfully.
+ result: |-
+ Request in CONTINUE state. Tapped Login button at coordinates (540, 1650). Awaiting app response (loading indicator or next screen).
+
+example3:
+ Request: |-
+ Scroll down to view more content in the current list.
+ Response:
+ observation: |-
+ The screenshot shows a scrollable list view with multiple items. The UI tree indicates this is a RecyclerView with id "com.example.app:id/content_list". The list appears to have more content below the visible area based on the scroll indicators.
+ thought: |-
+ To scroll down, I'll perform a swipe gesture from bottom to top. I'll use coordinates in the middle of the screen, starting near the bottom and ending near the top.
+ action:
+ function: |-
+ swipe
+ arguments:
+ {"start_x": 540, "start_y": 1800, "end_x": 540, "end_y": 600, "duration_ms": 300}
+ status: |-
+ CONTINUE
+ plan:
+ - (1) Swipe down to scroll the list.
+ - (2) Capture new screenshot to see newly visible content.
+ - (3) Check if target content is now visible.
+ - (4) Repeat scroll if needed or mark FINISH.
+ result: |-
+ Request in CONTINUE state. Performed swipe gesture to scroll down. The list should now display content that was previously below the visible area.
+
+example4:
+ Request: |-
+ Type "testuser@example.com" into the email field.
+ Response:
+ observation: |-
+ The screenshot shows a login form with two input fields. The UI tree shows an EditText with hint "Email" at coordinates (540, 800) with resource-id "com.example.app:id/email_input". The field appears to be empty and ready for input.
+ thought: |-
+ I need to first tap the email field to focus it, then type the email address. I'll tap the field coordinates first.
+ action:
+ function: |-
+ tap_screen
+ arguments:
+ {"x": 540, "y": 800}
+ status: |-
+ CONTINUE
+ plan:
+ - (1) Tap the email input field to focus it.
+ - (2) Type the email address "testuser@example.com".
+ - (3) Verify text is entered correctly.
+ - (4) Mark FINISH.
+ result: |-
+ Request in CONTINUE state. Tapped email field at (540, 800) to focus it. Next step: type the email address into the focused field.
+```
+
+### Prompt Template Best Practices
+
+| Component | Best Practice | Example |
+|-----------|---------------|---------|
+| **System Prompt** | Comprehensive instructions | Capabilities, rules, response format |
+| **Response Format** | JSON schema with examples | `{"observation": ..., "thought": ..., "action": ...}` |
+| **API Placeholder** | Use `{apis}` for tool injection | Populated by prompter |
+| **Examples Placeholder** | Use `{examples}` for few-shot | Populated from example template |
+| **User Prompt** | Include all context | Request, screenshot, UI tree, history |
+| **Examples** | Cover common scenarios | Launch app, tap, swipe, type, scroll |
+
+!!! tip "Prompt Template Tips"
+ - ✅ Use `{{variable}}` for template variables (double braces)
+ - ✅ Provide clear JSON structure with type annotations
+ - ✅ Include platform-specific guidance (Android vs iOS)
+ - ✅ Add examples covering success and failure cases
+ - ✅ Reference screenshots and UI trees in prompts
+ - ✅ Emphasize JSON-only output (no markdown)
+ - ❌ Don't hardcode API descriptions (use `{apis}` placeholder)
+
+---
+
+## Step-by-Step Deployment
+
+### Deployment Checklist
+
+```mermaid
+graph TB
+ Start([Start Deployment]) --> Config[1. Configure Files]
+ Config --> Code[2. Implement Agent Code]
+ Code --> MCP[3. Create MCP Server]
+ MCP --> Test[4. Test Components]
+ Test --> Server[5. Start Agent Server]
+ Server --> MCPStart[6. Start MCP Server]
+ MCPStart --> Client[7. Start Device Client]
+ Client --> Verify[8. Verify Connection]
+ Verify --> Ready[9. Ready for Tasks]
+
+ style Start fill:#c8e6c9
+ style Ready fill:#c8e6c9
+ style Test fill:#fff3e0
+ style Verify fill:#fff3e0
+```
+
+### Step 1: Configure third_party.yaml
+
+```bash
+# Edit config/ufo/third_party.yaml
+nano config/ufo/third_party.yaml
+```
+
+Add your agent configuration:
+
+```yaml
+ENABLED_THIRD_PARTY_AGENTS: ["MobileAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ MobileAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "MobileAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/mobile_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/mobile_agent_example.yaml"
+ INTRODUCTION: "MobileAgent controls Android/iOS devices..."
+```
+
+### Step 2: Register Device in devices.yaml
+
+```bash
+# Edit config/galaxy/devices.yaml
+nano config/galaxy/devices.yaml
+```
+
+Add device registration:
+
+```yaml
+devices:
+ - device_id: "mobile_agent_1"
+ server_url: "ws://192.168.1.100:5010/ws"
+ os: "android"
+ capabilities: ["ui_automation", "app_testing"]
+ metadata:
+ device_model: "Pixel 6"
+ tips: "Android device for testing"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Step 3: Create Prompt Templates
+
+```bash
+# Create main prompt
+touch ufo/prompts/third_party/mobile_agent.yaml
+
+# Create example prompt
+touch ufo/prompts/third_party/mobile_agent_example.yaml
+```
+
+Copy content from [Prompt Template Creation](#prompt-template-creation) section.
+
+### Step 4: Implement Agent Components
+
+```bash
+# Agent class
+# Edit: ufo/agents/agent/customized_agent.py
+
+# Processor
+# Edit: ufo/agents/processors/customized/customized_agent_processor.py
+
+# States
+# Create: ufo/agents/states/mobile_agent_state.py
+
+# Strategies
+# Create: ufo/agents/processors/strategies/mobile_agent_strategy.py
+
+# Prompter
+# Create: ufo/prompter/customized/mobile_agent_prompter.py
+```
+
+### Step 5: Create MCP Server
+
+```bash
+# Create MCP server
+touch ufo/client/mcp/http_servers/mobile_mcp_server.py
+```
+
+Implement MCP server from [Part 2: MCP Server Development](mcp_server.md).
+
+### Step 6: Test Components
+
+```bash
+# Run unit tests
+pytest tests/unit/test_mobile_agent.py
+
+# Run integration tests
+pytest tests/integration/test_mobile_agent_integration.py
+```
+
+### Step 7: Start Agent Server
+
+```bash
+# Terminal 1: Start UFO agent server
+python -m ufo.server.app --port 5010
+```
+
+Expected output:
+```
+========================================
+UFO Agent Server
+========================================
+INFO: Server starting on 0.0.0.0:5010
+INFO: Registered agents: MobileAgent, LinuxAgent
+INFO: WebSocket endpoint: ws://localhost:5010/ws
+========================================
+```
+
+### Step 8: Start MCP Server
+
+```bash
+# Terminal 2: Start MCP server
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --port 8020 \
+ --platform android
+```
+
+Expected output:
+```
+==================================================
+UFO Mobile MCP Server (Android)
+Mobile device automation via Model Context Protocol
+Running on localhost:8020
+==================================================
+INFO: Server started successfully
+INFO: Registered tools: tap_screen, swipe, type_text, ...
+```
+
+### Step 9: Start Device Client
+
+```bash
+# Terminal 3: Start device client
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5010/ws \
+ --client-id mobile_agent_1 \
+ --platform android \
+ --log-level INFO
+```
+
+Expected output:
+```
+INFO: Platform: android
+INFO: UFO Client initialized: mobile_agent_1
+INFO: Connecting to ws://localhost:5010/ws (attempt 1/5)
+INFO: ✅ Client registered: mobile_agent_1
+INFO: Starting message handling loop
+```
+
+### Step 10: Verify Connection
+
+```bash
+# Check agent server logs
+# Should show: "Client mobile_agent_1 registered"
+
+# Check client logs
+# Should show: "✅ Client registered: mobile_agent_1"
+
+# Test basic command (optional)
+curl -X POST http://localhost:5010/api/v1/task \
+ -H "Content-Type: application/json" \
+ -d '{
+ "request": "Tap at coordinates (500, 1000)",
+ "device_id": "mobile_agent_1"
+ }'
+```
+
+---
+
+## Galaxy Multi-Device Integration
+
+### Constellation Configuration
+
+**File**: `config/galaxy/constellation.yaml`
+
+```yaml
+# Galaxy Constellation Configuration
+# Multi-device orchestration settings
+
+constellation:
+ # Constellation ID (unique identifier)
+ constellation_id: "mobile_test_constellation"
+
+ # Heartbeat interval (seconds)
+ heartbeat_interval: 30
+
+ # Task timeout (seconds)
+ task_timeout: 300
+
+ # Retry strategy
+ max_task_retries: 3
+ retry_delay: 5
+
+ # Load balancing
+ load_balancing_strategy: "round_robin" # Options: round_robin, least_loaded, capability_based
+
+ # Device selection
+ device_selection_strategy: "capability_match" # Options: capability_match, explicit, random
+
+# Task routing rules
+routing_rules:
+ - task_type: "mobile_app_testing"
+ preferred_devices: ["mobile_agent_1", "mobile_agent_2"]
+ required_capabilities: ["ui_automation"]
+
+ - task_type: "server_management"
+ preferred_devices: ["linux_agent_1", "linux_agent_2"]
+ required_capabilities: ["bash_commands"]
+```
+
+### Galaxy Deployment Example
+
+```bash
+# ========================================
+# Start Galaxy with Multiple Devices
+# ========================================
+
+# Terminal 1: Galaxy orchestrator
+python -m galaxy \
+ --constellation-id mobile_test_constellation \
+ --config config/galaxy/constellation.yaml
+
+# Terminal 2-4: Device clients
+python -m ufo.client.client --ws --ws-server ws://localhost:5010/ws --client-id mobile_agent_1 --platform android &
+python -m ufo.client.client --ws --ws-server ws://localhost:5011/ws --client-id mobile_agent_2 --platform ios &
+python -m ufo.client.client --ws --ws-server ws://localhost:5001/ws --client-id linux_agent_1 --platform linux &
+
+# Terminal 5: Submit multi-device task
+python -m galaxy.client.submit_task \
+ --constellation mobile_test_constellation \
+ --request "Test app on both Android and iOS devices" \
+ --devices mobile_agent_1,mobile_agent_2
+```
+
+---
+
+## Common Configuration Patterns
+
+### Pattern 1: Development vs Production
+
+```yaml
+# Development configuration
+ENABLED_THIRD_PARTY_AGENTS: ["MobileAgent"]
+THIRD_PARTY_AGENT_CONFIG:
+ MobileAgent:
+ VISUAL_MODE: True
+ # Use local test device
+
+# config/galaxy/devices.yaml (dev)
+devices:
+ - device_id: "mobile_dev"
+ server_url: "ws://localhost:5010/ws"
+ auto_connect: false # Manual connection for debugging
+
+---
+
+# Production configuration
+ENABLED_THIRD_PARTY_AGENTS: ["MobileAgent", "LinuxAgent"]
+THIRD_PARTY_AGENT_CONFIG:
+ MobileAgent:
+ VISUAL_MODE: True
+ # Use production device farm
+
+# config/galaxy/devices.yaml (prod)
+devices:
+ - device_id: "mobile_prod_01"
+ server_url: "ws://192.168.1.100:5010/ws"
+ auto_connect: true # Auto-connect for reliability
+ max_retries: 10
+```
+
+### Pattern 2: Multi-Platform Support
+
+```yaml
+# Support both Android and iOS with same agent
+THIRD_PARTY_AGENT_CONFIG:
+ MobileAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "MobileAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/mobile_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/mobile_agent_example.yaml"
+
+# Separate device registrations
+devices:
+ - device_id: "android_device"
+ os: "android"
+ capabilities: ["ui_automation", "android_apps"]
+
+ - device_id: "ios_device"
+ os: "ios"
+ capabilities: ["ui_automation", "ios_apps", "xcuitest"]
+```
+
+### Pattern 3: Device Pool Management
+
+```yaml
+# Multiple devices of same type for load balancing
+devices:
+ - device_id: "android_pool_1"
+ server_url: "ws://192.168.1.101:5010/ws"
+ os: "android"
+ capabilities: ["ui_automation"]
+ metadata:
+ pool: "android_test_farm"
+ device_index: 1
+
+ - device_id: "android_pool_2"
+ server_url: "ws://192.168.1.102:5010/ws"
+ os: "android"
+ capabilities: ["ui_automation"]
+ metadata:
+ pool: "android_test_farm"
+ device_index: 2
+```
+
+---
+
+## Summary
+
+**What You've Configured**:
+
+- ✅ Third-party agent registration in `third_party.yaml`
+- ✅ Device registration in `devices.yaml`
+- ✅ Main and example prompt templates
+- ✅ Step-by-step deployment procedure
+- ✅ Galaxy multi-device integration (optional)
+
+**Key Takeaways**:
+
+| Configuration | Purpose | File |
+|---------------|---------|------|
+| **Agent Registration** | Enable agent in UFO | `config/ufo/third_party.yaml` |
+| **Device Registry** | Register device instances | `config/galaxy/devices.yaml` |
+| **Prompt Templates** | Define LLM prompts | `ufo/prompts/third_party/*.yaml` |
+| **Deployment** | Start servers and clients | Terminal commands |
+
+---
+
+## Next Steps
+
+**Continue to**: [Part 5: Testing & Debugging →](testing.md)
+
+Learn comprehensive testing strategies, debugging techniques, and common issue resolution.
+
+---
+
+## Related Documentation
+
+- **[Galaxy Overview](../../galaxy/overview.md)** - Multi-device orchestration
+- **[Third-Party Agents](../creating_third_party_agents.md)** - Related tutorial
+- **[Agent Architecture](../../infrastructure/agents/overview.md)** - Agent design patterns
+
+---
+
+**Previous**: [← Part 3: Client Setup](client_setup.md)
+**Next**: [Part 5: Testing & Debugging →](testing.md)
diff --git a/documents/docs/tutorials/creating_device_agent/core_components.md b/documents/docs/tutorials/creating_device_agent/core_components.md
new file mode 100644
index 000000000..5b0c3bf15
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/core_components.md
@@ -0,0 +1,1649 @@
+# Part 1: Core Components - Server-Side Implementation
+
+This tutorial covers the **server-side components** of your device agent. You'll learn to implement the Agent Class, Processor, State Manager, Strategies, and Prompter using **LinuxAgent** as reference.
+
+---
+
+## Table of Contents
+
+1. [Component Overview](#component-overview)
+2. [Step 1: Agent Class](#step-1-agent-class)
+3. [Step 2: Processor](#step-2-processor)
+4. [Step 3: State Manager](#step-3-state-manager)
+5. [Step 4: Processing Strategies](#step-4-processing-strategies)
+6. [Step 5: Prompter](#step-5-prompter)
+7. [Testing Your Implementation](#testing-your-implementation)
+
+---
+
+## Component Overview
+
+### What You'll Build
+
+```mermaid
+graph TB
+ subgraph "Server-Side Components"
+ A[MobileAgent Class Agent Definition]
+ B[MobileAgentProcessor Strategy Orchestration]
+ C[MobileAgentStateManager FSM Control]
+ D[Strategies LLM & Action Logic]
+ E[MobileAgentPrompter Prompt Construction]
+
+ A --> B
+ A --> C
+ B --> D
+ A --> E
+ end
+
+ style A fill:#c8e6c9
+ style B fill:#fff3e0
+ style C fill:#e1f5ff
+ style D fill:#f3e5f5
+ style E fill:#ffe1e1
+```
+
+**Component Responsibilities**:
+
+| Component | File | Purpose | Example (LinuxAgent) |
+|-----------|------|---------|---------------------|
+| **Agent Class** | `customized_agent.py` | Agent definition, initialization | `LinuxAgent` class |
+| **Processor** | `customized_agent_processor.py` | Strategy orchestration | `LinuxAgentProcessor` |
+| **State Manager** | `linux_agent_state.py` | FSM states and transitions | `LinuxAgentStateManager` |
+| **Strategies** | `linux_agent_strategy.py` | LLM and action execution logic | `LinuxLLMInteractionStrategy` |
+| **Prompter** | `linux_agent_prompter.py` | Prompt construction for LLM | `LinuxAgentPrompter` |
+
+---
+
+## Step 1: Agent Class
+
+### Understanding the Agent Class
+
+The **Agent Class** is the entry point for your device agent. It:
+
+- Inherits from `CustomizedAgent` (which extends `AppAgent`)
+- Registers with `AgentRegistry` for automatic discovery
+- Initializes prompter and default state
+- Maintains blackboard for multi-agent coordination
+
+### LinuxAgent Implementation
+
+```python
+# File: ufo/agents/agent/customized_agent.py
+
+from ufo.agents.agent.app_agent import AppAgent
+from ufo.agents.agent.basic import AgentRegistry
+from ufo.agents.memory.blackboard import Blackboard
+from ufo.agents.processors.customized.customized_agent_processor import (
+ LinuxAgentProcessor,
+)
+from ufo.agents.states.linux_agent_state import ContinueLinuxAgentState
+from ufo.prompter.customized.linux_agent_prompter import LinuxAgentPrompter
+
+
+@AgentRegistry.register(
+ agent_name="LinuxAgent", # Unique identifier
+ third_party=True, # Mark as third-party/device agent
+ processor_cls=LinuxAgentProcessor # Link to processor class
+)
+class LinuxAgent(CustomizedAgent):
+ """
+ LinuxAgent is a specialized agent that interacts with Linux systems.
+ Executes shell commands via MCP and manages Linux device tasks.
+ """
+
+ def __init__(
+ self,
+ name: str,
+ main_prompt: str,
+ example_prompt: str,
+ ) -> None:
+ """
+ Initialize the LinuxAgent.
+
+ :param name: The name of the agent instance
+ :param main_prompt: Path to main prompt template YAML
+ :param example_prompt: Path to example prompt template YAML
+ """
+ # Call parent constructor with None for process/app (not GUI-based)
+ super().__init__(
+ name=name,
+ main_prompt=main_prompt,
+ example_prompt=example_prompt,
+ process_name=None, # No Windows process for Linux
+ app_root_name=None, # No Windows app for Linux
+ is_visual=None, # Typically False for CLI-based agents
+ )
+
+ # Initialize blackboard for multi-agent coordination
+ self._blackboard = Blackboard()
+
+ # Set default state (ContinueLinuxAgentState)
+ self.set_state(self.default_state)
+
+ # Flag to track context provision
+ self._context_provision_executed = False
+
+ # Logger for debugging
+ self.logger = logging.getLogger(__name__)
+ self.logger.info(
+ f"LinuxAgent initialized with prompts: {main_prompt}, {example_prompt}"
+ )
+
+ def get_prompter(
+ self, is_visual: bool, main_prompt: str, example_prompt: str
+ ) -> LinuxAgentPrompter:
+ """
+ Get the prompter for the agent.
+
+ :param is_visual: Whether the agent uses visual mode (screenshots)
+ :param main_prompt: Path to main prompt template
+ :param example_prompt: Path to example prompt template
+ :return: LinuxAgentPrompter instance
+ """
+ return LinuxAgentPrompter(main_prompt, example_prompt)
+
+ @property
+ def default_state(self) -> ContinueLinuxAgentState:
+ """
+ Get the default state for LinuxAgent.
+
+ :return: ContinueLinuxAgentState instance
+ """
+ return ContinueLinuxAgentState()
+
+ @property
+ def blackboard(self) -> Blackboard:
+ """
+ Get the blackboard for multi-agent coordination.
+
+ :return: Blackboard instance
+ """
+ return self._blackboard
+```
+
+### Creating Your MobileAgent Class
+
+Now let's create `MobileAgent` following the same pattern:
+
+```python
+# File: ufo/agents/agent/customized_agent.py
+
+import logging
+from ufo.agents.agent.app_agent import AppAgent
+from ufo.agents.agent.basic import AgentRegistry
+from ufo.agents.memory.blackboard import Blackboard
+from ufo.agents.processors.customized.customized_agent_processor import (
+ MobileAgentProcessor, # We'll create this in Step 2
+)
+from ufo.agents.states.mobile_agent_state import ContinueMobileAgentState
+from ufo.prompter.customized.mobile_agent_prompter import MobileAgentPrompter
+
+
+@AgentRegistry.register(
+ agent_name="MobileAgent",
+ third_party=True,
+ processor_cls=MobileAgentProcessor
+)
+class MobileAgent(CustomizedAgent):
+ """
+ MobileAgent controls Android/iOS mobile devices.
+ Supports UI automation, app testing, and mobile-specific operations.
+ """
+
+ def __init__(
+ self,
+ name: str,
+ main_prompt: str,
+ example_prompt: str,
+ platform: str = "android", # Platform: "android" or "ios"
+ ) -> None:
+ """
+ Initialize the MobileAgent.
+
+ :param name: Agent instance name
+ :param main_prompt: Main prompt template path
+ :param example_prompt: Example prompt template path
+ :param platform: Mobile platform ("android" or "ios")
+ """
+ super().__init__(
+ name=name,
+ main_prompt=main_prompt,
+ example_prompt=example_prompt,
+ process_name=None,
+ app_root_name=None,
+ is_visual=True, # Mobile agents typically use screenshots
+ )
+
+ # Store platform information
+ self._platform = platform
+
+ # Initialize blackboard
+ self._blackboard = Blackboard()
+
+ # Set default state
+ self.set_state(self.default_state)
+
+ # Logger
+ self.logger = logging.getLogger(__name__)
+ self.logger.info(
+ f"MobileAgent initialized for platform: {platform}"
+ )
+
+ def get_prompter(
+ self, is_visual: bool, main_prompt: str, example_prompt: str
+ ) -> MobileAgentPrompter:
+ """Get the prompter for MobileAgent."""
+ return MobileAgentPrompter(main_prompt, example_prompt)
+
+ @property
+ def default_state(self) -> ContinueMobileAgentState:
+ """Get the default state."""
+ return ContinueMobileAgentState()
+
+ @property
+ def blackboard(self) -> Blackboard:
+ """Get the blackboard."""
+ return self._blackboard
+
+ @property
+ def platform(self) -> str:
+ """Get the mobile platform (android/ios)."""
+ return self._platform
+```
+
+### Key Differences from LinuxAgent
+
+| Aspect | LinuxAgent | MobileAgent |
+|--------|-----------|-------------|
+| **is_visual** | `None` (no screenshots) | `True` (UI screenshots needed) |
+| **Platform Tracking** | Not needed | `self._platform` stores "android"/"ios" |
+| **Processor** | `LinuxAgentProcessor` | `MobileAgentProcessor` |
+| **Prompter** | `LinuxAgentPrompter` | `MobileAgentPrompter` |
+| **Default State** | `ContinueLinuxAgentState` | `ContinueMobileAgentState` |
+
+!!! tip "Agent Class Best Practices"
+ - ✅ Always call `super().__init__()` first
+ - ✅ Initialize blackboard for multi-agent coordination
+ - ✅ Set `is_visual=True` if your agent uses screenshots
+ - ✅ Use meaningful logger messages for debugging
+ - ✅ Store platform-specific metadata as properties
+ - ✅ Keep initialization logic minimal (delegate to processor)
+
+---
+
+## Step 2: Processor
+
+### Understanding the Processor
+
+The **Processor** orchestrates the execution pipeline through modular strategies. It:
+
+- Manages strategy execution across 4 phases
+- Configures middleware (logging, error handling, metrics)
+- Validates strategy dependencies
+- Finalizes processing context
+
+### Four Processing Phases
+
+```mermaid
+graph LR
+ A[DATA_COLLECTION Screenshots, UI Tree] --> B[LLM_INTERACTION Prompt → LLM → Response]
+ B --> C[ACTION_EXECUTION Execute Commands]
+ C --> D[MEMORY_UPDATE Update Context]
+
+ style A fill:#e3f2fd
+ style B fill:#fff3e0
+ style C fill:#f3e5f5
+ style D fill:#e8f5e9
+```
+
+### LinuxAgentProcessor Implementation
+
+```python
+# File: ufo/agents/processors/customized/customized_agent_processor.py
+
+from typing import TYPE_CHECKING
+from ufo.agents.processors.app_agent_processor import AppAgentProcessor
+from ufo.agents.processors.context.processing_context import ProcessingPhase
+from ufo.agents.processors.strategies.app_agent_processing_strategy import (
+ AppMemoryUpdateStrategy,
+)
+from ufo.agents.processors.strategies.linux_agent_strategy import (
+ LinuxActionExecutionStrategy,
+ LinuxLLMInteractionStrategy,
+ LinuxLoggingMiddleware,
+)
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import LinuxAgent
+
+
+class LinuxAgentProcessor(CustomizedProcessor):
+ """
+ Processor for Linux MCP Agent.
+
+ Manages the execution pipeline with strategies for:
+ - LLM Interaction: Generate shell commands
+ - Action Execution: Execute commands via Linux MCP
+ - Memory Update: Update agent memory and blackboard
+ """
+
+ def _setup_strategies(self) -> None:
+ """
+ Setup processing strategies for LinuxAgent.
+
+ Note: No DATA_COLLECTION strategy since LinuxAgent doesn't
+ use screenshots (relies on shell command output).
+ """
+
+ # Phase 2: LLM Interaction
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ LinuxLLMInteractionStrategy(
+ fail_fast=True # LLM failures should halt processing
+ )
+ )
+
+ # Phase 3: Action Execution
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ LinuxActionExecutionStrategy(
+ fail_fast=False # Continue on action failures
+ )
+ )
+
+ # Phase 4: Memory Update
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(
+ fail_fast=False # Memory failures shouldn't stop agent
+ )
+ )
+
+ def _setup_middleware(self) -> None:
+ """
+ Setup middleware pipeline for LinuxAgent.
+
+ Uses custom logging middleware for Linux-specific context.
+ """
+ self.middleware_chain = [LinuxLoggingMiddleware()]
+
+ def _finalize_processing_context(
+ self, processing_context: ProcessingContext
+ ) -> None:
+ """
+ Finalize processing context by updating global context.
+
+ :param processing_context: The processing context to finalize
+ """
+ super()._finalize_processing_context(processing_context)
+
+ try:
+ # Extract result from local context
+ result = processing_context.get_local("result")
+ if result:
+ # Update global context with result
+ self.global_context.set(ContextNames.ROUND_RESULT, result)
+ except Exception as e:
+ self.logger.warning(
+ f"Failed to update ContextNames from results: {e}"
+ )
+```
+
+### Creating MobileAgentProcessor
+
+```python
+# File: ufo/agents/processors/customized/customized_agent_processor.py
+
+from ufo.agents.processors.strategies.customized_agent_processing_strategy import (
+ CustomizedScreenshotCaptureStrategy,
+)
+from ufo.agents.processors.strategies.mobile_agent_strategy import (
+ MobileActionExecutionStrategy,
+ MobileLLMInteractionStrategy,
+ MobileLoggingMiddleware,
+)
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import MobileAgent
+
+
+class MobileAgentProcessor(CustomizedProcessor):
+ """
+ Processor for MobileAgent.
+
+ Manages execution pipeline with mobile-specific strategies:
+ - Data Collection: Screenshots and UI hierarchy
+ - LLM Interaction: Mobile UI understanding
+ - Action Execution: Touch gestures, swipes, taps
+ - Memory Update: Context tracking
+ """
+
+ def _setup_strategies(self) -> None:
+ """Setup processing strategies for MobileAgent."""
+
+ # Phase 1: Data Collection (screenshots + UI tree)
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = (
+ CustomizedScreenshotCaptureStrategy(
+ fail_fast=True # Stop if screenshot capture fails
+ )
+ )
+
+ # Phase 2: LLM Interaction (mobile UI understanding)
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ MobileLLMInteractionStrategy(
+ fail_fast=True # LLM failures should halt
+ )
+ )
+
+ # Phase 3: Action Execution (touch gestures)
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ MobileActionExecutionStrategy(
+ fail_fast=False # Retry on action failures
+ )
+ )
+
+ # Phase 4: Memory Update
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(
+ fail_fast=False # Don't stop on memory errors
+ )
+ )
+
+ def _setup_middleware(self) -> None:
+ """Setup middleware for mobile-specific logging."""
+ self.middleware_chain = [MobileLoggingMiddleware()]
+
+ def _finalize_processing_context(
+ self, processing_context: ProcessingContext
+ ) -> None:
+ """Finalize context with mobile-specific results."""
+ super()._finalize_processing_context(processing_context)
+
+ try:
+ # Extract mobile-specific results
+ result = processing_context.get_local("result")
+ ui_state = processing_context.get_local("ui_state")
+
+ if result:
+ self.global_context.set(ContextNames.ROUND_RESULT, result)
+ if ui_state:
+ # Store UI state for next round
+ self.global_context.set("MOBILE_UI_STATE", ui_state)
+
+ except Exception as e:
+ self.logger.warning(f"Failed to finalize context: {e}")
+```
+
+### Processor Configuration Guide
+
+| Phase | Required? | When to Use | Example Strategies |
+|-------|-----------|-------------|-------------------|
+| **DATA_COLLECTION** | Optional | Agent needs observations (screenshots, sensor data) | `CustomizedScreenshotCaptureStrategy` |
+| **LLM_INTERACTION** | **Required** | All agents need LLM reasoning | `LinuxLLMInteractionStrategy`, `MobileLLMInteractionStrategy` |
+| **ACTION_EXECUTION** | **Required** | All agents need command execution | `LinuxActionExecutionStrategy`, `MobileActionExecutionStrategy` |
+| **MEMORY_UPDATE** | Recommended | Track agent history and context | `AppMemoryUpdateStrategy` |
+
+!!! warning "Common Processor Mistakes"
+ ❌ **Don't** skip `_setup_strategies()` - processor won't execute
+ ❌ **Don't** use `fail_fast=True` for all strategies - agent becomes brittle
+ ❌ **Don't** forget to call `super()._finalize_processing_context()` - context won't propagate
+ ✅ **Do** use `fail_fast=True` for LLM_INTERACTION - ensures valid responses
+ ✅ **Do** use `fail_fast=False` for ACTION_EXECUTION - allows retry logic
+ ✅ **Do** add custom middleware for debugging and logging
+
+---
+
+## Step 3: State Manager
+
+### Understanding State Manager
+
+The **State Manager** implements the Finite State Machine (FSM) that controls agent lifecycle. It defines:
+
+- **States**: `CONTINUE`, `FINISH`, `FAIL`, etc.
+- **Transitions**: Rules for moving between states
+- **State Handlers**: Logic executed in each state
+
+### LinuxAgent States
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: Agent Started
+
+ CONTINUE --> CONTINUE: Processing
+ CONTINUE --> FINISH: Task Complete
+ CONTINUE --> FAIL: Error Occurred
+
+ FAIL --> FINISH: Terminal State
+
+ FINISH --> [*]
+
+ note right of CONTINUE
+ Calls agent.process(context)
+ Executes processor strategies
+ end note
+
+ note right of FINISH
+ Task completed successfully
+ Returns control to orchestrator
+ end note
+
+ note right of FAIL
+ Error occurred
+ Transitions to FINISH
+ end note
+```
+
+### LinuxAgent State Implementation
+
+```python
+# File: ufo/agents/states/linux_agent_state.py
+
+from enum import Enum
+from typing import TYPE_CHECKING, Dict, Optional, Type
+from ufo.agents.states.basic import AgentState, AgentStateManager
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import LinuxAgent
+
+
+class LinuxAgentStatus(Enum):
+ """Status enum for LinuxAgent states."""
+ FINISH = "FINISH"
+ CONTINUE = "CONTINUE"
+ FAIL = "FAIL"
+
+
+class LinuxAgentStateManager(AgentStateManager):
+ """
+ State manager for LinuxAgent.
+ Manages state registration and retrieval.
+ """
+
+ _state_mapping: Dict[str, Type[LinuxAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ """Return the none state."""
+ return NoneLinuxAgentState()
+
+
+class LinuxAgentState(AgentState):
+ """
+ Abstract base class for LinuxAgent states.
+ All LinuxAgent states inherit from this class.
+ """
+
+ async def handle(
+ self, agent: "LinuxAgent", context: Optional["Context"] = None
+ ) -> None:
+ """
+ Handle the agent for the current step.
+
+ :param agent: The LinuxAgent instance
+ :param context: The global context
+ """
+ pass
+
+ @classmethod
+ def agent_class(cls) -> Type[LinuxAgent]:
+ """Return the agent class this state belongs to."""
+ from ufo.agents.agent.customized_agent import LinuxAgent
+ return LinuxAgent
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ """
+ Get the agent for the next step.
+ Default: return same agent (no delegation).
+
+ :param agent: Current agent
+ :return: Next agent (typically same agent for device agents)
+ """
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ """
+ Determine next state based on agent status.
+
+ :param agent: Current agent
+ :return: Next state instance
+ """
+ status = agent.status
+ state = LinuxAgentStateManager().get_state(status)
+ return state
+
+ def is_round_end(self) -> bool:
+ """Check if the round ends."""
+ return False
+
+
+@LinuxAgentStateManager.register
+class ContinueLinuxAgentState(LinuxAgentState):
+ """
+ CONTINUE state: Normal execution state.
+ Calls agent.process() to execute processor strategies.
+ """
+
+ async def handle(
+ self, agent: "LinuxAgent", context: Optional["Context"] = None
+ ) -> None:
+ """
+ Handle CONTINUE state by executing processor.
+
+ :param agent: LinuxAgent instance
+ :param context: Global context
+ """
+ await agent.process(context)
+
+ def is_subtask_end(self) -> bool:
+ """Subtask does not end in CONTINUE state."""
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ """State name matching LinuxAgentStatus enum."""
+ return LinuxAgentStatus.CONTINUE.value
+
+
+@LinuxAgentStateManager.register
+class FinishLinuxAgentState(LinuxAgentState):
+ """
+ FINISH state: Terminal state indicating task completion.
+ """
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ """Return same agent (no further delegation)."""
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ """Stay in FINISH state."""
+ return FinishLinuxAgentState()
+
+ def is_subtask_end(self) -> bool:
+ """Subtask ends in FINISH state."""
+ return True
+
+ def is_round_end(self) -> bool:
+ """Round ends in FINISH state."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """State name."""
+ return LinuxAgentStatus.FINISH.value
+
+
+@LinuxAgentStateManager.register
+class FailLinuxAgentState(LinuxAgentState):
+ """
+ FAIL state: Error occurred, transition to FINISH.
+ """
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ """Return same agent."""
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ """Transition to FINISH after failure."""
+ return FinishLinuxAgentState()
+
+ def is_round_end(self) -> bool:
+ """Round ends after failure."""
+ return True
+
+ def is_subtask_end(self) -> bool:
+ """Subtask ends after failure."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """State name."""
+ return LinuxAgentStatus.FAIL.value
+
+
+@LinuxAgentStateManager.register
+class NoneLinuxAgentState(LinuxAgentState):
+ """
+ NONE state: Initial/default state, transitions to FINISH.
+ """
+
+ def next_agent(self, agent: "LinuxAgent") -> "LinuxAgent":
+ """Return same agent."""
+ return agent
+
+ def next_state(self, agent: "LinuxAgent") -> LinuxAgentState:
+ """Transition to FINISH."""
+ return FinishLinuxAgentState()
+
+ def is_subtask_end(self) -> bool:
+ """Subtask ends in NONE state."""
+ return True
+
+ def is_round_end(self) -> bool:
+ """Round ends in NONE state."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """Empty name for NONE state."""
+ return ""
+```
+
+### Creating MobileAgent States
+
+```python
+# File: ufo/agents/states/mobile_agent_state.py
+
+from enum import Enum
+from typing import TYPE_CHECKING, Dict, Optional, Type
+from ufo.agents.states.basic import AgentState, AgentStateManager
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import MobileAgent
+
+
+class MobileAgentStatus(Enum):
+ """Status enum for MobileAgent states."""
+ FINISH = "FINISH"
+ CONTINUE = "CONTINUE"
+ FAIL = "FAIL"
+ WAITING = "WAITING" # Waiting for app to load
+
+
+class MobileAgentStateManager(AgentStateManager):
+ """State manager for MobileAgent."""
+
+ _state_mapping: Dict[str, Type[MobileAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ """Return the none state."""
+ return NoneMobileAgentState()
+
+
+class MobileAgentState(AgentState):
+ """Abstract base class for MobileAgent states."""
+
+ async def handle(
+ self, agent: "MobileAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Handle the agent for the current step."""
+ pass
+
+ @classmethod
+ def agent_class(cls) -> Type[MobileAgent]:
+ """Return the agent class."""
+ from ufo.agents.agent.customized_agent import MobileAgent
+ return MobileAgent
+
+ def next_agent(self, agent: "MobileAgent") -> "MobileAgent":
+ """Get next agent (same agent for device agents)."""
+ return agent
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ """Determine next state based on status."""
+ status = agent.status
+ state = MobileAgentStateManager().get_state(status)
+ return state
+
+ def is_round_end(self) -> bool:
+ """Check if round ends."""
+ return False
+
+
+@MobileAgentStateManager.register
+class ContinueMobileAgentState(MobileAgentState):
+ """CONTINUE state for MobileAgent."""
+
+ async def handle(
+ self, agent: "MobileAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Execute processor strategies."""
+ await agent.process(context)
+
+ def is_subtask_end(self) -> bool:
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.CONTINUE.value
+
+
+@MobileAgentStateManager.register
+class FinishMobileAgentState(MobileAgentState):
+ """FINISH state for MobileAgent."""
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ return FinishMobileAgentState()
+
+ def is_subtask_end(self) -> bool:
+ return True
+
+ def is_round_end(self) -> bool:
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.FINISH.value
+
+
+@MobileAgentStateManager.register
+class FailMobileAgentState(MobileAgentState):
+ """FAIL state for MobileAgent."""
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ return FinishMobileAgentState()
+
+ def is_round_end(self) -> bool:
+ return True
+
+ def is_subtask_end(self) -> bool:
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.FAIL.value
+
+
+@MobileAgentStateManager.register
+class WaitingMobileAgentState(MobileAgentState):
+ """
+ WAITING state: Wait for app to load or animation to complete.
+ """
+
+ async def handle(
+ self, agent: "MobileAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Wait and then transition to CONTINUE."""
+ import asyncio
+ await asyncio.sleep(2) # Wait 2 seconds
+ agent.status = MobileAgentStatus.CONTINUE.value
+
+ def is_subtask_end(self) -> bool:
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ return MobileAgentStatus.WAITING.value
+
+
+@MobileAgentStateManager.register
+class NoneMobileAgentState(MobileAgentState):
+ """NONE state for MobileAgent."""
+
+ def next_state(self, agent: "MobileAgent") -> MobileAgentState:
+ return FinishMobileAgentState()
+
+ def is_subtask_end(self) -> bool:
+ return True
+
+ def is_round_end(self) -> bool:
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ return ""
+```
+
+### State Design Guidelines
+
+| State | When to Use | Required Methods | Terminal? |
+|-------|-------------|------------------|-----------|
+| **CONTINUE** | Normal execution | `handle()` calls `agent.process()` | No |
+| **FINISH** | Task complete | `is_subtask_end()` → `True` | Yes |
+| **FAIL** | Error occurred | `next_state()` → `FINISH` | Yes |
+| **WAITING** | Async delays | `handle()` with `await asyncio.sleep()` | No |
+| **NONE** | Default/initial | `next_state()` → `FINISH` | Yes |
+
+!!! tip "State Design Best Practices"
+ - ✅ Always register states with `@StateManager.register`
+ - ✅ Implement `name()` to match status enum value
+ - ✅ Call `agent.process()` in `CONTINUE.handle()`
+ - ✅ Set `is_round_end()` = `True` for terminal states
+ - ✅ Transition `FAIL` → `FINISH` for graceful termination
+ - ❌ Don't create too many states - keep it simple
+ - ❌ Don't call processor directly - use `agent.process()`
+
+---
+
+## Step 4: Processing Strategies
+
+### Understanding Strategies
+
+**Strategies** are modular execution units that implement specific phases of the processing pipeline. Each strategy:
+
+- Executes independently within its phase
+- Declares dependencies using `@depends_on` decorator
+- Provides results using `@provides` decorator
+- Returns `ProcessingResult` with success/failure status
+
+### LinuxAgent Strategies
+
+#### Strategy 1: LLM Interaction
+
+```python
+# File: ufo/agents/processors/strategies/linux_agent_strategy.py
+
+from typing import TYPE_CHECKING
+from ufo.agents.processors.strategies.app_agent_processing_strategy import (
+ AppLLMInteractionStrategy,
+)
+from ufo.agents.processors.context.processing_context import (
+ ProcessingContext,
+ ProcessingResult,
+ ProcessingPhase,
+)
+from ufo.agents.processors.core.strategy_dependency import depends_on, provides
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import LinuxAgent
+
+
+@depends_on("request") # Requires "request" in context
+@provides(
+ "parsed_response", # Provides LLM parsed response
+ "response_text", # Raw LLM response text
+ "llm_cost", # LLM API cost
+ "prompt_message", # Prompt sent to LLM
+ "action", # Action to execute
+ "thought", # LLM reasoning
+ "comment", # LLM comment
+)
+class LinuxLLMInteractionStrategy(AppLLMInteractionStrategy):
+ """
+ Strategy for LLM interaction with Linux Agent.
+
+ Constructs prompts with Linux context, calls LLM,
+ parses response into structured action.
+ """
+
+ def __init__(self, fail_fast: bool = True) -> None:
+ """
+ Initialize Linux LLM interaction strategy.
+
+ :param fail_fast: Raise exceptions immediately on errors
+ """
+ super().__init__(fail_fast=fail_fast)
+
+ async def execute(
+ self, agent: "LinuxAgent", context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Execute LLM interaction for LinuxAgent.
+
+ :param agent: LinuxAgent instance
+ :param context: Processing context with request data
+ :return: ProcessingResult with parsed LLM response
+ """
+ try:
+ # Step 1: Extract request from context
+ request = context.get("request")
+ plan = self._get_prev_plan(agent)
+
+ # Step 2: Build comprehensive prompt
+ self.logger.info("Building Linux Agent prompt")
+
+ # Get blackboard context (if multi-agent)
+ blackboard_prompt = []
+ if not agent.blackboard.is_empty():
+ blackboard_prompt = agent.blackboard.blackboard_to_prompt()
+
+ # Construct prompt message
+ prompt_message = agent.message_constructor(
+ dynamic_examples=[],
+ dynamic_knowledge="",
+ plan=plan,
+ request=request,
+ blackboard_prompt=blackboard_prompt,
+ last_success_actions=self._get_last_success_actions(agent),
+ )
+
+ # Step 3: Get LLM response
+ self.logger.info("Getting LLM response for Linux Agent")
+ response_text, llm_cost = await self._get_llm_response(
+ agent, prompt_message
+ )
+
+ # Step 4: Parse and validate response
+ self.logger.info("Parsing Linux Agent response")
+ parsed_response = self._parse_app_response(agent, response_text)
+
+ # Step 5: Extract structured data
+ structured_data = parsed_response.model_dump()
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed_response,
+ "response_text": response_text,
+ "llm_cost": llm_cost,
+ "prompt_message": prompt_message,
+ **structured_data, # action, thought, comment, etc.
+ },
+ phase=ProcessingPhase.LLM_INTERACTION,
+ )
+
+ except Exception as e:
+ error_msg = f"Linux LLM interaction failed: {str(e)}"
+ self.logger.error(error_msg)
+ return self.handle_error(e, ProcessingPhase.LLM_INTERACTION, context)
+```
+
+#### Strategy 2: Action Execution
+
+```python
+# File: ufo/agents/processors/strategies/linux_agent_strategy.py
+
+@depends_on("parsed_response", "command_dispatcher")
+@provides("execution_result", "action_info", "control_log", "status")
+class LinuxActionExecutionStrategy(AppActionExecutionStrategy):
+ """
+ Strategy for executing actions in LinuxAgent.
+
+ Dispatches shell commands to Linux MCP server,
+ captures results, and creates action logs.
+ """
+
+ def __init__(self, fail_fast: bool = False) -> None:
+ """
+ Initialize Linux action execution strategy.
+
+ :param fail_fast: Raise exceptions immediately (typically False)
+ """
+ super().__init__(fail_fast=fail_fast)
+
+ async def execute(
+ self, agent: "LinuxAgent", context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Execute Linux Agent actions.
+
+ :param agent: LinuxAgent instance
+ :param context: Processing context with parsed response
+ :return: ProcessingResult with execution results
+ """
+ try:
+ # Step 1: Extract context variables
+ parsed_response = context.get_local("parsed_response")
+ command_dispatcher = context.global_context.command_dispatcher
+
+ if not parsed_response:
+ return ProcessingResult(
+ success=True,
+ data={"message": "No response for action execution"},
+ phase=ProcessingPhase.ACTION_EXECUTION,
+ )
+
+ # Step 2: Execute the action via command dispatcher
+ execution_results = await self._execute_app_action(
+ command_dispatcher, parsed_response.action
+ )
+
+ # Step 3: Create action info for memory tracking
+ actions = self._create_action_info(
+ parsed_response.action,
+ execution_results,
+ )
+
+ # Step 4: Print action info (for debugging)
+ action_info = ListActionCommandInfo(actions)
+ action_info.color_print()
+
+ # Step 5: Create control log
+ control_log = action_info.get_target_info()
+
+ status = (
+ parsed_response.action.status
+ if isinstance(parsed_response.action, ActionCommandInfo)
+ else action_info.status
+ )
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "execution_result": execution_results,
+ "action_info": action_info,
+ "control_log": control_log,
+ "status": status,
+ },
+ phase=ProcessingPhase.ACTION_EXECUTION,
+ )
+
+ except Exception as e:
+ error_msg = f"Linux action execution failed: {str(e)}"
+ self.logger.error(error_msg)
+ return self.handle_error(e, ProcessingPhase.ACTION_EXECUTION, context)
+```
+
+### Creating MobileAgent Strategies
+
+```python
+# File: ufo/agents/processors/strategies/mobile_agent_strategy.py
+
+from typing import TYPE_CHECKING
+from ufo.agents.processors.strategies.app_agent_processing_strategy import (
+ AppLLMInteractionStrategy,
+ AppActionExecutionStrategy,
+)
+from ufo.agents.processors.context.processing_context import (
+ ProcessingContext,
+ ProcessingResult,
+ ProcessingPhase,
+)
+from ufo.agents.processors.core.strategy_dependency import depends_on, provides
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import MobileAgent
+
+
+@depends_on("request", "screenshot", "ui_tree")
+@provides(
+ "parsed_response",
+ "response_text",
+ "llm_cost",
+ "prompt_message",
+ "action",
+ "thought",
+ "comment",
+)
+class MobileLLMInteractionStrategy(AppLLMInteractionStrategy):
+ """
+ LLM interaction strategy for MobileAgent.
+
+ Handles mobile UI screenshots and hierarchy for LLM understanding.
+ """
+
+ def __init__(self, fail_fast: bool = True) -> None:
+ super().__init__(fail_fast=fail_fast)
+
+ async def execute(
+ self, agent: "MobileAgent", context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute LLM interaction for mobile UI."""
+ try:
+ # Extract mobile-specific context
+ request = context.get("request")
+ screenshot = context.get_local("screenshot")
+ ui_tree = context.get_local("ui_tree")
+
+ self.logger.info(f"Building Mobile Agent prompt for {agent.platform}")
+
+ # Build prompt with mobile context
+ prompt_message = agent.message_constructor(
+ dynamic_examples=[],
+ dynamic_knowledge="",
+ plan=self._get_prev_plan(agent),
+ request=request,
+ screenshot=screenshot,
+ ui_tree=ui_tree,
+ blackboard_prompt=(
+ agent.blackboard.blackboard_to_prompt()
+ if not agent.blackboard.is_empty() else []
+ ),
+ last_success_actions=self._get_last_success_actions(agent),
+ )
+
+ # Get LLM response
+ response_text, llm_cost = await self._get_llm_response(
+ agent, prompt_message
+ )
+
+ # Parse response
+ parsed_response = self._parse_app_response(agent, response_text)
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": parsed_response,
+ "response_text": response_text,
+ "llm_cost": llm_cost,
+ "prompt_message": prompt_message,
+ **parsed_response.model_dump(),
+ },
+ phase=ProcessingPhase.LLM_INTERACTION,
+ )
+
+ except Exception as e:
+ self.logger.error(f"Mobile LLM interaction failed: {e}")
+ return self.handle_error(e, ProcessingPhase.LLM_INTERACTION, context)
+
+
+@depends_on("parsed_response", "command_dispatcher")
+@provides("execution_result", "action_info", "control_log", "status")
+class MobileActionExecutionStrategy(AppActionExecutionStrategy):
+ """
+ Action execution strategy for MobileAgent.
+
+ Executes mobile-specific actions (tap, swipe, type, etc.)
+ via Mobile MCP server.
+ """
+
+ def __init__(self, fail_fast: bool = False) -> None:
+ super().__init__(fail_fast=fail_fast)
+
+ async def execute(
+ self, agent: "MobileAgent", context: ProcessingContext
+ ) -> ProcessingResult:
+ """Execute mobile actions."""
+ try:
+ parsed_response = context.get_local("parsed_response")
+ command_dispatcher = context.global_context.command_dispatcher
+
+ if not parsed_response:
+ return ProcessingResult(
+ success=True,
+ data={"message": "No action to execute"},
+ phase=ProcessingPhase.ACTION_EXECUTION,
+ )
+
+ # Execute mobile action
+ execution_results = await self._execute_app_action(
+ command_dispatcher, parsed_response.action
+ )
+
+ # Create action info
+ actions = self._create_action_info(
+ parsed_response.action,
+ execution_results,
+ )
+
+ action_info = ListActionCommandInfo(actions)
+ action_info.color_print()
+
+ control_log = action_info.get_target_info()
+ status = action_info.status
+
+ return ProcessingResult(
+ success=True,
+ data={
+ "execution_result": execution_results,
+ "action_info": action_info,
+ "control_log": control_log,
+ "status": status,
+ },
+ phase=ProcessingPhase.ACTION_EXECUTION,
+ )
+
+ except Exception as e:
+ self.logger.error(f"Mobile action execution failed: {e}")
+ return self.handle_error(e, ProcessingPhase.ACTION_EXECUTION, context)
+
+
+# Middleware for mobile-specific logging
+class MobileLoggingMiddleware(AppAgentLoggingMiddleware):
+ """Logging middleware for MobileAgent."""
+
+ def starting_message(self, context: ProcessingContext) -> str:
+ """Return starting message."""
+ request = context.get_local("request")
+ return f"Executing mobile task: [{request}]"
+```
+
+### Strategy Design Checklist
+
+- [ ] Use `@depends_on()` decorator to declare dependencies
+- [ ] Use `@provides()` decorator to declare outputs
+- [ ] Return `ProcessingResult` with success status
+- [ ] Handle exceptions gracefully (log, return error result)
+- [ ] Respect `fail_fast` setting
+- [ ] Use `self.logger` for debugging
+- [ ] Call `self.handle_error()` in except blocks
+
+---
+
+## Step 5: Prompter
+
+### Understanding the Prompter
+
+The **Prompter** constructs prompts for LLM interaction. It:
+
+- Loads prompt templates from YAML files
+- Constructs system and user messages
+- Inserts dynamic context (request, plan, examples)
+- Formats API/tool descriptions
+
+### LinuxAgent Prompter
+
+```python
+# File: ufo/prompter/customized/linux_agent_prompter.py
+
+import json
+from typing import Any, Dict, List
+from ufo.prompter.agent_prompter import AppAgentPrompter
+
+
+class LinuxAgentPrompter(AppAgentPrompter):
+ """
+ Prompter for LinuxAgent.
+
+ Constructs prompts for shell command generation.
+ """
+
+ def __init__(
+ self,
+ prompt_template: str,
+ example_prompt_template: str,
+ ):
+ """
+ Initialize LinuxAgentPrompter.
+
+ :param prompt_template: Path to main prompt YAML
+ :param example_prompt_template: Path to example prompt YAML
+ """
+ super().__init__(None, prompt_template, example_prompt_template)
+ self.api_prompt_template = None
+
+ def system_prompt_construction(
+ self, additional_examples: List[str] = []
+ ) -> str:
+ """
+ Construct system prompt for LinuxAgent.
+
+ :param additional_examples: Additional examples to include
+ :return: System prompt string
+ """
+ # Format API descriptions
+ apis = self.api_prompt_helper(verbose=1)
+
+ # Format examples
+ examples = self.examples_prompt_helper(
+ additional_examples=additional_examples
+ )
+
+ # Fill template
+ return self.prompt_template["system"].format(
+ apis=apis, examples=examples
+ )
+
+ def user_prompt_construction(
+ self,
+ prev_plan: List[str],
+ user_request: str,
+ retrieved_docs: str = "",
+ last_success_actions: List[Dict[str, Any]] = [],
+ ) -> str:
+ """
+ Construct user prompt for LinuxAgent.
+
+ :param prev_plan: Previous execution plan
+ :param user_request: User's request
+ :param retrieved_docs: Retrieved documentation (optional)
+ :param last_success_actions: Last successful actions
+ :return: User prompt string
+ """
+ prompt = self.prompt_template["user"].format(
+ prev_plan=json.dumps(prev_plan),
+ user_request=user_request,
+ retrieved_docs=retrieved_docs,
+ last_success_actions=json.dumps(last_success_actions),
+ )
+
+ return prompt
+
+ def user_content_construction(
+ self,
+ prev_plan: List[str],
+ user_request: str,
+ retrieved_docs: str = "",
+ last_success_actions: List[Dict[str, Any]] = [],
+ ) -> List[Dict[str, str]]:
+ """
+ Construct user content for LLM (supports multi-modal).
+
+ :param prev_plan: Previous plan
+ :param user_request: User request
+ :param retrieved_docs: Retrieved docs
+ :param last_success_actions: Last actions
+ :return: List of content dicts
+ """
+ user_content = []
+
+ user_content.append({
+ "type": "text",
+ "text": self.user_prompt_construction(
+ prev_plan=prev_plan,
+ user_request=user_request,
+ retrieved_docs=retrieved_docs,
+ last_success_actions=last_success_actions,
+ ),
+ })
+
+ return user_content
+```
+
+### Creating MobileAgent Prompter
+
+```python
+# File: ufo/prompter/customized/mobile_agent_prompter.py
+
+import json
+from typing import Any, Dict, List
+from ufo.prompter.agent_prompter import AppAgentPrompter
+
+
+class MobileAgentPrompter(AppAgentPrompter):
+ """
+ Prompter for MobileAgent.
+
+ Handles mobile UI screenshots and hierarchy for LLM prompts.
+ """
+
+ def __init__(
+ self,
+ prompt_template: str,
+ example_prompt_template: str,
+ ):
+ """
+ Initialize MobileAgentPrompter.
+
+ :param prompt_template: Path to main prompt YAML
+ :param example_prompt_template: Path to example prompt YAML
+ """
+ super().__init__(None, prompt_template, example_prompt_template)
+ self.api_prompt_template = None
+
+ def system_prompt_construction(
+ self, additional_examples: List[str] = []
+ ) -> str:
+ """Construct system prompt for MobileAgent."""
+ apis = self.api_prompt_helper(verbose=1)
+ examples = self.examples_prompt_helper(
+ additional_examples=additional_examples
+ )
+
+ return self.prompt_template["system"].format(
+ apis=apis, examples=examples
+ )
+
+ def user_prompt_construction(
+ self,
+ prev_plan: List[str],
+ user_request: str,
+ ui_tree: str = "",
+ retrieved_docs: str = "",
+ last_success_actions: List[Dict[str, Any]] = [],
+ ) -> str:
+ """
+ Construct user prompt with mobile UI context.
+
+ :param prev_plan: Previous plan
+ :param user_request: User request
+ :param ui_tree: Mobile UI hierarchy (XML/JSON)
+ :param retrieved_docs: Retrieved docs
+ :param last_success_actions: Last actions
+ :return: User prompt string
+ """
+ prompt = self.prompt_template["user"].format(
+ prev_plan=json.dumps(prev_plan),
+ user_request=user_request,
+ ui_tree=ui_tree, # Mobile-specific
+ retrieved_docs=retrieved_docs,
+ last_success_actions=json.dumps(last_success_actions),
+ )
+
+ return prompt
+
+ def user_content_construction(
+ self,
+ prev_plan: List[str],
+ user_request: str,
+ screenshot: Any = None, # Mobile screenshot
+ ui_tree: str = "",
+ retrieved_docs: str = "",
+ last_success_actions: List[Dict[str, Any]] = [],
+ ) -> List[Dict[str, str]]:
+ """
+ Construct user content with screenshot for vision LLMs.
+
+ :param prev_plan: Previous plan
+ :param user_request: User request
+ :param screenshot: Screenshot image (base64 or path)
+ :param ui_tree: UI hierarchy
+ :param retrieved_docs: Retrieved docs
+ :param last_success_actions: Last actions
+ :return: List of content dicts (text + image)
+ """
+ user_content = []
+
+ # Add text prompt
+ user_content.append({
+ "type": "text",
+ "text": self.user_prompt_construction(
+ prev_plan=prev_plan,
+ user_request=user_request,
+ ui_tree=ui_tree,
+ retrieved_docs=retrieved_docs,
+ last_success_actions=last_success_actions,
+ ),
+ })
+
+ # Add screenshot if available (for vision LLMs)
+ if screenshot:
+ user_content.append({
+ "type": "image_url",
+ "image_url": {
+ "url": f"data:image/png;base64,{screenshot}"
+ },
+ })
+
+ return user_content
+```
+
+### Prompter Best Practices
+
+- ✅ Inherit from `AppAgentPrompter` for standard structure
+- ✅ Use `self.prompt_template` and `self.example_prompt_template`
+- ✅ Implement `system_prompt_construction()` and `user_prompt_construction()`
+- ✅ Use `user_content_construction()` for multi-modal content
+- ✅ Format examples with `examples_prompt_helper()`
+- ✅ Format APIs with `api_prompt_helper()`
+- ❌ Don't hardcode prompts - use YAML templates
+
+---
+
+## Testing Your Implementation
+
+### Unit Test: Agent Class
+
+```python
+# File: tests/unit/test_mobile_agent.py
+
+import pytest
+from ufo.agents.agent.customized_agent import MobileAgent
+from ufo.agents.processors.customized.customized_agent_processor import (
+ MobileAgentProcessor
+)
+
+
+class TestMobileAgent:
+ """Unit tests for MobileAgent."""
+
+ @pytest.fixture
+ def agent(self):
+ """Create test MobileAgent instance."""
+ return MobileAgent(
+ name="test_mobile_agent",
+ main_prompt="ufo/prompts/third_party/mobile_agent.yaml",
+ example_prompt="ufo/prompts/third_party/mobile_agent_example.yaml",
+ platform="android",
+ )
+
+ def test_agent_initialization(self, agent):
+ """Test agent initializes correctly."""
+ assert agent.name == "test_mobile_agent"
+ assert agent.platform == "android"
+ assert agent.prompter is not None
+ assert agent.blackboard is not None
+
+ def test_processor_registration(self, agent):
+ """Test processor is registered correctly."""
+ assert hasattr(agent, "_processor_cls")
+ assert agent._processor_cls == MobileAgentProcessor
+
+ def test_default_state(self, agent):
+ """Test default state is set."""
+ from ufo.agents.states.mobile_agent_state import ContinueMobileAgentState
+ assert isinstance(agent.default_state, ContinueMobileAgentState)
+```
+
+### Integration Test: Full Pipeline
+
+```python
+# File: tests/integration/test_mobile_agent_pipeline.py
+
+import pytest
+from ufo.agents.agent.customized_agent import MobileAgent
+from ufo.module.context import Context
+
+
+class TestMobileAgentPipeline:
+ """Integration tests for MobileAgent pipeline."""
+
+ @pytest.fixture
+ async def agent_with_context(self):
+ """Create agent with context."""
+ agent = MobileAgent(
+ name="test_agent",
+ main_prompt="ufo/prompts/third_party/mobile_agent.yaml",
+ example_prompt="ufo/prompts/third_party/mobile_agent_example.yaml",
+ platform="android",
+ )
+
+ context = Context()
+ context.set("request", "Tap the login button")
+
+ return agent, context
+
+ @pytest.mark.asyncio
+ async def test_processor_execution(self, agent_with_context):
+ """Test processor executes all strategies."""
+ agent, context = agent_with_context
+
+ # Execute processor
+ processor = agent._processor_cls(agent, context)
+ result = await processor.process()
+
+ # Verify strategies executed
+ assert result is not None
+ assert "parsed_response" in context.get_all_local()
+```
+
+---
+
+## Summary
+
+**What You've Built**:
+
+- **Agent Class** - MobileAgent with registration and initialization
+- **Processor** - MobileAgentProcessor with strategy orchestration
+- **State Manager** - MobileAgentStateManager with FSM states
+- **Strategies** - LLM and action execution strategies
+- **Prompter** - MobileAgentPrompter for prompt construction
+
+**Next Step**: [Part 2: MCP Server Development →](mcp_server.md)
+
+---
+
+## Related Documentation
+
+- **[Agent Architecture](../../infrastructure/agents/overview.md)** - Three-layer architecture
+- **[Processor Design](../../infrastructure/agents/design/processor.md)** - Processor deep dive
+- **[Strategy Pattern](../../infrastructure/agents/design/strategy.md)** - Strategy implementation
+- **[State Machine](../../infrastructure/agents/design/state.md)** - State management
+
diff --git a/documents/docs/tutorials/creating_device_agent/example_mobile_agent.md b/documents/docs/tutorials/creating_device_agent/example_mobile_agent.md
new file mode 100644
index 000000000..7e045a1c1
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/example_mobile_agent.md
@@ -0,0 +1,107 @@
+# Part 6: Complete Example - MobileAgent
+
+**Note**: This comprehensive hands-on tutorial is currently under development. Check back soon for a complete MobileAgent implementation walkthrough.
+
+## What You'll Build
+
+A fully functional **MobileAgent** that can:
+
+- Control Android/iOS devices
+- Perform UI automation
+- Execute touch gestures (tap, swipe, type)
+- Capture screenshots and UI hierarchy
+- Integrate with Galaxy orchestration
+
+## Planned Content
+
+### 1. Platform-Specific Setup
+
+#### Android
+- ADB (Android Debug Bridge) integration
+- UI Automator framework
+- Accessibility services
+
+#### iOS
+- XCTest framework
+- Accessibility API
+- Instrument tools
+
+### 2. Complete Implementation
+
+- Agent class
+- Processor and strategies
+- State manager
+- MCP server with mobile tools
+- Prompter for mobile UI
+
+### 3. Advanced Features
+
+- Multi-device coordination
+- App-specific automation
+- Error recovery strategies
+- Performance optimization
+
+## Temporary Reference
+
+For now, study the **LinuxAgent** implementation as a complete reference:
+
+### Key Files
+
+| Component | File Path |
+|-----------|-----------|
+| Agent Class | `ufo/agents/agent/customized_agent.py` |
+| Processor | `ufo/agents/processors/customized/customized_agent_processor.py` |
+| Strategies | `ufo/agents/processors/strategies/linux_agent_strategy.py` |
+| States | `ufo/agents/states/linux_agent_state.py` |
+| Prompter | `ufo/prompter/customized/linux_agent_prompter.py` |
+| MCP Server | `ufo/client/mcp/http_servers/linux_mcp_server.py` |
+
+### Quick Start Template
+
+```python
+# Minimal MobileAgent structure (to be expanded)
+
+@AgentRegistry.register(
+ agent_name="MobileAgent",
+ third_party=True,
+ processor_cls=MobileAgentProcessor
+)
+class MobileAgent(CustomizedAgent):
+ def __init__(self, name, main_prompt, example_prompt):
+ super().__init__(name, main_prompt, example_prompt,
+ process_name=None, app_root_name=None, is_visual=None)
+ self._blackboard = Blackboard()
+ self.set_state(self.default_state)
+ self._context_provision_executed = False
+
+ @property
+ def default_state(self):
+ return ContinueMobileAgentState()
+
+ def message_constructor(
+ self,
+ dynamic_examples,
+ dynamic_knowledge,
+ plan,
+ request,
+ installed_apps,
+ current_controls,
+ screenshot_url=None,
+ annotated_screenshot_url=None,
+ blackboard_prompt=None,
+ last_success_actions=None,
+ ):
+ # Construct prompt for LLM with mobile-specific context
+ return self.prompter.prompt_construction(...)
+```
+
+## Related Documentation
+
+- **[Agent Architecture](../../infrastructure/agents/overview.md)** - Architecture overview
+- **[Agent Types](../../infrastructure/agents/agent_types.md)** - Platform implementations
+- **[Linux Quick Start](../../getting_started/quick_start_linux.md)** - LinuxAgent deployment
+
+---
+
+**Previous**: [← Part 5: Testing & Debugging](testing.md)
+**Back to Index**: [Tutorial Series](index.md)
diff --git a/documents/docs/tutorials/creating_device_agent/index.md b/documents/docs/tutorials/creating_device_agent/index.md
new file mode 100644
index 000000000..c4903f6a1
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/index.md
@@ -0,0 +1,187 @@
+# Creating Device Agents - Tutorial Series
+
+This tutorial series teaches you how to create new device agents for UFO³, using **LinuxAgent** as a reference implementation.
+
+## 📚 Tutorial Structure
+
+### [Part 0: Overview](overview.md)
+**Introduction to device agents and architecture overview**
+
+- Understanding device agents vs third-party agents
+- Server-client architecture
+- LinuxAgent as reference implementation
+- Tutorial roadmap
+
+**Time**: 15 minutes | **Difficulty**: ⭐
+
+---
+
+### [Part 1: Core Components](core_components.md)
+**Building server-side components**
+
+- Agent Class implementation
+- Processor and strategy orchestration
+- State Manager and FSM
+- Processing Strategies (LLM, Action)
+- Prompter for LLM interaction
+
+**Time**: 45 minutes | **Difficulty**: ⭐⭐⭐
+
+---
+
+### [Part 2: MCP Server Development](mcp_server.md)
+**Creating platform-specific MCP servers** *(Placeholder - Under Development)*
+
+- MCP server architecture
+- Defining MCP tools
+- Command execution logic
+- Error handling and validation
+
+**Time**: 30 minutes | **Difficulty**: ⭐⭐
+
+---
+
+### [Part 3: Client Setup](client_setup.md)
+**Setting up the device client** *(Placeholder - Under Development)*
+
+- Client initialization and configuration
+- MCP server manager integration
+- WebSocket connection setup
+- Platform detection
+
+**Time**: 20 minutes | **Difficulty**: ⭐⭐
+
+---
+
+### [Part 4: Configuration & Deployment](configuration.md)
+**Configuring and deploying your agent** *(Placeholder - Under Development)*
+
+- `third_party.yaml` configuration
+- `devices.yaml` device registration
+- Prompt template creation
+- Deployment steps
+- Galaxy integration
+
+**Time**: 25 minutes | **Difficulty**: ⭐⭐
+
+---
+
+### [Part 5: Testing & Debugging](testing.md)
+**Testing and debugging your implementation** *(Placeholder - Under Development)*
+
+- Unit testing strategies
+- Integration testing
+- Debugging techniques
+- Common issues and solutions
+
+**Time**: 30 minutes | **Difficulty**: ⭐⭐⭐
+
+---
+
+### [Part 6: Complete Example: MobileAgent](example_mobile_agent.md)
+**Hands-on walkthrough creating MobileAgent** *(Placeholder - Under Development)*
+
+- Step-by-step implementation
+- Android/iOS platform specifics
+- UI Automator integration
+- Complete working example
+
+**Time**: 60 minutes | **Difficulty**: ⭐⭐⭐⭐
+
+---
+
+## Quick Navigation
+
+| I Want To... | Go To |
+|--------------|-------|
+| Understand device agent architecture | [Overview](overview.md#understanding-device-agents) |
+| Study LinuxAgent implementation | [Overview](overview.md#linuxagent-reference-implementation) |
+| Create Agent Class | [Core Components - Step 1](core_components.md#step-1-agent-class) |
+| Build Processor | [Core Components - Step 2](core_components.md#step-2-processor) |
+| Implement State Machine | [Core Components - Step 3](core_components.md#step-3-state-manager) |
+| Write Processing Strategies | [Core Components - Step 4](core_components.md#step-4-processing-strategies) |
+| Create Prompter | [Core Components - Step 5](core_components.md#step-5-prompter) |
+| Build MCP Server | [MCP Server](mcp_server.md) *(placeholder)* |
+| Setup Client | [Client Setup](client_setup.md) *(placeholder)* |
+| Configure & Deploy | [Configuration](configuration.md) *(placeholder)* |
+| Test & Debug | [Testing](testing.md) *(placeholder)* |
+| Complete Example | [MobileAgent Example](example_mobile_agent.md) *(placeholder)* |
+
+---
+
+## Prerequisites
+
+Before starting, ensure you have:
+
+- ✅ Python 3.10+
+- ✅ UFO³ repository cloned
+- ✅ Basic understanding of async programming
+- ✅ Familiarity with [Agent Architecture](../../infrastructure/agents/overview.md)
+
+---
+
+## Learning Path
+
+```mermaid
+graph LR
+ A[Overview ✅ Complete] --> B[Core Components ✅ Complete]
+ B --> C[MCP Server 📝 Placeholder]
+ C --> D[Client Setup 📝 Placeholder]
+ D --> E[Configuration 📝 Placeholder]
+ E --> F[Testing 📝 Placeholder]
+ F --> G[Complete Example 📝 Placeholder]
+
+ style A fill:#c8e6c9
+ style B fill:#c8e6c9
+ style C fill:#fff3e0
+ style D fill:#fff3e0
+ style E fill:#fff3e0
+ style F fill:#fff3e0
+ style G fill:#fff3e0
+```
+
+**Recommended Path**:
+1. ✅ **Completed**: [Overview](overview.md) - Understand architecture
+2. ✅ **Completed**: [Core Components](core_components.md) - Build server-side components
+3. 📝 **Placeholder**: [MCP Server](mcp_server.md) - Create device commands
+4. 📝 **Placeholder**: [Client Setup](client_setup.md) - Setup device client
+5. 📝 **Placeholder**: [Configuration](configuration.md) - Configure and deploy
+6. 📝 **Placeholder**: [Testing](testing.md) - Test and debug
+7. 📝 **Placeholder**: [Complete Example](example_mobile_agent.md) - Full MobileAgent implementation
+
+---
+
+## Additional Resources
+
+- **[Agent Architecture Overview](../../infrastructure/agents/overview.md)** - Three-layer architecture
+- **[Agent Types](../../infrastructure/agents/agent_types.md)** - Platform-specific implementations
+- **[Linux Quick Start](../../getting_started/quick_start_linux.md)** - Deploy LinuxAgent
+- **[Creating Third-Party Agents](../creating_third_party_agents.md)** - Related tutorial
+- **[MCP Overview](../../mcp/overview.md)** - Model Context Protocol
+- **[Server Overview](../../server/overview.md)** - Server architecture
+- **[Client Overview](../../client/overview.md)** - Client architecture
+
+---
+
+## Getting Help
+
+If you encounter issues:
+
+1. 📖 Review the [FAQ](../../faq.md)
+2. 🐛 Check [troubleshooting guides](core_components.md#testing-your-implementation)
+3. 💬 Ask in GitHub Discussions
+4. 🐞 Report bugs on GitHub Issues
+
+---
+
+## Contributing
+
+Found an issue or want to improve these tutorials?
+
+- 📝 Submit a PR with improvements
+- 💡 Suggest new topics
+- 🔍 Report errors or unclear sections
+
+---
+
+**Ready to start?** → [Begin with Overview](overview.md)
diff --git a/documents/docs/tutorials/creating_device_agent/mcp_server.md b/documents/docs/tutorials/creating_device_agent/mcp_server.md
new file mode 100644
index 000000000..283c6d6d5
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/mcp_server.md
@@ -0,0 +1,1140 @@
+# Part 2: MCP Server Development
+
+This tutorial teaches you how to create a **platform-specific MCP (Model Context Protocol) server** that enables your device agent to execute commands on the target device. We'll use **LinuxAgent's MCP server** as reference implementation.
+
+---
+
+## Table of Contents
+
+1. [MCP Server Overview](#mcp-server-overview)
+2. [Architecture and Design](#architecture-and-design)
+3. [LinuxAgent MCP Server Analysis](#linuxagent-mcp-server-analysis)
+4. [Creating Your MCP Server](#creating-your-mcp-server)
+5. [Tool Definition Best Practices](#tool-definition-best-practices)
+6. [Error Handling and Validation](#error-handling-and-validation)
+7. [Testing Your MCP Server](#testing-your-mcp-server)
+
+---
+
+## MCP Server Overview
+
+### What is an MCP Server?
+
+An **MCP Server** is a service that exposes **platform-specific tools** (commands) to LLM agents via the Model Context Protocol. For device agents, the MCP server:
+
+- Runs on or near the target device
+- Exposes tools as callable functions
+- Executes system-level commands safely
+- Returns structured results to the agent
+
+### MCP Server in Device Agent Architecture
+
+```mermaid
+graph TB
+ subgraph "Agent Server (Orchestrator)"
+ Agent[Device Agent]
+ Strategy[Action Execution Strategy]
+ Dispatcher[Command Dispatcher]
+ end
+
+ subgraph "Device Client"
+ Client[UFO Client]
+ Manager[MCP Server Manager]
+ end
+
+ subgraph "MCP Server (Device/Remote)"
+ MCP[MCP Server FastMCP]
+ Tool1[Tool: execute_command]
+ Tool2[Tool: get_system_info]
+ ToolN[Tool: ...]
+
+ MCP --> Tool1
+ MCP --> Tool2
+ MCP --> ToolN
+ end
+
+ subgraph "Target Device/System"
+ OS[Operating System Linux/Android/iOS]
+ Shell[Shell/API]
+
+ Tool1 --> Shell
+ Tool2 --> Shell
+ ToolN --> Shell
+ Shell --> OS
+ end
+
+ Agent --> Strategy
+ Strategy --> Dispatcher
+ Dispatcher -->|AIP Protocol| Client
+ Client --> Manager
+ Manager -->|HTTP/Stdio| MCP
+
+ style Agent fill:#c8e6c9
+ style MCP fill:#e1f5ff
+ style OS fill:#fff3e0
+```
+
+**Key Points**:
+
+- **MCP Server** runs separately from agent server (security isolation)
+- **Tools** are atomic operations exposed to LLM
+- **Command Dispatcher** translates LLM actions to MCP tool calls
+- **Results** flow back through the same path
+
+---
+
+## Architecture and Design
+
+### MCP Server Components
+
+```mermaid
+graph TB
+ subgraph "MCP Server Structure"
+ Server[FastMCP Server HTTP/Stdio Transport]
+
+ subgraph "Tools Layer"
+ T1[Tool 1 @mcp.tool]
+ T2[Tool 2 @mcp.tool]
+ T3[Tool N @mcp.tool]
+ end
+
+ subgraph "Execution Layer"
+ Executor[Command Executor asyncio subprocess]
+ Validator[Input Validator Security checks]
+ ErrorHandler[Error Handler Exception handling]
+ end
+
+ subgraph "Platform Interface"
+ API[Platform API Shell/SDK/ADB]
+ end
+
+ Server --> T1 & T2 & T3
+ T1 & T2 & T3 --> Validator
+ Validator --> Executor
+ Executor --> ErrorHandler
+ ErrorHandler --> API
+ end
+
+ style Server fill:#e1f5ff
+ style T1 fill:#c8e6c9
+ style Executor fill:#fff3e0
+ style API fill:#f3e5f5
+```
+
+### MCP Server Design Principles
+
+| Principle | Description | Example |
+|-----------|-------------|---------|
+| **Atomic Tools** | Each tool performs one specific operation | `execute_command` vs `execute_and_parse_command` |
+| **Type Safety** | Use Pydantic `Field` for type annotations | `Annotated[str, Field(description="...")]` |
+| **Error Resilience** | Handle all exceptions gracefully | Try/except with structured error responses |
+| **Security First** | Validate and sanitize all inputs | Block dangerous commands, validate paths |
+| **Platform Agnostic** | Abstract platform differences | Use subprocess for shell, ADB for Android |
+| **Async Execution** | Use asyncio for non-blocking operations | `async def`, `await subprocess` |
+
+---
+
+## LinuxAgent MCP Server Analysis
+
+### File Location
+
+**Path**: `ufo/client/mcp/http_servers/linux_mcp_server.py`
+
+### Complete Implementation
+
+```python
+#!/usr/bin/env python3
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Linux MCP Server
+Provides MCP interface for executing shell commands on Linux systems.
+"""
+
+import argparse
+import asyncio
+from typing import Annotated, Any, Dict, Optional
+from fastmcp import FastMCP
+from pydantic import Field
+
+
+def create_bash_mcp_server(host: str = "", port: int = 8010) -> None:
+ """Create an MCP server for Linux command execution."""
+
+ # Initialize FastMCP server with configuration
+ mcp = FastMCP(
+ "Linux Bash MCP Server", # Server name
+ instructions="MCP server for executing shell commands on Linux.",
+ stateless_http=False, # Maintain state across requests
+ json_response=True, # Return JSON responses
+ host=host,
+ port=port,
+ )
+
+ # ========================================
+ # Tool 1: Execute Shell Command
+ # ========================================
+ @mcp.tool()
+ async def execute_command(
+ command: Annotated[
+ str,
+ Field(
+ description="Shell command to execute on the Linux system. "
+ "This should be a valid bash/sh command that will be executed "
+ "in a shell environment. Examples: 'ls -la /home', "
+ "'cat /etc/os-release', 'python3 --version', "
+ "'grep -r \"pattern\" /path/to/search'. Be cautious with "
+ "destructive commands as some dangerous operations are blocked."
+ ),
+ ],
+ timeout: Annotated[
+ int,
+ Field(
+ description="Maximum execution time in seconds before the "
+ "command is forcefully terminated. Default is 30 seconds. "
+ "Use higher values for long-running operations."
+ ),
+ ] = 30,
+ cwd: Annotated[
+ Optional[str],
+ Field(
+ description="Working directory path where the command should "
+ "be executed. If not specified, uses server's current directory. "
+ "Use absolute paths for reliability."
+ ),
+ ] = None,
+ ) -> Annotated[
+ Dict[str, Any],
+ Field(
+ description="Dictionary containing execution results with keys: "
+ "'success' (bool), 'exit_code' (int), 'stdout' (str), "
+ "'stderr' (str), or 'error' (str error message if execution failed)"
+ ),
+ ]:
+ """
+ Execute a shell command on Linux and return stdout/stderr.
+
+ Security: Blocks known dangerous commands.
+ """
+ # Security: Block dangerous commands
+ dangerous = [
+ "rm -rf /",
+ ":(){ :|:& };:", # Fork bomb
+ "mkfs",
+ "dd if=/dev/zero",
+ "shutdown",
+ "reboot",
+ ]
+ if any(d in command.lower() for d in dangerous):
+ return {"success": False, "error": "Blocked dangerous command."}
+
+ try:
+ # Create async subprocess
+ proc = await asyncio.create_subprocess_shell(
+ command,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ cwd=cwd,
+ )
+
+ try:
+ # Wait for completion with timeout
+ stdout, stderr = await asyncio.wait_for(
+ proc.communicate(), timeout=timeout
+ )
+ except asyncio.TimeoutError:
+ # Kill process on timeout
+ proc.kill()
+ await proc.wait()
+ return {"success": False, "error": f"Timeout after {timeout}s."}
+
+ # Return structured result
+ return {
+ "success": proc.returncode == 0,
+ "exit_code": proc.returncode,
+ "stdout": stdout.decode("utf-8", errors="replace"),
+ "stderr": stderr.decode("utf-8", errors="replace"),
+ }
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 2: Get System Information
+ # ========================================
+ @mcp.tool()
+ async def get_system_info() -> Annotated[
+ Dict[str, Any],
+ Field(
+ description="Dictionary containing basic Linux system information "
+ "with keys: 'uname', 'uptime', 'memory', 'disk'"
+ ),
+ ]:
+ """
+ Get basic system info (uname, uptime, memory, disk).
+ """
+ info = {}
+ cmds = {
+ "uname": "uname -a",
+ "uptime": "uptime",
+ "memory": "free -h",
+ "disk": "df -h",
+ }
+
+ for k, cmd in cmds.items():
+ try:
+ proc = await asyncio.create_subprocess_shell(
+ cmd, stdout=asyncio.subprocess.PIPE
+ )
+ out, _ = await proc.communicate()
+ info[k] = out.decode("utf-8", errors="replace").strip()
+ except Exception as e:
+ info[k] = f"Error: {e}"
+
+ return info
+
+ # Start the server
+ mcp.run(transport="streamable-http")
+
+
+def main():
+ """CLI entry point for Linux MCP server."""
+ parser = argparse.ArgumentParser(description="Linux Bash MCP Server")
+ parser.add_argument(
+ "--port", type=int, default=8010, help="Port to run the server on"
+ )
+ parser.add_argument(
+ "--host", default="localhost", help="Host to bind the server to"
+ )
+ args = parser.parse_args()
+
+ print("=" * 50)
+ print("UFO Linux Bash MCP Server")
+ print("Linux command execution via Model Context Protocol")
+ print(f"Running on {args.host}:{args.port}")
+ print("=" * 50)
+
+ create_bash_mcp_server(host=args.host, port=args.port)
+
+
+if __name__ == "__main__":
+ main()
+```
+
+### Key Design Patterns
+
+#### 1. Type-Safe Tool Definitions
+
+```python
+@mcp.tool()
+async def execute_command(
+ command: Annotated[str, Field(description="...")], # Required parameter
+ timeout: Annotated[int, Field(description="...")] = 30, # Optional with default
+ cwd: Annotated[Optional[str], Field(description="...")] = None, # Optional
+) -> Annotated[Dict[str, Any], Field(description="...")]: # Return type
+```
+
+**Benefits**:
+- ✅ LLM understands parameter types and descriptions
+- ✅ Runtime validation via Pydantic
+- ✅ Auto-generated API documentation
+- ✅ Clear contracts for consumers
+
+#### 2. Security-First Validation
+
+```python
+# Block dangerous commands
+dangerous = ["rm -rf /", ":(){ :|:& };:", "mkfs", ...]
+if any(d in command.lower() for d in dangerous):
+ return {"success": False, "error": "Blocked dangerous command."}
+```
+
+**Best Practices**:
+- ✅ Whitelist safe operations when possible
+- ✅ Blacklist known dangerous patterns
+- ✅ Validate paths (prevent directory traversal)
+- ✅ Limit command complexity
+- ❌ Don't rely on sanitization alone
+
+#### 3. Async Execution with Timeout
+
+```python
+proc = await asyncio.create_subprocess_shell(...)
+try:
+ stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+except asyncio.TimeoutError:
+ proc.kill()
+ await proc.wait()
+ return {"success": False, "error": f"Timeout after {timeout}s."}
+```
+
+**Why Async?**:
+- Non-blocking execution (server remains responsive)
+- Timeout enforcement (prevent hanging)
+- Concurrent tool execution support
+- Better resource utilization
+
+#### 4. Structured Error Handling
+
+```python
+return {
+ "success": proc.returncode == 0, # Boolean success flag
+ "exit_code": proc.returncode, # Numeric exit code
+ "stdout": stdout.decode("utf-8", errors="replace"), # Output
+ "stderr": stderr.decode("utf-8", errors="replace"), # Errors
+}
+```
+
+**Error Response Contract**:
+- Always return dict (never raise exceptions to LLM)
+- Include `success` boolean field
+- Provide detailed error messages
+- Preserve stdout/stderr for debugging
+
+---
+
+## Creating Your MCP Server
+
+### Step-by-Step Guide: MobileAgent MCP Server
+
+Let's create a complete MCP server for mobile automation (Android/iOS):
+
+**File**: `ufo/client/mcp/http_servers/mobile_mcp_server.py`
+
+```python
+#!/usr/bin/env python3
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+"""
+Mobile MCP Server
+Provides MCP interface for mobile device automation (Android/iOS).
+"""
+
+import argparse
+import asyncio
+import subprocess
+from typing import Annotated, Any, Dict, Optional, Literal
+from fastmcp import FastMCP
+from pydantic import Field
+
+
+def create_mobile_mcp_server(
+ host: str = "localhost",
+ port: int = 8020,
+ platform: str = "android"
+) -> None:
+ """Create an MCP server for mobile device automation."""
+
+ mcp = FastMCP(
+ f"Mobile MCP Server ({platform.capitalize()})",
+ instructions=f"MCP server for {platform} mobile device automation",
+ stateless_http=False,
+ json_response=True,
+ host=host,
+ port=port,
+ )
+
+ # ========================================
+ # Tool 1: Tap Element by Coordinates
+ # ========================================
+ @mcp.tool()
+ async def tap_screen(
+ x: Annotated[int, Field(description="X coordinate (pixels from left)")],
+ y: Annotated[int, Field(description="Y coordinate (pixels from top)")],
+ duration_ms: Annotated[
+ int,
+ Field(description="Tap duration in milliseconds (default: 100)")
+ ] = 100,
+ ) -> Annotated[
+ Dict[str, Any],
+ Field(description="Result with 'success', 'message', and optional 'error'")
+ ]:
+ """
+ Tap the screen at specified coordinates.
+
+ Platform support:
+ - Android: Uses ADB input tap
+ - iOS: Uses xcrun simctl (simulator) or ios-deploy (device)
+ """
+ try:
+ if platform == "android":
+ # Android: adb shell input tap x y
+ result = subprocess.run(
+ ["adb", "shell", "input", "tap", str(x), str(y)],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ if result.returncode == 0:
+ return {
+ "success": True,
+ "message": f"Tapped at ({x}, {y})",
+ "platform": "android"
+ }
+ else:
+ return {
+ "success": False,
+ "error": f"ADB error: {result.stderr}",
+ "platform": "android"
+ }
+
+ elif platform == "ios":
+ # iOS: xcrun simctl (for simulator)
+ # Note: Real device requires more complex setup
+ result = subprocess.run(
+ ["xcrun", "simctl", "io", "booted", "tap", str(x), str(y)],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ if result.returncode == 0:
+ return {
+ "success": True,
+ "message": f"Tapped at ({x}, {y})",
+ "platform": "ios"
+ }
+ else:
+ return {
+ "success": False,
+ "error": f"iOS error: {result.stderr}",
+ "platform": "ios"
+ }
+
+ except subprocess.TimeoutExpired:
+ return {"success": False, "error": "Command timeout"}
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 2: Swipe Gesture
+ # ========================================
+ @mcp.tool()
+ async def swipe(
+ start_x: Annotated[int, Field(description="Start X coordinate")],
+ start_y: Annotated[int, Field(description="Start Y coordinate")],
+ end_x: Annotated[int, Field(description="End X coordinate")],
+ end_y: Annotated[int, Field(description="End Y coordinate")],
+ duration_ms: Annotated[
+ int,
+ Field(description="Swipe duration in milliseconds (default: 300)")
+ ] = 300,
+ ) -> Dict[str, Any]:
+ """
+ Perform a swipe gesture from start to end coordinates.
+ """
+ try:
+ if platform == "android":
+ # Android: adb shell input swipe x1 y1 x2 y2 duration
+ result = subprocess.run(
+ [
+ "adb", "shell", "input", "swipe",
+ str(start_x), str(start_y),
+ str(end_x), str(end_y),
+ str(duration_ms)
+ ],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ return {
+ "success": result.returncode == 0,
+ "message": f"Swiped from ({start_x},{start_y}) to ({end_x},{end_y})",
+ "error": result.stderr if result.returncode != 0 else None
+ }
+
+ elif platform == "ios":
+ # iOS simulator: multiple taps with delay
+ # (Approximates swipe - real swipe requires XCUITest)
+ await asyncio.sleep(0.1) # Placeholder
+ return {
+ "success": True,
+ "message": f"Swipe gesture simulated (iOS)",
+ "note": "Real device requires XCUITest integration"
+ }
+
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 3: Type Text
+ # ========================================
+ @mcp.tool()
+ async def type_text(
+ text: Annotated[str, Field(description="Text to type")],
+ clear_first: Annotated[
+ bool,
+ Field(description="Clear existing text before typing")
+ ] = False,
+ ) -> Dict[str, Any]:
+ """
+ Type text into the currently focused input field.
+ """
+ try:
+ if platform == "android":
+ # Escape special characters for ADB
+ escaped_text = text.replace(" ", "%s").replace("'", "\\'")
+
+ if clear_first:
+ # Clear existing text (Ctrl+A + Delete)
+ subprocess.run(
+ ["adb", "shell", "input", "keyevent", "KEYCODE_CTRL_A"],
+ timeout=2
+ )
+ subprocess.run(
+ ["adb", "shell", "input", "keyevent", "KEYCODE_DEL"],
+ timeout=2
+ )
+
+ # Type new text
+ result = subprocess.run(
+ ["adb", "shell", "input", "text", escaped_text],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ return {
+ "success": result.returncode == 0,
+ "message": f"Typed: {text}",
+ "error": result.stderr if result.returncode != 0 else None
+ }
+
+ elif platform == "ios":
+ # iOS: xcrun simctl io booted text
+ result = subprocess.run(
+ ["xcrun", "simctl", "io", "booted", "text", text],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ return {
+ "success": result.returncode == 0,
+ "message": f"Typed: {text}",
+ "error": result.stderr if result.returncode != 0 else None
+ }
+
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 4: Capture Screenshot
+ # ========================================
+ @mcp.tool()
+ async def capture_screenshot(
+ save_path: Annotated[
+ str,
+ Field(description="Local path to save screenshot (e.g., '/tmp/screen.png')")
+ ],
+ ) -> Dict[str, Any]:
+ """
+ Capture a screenshot from the mobile device.
+ """
+ try:
+ if platform == "android":
+ # Android: adb exec-out screencap -p > file
+ result = subprocess.run(
+ ["adb", "exec-out", "screencap", "-p"],
+ capture_output=True,
+ timeout=10
+ )
+
+ if result.returncode == 0:
+ with open(save_path, "wb") as f:
+ f.write(result.stdout)
+ return {
+ "success": True,
+ "message": f"Screenshot saved to {save_path}",
+ "path": save_path
+ }
+ else:
+ return {"success": False, "error": result.stderr.decode()}
+
+ elif platform == "ios":
+ # iOS: xcrun simctl io booted screenshot
+ result = subprocess.run(
+ ["xcrun", "simctl", "io", "booted", "screenshot", save_path],
+ capture_output=True,
+ text=True,
+ timeout=10
+ )
+
+ return {
+ "success": result.returncode == 0,
+ "message": f"Screenshot saved to {save_path}",
+ "path": save_path,
+ "error": result.stderr if result.returncode != 0 else None
+ }
+
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 5: Get UI Hierarchy
+ # ========================================
+ @mcp.tool()
+ async def get_ui_tree(
+ format: Annotated[
+ Literal["xml", "json"],
+ Field(description="Output format (xml or json)")
+ ] = "xml",
+ ) -> Dict[str, Any]:
+ """
+ Get the current UI hierarchy/tree from the device.
+ """
+ try:
+ if platform == "android":
+ # Android: adb shell uiautomator dump
+ # Dump to device, then pull
+ subprocess.run(
+ ["adb", "shell", "uiautomator", "dump", "/sdcard/window_dump.xml"],
+ timeout=10
+ )
+
+ result = subprocess.run(
+ ["adb", "shell", "cat", "/sdcard/window_dump.xml"],
+ capture_output=True,
+ text=True,
+ timeout=5
+ )
+
+ if result.returncode == 0:
+ return {
+ "success": True,
+ "ui_tree": result.stdout,
+ "format": "xml"
+ }
+ else:
+ return {"success": False, "error": result.stderr}
+
+ elif platform == "ios":
+ # iOS: requires XCUITest or Appium
+ return {
+ "success": False,
+ "error": "iOS UI tree requires XCUITest integration",
+ "note": "Use accessibility inspector or Appium"
+ }
+
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # ========================================
+ # Tool 6: Launch App
+ # ========================================
+ @mcp.tool()
+ async def launch_app(
+ package_name: Annotated[
+ str,
+ Field(description="App package name (Android) or bundle ID (iOS)")
+ ],
+ ) -> Dict[str, Any]:
+ """
+ Launch an application by package name or bundle ID.
+ """
+ try:
+ if platform == "android":
+ # Android: adb shell monkey
+ result = subprocess.run(
+ [
+ "adb", "shell", "monkey", "-p", package_name,
+ "-c", "android.intent.category.LAUNCHER", "1"
+ ],
+ capture_output=True,
+ text=True,
+ timeout=10
+ )
+
+ return {
+ "success": "monkey" in result.stdout.lower(),
+ "message": f"Launched {package_name}",
+ "output": result.stdout
+ }
+
+ elif platform == "ios":
+ # iOS: xcrun simctl launch
+ result = subprocess.run(
+ ["xcrun", "simctl", "launch", "booted", package_name],
+ capture_output=True,
+ text=True,
+ timeout=10
+ )
+
+ return {
+ "success": result.returncode == 0,
+ "message": f"Launched {package_name}",
+ "error": result.stderr if result.returncode != 0 else None
+ }
+
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ # Start the server
+ mcp.run(transport="streamable-http")
+
+
+def main():
+ """CLI entry point for Mobile MCP server."""
+ parser = argparse.ArgumentParser(description="Mobile MCP Server")
+ parser.add_argument(
+ "--port", type=int, default=8020, help="Port to run the server on"
+ )
+ parser.add_argument(
+ "--host", default="localhost", help="Host to bind the server to"
+ )
+ parser.add_argument(
+ "--platform",
+ choices=["android", "ios"],
+ default="android",
+ help="Mobile platform (android or ios)"
+ )
+ args = parser.parse_args()
+
+ print("=" * 50)
+ print(f"UFO Mobile MCP Server ({args.platform.capitalize()})")
+ print(f"Mobile device automation via Model Context Protocol")
+ print(f"Running on {args.host}:{args.port}")
+ print("=" * 50)
+
+ create_mobile_mcp_server(host=args.host, port=args.port, platform=args.platform)
+
+
+if __name__ == "__main__":
+ main()
+```
+
+---
+
+## Tool Definition Best Practices
+
+### 1. Descriptive Tool Names
+
+| ❌ Bad | ✅ Good | Why |
+|--------|---------|-----|
+| `do_thing` | `tap_screen` | Clear action |
+| `cmd` | `execute_command` | Self-documenting |
+| `get` | `get_ui_tree` | Specific purpose |
+
+### 2. Rich Type Annotations
+
+```python
+# ✅ Excellent: Full type hints with descriptions
+@mcp.tool()
+async def tap_screen(
+ x: Annotated[int, Field(description="X coordinate in pixels from left edge")],
+ y: Annotated[int, Field(description="Y coordinate in pixels from top edge")],
+ duration_ms: Annotated[int, Field(description="Tap duration in milliseconds")] = 100,
+) -> Annotated[Dict[str, Any], Field(description="Result dict with 'success' and 'message'")]:
+```
+
+### 3. Consistent Return Format
+
+```python
+# ✅ Always return structured dict
+{
+ "success": bool, # Required: operation status
+ "message": str, # Optional: human-readable result
+ "error": str, # Optional: error details if success=False
+ "data": Any, # Optional: additional result data
+}
+
+# ❌ Don't mix return types
+return True # Bad: not structured
+raise Exception("Error") # Bad: exceptions not handled by LLM
+```
+
+### 4. Comprehensive Docstrings
+
+```python
+@mcp.tool()
+async def swipe(start_x: int, start_y: int, end_x: int, end_y: int) -> Dict:
+ """
+ Perform a swipe gesture from start to end coordinates.
+
+ Platform support:
+ - Android: Uses ADB input swipe
+ - iOS: Simulated via multiple taps (requires XCUITest for real swipe)
+
+ Args:
+ start_x: Starting X coordinate (pixels from left)
+ start_y: Starting Y coordinate (pixels from top)
+ end_x: Ending X coordinate
+ end_y: Ending Y coordinate
+
+ Returns:
+ Dict with 'success', 'message', and optional 'error'
+
+ Example:
+ >>> await swipe(100, 500, 100, 100) # Swipe up
+ {"success": True, "message": "Swiped from (100,500) to (100,100)"}
+ """
+```
+
+---
+
+## Error Handling and Validation
+
+### Input Validation Strategies
+
+```python
+@mcp.tool()
+async def tap_screen(x: int, y: int) -> Dict[str, Any]:
+ """Tap with validation."""
+
+ # 1. Range validation
+ if x < 0 or y < 0:
+ return {
+ "success": False,
+ "error": f"Invalid coordinates: ({x}, {y}). Must be non-negative."
+ }
+
+ # 2. Boundary checks (if screen size known)
+ max_x, max_y = 1080, 1920 # Example resolution
+ if x > max_x or y > max_y:
+ return {
+ "success": False,
+ "error": f"Coordinates out of bounds. Screen: {max_x}x{max_y}"
+ }
+
+ # 3. Execute with error handling
+ try:
+ result = subprocess.run([...], timeout=5)
+ return {"success": result.returncode == 0}
+ except subprocess.TimeoutExpired:
+ return {"success": False, "error": "Tap command timeout"}
+ except Exception as e:
+ return {"success": False, "error": f"Unexpected error: {str(e)}"}
+```
+
+### Security Validation
+
+```python
+def validate_app_package(package: str) -> bool:
+ """Validate app package name format."""
+ import re
+ # Android: com.example.app
+ android_pattern = r'^[a-z][a-z0-9_]*(\.[a-z][a-z0-9_]*)+$'
+ # iOS: com.example.App
+ ios_pattern = r'^[a-zA-Z][a-zA-Z0-9_]*(\.[a-zA-Z][a-zA-Z0-9_]*)+$'
+
+ return bool(re.match(android_pattern, package) or re.match(ios_pattern, package))
+
+@mcp.tool()
+async def launch_app(package_name: str) -> Dict:
+ """Launch app with validation."""
+ if not validate_app_package(package_name):
+ return {
+ "success": False,
+ "error": f"Invalid package name format: {package_name}"
+ }
+ # ... continue execution
+```
+
+### Timeout Strategies
+
+```python
+# Strategy 1: Command-level timeout
+result = subprocess.run([...], timeout=5)
+
+# Strategy 2: Async timeout with cleanup
+try:
+ proc = await asyncio.create_subprocess_exec(...)
+ stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=10)
+except asyncio.TimeoutError:
+ proc.kill() # Clean up process
+ await proc.wait()
+ return {"success": False, "error": "Operation timeout"}
+
+# Strategy 3: Retry with backoff
+async def execute_with_retry(cmd, max_retries=3):
+ for attempt in range(max_retries):
+ try:
+ return await execute_command(cmd)
+ except TimeoutError:
+ if attempt == max_retries - 1:
+ raise
+ await asyncio.sleep(2 ** attempt) # Exponential backoff
+```
+
+---
+
+## Testing Your MCP Server
+
+### Unit Testing
+
+```python
+# tests/test_mobile_mcp_server.py
+
+import pytest
+from unittest.mock import patch, MagicMock
+from ufo.client.mcp.http_servers.mobile_mcp_server import (
+ create_mobile_mcp_server
+)
+
+
+class TestMobileMCPServer:
+ """Unit tests for Mobile MCP Server tools."""
+
+ @pytest.mark.asyncio
+ @patch('subprocess.run')
+ async def test_tap_screen_success(self, mock_run):
+ """Test successful tap execution."""
+ # Mock subprocess result
+ mock_run.return_value = MagicMock(
+ returncode=0,
+ stdout="",
+ stderr=""
+ )
+
+ # Import tool function (assuming it's exposed)
+ from mobile_mcp_server import tap_screen
+
+ result = await tap_screen(x=100, y=200)
+
+ assert result["success"] == True
+ assert "Tapped at (100, 200)" in result["message"]
+ mock_run.assert_called_once()
+
+ @pytest.mark.asyncio
+ async def test_tap_screen_invalid_coordinates(self):
+ """Test tap with invalid coordinates."""
+ from mobile_mcp_server import tap_screen
+
+ result = await tap_screen(x=-10, y=50)
+
+ assert result["success"] == False
+ assert "Invalid coordinates" in result["error"]
+
+ @pytest.mark.asyncio
+ @patch('subprocess.run')
+ async def test_swipe_timeout(self, mock_run):
+ """Test swipe with timeout."""
+ mock_run.side_effect = subprocess.TimeoutExpired(cmd="adb", timeout=5)
+
+ from mobile_mcp_server import swipe
+
+ result = await swipe(0, 0, 100, 100)
+
+ assert result["success"] == False
+ assert "timeout" in result["error"].lower()
+```
+
+### Integration Testing
+
+```python
+# tests/integration/test_mcp_server_integration.py
+
+import pytest
+import requests
+from ufo.client.mcp.mcp_server_manager import HTTPMCPServer
+
+
+class TestMCPServerIntegration:
+ """Integration tests for MCP server."""
+
+ @pytest.fixture
+ def mcp_server(self):
+ """Start MCP server for testing."""
+ config = {
+ "host": "localhost",
+ "port": 8020,
+ "path": "/mcp"
+ }
+ server = HTTPMCPServer(config)
+ server.start()
+ yield server
+ server.stop()
+
+ def test_server_health(self, mcp_server):
+ """Test server is reachable."""
+ response = requests.get(f"{mcp_server.server}/health")
+ assert response.status_code == 200
+
+ def test_tap_screen_end_to_end(self, mcp_server):
+ """Test tap screen tool end-to-end."""
+ payload = {
+ "tool": "tap_screen",
+ "parameters": {"x": 100, "y": 200}
+ }
+ response = requests.post(
+ f"{mcp_server.server}/execute",
+ json=payload
+ )
+
+ assert response.status_code == 200
+ result = response.json()
+ assert "success" in result
+```
+
+### Manual Testing
+
+```bash
+# 1. Start MCP server
+python -m ufo.client.mcp.http_servers.mobile_mcp_server \
+ --host localhost \
+ --port 8020 \
+ --platform android
+
+# 2. Test with curl
+curl -X POST http://localhost:8020/mcp \
+ -H "Content-Type: application/json" \
+ -d '{
+ "tool": "tap_screen",
+ "parameters": {"x": 500, "y": 1000}
+ }'
+
+# 3. Expected response
+{
+ "success": true,
+ "message": "Tapped at (500, 1000)",
+ "platform": "android"
+}
+```
+
+---
+
+## Summary
+
+**What You've Built**:
+
+- ✅ Platform-specific MCP server with FastMCP
+- ✅ Type-safe tool definitions with Pydantic
+- ✅ Async execution with timeout handling
+- ✅ Security validation and error handling
+- ✅ Comprehensive testing strategy
+
+**Key Takeaways**:
+
+| Concept | Best Practice |
+|---------|---------------|
+| **Tool Design** | Atomic, single-purpose operations |
+| **Type Safety** | Use `Annotated[T, Field(description=...)]` |
+| **Error Handling** | Always return structured dicts, never raise |
+| **Security** | Validate inputs, block dangerous operations |
+| **Async** | Use `asyncio` for non-blocking execution |
+| **Testing** | Unit + integration tests for all tools |
+
+---
+
+## Next Steps
+
+**Continue to**: [Part 3: Client Setup →](client_setup.md)
+
+Learn how to configure the UFO client to connect to your MCP server and enable device agent execution.
+
+---
+
+## Related Documentation
+
+- **[MCP Overview](../../mcp/overview.md)** - Model Context Protocol fundamentals
+- **[Creating MCP Servers](../creating_mcp_servers.md)** - General MCP server tutorial
+- **[FastMCP Documentation](https://github.com/jlowin/fastmcp)** - FastMCP library reference
+- **[AIP Protocol](../../aip/overview.md)** - Agent Interaction Protocol
+
+---
+
+**Previous**: [← Part 1: Core Components](core_components.md)
+**Next**: [Part 3: Client Setup →](client_setup.md)
diff --git a/documents/docs/tutorials/creating_device_agent/overview.md b/documents/docs/tutorials/creating_device_agent/overview.md
new file mode 100644
index 000000000..a5c7c25b2
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/overview.md
@@ -0,0 +1,642 @@
+# Creating a New Device Agent - Complete Tutorial
+
+This comprehensive tutorial teaches you how to create a new device agent (like `MobileAgent`, `AndroidAgent`, or `iOSAgent`) and integrate it with UFO³'s multi-device orchestration system. We'll use **LinuxAgent** as our primary reference implementation.
+
+---
+
+## 📋 Table of Contents
+
+1. [Introduction](#introduction)
+2. [Prerequisites](#prerequisites)
+3. [Understanding Device Agents](#understanding-device-agents)
+4. [LinuxAgent: Reference Implementation](#linuxagent-reference-implementation)
+5. [Architecture Overview](#architecture-overview)
+6. [Tutorial Roadmap](#tutorial-roadmap)
+
+---
+
+## Introduction
+
+### What is a Device Agent?
+
+A **Device Agent** is a specialized AI agent that controls and automates tasks on a specific type of device or platform. Unlike traditional third-party agents that extend specific functionality, device agents represent entire computing platforms with their own:
+
+- **Execution Environment**: Device-specific OS, runtime, and APIs
+- **Control Mechanism**: UI automation, CLI commands, or platform APIs
+- **Communication Protocol**: Client-server architecture via WebSocket
+- **MCP Integration**: Device-specific MCP servers for command execution
+
+### Device Agent vs Third-Party Agent
+
+| Aspect | Device Agent | Third-Party Agent |
+|--------|--------------|-------------------|
+| **Scope** | Full platform control (Windows, Linux, Mobile) | Specific functionality (Hardware, Web) |
+| **Architecture** | Client-Server separation | Runs on orchestrator server |
+| **Communication** | WebSocket + AIP Protocol | Direct method calls |
+| **MCP Servers** | Platform-specific MCP servers | Shares MCP servers |
+| **Examples** | WindowsAgent, LinuxAgent, MobileAgent | HardwareAgent, WebAgent |
+| **Deployment** | Separate client process on device | Part of orchestrator |
+
+### When to Create a Device Agent
+
+Create a **Device Agent** when you need to:
+
+- Control an entirely new platform (mobile, IoT, embedded)
+- Execute tasks on remote or distributed devices
+- Integrate with Galaxy multi-device orchestration
+- Isolate execution for security or scalability
+
+Create a **Third-Party Agent** when you need to:
+
+- Extend existing platform with new capabilities
+- Add specialized tools or APIs
+- Run alongside existing agents
+
+---
+
+## Prerequisites
+
+Before starting this tutorial, ensure you have:
+
+### Knowledge Requirements
+
+- ✅ **Python 3.10+**: Intermediate Python programming skills
+- ✅ **Async Programming**: Understanding of `async`/`await` patterns
+- ✅ **UFO³ Basics**: Familiarity with [Agent Architecture](../../infrastructure/agents/overview.md)
+- ✅ **MCP Protocol**: Understanding of [Model Context Protocol](../../mcp/overview.md)
+- ✅ **WebSocket**: Basic knowledge of WebSocket communication
+
+### Recommended Reading
+
+| Priority | Topic | Link | Time |
+|----------|-------|------|------|
+| 🥇 | **Agent Architecture Overview** | [Infrastructure/Agents](../../infrastructure/agents/overview.md) | 20 min |
+| 🥇 | **LinuxAgent Quick Start** | [Quick Start: Linux](../../getting_started/quick_start_linux.md) | 15 min |
+| 🥈 | **Server-Client Architecture** | [Server Overview](../../server/overview.md), [Client Overview](../../client/overview.md) | 30 min |
+| 🥈 | **MCP Integration** | [MCP Overview](../../mcp/overview.md) | 20 min |
+| 🥉 | **AIP Protocol** | [AIP Protocol](../../aip/overview.md) | 15 min |
+
+### Development Environment
+
+```bash
+# Clone UFO³ repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Verify installation
+python -c "import ufo; print('UFO³ installed successfully')"
+```
+
+---
+
+## Understanding Device Agents
+
+### Three-Layer Architecture
+
+All device agents in UFO³ follow a **unified three-layer architecture**:
+
+```mermaid
+graph TB
+ subgraph "Device Agent Architecture"
+ subgraph "Level-1: State Layer (FSM)"
+ S1[AgentState]
+ S2[State Machine]
+ S3[State Transitions]
+ S1 --> S2 --> S3
+ end
+
+ subgraph "Level-2: Strategy Layer (Execution Logic)"
+ P1[ProcessorTemplate]
+ P2[DATA_COLLECTION]
+ P3[LLM_INTERACTION]
+ P4[ACTION_EXECUTION]
+ P5[MEMORY_UPDATE]
+ P1 --> P2 --> P3 --> P4 --> P5
+ end
+
+ subgraph "Level-3: Command Layer (System Interface)"
+ C1[CommandDispatcher]
+ C2[MCP Tools]
+ C3[Device Commands]
+ C1 --> C2 --> C3
+ end
+
+ S3 -->|delegates to| P1
+ P5 -->|executes via| C1
+ end
+
+ style S1 fill:#e1f5ff
+ style P1 fill:#fff3e0
+ style C1 fill:#f3e5f5
+```
+
+**Key Layers**:
+
+1. **State Layer (Level-1)**: Finite State Machine controlling agent lifecycle
+2. **Strategy Layer (Level-2)**: Processing pipeline with modular strategies
+3. **Command Layer (Level-3)**: Atomic system operations via MCP
+
+For detailed architecture, see [Agent Architecture Documentation](../../infrastructure/agents/overview.md).
+
+---
+
+### Server-Client Separation
+
+Device agents use a **server-client architecture** for security and scalability:
+
+```mermaid
+graph LR
+ subgraph "Server Side (Orchestrator)"
+ Server[Device Agent Server]
+ State[State Machine]
+ Processor[Strategy Processor]
+ LLM[LLM Service]
+
+ Server --> State
+ Server --> Processor
+ Processor -.-> LLM
+ end
+
+ subgraph "Communication"
+ AIP[AIP Protocol WebSocket]
+ end
+
+ subgraph "Client Side (Device)"
+ Client[Device Client]
+ MCP[MCP Server Manager]
+ Tools[Platform Tools]
+ OS[Device OS]
+
+ Client --> MCP
+ MCP --> Tools
+ Tools --> OS
+ end
+
+ Server <-->|Commands/Results| AIP
+ AIP <-->|Commands/Results| Client
+
+ style Server fill:#e1f5ff
+ style Client fill:#c8e6c9
+ style AIP fill:#fff3e0
+```
+
+**Separation Benefits**:
+
+| Component | Location | Responsibilities | Security |
+|-----------|----------|------------------|----------|
+| **Agent Server** | Orchestrator | Reasoning, planning, state management | Untrusted (LLM-driven) |
+| **Device Client** | Target Device | Command execution, resource access | Trusted (validated operations) |
+| **AIP Protocol** | Network | Message transport, serialization | Encrypted channel |
+
+**Separation Benefits**:
+
+- **Security**: Isolates LLM reasoning from system-level execution
+- **Scalability**: Single orchestrator manages multiple devices
+- **Flexibility**: Clients run on resource-constrained devices (mobile, IoT)
+- **Safety**: Client validates all commands before execution
+
+---
+
+## LinuxAgent: Reference Implementation
+
+### Why LinuxAgent as Reference?
+
+**LinuxAgent** is the ideal reference for creating new device agents because:
+
+- ✅ **Simple Architecture**: Single-tier agent (no HostAgent delegation)
+- ✅ **Clear Separation**: Clean server-client boundary
+- ✅ **Well-Documented**: Comprehensive code and documentation
+- ✅ **Production-Ready**: Battle-tested in real deployments
+- ✅ **Minimal Complexity**: Focuses on core device agent patterns
+
+### LinuxAgent Components
+
+```mermaid
+graph TB
+ subgraph "Server Side (ufo/agents/)"
+ LA[LinuxAgent Class customized_agent.py]
+ LAP[LinuxAgentProcessor customized_agent_processor.py]
+ LAS[LinuxAgent Strategies linux_agent_strategy.py]
+ LAST[LinuxAgent States linux_agent_state.py]
+
+ LA --> LAP
+ LAP --> LAS
+ LA --> LAST
+ end
+
+ subgraph "Client Side (ufo/client/)"
+ Client[UFO Client client.py]
+ MCP[MCP Server Manager mcp_server_manager.py]
+ LinuxMCP[Linux MCP Server linux_mcp_server.py]
+
+ Client --> MCP
+ MCP --> LinuxMCP
+ end
+
+ subgraph "Configuration"
+ Config[third_party.yaml]
+ Devices[devices.yaml]
+ Prompts[Prompt Templates]
+ end
+
+ LA -.reads.-> Config
+ Client -.reads.-> Devices
+ LA -.uses.-> Prompts
+
+ style LA fill:#c8e6c9
+ style LAP fill:#c8e6c9
+ style LAS fill:#c8e6c9
+ style LAST fill:#c8e6c9
+ style Client fill:#e1f5ff
+ style MCP fill:#e1f5ff
+ style LinuxMCP fill:#e1f5ff
+```
+
+**File Locations**:
+
+| Component | File Path | Purpose |
+|-----------|-----------|---------|
+| **Agent Class** | `ufo/agents/agent/customized_agent.py` | LinuxAgent definition |
+| **Processor** | `ufo/agents/processors/customized/customized_agent_processor.py` | LinuxAgentProcessor |
+| **Strategies** | `ufo/agents/processors/strategies/linux_agent_strategy.py` | LLM & Action strategies |
+| **States** | `ufo/agents/states/linux_agent_state.py` | State machine states |
+| **Prompter** | `ufo/prompter/customized/linux_agent_prompter.py` | Prompt construction |
+| **Client** | `ufo/client/client.py` | Device client entry point |
+| **MCP Server** | `ufo/client/mcp/http_servers/linux_mcp_server.py` | Command execution |
+
+---
+
+### LinuxAgent Architecture Diagram
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Server as LinuxAgent Server
+ participant AIP as AIP Protocol
+ participant Client as Linux Client
+ participant MCP as Linux MCP Server
+ participant Shell as Bash Shell
+
+ User->>Server: User Request: "List files in /tmp"
+
+ Server->>Server: State: ContinueLinuxAgentState
+ Server->>Server: Processor: LinuxAgentProcessor
+
+ Server->>Server: Strategy: LLM_INTERACTION
+ Note over Server: Construct prompt, call LLM
+ Server->>Server: LLM Response: execute_command("ls -la /tmp")
+
+ Server->>Server: Strategy: ACTION_EXECUTION
+ Server->>AIP: COMMAND: execute_command
+ AIP->>Client: WebSocket: COMMAND
+
+ Client->>MCP: Call MCP Tool: execute_command
+ MCP->>Shell: Execute: ls -la /tmp
+ Shell-->>MCP: stdout, stderr, exit_code
+ MCP-->>Client: Result
+ Client->>AIP: WebSocket: RESULT
+ AIP->>Server: RESULT
+
+ Server->>Server: Strategy: MEMORY_UPDATE
+ Server->>Server: Update memory & blackboard
+
+ Server->>Server: State Transition: FINISH
+ Server->>User: Task Complete
+```
+
+**Key Execution Flow**:
+
+1. **User Request** → LinuxAgent Server receives request
+2. **State Machine** → Activates `ContinueLinuxAgentState`
+3. **Processor** → Executes `LinuxAgentProcessor` strategies
+4. **LLM Interaction** → Generates shell command
+5. **Action Execution** → Sends command via AIP to client
+6. **MCP Execution** → Client executes via Linux MCP Server
+7. **Result Handling** → Server receives result, updates memory
+8. **State Transition** → Moves to `FINISH` state
+
+---
+
+## Architecture Overview
+
+### Complete Device Agent Architecture
+
+When creating a new device agent (e.g., `MobileAgent`), you'll implement these components:
+
+```mermaid
+graph TB
+ subgraph "1. Agent Definition"
+ A1[Agent Class MobileAgent]
+ A2[Processor MobileAgentProcessor]
+ A3[State Manager MobileAgentStateManager]
+ end
+
+ subgraph "2. Processing Strategies"
+ S1[DATA_COLLECTION Screenshot, UI Tree]
+ S2[LLM_INTERACTION Prompt Construction]
+ S3[ACTION_EXECUTION Command Dispatch]
+ S4[MEMORY_UPDATE Context Update]
+ end
+
+ subgraph "3. MCP Server"
+ M1[MCP Server mobile_mcp_server.py]
+ M2[MCP Tools tap, swipe, type, etc.]
+ end
+
+ subgraph "4. Configuration"
+ C1[third_party.yaml Agent Config]
+ C2[devices.yaml Device Registry]
+ C3[Prompt Templates LLM Prompts]
+ end
+
+ subgraph "5. Client"
+ CL1[Device Client client.py]
+ CL2[MCP Manager mcp_server_manager.py]
+ end
+
+ A1 --> A2
+ A2 --> S1 & S2 & S3 & S4
+ S3 --> M1
+ M1 --> M2
+ A1 -.reads.-> C1
+ CL1 --> CL2
+ CL2 --> M1
+ CL1 -.reads.-> C2
+ A2 -.uses.-> C3
+
+ style A1 fill:#c8e6c9
+ style A2 fill:#c8e6c9
+ style A3 fill:#c8e6c9
+ style M1 fill:#e1f5ff
+ style CL1 fill:#e1f5ff
+```
+
+**Implementation Checklist**:
+
+- [ ] **Agent Class**: Define `MobileAgent` inheriting from `CustomizedAgent`
+- [ ] **Processor**: Create `MobileAgentProcessor` with custom strategies
+- [ ] **State Manager**: Implement `MobileAgentStateManager` and states
+- [ ] **Strategies**: Build platform-specific LLM and action strategies
+- [ ] **MCP Server**: Develop MCP server with platform tools
+- [ ] **Prompter**: Create custom prompter for mobile context
+- [ ] **Client Setup**: Configure client to run on mobile device
+- [ ] **Configuration**: Add agent config to `third_party.yaml`
+- [ ] **Device Registry**: Register device in `devices.yaml`
+- [ ] **Prompt Templates**: Write LLM prompt templates
+
+---
+
+## Tutorial Roadmap
+
+This tutorial is split into **6 detailed guides**:
+
+### 📘 Part 1: [Core Components](core_components.md)
+
+Learn to implement the **server-side components**:
+
+- Agent Class definition
+- Processor and strategies
+- State Manager and states
+- Prompter for LLM interaction
+
+**Time**: 45 minutes
+**Difficulty**: ⭐⭐⭐
+
+---
+
+### 📘 Part 2: [MCP Server Development](mcp_server.md)
+
+Create a **platform-specific MCP server**:
+
+- MCP server architecture
+- Defining MCP tools
+- Command execution logic
+- Error handling and validation
+
+**Time**: 30 minutes
+**Difficulty**: ⭐⭐
+
+---
+
+### 📘 Part 3: [Client Configuration](client_setup.md)
+
+Set up the **device client**:
+
+- Client initialization
+- MCP server manager integration
+- WebSocket connection setup
+- Platform detection
+
+**Time**: 20 minutes
+**Difficulty**: ⭐⭐
+
+---
+
+### 📘 Part 4: [Configuration & Deployment](configuration.md)
+
+Configure and deploy your agent:
+
+- `third_party.yaml` configuration
+- `devices.yaml` device registration
+- Prompt template creation
+- Galaxy integration
+
+**Time**: 25 minutes
+**Difficulty**: ⭐⭐
+
+---
+
+### 📘 Part 5: [Testing & Debugging](testing.md)
+
+Test and debug your implementation:
+
+- Unit testing strategies
+- Integration testing
+- Debugging techniques
+- Common issues and solutions
+
+**Time**: 30 minutes
+**Difficulty**: ⭐⭐⭐
+
+---
+
+### 📘 Part 6: [Complete Example: MobileAgent](example_mobile_agent.md)
+
+**Hands-on walkthrough** creating `MobileAgent`:
+
+- Step-by-step implementation
+- Android/iOS platform specifics
+- UI Automator integration
+- Complete working example
+
+**Time**: 60 minutes
+**Difficulty**: ⭐⭐⭐⭐
+
+---
+
+## Quick Start Guide
+
+For experienced developers, here's a **minimal implementation checklist**:
+
+### 1️⃣ Create Agent Class
+
+```python
+# ufo/agents/agent/customized_agent.py
+
+@AgentRegistry.register(
+ agent_name="MobileAgent",
+ third_party=True,
+ processor_cls=MobileAgentProcessor
+)
+class MobileAgent(CustomizedAgent):
+ def __init__(self, name, main_prompt, example_prompt):
+ super().__init__(name, main_prompt, example_prompt,
+ process_name=None, app_root_name=None, is_visual=None)
+ self._blackboard = Blackboard()
+ self.set_state(self.default_state)
+ self._context_provision_executed = False
+
+ @property
+ def default_state(self):
+ return ContinueMobileAgentState()
+```
+
+### 2️⃣ Create Processor
+
+```python
+# ufo/agents/processors/customized/customized_agent_processor.py
+
+class MobileAgentProcessor(CustomizedProcessor):
+ def _setup_strategies(self):
+ # Compose multiple data collection strategies
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ MobileScreenshotCaptureStrategy(fail_fast=True),
+ MobileAppsCollectionStrategy(fail_fast=False),
+ MobileControlsCollectionStrategy(fail_fast=False),
+ ],
+ name="MobileDataCollectionStrategy",
+ fail_fast=True,
+ )
+
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ MobileLLMInteractionStrategy(fail_fast=True)
+ )
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ MobileActionExecutionStrategy(fail_fast=False)
+ )
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(fail_fast=False)
+ )
+```
+
+### 3️⃣ Create MCP Server
+
+```python
+# ufo/client/mcp/http_servers/mobile_mcp_server.py
+
+def create_mobile_mcp_server(host="localhost", port=8020):
+ mcp = FastMCP("Mobile MCP Server", stateless_http=False,
+ json_response=True, host=host, port=port)
+
+ @mcp.tool()
+ async def tap_element(x: int, y: int) -> dict:
+ # Execute tap via ADB or platform API
+ pass
+
+ mcp.run(transport="streamable-http")
+```
+
+### 4️⃣ Configure Agent
+
+```yaml
+# config/ufo/third_party.yaml
+
+ENABLED_THIRD_PARTY_AGENTS: ["MobileAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ MobileAgent:
+ VISUAL_MODE: True
+ AGENT_NAME: "MobileAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/mobile_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/mobile_agent_example.yaml"
+ INTRODUCTION: "MobileAgent controls Android/iOS devices..."
+```
+
+### 5️⃣ Register Device
+
+```yaml
+# config/galaxy/devices.yaml
+
+devices:
+ - device_id: "mobile_agent_1"
+ server_url: "ws://localhost:5010/ws"
+ os: "android"
+ capabilities: ["ui_automation", "app_testing"]
+ metadata:
+ device_model: "Pixel 6"
+ android_version: "13"
+ max_retries: 5
+```
+
+### 6️⃣ Start Server & Client
+
+```bash
+# Terminal 1: Start Agent Server
+python -m ufo.server.app --port 5010
+
+# Terminal 2: Start Device Client
+python -m ufo.client.client \
+ --ws --ws-server ws://localhost:5010/ws \
+ --client-id mobile_agent_1 \
+ --platform android
+
+# Terminal 3: Start MCP Server (on device or accessible endpoint)
+python -m ufo.client.mcp.http_servers.mobile_mcp_server --port 8020
+```
+
+---
+
+## Next Steps
+
+**Ready to Build Your Device Agent?**
+
+Start with Part 1: [Core Components →](core_components.md)
+
+Or jump to a specific topic:
+
+- [MCP Server Development](mcp_server.md)
+- [Configuration & Deployment](configuration.md)
+- [Complete Example: MobileAgent](example_mobile_agent.md)
+
+---
+
+## Related Documentation
+
+- **[Agent Architecture](../../infrastructure/agents/overview.md)** - Three-layer architecture deep dive
+- **[Linux Agent Quick Start](../../getting_started/quick_start_linux.md)** - LinuxAgent deployment guide
+- **[Server Overview](../../server/overview.md)** - Server-side orchestration
+- **[Client Overview](../../client/overview.md)** - Client-side execution
+- **[MCP Overview](../../mcp/overview.md)** - Model Context Protocol
+- **[AIP Protocol](../../aip/overview.md)** - Agent Interaction Protocol
+- **[Creating Third-Party Agents](../creating_third_party_agents.md)** - Third-party agent tutorial
+
+---
+
+## Summary
+
+**Key Takeaways**:
+
+- **Device Agents** control entire platforms (Windows, Linux, Mobile)
+- **Server-Client Architecture** separates reasoning from execution
+- **Three-Layer Design** provides modular, extensible framework
+- **LinuxAgent** is the best reference implementation
+- **6-Part Tutorial** covers all aspects of device agent creation
+- **MCP Integration** enables platform-specific command execution
+- **Galaxy Integration** supports multi-device orchestration
+
+**Ready to build your first device agent? Let's get started!** 🚀
+
diff --git a/documents/docs/tutorials/creating_device_agent/testing.md b/documents/docs/tutorials/creating_device_agent/testing.md
new file mode 100644
index 000000000..0a4aa147a
--- /dev/null
+++ b/documents/docs/tutorials/creating_device_agent/testing.md
@@ -0,0 +1,50 @@
+# Part 5: Testing & Debugging
+
+**Note**: This tutorial is currently under development. Check back soon for comprehensive testing and debugging guidance.
+
+## What You'll Learn
+
+- Unit testing strategies
+- Integration testing
+- Debugging techniques
+- Common issues and solutions
+- Performance optimization
+
+## Temporary Quick Guide
+
+### Basic Testing
+
+```python
+# tests/test_mobile_agent.py
+
+import pytest
+from ufo.agents.agent.customized_agent import MobileAgent
+
+def test_agent_initialization():
+ agent = MobileAgent(
+ name="test_agent",
+ main_prompt="ufo/prompts/third_party/mobile_agent.yaml",
+ example_prompt="ufo/prompts/third_party/mobile_agent_example.yaml",
+ platform="android",
+ )
+ assert agent.name == "test_agent"
+ assert agent.platform == "android"
+```
+
+### Common Issues
+
+| Issue | Solution |
+|-------|----------|
+| Agent not registered | Check `@AgentRegistry.register()` decorator |
+| MCP server not responding | Verify MCP server is running on correct port |
+| WebSocket connection failed | Check server URL and network connectivity |
+
+## Related Documentation
+
+- **[Testing Best Practices](../../infrastructure/agents/overview.md#best-practices)** - Agent testing
+- **[Troubleshooting](../../getting_started/quick_start_linux.md#common-issues-troubleshooting)** - Common issues
+
+---
+
+**Previous**: [← Part 4: Configuration](configuration.md)
+**Next**: [Part 6: Complete Example →](example_mobile_agent.md)
diff --git a/documents/docs/tutorials/creating_mcp_servers.md b/documents/docs/tutorials/creating_mcp_servers.md
new file mode 100644
index 000000000..2ae2cb51a
--- /dev/null
+++ b/documents/docs/tutorials/creating_mcp_servers.md
@@ -0,0 +1,1284 @@
+# Creating Custom MCP Servers - Complete Tutorial
+
+This tutorial teaches you how to create, register, and deploy custom MCP servers for UFO² agents. You'll learn to build **local**, **HTTP**, and **stdio** MCP servers, and how to register them with different agents.
+
+**Prerequisites**: Basic Python knowledge, familiarity with [MCP Overview](../mcp/overview.md) and [MCP Configuration](../mcp/configuration.md). Review [Built-in Local Servers](../mcp/local_servers.md) as examples.
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Local MCP Servers](#local-mcp-servers)
+3. [HTTP MCP Servers](#http-mcp-servers)
+4. [Stdio MCP Servers](#stdio-mcp-servers)
+5. [Registering Servers with Agents](#registering-servers-with-agents)
+6. [Best Practices](#best-practices)
+7. [Troubleshooting](#troubleshooting)
+
+---
+
+## Overview
+
+### MCP Server Types
+
+UFO² supports three deployment models:
+
+| Type | Deployment | Use Case | Complexity |
+|------|------------|----------|------------|
+| **Local** | In-process with agent | Fast, built-in tools | ⭐ Simple |
+| **HTTP** | Standalone HTTP server | Cross-platform, remote control | ⭐⭐ Moderate |
+| **Stdio** | Child process (stdin/stdout) | Process isolation, third-party tools | ⭐⭐⭐ Advanced |
+
+### Server Categories
+
+All MCP servers fall into two categories:
+
+| Category | Purpose | LLM Selectable? | Auto-Invoked? |
+|----------|---------|-----------------|---------------|
+| **Data Collection** | Read-only observation | ❌ No | ✅ Yes |
+| **Action** | State-changing execution | ✅ Yes | ❌ No |
+
+**Tool Selection:**
+- **Data Collection tools**: Automatically invoked by the framework to build observation prompts
+- **Action tools**: LLM agent actively selects which tool to execute at each step
+
+**Important**: Write clear docstrings and type annotations - they become LLM instructions!
+
+---
+
+## Local MCP Servers
+
+Local servers run **in-process** with the UFO² agent, providing the fastest tool access.
+
+### Step 1: Create Your Server
+
+Create a Python file in `ufo/client/mcp/local_servers/` (or your custom location):
+
+```python
+# File: ufo/client/mcp/local_servers/my_custom_server.py
+
+from typing import Annotated
+from fastmcp import FastMCP
+from pydantic import Field
+from ufo.client.mcp.mcp_registry import MCPRegistry
+
+
+@MCPRegistry.register_factory_decorator("MyCustomExecutor")
+def create_my_custom_server(*args, **kwargs) -> FastMCP:
+ """
+ Create a custom MCP server for specialized automation.
+ Factory function registered with MCPRegistry for lazy initialization.
+
+ :return: FastMCP instance with custom tools.
+ """
+
+ # Create FastMCP instance
+ mcp = FastMCP("My Custom MCP Server")
+
+ # Define tools using @mcp.tool() decorator
+ @mcp.tool()
+ def greet_user(
+ name: Annotated[str, Field(description="The name of the user to greet.")],
+ formal: Annotated[bool, Field(description="Use formal greeting?")] = False,
+ ) -> Annotated[str, Field(description="The greeting message.")]:
+ """
+ Greet a user with a customized message.
+ Use formal=True for business contexts, False for casual.
+ """
+ if formal:
+ return f"Good day, {name}. How may I assist you?"
+ else:
+ return f"Hey {name}! What's up?"
+
+ @mcp.tool()
+ def calculate_sum(
+ numbers: Annotated[
+ list[int],
+ Field(description="List of integers to sum.")
+ ],
+ ) -> Annotated[int, Field(description="The sum of all numbers.")]:
+ """
+ Calculate the sum of a list of numbers.
+ Useful for quick arithmetic operations.
+ """
+ return sum(numbers)
+
+ return mcp
+```
+
+!!!warning "Critical Design Rules"
+ 1. **Use `@MCPRegistry.register_factory_decorator("Namespace")`** to register the factory
+ 2. **Factory function must return a `FastMCP` instance**
+ 3. **Use `@mcp.tool()` decorator** for each tool
+ 4. **Write detailed docstrings** - they become LLM instructions
+ 5. **Use `Annotated[Type, Field(description="...")]`** for all parameters and returns
+ 6. **Namespace must be unique** across all servers
+
+### Step 2: Import the Server
+
+Add your server to `ufo/client/mcp/local_servers/__init__.py`:
+
+```python
+# File: ufo/client/mcp/local_servers/__init__.py
+
+from .my_custom_server import create_my_custom_server
+# ... other imports
+
+__all__ = [
+ "create_my_custom_server",
+ # ... other exports
+]
+```
+
+### Step 3: Configure in mcp.yaml
+
+Add your server to the appropriate agent in `config/ufo/mcp.yaml`:
+
+```yaml
+# For action server (LLM-selectable)
+CustomAgent:
+ default:
+ action:
+ - namespace: MyCustomExecutor
+ type: local
+ reset: false
+
+# For data collection server (auto-invoked)
+CustomAgent:
+ default:
+ data_collection:
+ - namespace: MyCustomCollector
+ type: local
+ reset: false
+```
+
+### Step 4: Test Your Server
+
+Test locally before integration:
+
+```python
+# File: test_my_server.py
+
+import asyncio
+from fastmcp.client import Client
+from ufo.client.mcp.local_servers.my_custom_server import create_my_custom_server
+
+
+async def test_server():
+ """Test the custom MCP server."""
+ server = create_my_custom_server()
+
+ async with Client(server) as client:
+ # List available tools
+ tools = await client.list_tools()
+ print(f"Available tools: {[t.name for t in tools]}")
+
+ # Test greet_user tool
+ result = await client.call_tool(
+ "greet_user",
+ arguments={"name": "Alice", "formal": True}
+ )
+ print(f"Greeting: {result.data}")
+
+ # Test calculate_sum tool
+ result = await client.call_tool(
+ "calculate_sum",
+ arguments={"numbers": [1, 2, 3, 4, 5]}
+ )
+ print(f"Sum: {result.data}")
+
+
+if __name__ == "__main__":
+ asyncio.run(test_server())
+```
+
+### Example: Application-Specific Server
+
+Here's a real-world example - a server for Chrome browser automation. For more details on wrapping application native APIs, see [Wrapping App Native API](creating_app_agent/warpping_app_native_api.md).
+
+```python
+# File: ufo/client/mcp/local_servers/chrome_executor.py
+
+from typing import Annotated, Optional
+from fastmcp import FastMCP
+from pydantic import Field
+from ufo.client.mcp.mcp_registry import MCPRegistry
+from ufo.automator.puppeteer import AppPuppeteer
+from ufo.automator.action_execution import ActionExecutor
+from ufo.agents.processors.schemas.actions import ActionCommandInfo
+
+
+@MCPRegistry.register_factory_decorator("ChromeExecutor")
+def create_chrome_executor(process_name: str, *args, **kwargs) -> FastMCP:
+ """
+ Create a Chrome-specific automation server.
+
+ :param process_name: Chrome process name for UI automation.
+ :return: FastMCP instance for Chrome automation.
+ """
+
+ # Initialize puppeteer for Chrome
+ puppeteer = AppPuppeteer(
+ process_name=process_name,
+ app_root_name="chrome.exe",
+ )
+ executor = ActionExecutor()
+
+ def _execute(action: ActionCommandInfo) -> dict:
+ """Execute action via puppeteer."""
+ return executor.execute(action, puppeteer, control_dict={})
+
+ mcp = FastMCP("Chrome Automation MCP Server")
+
+ @mcp.tool()
+ def navigate_to_url(
+ url: Annotated[str, Field(description="The URL to navigate to.")],
+ ) -> Annotated[str, Field(description="Navigation result message.")]:
+ """
+ Navigate Chrome to a specific URL.
+ Example: navigate_to_url(url="https://www.google.com")
+ """
+ action = ActionCommandInfo(
+ function="navigate",
+ arguments={"url": url},
+ )
+ return _execute(action)
+
+ @mcp.tool()
+ def search_in_page(
+ query: Annotated[str, Field(description="Search query text.")],
+ case_sensitive: Annotated[
+ bool, Field(description="Case-sensitive search?")
+ ] = False,
+ ) -> Annotated[str, Field(description="Search results.")]:
+ """
+ Search for text in the current Chrome page.
+ Returns the number of matches found.
+ """
+ action = ActionCommandInfo(
+ function="find_in_page",
+ arguments={"query": query, "case_sensitive": case_sensitive},
+ )
+ return _execute(action)
+
+ @mcp.tool()
+ def get_page_title() -> Annotated[str, Field(description="The page title.")]:
+ """
+ Get the title of the current Chrome page.
+ Useful for verifying page navigation.
+ """
+ action = ActionCommandInfo(function="get_title", arguments={})
+ return _execute(action)
+
+ return mcp
+```
+
+**Configuration:**
+
+```yaml
+AppAgent:
+ chrome.exe:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor # Generic UI automation
+ type: local
+ - namespace: ChromeExecutor # Chrome-specific tools
+ type: local
+ reset: true # Reset when switching tabs/windows
+```
+
+---
+
+## HTTP MCP Servers
+
+HTTP servers run as **standalone services**, enabling cross-platform automation and distributed workflows.
+
+### Step 1: Create HTTP Server
+
+Create a standalone Python script:
+
+```python
+# File: ufo/client/mcp/http_servers/my_http_server.py
+
+import argparse
+from typing import Annotated, Any, Dict
+from fastmcp import FastMCP
+from pydantic import Field
+
+
+def create_my_http_server(host: str = "localhost", port: int = 8020) -> None:
+ """
+ Create and run an HTTP MCP server.
+
+ :param host: Host address to bind the server.
+ :param port: Port number for the server.
+ """
+
+ # Create FastMCP with HTTP transport
+ mcp = FastMCP(
+ "My Custom HTTP MCP Server",
+ instructions="Custom automation server via HTTP.",
+ stateless_http=True, # Stateless HTTP (one-shot JSON)
+ json_response=True, # Return pure JSON bodies
+ host=host,
+ port=port,
+ )
+
+ @mcp.tool()
+ async def process_data(
+ data: Annotated[str, Field(description="Data to process.")],
+ transform: Annotated[
+ str, Field(description="Transformation type: 'upper', 'lower', 'reverse'.")
+ ] = "upper",
+ ) -> Annotated[Dict[str, Any], Field(description="Processing result.")]:
+ """
+ Process text data with various transformations.
+ Supports: 'upper' (uppercase), 'lower' (lowercase), 'reverse' (reverse string).
+ """
+ try:
+ if transform == "upper":
+ result = data.upper()
+ elif transform == "lower":
+ result = data.lower()
+ elif transform == "reverse":
+ result = data[::-1]
+ else:
+ return {"success": False, "error": f"Unknown transform: {transform}"}
+
+ return {
+ "success": True,
+ "original": data,
+ "transformed": result,
+ "transform_type": transform,
+ }
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ @mcp.tool()
+ async def get_server_info() -> Annotated[
+ Dict[str, Any], Field(description="Server information.")
+ ]:
+ """
+ Get information about the HTTP MCP server.
+ Returns server name, version, and status.
+ """
+ import platform
+ return {
+ "server": "My Custom HTTP MCP Server",
+ "version": "1.0.0",
+ "platform": platform.system(),
+ "status": "running",
+ }
+
+ # Start the HTTP server
+ mcp.run(transport="streamable-http")
+
+
+def main():
+ """Main entry point for the HTTP server."""
+ parser = argparse.ArgumentParser(description="My Custom HTTP MCP Server")
+ parser.add_argument("--port", type=int, default=8020, help="Server port")
+ parser.add_argument("--host", default="localhost", help="Server host")
+ args = parser.parse_args()
+
+ print("=" * 60)
+ print("My Custom HTTP MCP Server")
+ print(f"Running on {args.host}:{args.port}")
+ print("=" * 60)
+
+ create_my_http_server(host=args.host, port=args.port)
+
+
+if __name__ == "__main__":
+ main()
+```
+
+### Step 2: Start the HTTP Server
+
+Run the server as a standalone process:
+
+```bash
+# Start on localhost
+python -m ufo.client.mcp.http_servers.my_http_server --host localhost --port 8020
+
+# Start on all interfaces (for remote access)
+python -m ufo.client.mcp.http_servers.my_http_server --host 0.0.0.0 --port 8020
+```
+
+**For production, run as a background service:**
+
+**Linux/macOS:**
+```bash
+nohup python -m ufo.client.mcp.http_servers.my_http_server --host 0.0.0.0 --port 8020 &
+```
+
+**Windows:**
+```powershell
+Start-Process python -ArgumentList "-m", "ufo.client.mcp.http_servers.my_http_server", "--host", "0.0.0.0", "--port", "8020" -WindowStyle Hidden
+```
+
+### Step 3: Configure HTTP Server in mcp.yaml
+
+```yaml
+RemoteAgent:
+ default:
+ action:
+ - namespace: MyHTTPExecutor
+ type: http
+ host: "localhost" # Or remote IP: "192.168.1.100"
+ port: 8020
+ path: "/mcp"
+ reset: false
+```
+
+### Step 4: Test HTTP Server
+
+Test connectivity before integration:
+
+```python
+# File: test_http_server.py
+
+import asyncio
+from fastmcp.client import Client
+
+
+async def test_http_server():
+ """Test the HTTP MCP server."""
+ server_url = "http://localhost:8020/mcp"
+
+ async with Client(server_url) as client:
+ # List tools
+ tools = await client.list_tools()
+ print(f"Available tools: {[t.name for t in tools]}")
+
+ # Test process_data
+ result = await client.call_tool(
+ "process_data",
+ arguments={"data": "Hello World", "transform": "reverse"}
+ )
+ print(f"Process result: {result.data}")
+
+ # Test get_server_info
+ result = await client.call_tool("get_server_info", arguments={})
+ print(f"Server info: {result.data}")
+
+
+if __name__ == "__main__":
+ asyncio.run(test_http_server())
+```
+
+### Example: Cross-Platform Linux Executor
+
+Real-world example - controlling Linux systems from Windows:
+
+```python
+# File: ufo/client/mcp/http_servers/linux_executor.py
+
+import argparse
+import asyncio
+from typing import Annotated, Any, Dict, Optional
+from fastmcp import FastMCP
+from pydantic import Field
+
+
+def create_linux_executor(host: str = "0.0.0.0", port: int = 8010) -> None:
+ """Linux command execution MCP server."""
+
+ mcp = FastMCP(
+ "Linux Executor MCP Server",
+ instructions="Execute shell commands on Linux.",
+ stateless_http=True,
+ json_response=True,
+ host=host,
+ port=port,
+ )
+
+ @mcp.tool()
+ async def execute_command(
+ command: Annotated[str, Field(description="Shell command to execute.")],
+ timeout: Annotated[int, Field(description="Timeout in seconds.")] = 30,
+ cwd: Annotated[
+ Optional[str], Field(description="Working directory.")
+ ] = None,
+ ) -> Annotated[Dict[str, Any], Field(description="Execution result.")]:
+ """
+ Execute a shell command on Linux and return stdout/stderr.
+ Dangerous commands (rm -rf /, shutdown, etc.) are blocked.
+ """
+ # Security check
+ dangerous = ["rm -rf /", "shutdown", "reboot", "mkfs"]
+ if any(d in command.lower() for d in dangerous):
+ return {"success": False, "error": "Blocked dangerous command."}
+
+ try:
+ proc = await asyncio.create_subprocess_shell(
+ command,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ cwd=cwd,
+ )
+
+ try:
+ stdout, stderr = await asyncio.wait_for(
+ proc.communicate(), timeout=timeout
+ )
+ except asyncio.TimeoutError:
+ proc.kill()
+ await proc.wait()
+ return {"success": False, "error": f"Timeout after {timeout}s."}
+
+ return {
+ "success": proc.returncode == 0,
+ "exit_code": proc.returncode,
+ "stdout": stdout.decode("utf-8", errors="replace"),
+ "stderr": stderr.decode("utf-8", errors="replace"),
+ }
+ except Exception as e:
+ return {"success": False, "error": str(e)}
+
+ @mcp.tool()
+ async def get_system_info() -> Annotated[
+ Dict[str, Any], Field(description="System information.")
+ ]:
+ """Get basic Linux system information."""
+ info = {}
+ cmds = {
+ "uname": "uname -a",
+ "uptime": "uptime",
+ "memory": "free -h",
+ }
+ for key, cmd in cmds.items():
+ try:
+ proc = await asyncio.create_subprocess_shell(
+ cmd, stdout=asyncio.subprocess.PIPE
+ )
+ out, _ = await proc.communicate()
+ info[key] = out.decode("utf-8", errors="replace").strip()
+ except Exception as e:
+ info[key] = f"Error: {e}"
+ return info
+
+ mcp.run(transport="streamable-http")
+
+
+def main():
+ parser = argparse.ArgumentParser(description="Linux Executor MCP Server")
+ parser.add_argument("--port", type=int, default=8010)
+ parser.add_argument("--host", default="0.0.0.0")
+ args = parser.parse_args()
+
+ print(f"Linux Executor running on {args.host}:{args.port}")
+ create_linux_executor(host=args.host, port=args.port)
+
+
+if __name__ == "__main__":
+ main()
+```
+
+**Deploy on Linux:**
+
+```bash
+# Start server on Linux machine
+python -m ufo.client.mcp.http_servers.linux_executor --host 0.0.0.0 --port 8010
+```
+
+**Configure on Windows UFO²:**
+
+```yaml
+LinuxAgent:
+ default:
+ action:
+ - namespace: LinuxExecutor
+ type: http
+ host: "192.168.1.50" # Linux machine IP
+ port: 8010
+ path: "/mcp"
+```
+
+**Cross-Platform Workflow**: Now your Windows UFO² agent can execute Linux commands remotely! The LLM will select `execute_command` or `get_system_info` as needed.
+
+---
+
+## Stdio MCP Servers
+
+Stdio servers run as **child processes**, communicating via stdin/stdout. They provide process isolation and work with any language.
+
+### Step 1: Create Stdio Server
+
+Create a standalone script that reads JSON-RPC from stdin and writes to stdout:
+
+```python
+# File: custom_stdio_server.py
+
+import sys
+import json
+from typing import Any, Dict
+
+
+def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
+ """
+ Handle incoming MCP request.
+
+ :param request: JSON-RPC request from stdin.
+ :return: JSON-RPC response.
+ """
+ method = request.get("method", "")
+ params = request.get("params", {})
+
+ if method == "tools/list":
+ # Return available tools
+ return {
+ "jsonrpc": "2.0",
+ "id": request.get("id"),
+ "result": {
+ "tools": [
+ {
+ "name": "echo",
+ "description": "Echo back a message.",
+ "inputSchema": {
+ "type": "object",
+ "properties": {
+ "message": {
+ "type": "string",
+ "description": "Message to echo.",
+ }
+ },
+ "required": ["message"],
+ },
+ }
+ ]
+ },
+ }
+
+ elif method == "tools/call":
+ tool_name = params.get("name", "")
+ arguments = params.get("arguments", {})
+
+ if tool_name == "echo":
+ message = arguments.get("message", "")
+ return {
+ "jsonrpc": "2.0",
+ "id": request.get("id"),
+ "result": {
+ "content": [
+ {
+ "type": "text",
+ "text": f"Echo: {message}",
+ }
+ ]
+ },
+ }
+ else:
+ return {
+ "jsonrpc": "2.0",
+ "id": request.get("id"),
+ "error": {
+ "code": -32601,
+ "message": f"Unknown tool: {tool_name}",
+ },
+ }
+
+ else:
+ return {
+ "jsonrpc": "2.0",
+ "id": request.get("id"),
+ "error": {
+ "code": -32601,
+ "message": f"Unknown method: {method}",
+ },
+ }
+
+
+def main():
+ """Main stdio loop."""
+ for line in sys.stdin:
+ try:
+ request = json.loads(line)
+ response = handle_request(request)
+ print(json.dumps(response), flush=True)
+ except Exception as e:
+ error_response = {
+ "jsonrpc": "2.0",
+ "id": None,
+ "error": {
+ "code": -32603,
+ "message": str(e),
+ },
+ }
+ print(json.dumps(error_response), flush=True)
+
+
+if __name__ == "__main__":
+ main()
+```
+
+### Step 2: Configure Stdio Server in mcp.yaml
+
+```yaml
+CustomAgent:
+ default:
+ action:
+ - namespace: CustomStdioExecutor
+ type: stdio
+ command: "python"
+ start_args: ["custom_stdio_server.py"]
+ env:
+ API_KEY: "secret_key"
+ LOG_LEVEL: "INFO"
+ cwd: "/path/to/server/directory"
+ reset: false
+```
+
+!!!warning "Stdio Limitations"
+ - **More complex** than local/HTTP servers
+ - Requires implementing **JSON-RPC protocol** manually
+ - Better suited for **third-party MCP servers** than custom tools
+ - For custom Python tools, **prefer local or HTTP servers**
+
+### Example: Third-Party Node.js Server
+
+Stdio is ideal for integrating existing MCP servers written in other languages:
+
+```yaml
+CustomAgent:
+ default:
+ action:
+ - namespace: NodeJSTools
+ type: stdio
+ command: "node"
+ start_args: ["./node_mcp_server/index.js"]
+ env:
+ NODE_ENV: "production"
+ cwd: "/path/to/node_mcp_server"
+```
+
+---
+
+## Registering Servers with Agents
+
+### Agent-Specific Registration
+
+Different agents can use different MCP server configurations:
+
+```yaml
+# HostAgent: System-level automation
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local
+
+# AppAgent: Application-specific automation
+AppAgent:
+ # Default configuration for all apps
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: CommandLineExecutor
+ type: local
+
+ # Word-specific configuration
+ WINWORD.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor # Word COM API
+ type: local
+ reset: true
+ - namespace: CommandLineExecutor
+ type: local
+
+ # Excel-specific configuration
+ EXCEL.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: ExcelCOMExecutor # Excel COM API
+ type: local
+ reset: true
+
+ # Chrome-specific configuration
+ chrome.exe:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: ChromeExecutor # Custom Chrome tools
+ type: local
+ reset: true
+
+# Custom Agent: Specialized automation
+CustomAutomationAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ - namespace: MyCustomCollector # Custom data collection
+ type: local
+ action:
+ - namespace: MyCustomExecutor # Custom actions
+ type: local
+ - namespace: MyHTTPExecutor # Remote HTTP actions
+ type: http
+ host: "192.168.1.100"
+ port: 8020
+ path: "/mcp"
+```
+
+### Multi-Server Agent Configuration
+
+Agents can register **multiple servers** of the same category:
+
+```yaml
+HybridAgent:
+ default:
+ # Multiple data collection sources
+ data_collection:
+ - namespace: UICollector
+ type: local
+ - namespace: HardwareCollector # Remote hardware monitoring
+ type: http
+ host: "192.168.1.50"
+ port: 8006
+ path: "/mcp"
+ - namespace: SystemMetrics # Custom metrics
+ type: local
+
+ # Multiple action executors (LLM chooses best tool)
+ action:
+ - namespace: AppUIExecutor # GUI automation
+ type: local
+ - namespace: WordCOMExecutor # API automation
+ type: local
+ reset: true
+ - namespace: LinuxExecutor # Remote Linux control
+ type: http
+ host: "192.168.1.100"
+ port: 8010
+ path: "/mcp"
+ - namespace: CustomExecutor # Custom actions
+ type: local
+```
+
+**How it works:**
+
+1. **Data collection tools**: All servers are invoked automatically to build observation
+2. **Action tools**: LLM sees tools from ALL action servers and selects the best one
+
+**Example LLM decision:**
+
+```
+Task: "Create a Word document with sales data from the Linux database"
+
+Step 1: Get data from Linux
+ → LLM selects: LinuxExecutor::execute_command(
+ command="mysql -e 'SELECT * FROM sales'"
+ )
+
+Step 2: Create Word document
+ → LLM selects: WordCOMExecutor::insert_table(rows=10, columns=3)
+
+Step 3: Format the table
+ → LLM selects: WordCOMExecutor::select_table(number=1)
+ → AppUIExecutor::click_input(name="Table Design")
+```
+
+### Configuration Hierarchy
+
+Agent configurations follow this **inheritance hierarchy**:
+
+```
+AgentName
+ ├─ default (fallback configuration)
+ │ ├─ data_collection
+ │ └─ action
+ └─ SubType (e.g., "WINWORD.EXE")
+ ├─ data_collection
+ └─ action
+```
+
+**Lookup logic:**
+
+1. Check for `AgentName.SubType`
+2. If not found, use `AgentName.default`
+3. If neither exists, raise error
+
+**Example:**
+
+```yaml
+AppAgent:
+ # Fallback for all apps
+ default:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+
+ # Overrides default for Word
+ WINWORD.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor
+ type: local
+```
+
+---
+
+## Best Practices
+
+### 1. Write Comprehensive Docstrings
+
+Your docstrings are **directly converted to LLM prompts**. The LLM uses them to understand:
+- **What** the tool does
+- **When** to use it
+- **How** to use it correctly
+
+**Bad Example:**
+```python
+@mcp.tool()
+def process(data: str) -> str:
+ """Process data.""" # ❌ Too vague
+ return data.upper()
+```
+
+**Good Example:**
+```python
+@mcp.tool()
+def process_text_to_uppercase(
+ text: Annotated[str, Field(description="The input text to convert.")],
+) -> Annotated[str, Field(description="The text converted to uppercase.")]:
+ """
+ Convert text to uppercase letters.
+
+ Use this tool when you need to standardize text formatting or make text
+ more prominent. Works with all Unicode characters.
+
+ Examples:
+ - "hello world" → "HELLO WORLD"
+ - "Café" → "CAFÉ"
+ """ # ✅ Clear, detailed, with examples
+ return text.upper()
+```
+
+### 2. Use Descriptive Parameter Names
+
+```python
+# ❌ Bad: Unclear parameter names
+@mcp.tool()
+def func(a: str, b: int, c: bool) -> str:
+ ...
+
+# ✅ Good: Self-documenting parameter names
+@mcp.tool()
+def send_email(
+ recipient_address: str,
+ message_body: str,
+ use_html_format: bool = False,
+) -> str:
+ ...
+```
+
+### 3. Provide Default Values
+
+```python
+@mcp.tool()
+def search_files(
+ query: Annotated[str, Field(description="Search query.")],
+ case_sensitive: Annotated[
+ bool, Field(description="Case-sensitive search?")
+ ] = False, # ✅ Sensible default
+ max_results: Annotated[
+ int, Field(description="Maximum results to return.")
+ ] = 10, # ✅ Sensible default
+) -> list[str]:
+ """Search for files matching the query."""
+ ...
+```
+
+### 4. Handle Errors Gracefully
+
+```python
+@mcp.tool()
+def divide_numbers(
+ dividend: Annotated[float, Field(description="Number to divide.")],
+ divisor: Annotated[float, Field(description="Number to divide by.")],
+) -> Annotated[dict, Field(description="Division result or error.")]:
+ """
+ Divide two numbers and return the result.
+ Returns an error if divisor is zero.
+ """
+ try:
+ if divisor == 0:
+ return {
+ "success": False,
+ "error": "Cannot divide by zero.",
+ }
+
+ result = dividend / divisor
+ return {
+ "success": True,
+ "result": result,
+ }
+ except Exception as e:
+ return {
+ "success": False,
+ "error": f"Division failed: {str(e)}",
+ }
+```
+
+### 5. Use Reset for Stateful Servers
+
+```yaml
+# ✅ Good: Reset COM servers when switching contexts
+AppAgent:
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor
+ type: local
+ reset: true # Prevents state leakage between documents
+
+# ❌ Bad: Not resetting can cause issues
+AppAgent:
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor
+ type: local
+ reset: false # May retain state from previous document
+```
+
+### 6. Validate Remote Server Connectivity
+
+Before deploying, test connectivity:
+
+```python
+import asyncio
+from fastmcp.client import Client
+
+
+async def validate_server(url: str):
+ """Validate HTTP server is accessible."""
+ try:
+ async with Client(url) as client:
+ tools = await client.list_tools()
+ print(f"✅ Server {url} is accessible")
+ print(f" Tools: {[t.name for t in tools]}")
+ return True
+ except Exception as e:
+ print(f"❌ Server {url} is NOT accessible: {e}")
+ return False
+
+
+# Test before adding to mcp.yaml
+asyncio.run(validate_server("http://192.168.1.100:8020/mcp"))
+```
+
+### 7. Use Environment Variables for Secrets
+
+```yaml
+# ❌ Bad: Hardcoded secrets
+CustomAgent:
+ default:
+ action:
+ - namespace: APIExecutor
+ type: http
+ host: "api.example.com"
+ port: 443
+ auth_token: "sk-1234567890" # Don't commit this!
+
+# ✅ Good: Use environment variables
+CustomAgent:
+ default:
+ action:
+ - namespace: APIExecutor
+ type: http
+ host: "${API_HOST}"
+ port: "${API_PORT}"
+ auth_token: "${API_TOKEN}"
+```
+
+Set environment variables before running UFO²:
+
+```bash
+export API_HOST="api.example.com"
+export API_PORT="443"
+export API_TOKEN="sk-1234567890"
+```
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+#### 1. "No MCP server found for name 'MyServer'"
+
+**Cause**: Server not registered in MCPRegistry.
+
+**Solution**:
+```python
+# Ensure you're using the decorator
+@MCPRegistry.register_factory_decorator("MyServer")
+def create_my_server(*args, **kwargs) -> FastMCP:
+ ...
+
+# Or manually register
+MCPRegistry.register_factory("MyServer", create_my_server)
+```
+
+#### 2. "Connection refused" for HTTP Server
+
+**Cause**: HTTP server not running or wrong host/port.
+
+**Solution**:
+```bash
+# Verify server is running
+curl http://localhost:8020/mcp
+
+# Check firewall rules
+# Windows:
+netsh advfirewall firewall add rule name="MCP Server" dir=in action=allow protocol=TCP localport=8020
+
+# Linux:
+sudo ufw allow 8020/tcp
+```
+
+#### 3. Tools Not Appearing in LLM Prompt
+
+**Cause**: Server registered in wrong category (data_collection vs action).
+
+**Solution**:
+```yaml
+# For LLM-selectable tools, use 'action'
+CustomAgent:
+ default:
+ action: # ✅ Correct for LLM-selectable tools
+ - namespace: MyExecutor
+ type: local
+
+# For auto-invoked observation, use 'data_collection'
+CustomAgent:
+ default:
+ data_collection: # ✅ Correct for automatic observation
+ - namespace: MyCollector
+ type: local
+```
+
+#### 4. Server State Leaking Between Contexts
+
+**Cause**: `reset: false` for stateful servers.
+
+**Solution**:
+```yaml
+# Set reset: true for stateful servers
+AppAgent:
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor
+ type: local
+ reset: true # ✅ Reset COM state when switching documents
+```
+
+#### 5. Timeout Errors for Long-Running Tools
+
+**Cause**: Default timeout is 6000 seconds (100 minutes).
+
+**Solution**:
+```python
+# In Computer class, adjust timeout
+self._tool_timeout = 12000 # 200 minutes
+```
+
+### Debugging Tips
+
+#### Enable Debug Logging
+
+```python
+import logging
+
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger("ufo.client.mcp")
+```
+
+#### Check Registered Servers
+
+```python
+from ufo.client.mcp.mcp_server_manager import MCPServerManager
+
+# List all registered servers
+for namespace, server in MCPServerManager._servers_mapping.items():
+ print(f"Server: {namespace}, Type: {type(server).__name__}")
+```
+
+#### Test Server in Isolation
+
+```python
+# Test local server
+from ufo.client.mcp.local_servers.my_custom_server import create_my_custom_server
+import asyncio
+from fastmcp.client import Client
+
+
+async def test():
+ server = create_my_custom_server()
+ async with Client(server) as client:
+ tools = await client.list_tools()
+ print(f"Tools: {[t.name for t in tools]}")
+
+
+asyncio.run(test())
+```
+
+---
+
+## Next Steps
+
+Now that you've learned to create MCP servers, explore these related topics:
+
+1. **Review Built-in Servers**: See [Local Servers](../mcp/local_servers.md) for production examples
+2. **Explore HTTP Deployment**: Read [Remote Servers](../mcp/remote_servers.md) for cross-platform automation
+3. **Understand Agent Configuration**: Study [MCP Configuration](../mcp/configuration.md) for advanced setups
+4. **Learn about Computer Class**: Review [Computer](../client/computer.md) to understand the MCP client integration
+5. **Create Your First Agent**: Follow [Creating App Agent](creating_app_agent/overview.md) to build custom agents
+
+---
+
+## Related Documentation
+
+- [MCP Overview](../mcp/overview.md) - MCP architecture and concepts
+- [MCP Configuration](../mcp/configuration.md) - Complete configuration reference
+- [Local Servers](../mcp/local_servers.md) - Built-in local servers
+- [Remote Servers](../mcp/remote_servers.md) - HTTP/Stdio deployment
+- [Data Collection Servers](../mcp/data_collection.md) - Observation tools
+- [Action Servers](../mcp/action.md) - Execution tools
+- [MCP Reference](../configuration/system/mcp_reference.md) - Quick reference guide
+
+---
+
+## Best Practices Summary
+
+- ✅ **Write clear docstrings** - they become LLM instructions
+- ✅ **Use descriptive names** - for tools, parameters, and namespaces
+- ✅ **Handle errors gracefully** - return structured error messages
+- ✅ **Test in isolation** - before integrating with agents
+- ✅ **Use `reset: true`** - for stateful servers (COM, API clients)
+- ✅ **Validate connectivity** - for HTTP/Stdio servers before deployment
diff --git a/documents/docs/tutorials/creating_third_party_agents.md b/documents/docs/tutorials/creating_third_party_agents.md
new file mode 100644
index 000000000..f03e90c5d
--- /dev/null
+++ b/documents/docs/tutorials/creating_third_party_agents.md
@@ -0,0 +1,1377 @@
+# Creating Custom Third-Party Agents - Complete Tutorial
+
+This tutorial teaches you how to create, register, and deploy custom third-party agents that extend UFO²'s capabilities beyond Windows GUI automation. You'll learn the complete process using **HardwareAgent** as a reference implementation.
+
+**Prerequisites**: Basic Python knowledge, familiarity with UFO² agent architecture, [Agent Configuration](../configuration/system/agents_config.md), and [Third-Party Configuration](../configuration/system/third_party_config.md).
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Understanding Third-Party Agents](#understanding-third-party-agents)
+3. [Step-by-Step Implementation](#step-by-step-implementation)
+4. [Complete Example: HardwareAgent](#complete-example-hardwareagent)
+5. [Registering with HostAgent](#registering-with-hostagent)
+6. [Configuration and Deployment](#configuration-and-deployment)
+7. [Best Practices](#best-practices)
+8. [Troubleshooting](#troubleshooting)
+
+---
+
+## Overview
+
+### What are Third-Party Agents?
+
+Third-party agents are specialized agents that extend UFO²'s capabilities to handle tasks beyond standard Windows GUI automation. They work alongside the core agents (HostAgent and AppAgent) to provide domain-specific functionality.
+
+**Key Characteristics**:
+- ✅ Independent agent implementation with custom logic
+- ✅ Registered and managed by HostAgent
+- ✅ Selectable as execution targets by the LLM
+- ✅ Can use MCP servers and custom tools
+- ✅ Configurable via YAML files
+
+**Common Use Cases**:
+- 🔧 **Hardware Control**: Physical device manipulation (HardwareAgent)
+- 🐧 **Linux CLI**: Server and CLI command execution (LinuxAgent)
+- 🌐 **Web Automation**: Browser-based tasks without GUI
+- 📡 **IoT Integration**: Smart device control
+- 🤖 **Robotic Process Automation**: Custom automation workflows
+
+---
+
+## Understanding Third-Party Agents
+
+### Architecture Overview
+
+Third-party agents integrate with UFO² through a well-defined architecture:
+
+```mermaid
+graph TB
+ HostAgent["HostAgent - Orchestrates all agents - Registers third-party agents as selectable targets - Routes tasks to appropriate agents"]
+
+ AppAgent["AppAgent (GUI tasks)"]
+ HardwareAgent["HardwareAgent (Hardware)"]
+ YourAgent["YourAgent (Custom)"]
+
+ Strategies["Processing Strategies - LLM Interaction - Action Execution - Memory Updates"]
+
+ HostAgent --> AppAgent
+ HostAgent --> HardwareAgent
+ HostAgent --> YourAgent
+
+ AppAgent --> Strategies
+ HardwareAgent --> Strategies
+ YourAgent --> Strategies
+
+ style HostAgent fill:#e1f5ff,stroke:#0288d1,stroke-width:2px
+ style AppAgent fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
+ style HardwareAgent fill:#fff3e0,stroke:#ff9800,stroke-width:2px
+ style YourAgent fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+ style Strategies fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+```
+
+### Agent Registry System
+
+UFO² uses a registry pattern to dynamically load and manage agents:
+
+```python
+@AgentRegistry.register(
+ agent_name="YourAgent", # Unique identifier
+ third_party=True, # Mark as third-party
+ processor_cls=YourProcessor # Processing logic
+)
+class YourAgent(CustomizedAgent):
+ """Your custom agent implementation."""
+ pass
+```
+
+**How it works**:
+
+1. **Registration**: `@AgentRegistry.register()` decorator registers your agent class
+2. **Filtering**: Registry checks if agent is in `ENABLED_THIRD_PARTY_AGENTS` config
+3. **Instantiation**: HostAgent creates instances when needed
+4. **Target Selection**: LLM can select your agent as an execution target
+
+---
+
+## Step-by-Step Implementation
+
+### Step 1: Create Agent Class
+
+Create your agent class by inheriting from `CustomizedAgent`:
+
+```python
+# File: ufo/agents/agent/customized_agent.py
+
+from ufo.agents.agent.app_agent import AppAgent
+from ufo.agents.agent.basic import AgentRegistry
+from ufo.agents.processors.customized.customized_agent_processor import (
+ CustomizedProcessor,
+ YourAgentProcessor, # Import your processor
+)
+
+@AgentRegistry.register(
+ agent_name="YourAgent",
+ third_party=True,
+ processor_cls=YourAgentProcessor
+)
+class YourAgent(CustomizedAgent):
+ """
+ YourAgent is a specialized agent that handles [specific functionality].
+
+ This agent extends CustomizedAgent to provide:
+ - Custom domain logic (e.g., hardware control, web automation)
+ - Specialized action execution
+ - Domain-specific tool integration
+ """
+
+ def __init__(
+ self,
+ name: str,
+ main_prompt: str,
+ example_prompt: str,
+ api_prompt: str = None,
+ ) -> None:
+ """
+ Initialize YourAgent.
+
+ :param name: The name of the agent instance
+ :param main_prompt: Path to main prompt template YAML
+ :param example_prompt: Path to example prompt template YAML
+ :param api_prompt: Optional path to API prompt template YAML
+ """
+ super().__init__(
+ name=name,
+ main_prompt=main_prompt,
+ example_prompt=example_prompt,
+ process_name=None,
+ app_root_name=None,
+ is_visual=None, # Set True if your agent uses screenshots
+ )
+
+ # Optional: Add custom initialization
+ self._custom_state = {}
+ self.logger.info(f"YourAgent initialized with prompts: {main_prompt}")
+
+ # Optional: Override methods for custom behavior
+ def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str):
+ """Get the prompter for your agent."""
+ # Use default or create custom prompter
+ return super().get_prompter(is_visual, main_prompt, example_prompt)
+```
+
+**Key Points**:
+- ✅ **Inherit from `CustomizedAgent`**: Provides base functionality
+- ✅ **Use `@AgentRegistry.register()`**: Enables dynamic loading
+- ✅ **Set `third_party=True`**: Triggers configuration filtering
+- ✅ **Specify `processor_cls`**: Links to your processing logic
+
+---
+
+### Step 2: Create Processor Class
+
+Create a processor that defines how your agent processes tasks. For detailed information about processors and strategies, see [Agent Architecture](../infrastructure/agents/overview.md).
+
+```python
+# File: ufo/agents/processors/customized/customized_agent_processor.py
+
+from typing import TYPE_CHECKING
+from ufo.agents.processors.app_agent_processor import AppAgentProcessor
+from ufo.agents.processors.context.processing_context import (
+ ProcessingContext,
+ ProcessingPhase,
+)
+from ufo.agents.processors.strategies.app_agent_processing_strategy import (
+ AppActionExecutionStrategy,
+ AppMemoryUpdateStrategy,
+)
+from ufo.agents.processors.strategies.customized_agent_processing_strategy import (
+ CustomizedLLMInteractionStrategy,
+ CustomizedScreenshotCaptureStrategy,
+)
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import YourAgent
+
+
+class YourAgentProcessor(CustomizedProcessor):
+ """
+ Processor for YourAgent - defines processing pipeline and strategies.
+ """
+
+ def __init__(self, agent: "YourAgent", global_context: "Context") -> None:
+ """
+ Initialize YourAgent processor.
+
+ :param agent: The YourAgent instance
+ :param global_context: Global context shared across processing
+ """
+ super().__init__(agent, global_context)
+
+ def _setup_strategies(self) -> None:
+ """
+ Setup processing strategies for YourAgent.
+
+ Define how your agent processes each phase:
+ - DATA_COLLECTION: Gather observations (screenshots, data)
+ - LLM_INTERACTION: Communicate with LLM to get actions
+ - ACTION_EXECUTION: Execute the selected action
+ - MEMORY_UPDATE: Update agent memory and history
+ """
+
+ # Phase 1: Data Collection (if your agent uses visual input)
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = (
+ CustomizedScreenshotCaptureStrategy(
+ fail_fast=True, # Stop if screenshot capture fails
+ )
+ )
+
+ # Phase 2: LLM Interaction
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ CustomizedLLMInteractionStrategy(
+ fail_fast=True # LLM failures should halt processing
+ )
+ )
+
+ # Phase 3: Action Execution
+ # Option A: Use default strategy
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ AppActionExecutionStrategy(
+ fail_fast=False # Continue on action failures
+ )
+ )
+
+ # Option B: Create custom strategy (see Step 3)
+ # self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ # YourActionExecutionStrategy(fail_fast=False)
+ # )
+
+ # Phase 4: Memory Update
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(
+ fail_fast=False # Memory failures shouldn't stop agent
+ )
+ )
+
+ def _setup_middleware(self) -> None:
+ """
+ Optional: Setup middleware for logging, error handling, etc.
+ """
+ # Use default middleware or add custom middleware
+ super()._setup_middleware()
+
+ # Example: Add custom middleware
+ # self.middleware_chain.append(YourCustomMiddleware())
+```
+
+**Strategy Setup Guidelines**:
+
+| Phase | Purpose | fail_fast | Strategy Options |
+|-------|---------|-----------|------------------|
+| **DATA_COLLECTION** | Capture observations | `True` | Screenshot, sensor data, API calls |
+| **LLM_INTERACTION** | Get LLM decision | `True` | Custom prompts, function calling |
+| **ACTION_EXECUTION** | Execute action | `False` | Custom tools, API calls, commands |
+| **MEMORY_UPDATE** | Save history | `False` | Standard or custom memory logic |
+
+---
+
+### Step 3: Create Custom Strategies (Optional)
+
+If you need custom processing logic, create strategy classes:
+
+```python
+# File: ufo/agents/processors/strategies/your_agent_strategy.py
+
+from typing import TYPE_CHECKING
+from ufo.agents.processors.strategies.base import (
+ BaseProcessingStrategy,
+ ProcessingResult,
+)
+from ufo.agents.processors.context.processing_context import ProcessingContext
+
+if TYPE_CHECKING:
+ from ufo.agents.agent.customized_agent import YourAgent
+
+
+class YourActionExecutionStrategy(BaseProcessingStrategy):
+ """
+ Custom action execution strategy for YourAgent.
+ """
+
+ def __init__(self, fail_fast: bool = False) -> None:
+ super().__init__(name="your_action_execution", fail_fast=fail_fast)
+
+ async def execute(
+ self,
+ agent: "YourAgent",
+ context: ProcessingContext
+ ) -> ProcessingResult:
+ """
+ Execute custom actions for your agent.
+
+ :param agent: YourAgent instance
+ :param context: Processing context with LLM response
+ :return: ProcessingResult with execution outcome
+ """
+ try:
+ # Extract action from LLM response
+ parsed_response = context.get_local("parsed_response")
+ function_name = parsed_response.get("function")
+ arguments = parsed_response.get("arguments", {})
+
+ self.logger.info(f"Executing action: {function_name}")
+
+ # Execute your custom action logic
+ if function_name == "your_custom_action":
+ result = self._execute_custom_action(arguments)
+ else:
+ # Fallback to standard action execution
+ result = await self._execute_standard_action(
+ agent, function_name, arguments
+ )
+
+ # Store results in context
+ context.set_local("action_result", result)
+ context.set_local("action_status", "success")
+
+ return ProcessingResult(
+ success=True,
+ data={"result": result},
+ error=None
+ )
+
+ except Exception as e:
+ self.logger.error(f"Action execution failed: {str(e)}")
+
+ return ProcessingResult(
+ success=False,
+ data={},
+ error=str(e)
+ )
+
+ def _execute_custom_action(self, arguments: dict) -> dict:
+ """
+ Implement your custom action logic here.
+
+ Example: Hardware control, API calls, CLI commands, etc.
+ """
+ # Your custom implementation
+ return {"status": "executed", "details": arguments}
+```
+
+**When to Create Custom Strategies**:
+- ✅ Need domain-specific action execution (e.g., hardware APIs)
+- ✅ Special LLM interaction patterns (e.g., multi-turn dialogs)
+- ✅ Custom data collection (e.g., sensor readings, external APIs)
+- ❌ Standard GUI automation (use default strategies)
+
+---
+
+### Step 4: Create Prompt Templates
+
+Create YAML prompt templates to guide your agent's LLM interactions:
+
+```yaml
+# File: ufo/prompts/third_party/your_agent.yaml
+
+system: |
+ You are YourAgent, a specialized AI agent that handles [specific domain tasks].
+
+ Your capabilities include:
+ - [Capability 1]: Description
+ - [Capability 2]: Description
+ - [Capability 3]: Description
+
+ You have access to the following tools:
+ {apis}
+
+ Guidelines:
+ 1. Analyze the user's request carefully
+ 2. Select the most appropriate tool for the task
+ 3. Provide clear reasoning for your decisions
+ 4. Handle errors gracefully
+
+ Available actions:
+ - your_action_1: Description and usage
+ - your_action_2: Description and usage
+ - finish: Complete the task
+
+user: |
+ ## Previous Actions
+ {previous_actions}
+
+ ## Current Task
+ User Request: {request}
+
+ ## Available Tools
+ {tool_list}
+
+ ## Instructions
+ Based on the above information:
+ 1. Analyze what needs to be done
+ 2. Select the appropriate action
+ 3. Provide the action parameters
+
+ Respond with:
+ - Thought: Your reasoning
+ - Action: The action to take
+ - Arguments: Parameters for the action
+```
+
+```yaml
+# File: ufo/prompts/third_party/your_agent_example.yaml
+
+example_1: |
+ User Request: [Example request]
+
+ Thought: [Agent's reasoning]
+ Action: your_action_1
+ Arguments:
+ param1: value1
+ param2: value2
+
+example_2: |
+ User Request: [Another example]
+
+ Thought: [Agent's reasoning]
+ Action: finish
+ Arguments:
+ summary: Task completed successfully
+```
+
+**Prompt Design Best Practices**:
+- ✅ **Clear role definition**: Explain what your agent does
+- ✅ **Tool descriptions**: List available actions with usage
+- ✅ **Examples**: Provide concrete examples of interactions
+- ✅ **Error handling**: Include guidance for error scenarios
+- ✅ **Output format**: Specify expected response structure
+
+---
+
+## Complete Example: HardwareAgent
+
+Let's examine the complete implementation of **HardwareAgent** as a reference:
+
+### Agent Class
+
+```python
+# File: ufo/agents/agent/customized_agent.py
+
+@AgentRegistry.register(
+ agent_name="HardwareAgent",
+ third_party=True,
+ processor_cls=HardwareAgentProcessor
+)
+class HardwareAgent(CustomizedAgent):
+ """
+ HardwareAgent is a specialized agent that interacts with hardware components.
+ It extends CustomizedAgent to provide additional functionality specific to hardware.
+
+ Use cases:
+ - Robotic arm control for keyboard/mouse input
+ - USB device plug/unplug automation
+ - Physical hardware testing
+ - Sensor data collection
+ """
+ pass # Inherits all functionality from CustomizedAgent
+```
+
+**Why so simple?**
+- ✅ **Inheritance**: Gets all functionality from `CustomizedAgent`
+- ✅ **Composition**: Custom logic goes in the Processor
+- ✅ **Separation of Concerns**: Agent defines "what", Processor defines "how"
+
+---
+
+### Processor Class
+
+```python
+# File: ufo/agents/processors/customized/customized_agent_processor.py
+
+class HardwareAgentProcessor(CustomizedProcessor):
+ """
+ Processor for Hardware Agent.
+
+ Handles hardware-specific processing logic including:
+ - Visual mode for screenshot understanding
+ - Custom action execution for hardware APIs
+ - Hardware-specific error handling
+ """
+ pass # Uses default strategy setup from CustomizedProcessor
+```
+
+**Default Strategy Setup**:
+```python
+# From CustomizedProcessor._setup_strategies()
+def _setup_strategies(self) -> None:
+ # Data collection with screenshots
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = (
+ CustomizedScreenshotCaptureStrategy(fail_fast=True)
+ )
+
+ # LLM interaction with custom prompts
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ CustomizedLLMInteractionStrategy(fail_fast=True)
+ )
+
+ # Action execution using standard tools
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ AppActionExecutionStrategy(fail_fast=False)
+ )
+
+ # Memory updates
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(fail_fast=False)
+ )
+```
+
+---
+
+### Configuration
+
+```yaml
+# File: config/ufo/third_party.yaml
+
+ENABLED_THIRD_PARTY_AGENTS: ["HardwareAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ HardwareAgent:
+ # Enable visual mode for screenshot understanding
+ VISUAL_MODE: True
+
+ # Agent identifier (must match @AgentRegistry.register name)
+ AGENT_NAME: "HardwareAgent"
+
+ # Prompt templates
+ APPAGENT_PROMPT: "ufo/prompts/share/base/app_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/examples/visual/app_agent_example.yaml"
+ API_PROMPT: "ufo/prompts/third_party/hardware_agent_api.yaml"
+
+ # Description for LLM context
+ INTRODUCTION: "The HardwareAgent is used to manipulate hardware components of the computer without using GUI, such as robotic arms for keyboard input and mouse control, plug and unplug devices such as USB drives, and other hardware-related tasks."
+```
+
+**Configuration Fields**:
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `VISUAL_MODE` | Optional | Enable screenshot-based reasoning |
+| `AGENT_NAME` | **Required** | Must match registry name exactly |
+| `APPAGENT_PROMPT` | **Required** | Main prompt template path |
+| `APPAGENT_EXAMPLE_PROMPT` | **Required** | Example prompt template path |
+| `API_PROMPT` | Optional | API/tool description prompt |
+| `INTRODUCTION` | **Required** | Agent description for LLM |
+
+---
+
+## Registering with HostAgent
+
+### How HostAgent Discovers Third-Party Agents
+
+The registration process is automatic through the Agent Registry system:
+
+```python
+# File: ufo/agents/processors/strategies/host_agent_processing_strategy.py
+
+def _register_third_party_agents(
+ self, target_registry: TargetRegistry, start_index: int
+) -> int:
+ """
+ Register enabled third-party agents with HostAgent.
+
+ This method:
+ 1. Reads ENABLED_THIRD_PARTY_AGENTS from config
+ 2. Creates TargetInfo entries for each agent
+ 3. Registers them as selectable targets for the LLM
+ """
+ try:
+ # Get enabled third-party agent names from configuration
+ third_party_agent_names = ufo_config.system.enabled_third_party_agents
+
+ if not third_party_agent_names:
+ self.logger.info("No third-party agents configured")
+ return 0
+
+ # Create third-party agent entries
+ third_party_agent_list = []
+ for i, agent_name in enumerate(third_party_agent_names):
+ agent_id = str(i + start_index + 1) # Unique ID for selection
+ third_party_agent_list.append(
+ TargetInfo(
+ kind=TargetKind.THIRD_PARTY_AGENT.value,
+ id=agent_id,
+ type="ThirdPartyAgent",
+ name=agent_name, # e.g., "HardwareAgent"
+ )
+ )
+
+ # Register third-party agents in target registry
+ target_registry.register(third_party_agent_list)
+
+ return len(third_party_agent_list)
+
+ except Exception as e:
+ self.logger.warning(f"Failed to register third-party agents: {str(e)}")
+ return 0
+```
+
+**Target Registry Flow**:
+
+```
+1. HostAgent starts processing
+ ↓
+2. _register_applications_and_agents() called
+ ↓
+3. _register_third_party_agents() called
+ ↓
+4. Read ENABLED_THIRD_PARTY_AGENTS from config
+ ↓
+5. Create TargetInfo for each agent
+ ↓
+6. Register in TargetRegistry
+ ↓
+7. LLM can now select third-party agents as targets
+```
+
+### LLM Target Selection
+
+When HostAgent presents targets to the LLM:
+
+```json
+{
+ "available_targets": [
+ {"id": "1", "name": "Microsoft Word", "kind": "APPLICATION"},
+ {"id": "2", "name": "Google Chrome", "kind": "APPLICATION"},
+ {"id": "3", "name": "HardwareAgent", "kind": "THIRD_PARTY_AGENT"},
+ {"id": "4", "name": "LinuxAgent", "kind": "THIRD_PARTY_AGENT"}
+ ]
+}
+```
+
+The LLM selects a target based on the task:
+
+```json
+{
+ "thought": "Need to control physical hardware for USB operations",
+ "selected_target": "3", // HardwareAgent
+ "action": "delegate_to_agent"
+}
+```
+
+### Agent Instantiation
+
+When LLM selects your agent, HostAgent creates an instance:
+
+```python
+# File: ufo/agents/agent/host_agent.py
+
+@staticmethod
+def create_agent(agent_type: str, *args, **kwargs) -> BasicAgent:
+ """
+ Create an agent based on the given type.
+ """
+ if agent_type == "host":
+ return HostAgent(*args, **kwargs)
+ elif agent_type == "app":
+ return AppAgent(*args, **kwargs)
+ elif agent_type in AgentRegistry.list_agents():
+ # Third-party agents are retrieved from registry
+ return AgentRegistry.get(agent_type)(*args, **kwargs)
+ else:
+ raise ValueError("Invalid agent type: {}".format(agent_type))
+```
+
+**Instantiation Flow**:
+
+```
+1. LLM selects "HardwareAgent"
+ ↓
+2. HostAgent calls create_agent("HardwareAgent")
+ ↓
+3. AgentRegistry.get("HardwareAgent") retrieves class
+ ↓
+4. Class instantiated with config parameters
+ ↓
+5. Agent executes task
+ ↓
+6. Results returned to HostAgent
+```
+
+---
+
+## Configuration and Deployment
+
+### Step 1: Enable Your Agent
+
+Edit `config/ufo/third_party.yaml`:
+
+```yaml
+ENABLED_THIRD_PARTY_AGENTS: ["YourAgent"]
+
+THIRD_PARTY_AGENT_CONFIG:
+ YourAgent:
+ VISUAL_MODE: False # Set True if using screenshots
+ AGENT_NAME: "YourAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/your_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/your_agent_example.yaml"
+ INTRODUCTION: "YourAgent handles [specific tasks] by [method]. Use this agent when you need to [use case]."
+```
+
+**Configuration Checklist**:
+- ✅ Add agent name to `ENABLED_THIRD_PARTY_AGENTS`
+- ✅ Create config block under `THIRD_PARTY_AGENT_CONFIG`
+- ✅ Set `AGENT_NAME` to match registry name
+- ✅ Provide paths to prompt templates
+- ✅ Write clear `INTRODUCTION` for LLM context
+
+---
+
+### Step 2: Add Prompt Templates
+
+Create your prompt files:
+
+```
+ufo/prompts/third_party/
+├── your_agent.yaml # Main prompt template
+└── your_agent_example.yaml # Example interactions
+```
+
+**Template Requirements**:
+- ✅ Define agent role and capabilities
+- ✅ List available actions/tools
+- ✅ Provide clear output format
+- ✅ Include error handling guidance
+- ✅ Add concrete examples
+
+---
+
+### Step 3: Test Configuration
+
+Test that your agent loads correctly:
+
+```python
+# test_your_agent.py
+
+from config.config_loader import get_ufo_config
+from ufo.agents.agent.basic import AgentRegistry
+
+def test_agent_registration():
+ """Test that YourAgent is registered correctly."""
+ config = get_ufo_config()
+
+ # Check if agent is enabled
+ assert "YourAgent" in config.system.enabled_third_party_agents
+ print("✅ Agent is enabled in config")
+
+ # Check if agent is registered
+ registered_agents = AgentRegistry.list_agents()
+ assert "YourAgent" in registered_agents
+ print("✅ Agent is registered in AgentRegistry")
+
+ # Test agent instantiation
+ agent_cls = AgentRegistry.get("YourAgent")
+ agent_config = config.system.third_party_agent_config["YourAgent"]
+
+ agent = agent_cls(
+ name="test_agent",
+ main_prompt=agent_config["APPAGENT_PROMPT"],
+ example_prompt=agent_config["APPAGENT_EXAMPLE_PROMPT"],
+ )
+ print(f"✅ Agent instantiated: {agent}")
+
+ # Check processor
+ assert hasattr(agent, "_processor_cls")
+ print(f"✅ Processor registered: {agent._processor_cls}")
+
+if __name__ == "__main__":
+ test_agent_registration()
+```
+
+Run test:
+```powershell
+python test_your_agent.py
+```
+
+---
+
+### Step 4: Integration Testing
+
+Test your agent in a full UFO² session:
+
+```python
+# integration_test.py
+
+from ufo.agents.agent.host_agent import HostAgent
+from config.config_loader import get_ufo_config
+
+def test_agent_selection():
+ """Test that HostAgent can discover and select YourAgent."""
+ config = get_ufo_config()
+
+ # Create HostAgent
+ host_agent = HostAgent(
+ name="host",
+ is_visual=True,
+ main_prompt="ufo/prompts/share/base/host_agent.yaml",
+ example_prompt="ufo/prompts/examples/visual/host_agent_example.yaml",
+ api_prompt="ufo/prompts/share/base/api.yaml",
+ )
+
+ # Verify third-party agents are in target registry
+ # (This happens during HostAgent processing)
+ print("✅ HostAgent created successfully")
+ print(f"Enabled third-party agents: {config.system.enabled_third_party_agents}")
+
+if __name__ == "__main__":
+ test_agent_selection()
+```
+
+---
+
+## Best Practices
+
+### Code Organization
+
+```
+ufo/
+├── agents/
+│ ├── agent/
+│ │ └── customized_agent.py # Agent classes
+│ └── processors/
+│ ├── customized/
+│ │ └── customized_agent_processor.py # Processors
+│ └── strategies/
+│ └── your_agent_strategy.py # Custom strategies
+├── prompts/
+│ └── third_party/
+│ ├── your_agent.yaml # Main prompt
+│ └── your_agent_example.yaml # Examples
+config/
+└── ufo/
+ └── third_party.yaml # Configuration
+```
+
+**Organization Guidelines**:
+- ✅ **Agent classes** → `ufo/agents/agent/customized_agent.py`
+- ✅ **Processors** → `ufo/agents/processors/customized/`
+- ✅ **Custom strategies** → `ufo/agents/processors/strategies/`
+- ✅ **Prompts** → `ufo/prompts/third_party/`
+- ✅ **Configuration** → `config/ufo/third_party.yaml`
+
+---
+
+### Naming Conventions
+
+| Component | Naming Pattern | Example |
+|-----------|----------------|---------|
+| Agent Class | `{Name}Agent` | `HardwareAgent`, `WebAgent` |
+| Processor Class | `{Name}AgentProcessor` | `HardwareAgentProcessor` |
+| Strategy Class | `{Name}{Phase}Strategy` | `HardwareActionExecutionStrategy` |
+| Registry Name | Same as class (no suffix) | `"HardwareAgent"` |
+| Config Key | Same as registry name | `HardwareAgent:` |
+
+---
+
+### Error Handling
+
+Implement robust error handling in your strategies:
+
+```python
+async def execute(self, agent, context) -> ProcessingResult:
+ try:
+ # Main execution logic
+ result = await self._do_work(agent, context)
+
+ return ProcessingResult(
+ success=True,
+ data=result,
+ error=None
+ )
+
+ except SpecificError as e:
+ # Handle expected errors gracefully
+ self.logger.warning(f"Expected error: {e}")
+ return ProcessingResult(
+ success=False,
+ data={"partial_result": "..."},
+ error=f"Recoverable error: {str(e)}"
+ )
+
+ except Exception as e:
+ # Log unexpected errors
+ self.logger.error(f"Unexpected error: {e}", exc_info=True)
+
+ if self.fail_fast:
+ raise # Re-raise if configured to fail fast
+
+ return ProcessingResult(
+ success=False,
+ data={},
+ error=f"Fatal error: {str(e)}"
+ )
+```
+
+**Error Handling Guidelines**:
+- ✅ Use `ProcessingResult` to communicate outcomes
+- ✅ Log errors at appropriate levels (warning/error)
+- ✅ Respect `fail_fast` setting
+- ✅ Provide actionable error messages
+- ✅ Return partial results when possible
+
+---
+
+### Logging
+
+Use structured logging throughout your agent:
+
+```python
+import logging
+
+class YourAgentProcessor(CustomizedProcessor):
+ def __init__(self, agent, global_context):
+ super().__init__(agent, global_context)
+ self.logger = logging.getLogger(__name__)
+
+ async def execute(self, agent, context):
+ # Info: Normal operation flow
+ self.logger.info(f"Processing task: {context.get_local('task')}")
+
+ # Debug: Detailed debugging info
+ self.logger.debug(f"Context state: {context.get_all_local()}")
+
+ # Warning: Recoverable issues
+ self.logger.warning(f"Retrying action after failure")
+
+ # Error: Serious problems
+ self.logger.error(f"Action failed: {error}", exc_info=True)
+```
+
+**Logging Best Practices**:
+- ✅ Use `self.logger` from base class
+- ✅ Log at appropriate levels (debug/info/warning/error)
+- ✅ Include context in log messages
+- ✅ Use `exc_info=True` for exceptions
+- ✅ Avoid logging sensitive data
+
+---
+
+### Testing
+
+Create comprehensive tests for your agent:
+
+```python
+# tests/test_your_agent.py
+
+import pytest
+from ufo.agents.agent.customized_agent import YourAgent
+from ufo.agents.processors.customized.customized_agent_processor import (
+ YourAgentProcessor
+)
+
+class TestYourAgent:
+ @pytest.fixture
+ def agent(self):
+ """Create test agent instance."""
+ return YourAgent(
+ name="test_agent",
+ main_prompt="ufo/prompts/third_party/your_agent.yaml",
+ example_prompt="ufo/prompts/third_party/your_agent_example.yaml",
+ )
+
+ def test_agent_initialization(self, agent):
+ """Test agent initializes correctly."""
+ assert agent.name == "test_agent"
+ assert agent.prompter is not None
+
+ def test_processor_registration(self, agent):
+ """Test processor is registered."""
+ assert hasattr(agent, "_processor_cls")
+ assert agent._processor_cls == YourAgentProcessor
+
+ @pytest.mark.asyncio
+ async def test_action_execution(self, agent, mock_context):
+ """Test action execution logic."""
+ processor = YourAgentProcessor(agent, mock_context)
+ result = await processor.execute_phase(
+ ProcessingPhase.ACTION_EXECUTION,
+ agent,
+ mock_context
+ )
+ assert result.success == True
+```
+
+**Test Coverage Checklist**:
+- ✅ Agent initialization
+- ✅ Processor registration
+- ✅ Strategy execution
+- ✅ Error handling
+- ✅ Configuration loading
+- ✅ Integration with HostAgent
+
+---
+
+## Troubleshooting
+
+### Issue 1: Agent Not Registered
+
+!!!bug "Error Message"
+ ```
+ ValueError: No agent class registered under 'YourAgent'
+ ```
+
+ **Diagnosis**: Agent is not enabled in configuration or decorator is missing.
+
+ **Solutions**:
+
+ 1. Check configuration:
+ ```yaml
+ # config/ufo/third_party.yaml
+ ENABLED_THIRD_PARTY_AGENTS: ["YourAgent"] # ← Must include your agent
+ ```
+
+ 2. Verify decorator:
+ ```python
+ @AgentRegistry.register(
+ agent_name="YourAgent", # ← Must match config
+ third_party=True, # ← Must be True
+ processor_cls=YourAgentProcessor
+ )
+ class YourAgent(CustomizedAgent):
+ pass
+ ```
+
+ 3. Check import:
+ ```python
+ # Ensure your agent module is imported
+ # In ufo/agents/agent/__init__.py or customized_agent.py
+ from ufo.agents.agent.customized_agent import YourAgent
+ ```
+
+---
+
+### Issue 2: Prompt Files Not Found
+
+!!!bug "Error Message"
+ ```
+ FileNotFoundError: ufo/prompts/third_party/your_agent.yaml
+ ```
+
+ **Diagnosis**: Prompt template files don't exist or paths are incorrect.
+
+ **Solutions**:
+
+ 1. Create prompt files:
+ ```powershell
+ # Create directory if needed
+ New-Item -ItemType Directory -Force -Path "ufo\prompts\third_party"
+
+ # Create prompt files
+ New-Item -ItemType File -Path "ufo\prompts\third_party\your_agent.yaml"
+ New-Item -ItemType File -Path "ufo\prompts\third_party\your_agent_example.yaml"
+ ```
+
+ 2. Verify paths in configuration:
+ ```yaml
+ THIRD_PARTY_AGENT_CONFIG:
+ YourAgent:
+ APPAGENT_PROMPT: "ufo/prompts/third_party/your_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/your_agent_example.yaml"
+ ```
+
+ 3. Check file permissions:
+ ```powershell
+ # Verify files are readable
+ Test-Path "ufo\prompts\third_party\your_agent.yaml"
+ ```
+
+---
+
+### Issue 3: Agent Not Appearing in Target List
+
+!!!bug "Symptom"
+ HostAgent doesn't show your third-party agent as a selectable target.
+
+ **Diagnosis**: Agent is registered but not appearing in TargetRegistry.
+
+ **Solutions**:
+
+ 1. Check enabled agents:
+ ```python
+ from config.config_loader import get_ufo_config
+ config = get_ufo_config()
+ print(config.system.enabled_third_party_agents)
+ # Should include "YourAgent"
+ ```
+
+ 2. Verify TargetKind:
+ ```python
+ # In your registration code
+ TargetInfo(
+ kind=TargetKind.THIRD_PARTY_AGENT.value, # ← Correct kind
+ name=agent_name,
+ )
+ ```
+
+ 3. Check HostAgent logs:
+ ```
+ [INFO] Registered 2 third-party agents
+ ```
+
+ 4. Test target registry directly:
+ ```python
+ from ufo.agents.processors.schemas.target import TargetRegistry, TargetKind
+ registry = TargetRegistry()
+ targets = registry.get_by_kind(TargetKind.THIRD_PARTY_AGENT)
+ print(targets) # Should include your agent
+ ```
+
+---
+
+### Issue 4: Processor Not Executing
+
+!!!bug "Symptom"
+ Agent instantiates but processor strategies don't execute.
+
+ **Diagnosis**: Processor class not properly linked or strategies not set up.
+
+ **Solutions**:
+
+ 1. Verify processor_cls in decorator:
+ ```python
+ @AgentRegistry.register(
+ agent_name="YourAgent",
+ third_party=True,
+ processor_cls=YourAgentProcessor # ← Must be specified
+ )
+ ```
+
+ 2. Check processor initialization:
+ ```python
+ class YourAgentProcessor(CustomizedProcessor):
+ def __init__(self, agent, global_context):
+ super().__init__(agent, global_context) # ← Must call super
+ # Your custom init
+ ```
+
+ 3. Verify strategy setup:
+ ```python
+ def _setup_strategies(self) -> None:
+ # Must populate self.strategies dict
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = ...
+ ```
+
+ 4. Check processor is created:
+ ```python
+ # In your test
+ assert hasattr(agent, "_processor_cls")
+ processor = agent._processor_cls(agent, global_context)
+ assert processor is not None
+ ```
+
+---
+
+### Issue 5: LLM Not Selecting Your Agent
+
+!!!bug "Symptom"
+ Agent is registered but LLM never selects it.
+
+ **Diagnosis**: Agent description unclear or not suitable for user requests.
+
+ **Solutions**:
+
+ 1. Improve `INTRODUCTION`:
+ ```yaml
+ INTRODUCTION: "Use YourAgent when you need to [clear use case]. It provides [specific capabilities] through [method]. Examples: [concrete examples]."
+ ```
+
+ 2. Add clear examples in prompt:
+ ```yaml
+ # your_agent_example.yaml
+ example_1: |
+ User: [Clear example request]
+ Agent: [Clear example response]
+ ```
+
+ 3. Test with explicit requests:
+ ```python
+ # Test with request that clearly needs your agent
+ user_request = "Use YourAgent to [specific task]"
+ ```
+
+ 4. Check HostAgent prompt includes your agent:
+ ```
+ Available targets:
+ - YourAgent: [Your INTRODUCTION text should appear here]
+ ```
+
+---
+
+## Advanced Topics
+
+### Multi-MCP Integration
+
+Integrate multiple MCP servers with your agent:
+
+```yaml
+# config/ufo/agent_mcp.yaml
+
+YourAgent:
+ mcp_servers:
+ hardware_control:
+ type: "local"
+ module: "your_package.hardware_mcp"
+ config:
+ device_port: "/dev/ttyUSB0"
+
+ data_collection:
+ type: "http"
+ url: "http://localhost:8080/mcp"
+ config:
+ api_key: "${SENSOR_API_KEY}"
+```
+
+See [Creating Custom MCP Servers](./creating_mcp_servers.md) for details.
+
+---
+
+### State Management
+
+Maintain agent state across invocations:
+
+```python
+from ufo.agents.memory.blackboard import Blackboard
+
+class YourAgent(CustomizedAgent):
+ def __init__(self, name, main_prompt, example_prompt):
+ super().__init__(name, main_prompt, example_prompt)
+
+ # Use blackboard for persistent state
+ self._blackboard = Blackboard()
+
+ @property
+ def blackboard(self) -> Blackboard:
+ return self._blackboard
+
+ def save_state(self, key: str, value: Any):
+ """Save state to blackboard."""
+ self.blackboard.add_entry(key, value)
+
+ def load_state(self, key: str) -> Any:
+ """Load state from blackboard."""
+ return self.blackboard.get_entry(key)
+```
+
+---
+
+### Custom Prompter
+
+Create a custom prompter for specialized LLM interactions:
+
+```python
+from ufo.prompter.app_prompter import AppPrompter
+
+class YourAgentPrompter(AppPrompter):
+ """Custom prompter for YourAgent."""
+
+ def user_content_construction(
+ self,
+ prev_plan: List[str],
+ user_request: str,
+ retrieved_docs: str,
+ last_success_actions: List[Dict],
+ **kwargs
+ ) -> List[Dict[str, str]]:
+ """
+ Construct custom user message content.
+ """
+ # Add custom context
+ custom_context = self._build_custom_context(**kwargs)
+
+ # Call parent method
+ base_content = super().user_content_construction(
+ prev_plan=prev_plan,
+ user_request=user_request,
+ retrieved_docs=retrieved_docs,
+ last_success_actions=last_success_actions
+ )
+
+ # Insert custom content
+ base_content.insert(0, {
+ "type": "text",
+ "text": custom_context
+ })
+
+ return base_content
+```
+
+Use custom prompter in your agent:
+
+```python
+class YourAgent(CustomizedAgent):
+ def get_prompter(self, is_visual, main_prompt, example_prompt):
+ return YourAgentPrompter(main_prompt, example_prompt)
+```
+
+---
+
+## Related Documentation
+
+- **[Third-Party Agent Configuration](../configuration/system/third_party_config.md)** - Configuration reference
+- **[Agent Configuration](../configuration/system/agents_config.md)** - Core agent LLM settings
+- **[Creating Custom MCP Servers](./creating_mcp_servers.md)** - MCP server development for custom tools
+- **[Agent Architecture](../infrastructure/agents/overview.md)** - Understanding agent design patterns
+- **[HostAgent Strategy](../ufo2/host_agent/strategy.md)** - Learn how HostAgent orchestrates third-party agents
+- **[AppAgent Strategy](../ufo2/app_agent/strategy.md)** - Processing strategies reference
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+✅ **Third-party agents extend UFO²** with specialized capabilities
+✅ **Use `@AgentRegistry.register()`** to register your agent
+✅ **Create processor classes** to define processing logic
+✅ **Configure in third_party.yaml** to enable your agent
+✅ **HostAgent automatically discovers** enabled third-party agents
+✅ **LLM selects agents** based on task requirements
+✅ **Follow HardwareAgent** as a reference implementation
+
+**Build powerful third-party agents to extend UFO²!** 🚀
+
+---
+
+## Quick Reference
+
+### Minimal Agent Implementation
+
+```python
+# 1. Agent class
+@AgentRegistry.register(
+ agent_name="MyAgent", third_party=True, processor_cls=MyProcessor
+)
+class MyAgent(CustomizedAgent):
+ pass
+
+# 2. Processor class
+class MyProcessor(CustomizedProcessor):
+ pass # Use default strategies
+
+# 3. Configuration
+# config/ufo/third_party.yaml
+ENABLED_THIRD_PARTY_AGENTS: ["MyAgent"]
+THIRD_PARTY_AGENT_CONFIG:
+ MyAgent:
+ AGENT_NAME: "MyAgent"
+ APPAGENT_PROMPT: "ufo/prompts/third_party/my_agent.yaml"
+ APPAGENT_EXAMPLE_PROMPT: "ufo/prompts/third_party/my_agent_example.yaml"
+ INTRODUCTION: "MyAgent handles [tasks]."
+
+# 4. Prompt templates
+# Create: ufo/prompts/third_party/my_agent.yaml
+# Create: ufo/prompts/third_party/my_agent_example.yaml
+```
+
+**That's all you need to get started!** 🎉
diff --git a/documents/docs/ufo2/advanced_usage/batch_mode.md b/documents/docs/ufo2/advanced_usage/batch_mode.md
new file mode 100644
index 000000000..2b57cef7e
--- /dev/null
+++ b/documents/docs/ufo2/advanced_usage/batch_mode.md
@@ -0,0 +1,84 @@
+# Batch Mode
+
+Batch mode allows automated execution of tasks on specific applications or files using predefined plan files. This mode is particularly useful for repetitive tasks on Microsoft Office applications (Word, Excel, PowerPoint).
+
+## Quick Start
+
+### Step 1: Create a Plan File
+
+Create a JSON plan file that defines the task to be automated. The plan file should contain the following fields:
+
+| Field | Description | Type |
+| ------ | -------------------------------------------------------------------------------------------- | ------- |
+| task | The task description. | String |
+| object | The application or file to interact with. | String |
+| close | Determines whether to close the corresponding application or file after completing the task. | Boolean |
+
+Example plan file:
+
+```json
+{
+ "task": "Type in a text of 'Test For Fun' with heading 1 level",
+ "object": "draft.docx",
+ "close": false
+}
+```
+
+**Important:** The `close` field should be a boolean value (`true` or `false`), not a Python boolean (`True` or `False`).
+
+The file structure should be organized as follows:
+
+```
+Parent/
+├── tasks/
+│ └── plan.json
+└── files/
+ └── draft.docx
+```
+
+The `object` field in the plan file refers to files in the `files` directory. The plan reader will automatically resolve the full file path by replacing `tasks` with `files` in the directory structure.
+
+### Step 2: Start Batch Mode
+
+Run the following command to start batch mode:
+
+```bash
+# Assume you are in the cloned UFO folder
+python -m ufo --task {task_name} --mode batch_normal --plan {plan_file}
+```
+
+**Parameters:**
+- `{task_name}`: Name for this task execution (used for logging)
+- `{plan_file}`: Full path to the plan JSON file (e.g., `C:/Parent/tasks/plan.json`)
+
+### Supported Applications
+
+Batch mode currently supports the following Microsoft Office applications:
+
+- **Word** (`.docx` files) - `WINWORD.EXE`
+- **Excel** (`.xlsx` files) - `EXCEL.EXE`
+- **PowerPoint** (`.pptx` files) - `POWERPNT.EXE`
+
+The application will be automatically launched when the batch mode starts, and the specified file will be opened and maximized.
+
+## Evaluation
+
+UFO can automatically evaluate whether the task was completed successfully. To enable evaluation, ensure `EVA_SESSION` is set to `True` in the `config/ufo/system.yaml` file.
+
+Check the evaluation results in `logs/{task_name}/evaluation.log`.
+
+## References
+
+The batch mode uses a `PlanReader` to parse the plan file and creates a `FromFileSession` to execute the plan.
+
+### PlanReader
+
+The `PlanReader` is located at `ufo/module/sessions/plan_reader.py`.
+
+:::module.sessions.plan_reader.PlanReader
+
+### FromFileSession
+
+The `FromFileSession` is located at `ufo/module/sessions/session.py`.
+
+:::module.sessions.session.FromFileSession
\ No newline at end of file
diff --git a/documents/docs/ufo2/advanced_usage/customization.md b/documents/docs/ufo2/advanced_usage/customization.md
new file mode 100644
index 000000000..ec25d3f26
--- /dev/null
+++ b/documents/docs/ufo2/advanced_usage/customization.md
@@ -0,0 +1,37 @@
+# Customization
+
+UFO can ask users for additional context or information when needed and save it in local memory for future reference. This customization feature enables a more personalized user experience by remembering user-specific information across sessions.
+
+## Example Scenario
+
+Consider a task where UFO needs to book a cab. To complete this task, UFO requires the user's address. UFO will:
+
+1. Ask the user for their address
+2. Save the address in local memory
+3. Use the saved address automatically in future tasks that require it
+
+This eliminates the need to repeatedly provide the same information.
+
+## How It Works
+
+The customization feature is implemented across multiple agent types (`HostAgent`, `AppAgent`, and `OpenAIOperatorAgent`). When an agent needs additional information:
+
+1. The agent transitions to the `PENDING` state
+2. The agent asks the user for the required information (if `ASK_QUESTION` is enabled)
+3. The user's response is saved to the `blackboard` in the QA pairs file
+4. All agents in the session can access this information from the shared `blackboard`
+
+The saved QA pairs are stored locally as JSON lines in the file specified by `QA_PAIR_FILE`. Privacy is preserved as this information never leaves the local machine.
+
+## Configuration
+
+Configure the customization feature in `config/ufo/system.yaml`:
+
+| Configuration Option | Description | Type | Default Value |
+|------------------------|------------------------------------------------------------------|---------|---------------------------------------|
+| `ASK_QUESTION` | Whether to allow agents to ask users questions | Boolean | False |
+| `USE_CUSTOMIZATION` | Whether to load and use saved QA pairs from previous sessions | Boolean | False |
+| `QA_PAIR_FILE` | Path to the file storing historical QA pairs | String | "customization/global_memory.jsonl" |
+| `QA_PAIR_NUM` | Maximum number of recent QA pairs to load into memory | Integer | 20 |
+
+**Note:** Both `ASK_QUESTION` and `USE_CUSTOMIZATION` need to be enabled for the full customization experience. `ASK_QUESTION` controls whether agents can prompt users for information, while `USE_CUSTOMIZATION` controls whether previously saved information is loaded.
diff --git a/documents/docs/ufo2/advanced_usage/follower_mode.md b/documents/docs/ufo2/advanced_usage/follower_mode.md
new file mode 100644
index 000000000..3fd6fe85a
--- /dev/null
+++ b/documents/docs/ufo2/advanced_usage/follower_mode.md
@@ -0,0 +1,84 @@
+# Follower Mode
+
+Follower mode enables UFO to execute a predefined list of steps in natural language. Unlike normal mode where the agent generates its own plan, follower mode creates an `AppAgent` that follows user-provided steps to interact with applications. This mode is particularly useful for debugging, software testing, and verification.
+
+## Quick Start
+
+### Step 1: Create a Plan File
+
+Create a JSON plan file containing the steps for the agent to follow:
+
+| Field | Description | Type |
+| --- | --- | --- |
+| task | The task description. | String |
+| steps | The list of steps for the agent to follow. | List of Strings |
+| object | The application or file to interact with. | String |
+
+Example plan file:
+
+```json
+{
+ "task": "Type in a text of 'Test For Fun' with heading 1 level",
+ "steps":
+ [
+ "1.type in 'Test For Fun'",
+ "2.Select the 'Test For Fun' text",
+ "3.Click 'Home' tab to show the 'Styles' ribbon tab",
+ "4.Click 'Styles' ribbon tab to show the style 'Heading 1'",
+ "5.Click 'Heading 1' style to apply the style to the selected text"
+ ],
+ "object": "draft.docx"
+}
+```
+
+The `object` field specifies the application or file the agent will interact with. This object should be opened and accessible before starting follower mode.
+
+### Step 2: Start Follower Mode
+
+Run the following command:
+
+```bash
+# Assume you are in the cloned UFO folder
+python -m ufo --task {task_name} --mode follower --plan {plan_file}
+```
+
+**Parameters:**
+- `{task_name}`: Name for this task execution (used for logging)
+- `{plan_file}`: Path to the plan JSON file
+
+### Step 3: Run in Batch (Optional)
+
+To execute multiple plan files sequentially, provide a folder containing multiple plan files:
+
+```bash
+# Assume you are in the cloned UFO folder
+python -m ufo --task {task_name} --mode follower --plan {plan_folder}
+```
+
+UFO will automatically detect and execute all plan files in the folder sequentially.
+
+**Parameters:**
+- `{task_name}`: Name for this batch execution (used for logging)
+- `{plan_folder}`: Path to the folder containing plan JSON files
+
+## Evaluation
+
+UFO can automatically evaluate task completion. To enable evaluation, ensure `EVA_SESSION` is set to `True` in `config/ufo/system.yaml`.
+
+Check the evaluation results in `logs/{task_name}/evaluation.log`.
+
+## References
+
+Follower mode uses a `PlanReader` to parse the plan file and creates a `FollowerSession` to execute the steps.
+
+### PlanReader
+
+The `PlanReader` is located at `ufo/module/sessions/plan_reader.py`.
+
+:::module.sessions.plan_reader.PlanReader
+
+### FollowerSession
+
+The `FollowerSession` is located at `ufo/module/sessions/session.py`.
+
+:::module.sessions.session.FollowerSession
\ No newline at end of file
diff --git a/documents/docs/ufo2/advanced_usage/operator_as_app_agent.md b/documents/docs/ufo2/advanced_usage/operator_as_app_agent.md
new file mode 100644
index 000000000..6c439168e
--- /dev/null
+++ b/documents/docs/ufo2/advanced_usage/operator_as_app_agent.md
@@ -0,0 +1,52 @@
+# Operator as an AppAgent
+
+UFO² supports wrapping third-party agents as AppAgents, enabling them to be orchestrated by the HostAgent in multi-agent workflows. This guide demonstrates how to run **Operator**, an OpenAI-based Conversational UI Agent (CUA), within the UFO² ecosystem.
+
+
+
+## Prerequisites
+
+Before proceeding, ensure that Operator has been properly configured. Follow the setup instructions in the [OpenAI CUA (Operator) guide](../../configuration/models/operator.md).
+
+## Running the Operator
+
+UFO² provides two modes for running Operator:
+
+1. **Single Agent Mode (`operator`)** — Run Operator independently through UFO² as a launcher
+2. **AppAgent Mode (`normal_operator`)** — Run Operator as an `AppAgent` orchestrated by the `HostAgent`
+
+### Single Agent Mode
+
+In single agent mode, Operator functions independently but is launched through UFO². This mode is useful for debugging or quick prototyping.
+
+```powershell
+python -m ufo --mode operator --task --request
+```
+
+**Example:**
+```powershell
+python -m ufo --mode operator --task test_operator --request "Open Notepad and type Hello World"
+```
+
+### AppAgent Mode
+
+In AppAgent mode, Operator is wrapped as an `AppAgent` and can be triggered as a sub-agent within the HostAgent workflow. This enables task decomposition where the HostAgent coordinates multiple agents including Operator.
+
+```powershell
+python -m ufo --mode normal_operator --task --request
+```
+
+**Example:**
+```powershell
+python -m ufo --mode normal_operator --task test_integration --request "Search for Python documentation and open the first result"
+```
+
+## Logs
+
+In both modes, execution logs are saved in:
+
+```
+logs//
+```
+
+These logs follow the same structure and conventions as other UFO² sessions.
\ No newline at end of file
diff --git a/documents/docs/ufo2/app_agent/commands.md b/documents/docs/ufo2/app_agent/commands.md
new file mode 100644
index 000000000..fc35466eb
--- /dev/null
+++ b/documents/docs/ufo2/app_agent/commands.md
@@ -0,0 +1,299 @@
+# AppAgent Command System
+
+AppAgent executes application-level commands through the **MCP (Model-Context Protocol)** system. Commands are dynamically provided by MCP servers and executed through the `CommandDispatcher` interface. This document describes the MCP configuration for AppAgent commands.
+
+---
+
+## Command Execution Architecture
+
+```mermaid
+graph LR
+ Agent[AppAgent] --> Dispatcher[CommandDispatcher]
+ Dispatcher --> MCPClient[MCP Client]
+ MCPClient --> UICollector[UICollector Server]
+ MCPClient --> AppUIExecutor[AppUIExecutor Server]
+ MCPClient --> COMExecutor[COM Executor Servers]
+ MCPClient --> CLIExecutor[CommandLine Executor]
+
+ UICollector --> DataCollection[Data Collection Commands]
+ AppUIExecutor --> UIActions[UI Automation Commands]
+ COMExecutor --> APIActions[Application API Commands]
+ CLIExecutor --> ShellActions[Shell Commands]
+
+ style Agent fill:#e3f2fd
+ style Dispatcher fill:#fff3e0
+ style MCPClient fill:#f1f8e9
+ style UICollector fill:#c8e6c9
+ style AppUIExecutor fill:#fff9c4
+ style COMExecutor fill:#ffccbc
+ style CLIExecutor fill:#d1c4e9
+```
+
+!!!note "Dynamic Commands"
+ AppAgent commands are **not hardcoded**. They are dynamically discovered from configured MCP servers. The available commands depend on:
+
+ - **MCP server configuration** in `config/ufo/mcp.yaml`
+ - **Application context** (e.g., Word, Excel, PowerPoint)
+ - **Installed MCP servers** (local, HTTP, or stdio)
+
+---
+
+## MCP Server Configuration
+
+### Configuration File
+
+AppAgent commands are configured in **`config/ufo/mcp.yaml`**:
+
+```yaml
+# Default configuration for all applications
+AppAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+
+ # Application-specific configurations
+ WINWORD.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: WordCOMExecutor
+ type: local
+ reset: true # Reset on document switch
+
+ EXCEL.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: ExcelCOMExecutor
+ type: local
+ reset: true
+
+ POWERPNT.EXE:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: PowerPointCOMExecutor
+ type: local
+ reset: true
+
+ explorer.exe:
+ action:
+ - namespace: AppUIExecutor
+ type: local
+ - namespace: PDFReaderExecutor
+ type: local
+ reset: true
+```
+
+### MCP Servers Used by AppAgent
+
+| Server | Namespace | Type | Purpose | Command Categories |
+|--------|-----------|------|---------|-------------------|
+| **UICollector** | `UICollector` | Local | Data collection | Screenshot capture, control detection, UI tree |
+| **AppUIExecutor** | `AppUIExecutor` | Local | UI automation | Mouse clicks, keyboard input, text entry |
+| **CommandLineExecutor** | `CommandLineExecutor` | Local | Shell execution | PowerShell, Bash commands |
+| **WordCOMExecutor** | `WordCOMExecutor` | Local | Word automation | Document creation, text manipulation, formatting |
+| **ExcelCOMExecutor** | `ExcelCOMExecutor` | Local | Excel automation | Workbook creation, data entry, charts |
+| **PowerPointCOMExecutor** | `PowerPointCOMExecutor` | Local | PowerPoint automation | Presentation creation, slides, shapes |
+| **PDFReaderExecutor** | `PDFReaderExecutor` | Local | PDF operations | Text extraction, page navigation |
+
+When AppAgent works with specific applications (Word, Excel, PowerPoint), additional **COM executor servers** are automatically loaded to provide native API access alongside UI automation commands. These servers have `reset: true` to prevent state leakage between documents.
+
+---
+
+## Command Discovery
+
+### Listing Available Commands
+
+AppAgent dynamically discovers available commands from MCP servers:
+
+```python
+# Get all available tools from MCP servers
+result = await command_dispatcher.execute_commands([
+ Command(tool_name="list_tools", parameters={})
+])
+
+tools = result[0].result
+# Returns list of all available commands with their schemas
+```
+
+### Command Categories
+
+Commands are categorized by purpose:
+
+| Category | Server | Examples |
+|----------|--------|----------|
+| **Data Collection** | UICollector | `capture_window_screenshot`, `get_app_window_controls_target_info`, `get_ui_tree` |
+| **Mouse Actions** | AppUIExecutor | `click_input`, `click_on_coordinates`, `drag_on_coordinates`, `wheel_mouse_input` |
+| **Keyboard Actions** | AppUIExecutor | `set_edit_text`, `keyboard_input` |
+| **Data Retrieval** | AppUIExecutor | `texts`, `get_text` |
+| **Document API** | WordCOMExecutor | `create_document`, `insert_text`, `save_document` |
+| **Spreadsheet API** | ExcelCOMExecutor | `create_workbook`, `insert_data`, `create_chart` |
+| **Presentation API** | PowerPointCOMExecutor | `create_presentation`, `add_slide`, `insert_shape` |
+| **Shell Execution** | CommandLineExecutor | `execute_command` |
+
+---
+
+## Command Execution
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Executor as ActionExecutor
+ participant Dispatcher as CommandDispatcher
+ participant MCP as MCP Server
+
+ Strategy->>Executor: execute(action_info)
+ Executor->>Dispatcher: execute_commands([Command(...)])
+ Dispatcher->>MCP: Invoke tool
+ MCP->>MCP: Execute command logic
+ MCP-->>Dispatcher: Result
+ Dispatcher-->>Executor: Result
+ Executor-->>Strategy: Success/Error
+```
+
+### Example: Execute UI Command
+
+```python
+from aip.messages import Command
+
+# Create command
+command = Command(
+ tool_name="click_input",
+ parameters={
+ "id": "12",
+ "name": "Export",
+ "button": "left",
+ "double": False
+ },
+ tool_type="action",
+)
+
+# Execute command
+results = await command_dispatcher.execute_commands([command])
+
+# Check result
+if results[0].status == "SUCCESS":
+ print(f"Command executed: {results[0].result}")
+```
+
+---
+
+## Configuration Resources
+
+For detailed MCP configuration, server setup, and command reference:
+
+**Quick References:**
+
+- **[MCP Configuration Reference](../../configuration/system/mcp_reference.md)** - Quick MCP settings reference
+- **[MCP Overview](../../mcp/overview.md)** - MCP architecture and concepts
+
+**Configuration Guides:**
+
+- **[MCP Configuration Guide](../../mcp/configuration.md)** - Complete configuration documentation
+- **[Local Servers](../../mcp/local_servers.md)** - Built-in MCP servers
+- **[Remote Servers](../../mcp/remote_servers.md)** - HTTP and stdio servers
+- **[Creating MCP Servers](../../tutorials/creating_mcp_servers.md)** - Creating custom MCP servers
+
+**Server Type Documentation:**
+
+- **[Action Servers](../../mcp/action.md)** - Action server documentation
+- **[Data Collection Servers](../../mcp/data_collection.md)** - Data collection server documentation
+
+### Detailed Server Documentation
+
+Each MCP server has comprehensive documentation:
+
+| Server | Documentation | Command Details |
+|--------|--------------|----------------|
+| UICollector | [UICollector Server](../../mcp/servers/ui_collector.md) | Screenshot, control detection, UI tree commands |
+| AppUIExecutor | [AppUIExecutor Server](../../mcp/servers/app_ui_executor.md) | UI automation commands with parameters |
+| WordCOMExecutor | [Word COM Executor](../../mcp/servers/word_com_executor.md) | Microsoft Word API commands |
+| ExcelCOMExecutor | [Excel COM Executor](../../mcp/servers/excel_com_executor.md) | Microsoft Excel API commands |
+| PowerPointCOMExecutor | [PowerPoint COM Executor](../../mcp/servers/ppt_com_executor.md) | Microsoft PowerPoint API commands |
+| PDFReaderExecutor | [PDF Reader Executor](../../mcp/servers/pdf_reader_executor.md) | PDF reading commands |
+| CommandLineExecutor | [CommandLine Executor](../../mcp/servers/command_line_executor.md) | Shell command execution |
+
+!!!warning "Command Details Subject to Change"
+ Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the **server-specific documentation** for the most up-to-date command reference.
+
+---
+
+## Agent Configuration Settings
+
+### AppAgent Configuration
+
+```yaml
+# config/ufo/app_agent_config.yaml
+system:
+ # Control detection backend
+ control_backend:
+ - "uia" # Windows UI Automation
+ - "omniparser" # Vision-based detection
+
+ # Screenshot settings
+ save_full_screen: true # Also capture desktop
+ save_ui_tree: true # Save UI tree JSON
+ include_last_screenshot: true # Include previous step
+ concat_screenshot: true # Concatenate clean + annotated
+
+ # Window behavior
+ maximize_window: false # Maximize on selection
+ show_visual_outline_on_screen: true # Draw red outline
+```
+
+See **[Configuration Overview](../../configuration/system/overview.md)** and **[System Configuration](../../configuration/system/system_config.md)** for complete configuration options.
+
+---
+
+## Related Documentation
+
+**Architecture & Design:**
+
+- **[AppAgent Overview](overview.md)** - High-level AppAgent architecture
+- **[State Machine](state.md)** - State machine documentation
+- **[Processing Strategy](strategy.md)** - 4-phase processing pipeline
+- **[HostAgent Commands](../host_agent/commands.md)** - Desktop-level commands
+
+**Core Features:**
+
+- **[Hybrid Actions](../core_features/hybrid_actions.md)** - MCP command system architecture
+- **[Control Detection](../core_features/control_detection/overview.md)** - UIA and OmniParser backends
+- **[Command Dispatcher](../../infrastructure/modules/dispatcher.md)** - Command routing
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+✅ **MCP-Based**: All commands provided by MCP servers configured in `mcp.yaml`
+✅ **Dynamic Discovery**: Commands discovered at runtime via `list_tools`
+✅ **Application-Specific**: COM executors auto-loaded for Word, Excel, PowerPoint
+✅ **Hybrid Approach**: UI automation + native API commands
+✅ **Configurable**: Extensive MCP server configuration options
+✅ **Documented**: Each server has detailed command reference
+
+!!!warning "Command Details Subject to Change"
+ Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the **server-specific documentation** for the most up-to-date command reference.
+
+**Next Steps:**
+
+1. **Review MCP Configuration**: [MCP Configuration Reference](../../configuration/system/mcp_reference.md)
+2. **Explore Server Documentation**: Click server links above for command details
+3. **Understand Processing**: [Processing Strategy](strategy.md) shows commands in action
+4. **Learn State Machine**: [State Machine](state.md) explains when commands execute
diff --git a/documents/docs/ufo2/app_agent/overview.md b/documents/docs/ufo2/app_agent/overview.md
new file mode 100644
index 000000000..b6cd8805e
--- /dev/null
+++ b/documents/docs/ufo2/app_agent/overview.md
@@ -0,0 +1,293 @@
+# AppAgent: Application Execution Agent
+
+**AppAgent** is the core execution runtime in UFO, responsible for carrying out individual subtasks within a specific Windows application. Each AppAgent functions as an isolated, application-specialized worker process launched and orchestrated by the central HostAgent.
+
+---
+
+## What is AppAgent?
+
+
+ 
+ AppAgent Architecture: Application-specialized worker process for subtask execution
+
+
+**AppAgent** operates as a **child agent** under the HostAgent's orchestration:
+
+- **Isolated Runtime**: Each AppAgent is dedicated to a single Windows application
+- **Subtask Executor**: Executes specific subtasks delegated by HostAgent
+- **Application Expert**: Tailored with deep knowledge of the target app's API surface, control semantics, and domain logic
+- **Hybrid Execution**: Leverages both GUI automation and API-based actions through MCP commands
+
+Unlike monolithic Computer-Using Agents (CUAs) that treat all GUI contexts uniformly, each AppAgent is tailored to a single application and operates with specialized knowledge of its interface and capabilities.
+
+---
+
+## Core Responsibilities
+
+```mermaid
+graph TB
+ subgraph "AppAgent Core Responsibilities"
+ SR[Sense: Capture Application State]
+ RE[Reason: Analyze Next Action]
+ EX[Execute: GUI or API Action]
+ RP[Report: Write Results to Blackboard]
+ end
+
+ SR --> RE
+ RE --> EX
+ EX --> RP
+ RP --> SR
+
+ style SR fill:#e3f2fd
+ style RE fill:#fff3e0
+ style EX fill:#f1f8e9
+ style RP fill:#fce4ec
+```
+
+| Responsibility | Description | Example |
+|---------------|-------------|---------|
+| **State Sensing** | Capture application UI, detect controls, understand current state | Screenshot Word window → Detect 50 controls → Annotate UI elements |
+| **Reasoning** | Analyze state and determine next action using LLM | "Table visible with Export button [12] → Click to export data" |
+| **Action Execution** | Execute GUI clicks or API calls via MCP commands | `click_input(control_id=12)` or `execute_word_command("export_table")` |
+| **Result Reporting** | Write execution results to shared Blackboard | Write extracted data to `subtask_result_1` for HostAgent |
+
+---
+
+## ReAct-Style Control Loop
+
+Upon receiving a subtask and execution context from the HostAgent, the AppAgent initializes a **ReAct-style control loop** where it iteratively:
+
+1. **Observes** the current application state (screenshot + control detection)
+2. **Thinks** about the next step (LLM reasoning)
+3. **Acts** by executing either a GUI or API-based action (MCP commands)
+
+```mermaid
+sequenceDiagram
+ participant HostAgent
+ participant AppAgent
+ participant Application
+ participant Blackboard
+
+ HostAgent->>AppAgent: Delegate subtask "Extract table from Word"
+
+ loop ReAct Loop
+ AppAgent->>Application: Observe (screenshot + controls)
+ Application-->>AppAgent: UI state
+ AppAgent->>AppAgent: Think (LLM reasoning)
+ AppAgent->>Application: Act (click/API call)
+ Application-->>AppAgent: Action result
+ end
+
+ AppAgent->>Blackboard: Write result
+ AppAgent->>HostAgent: Return control
+```
+
+The MCP command system enables **reliable control** over dynamic and complex UIs by favoring structured API commands whenever available, while retaining fallback to GUI-based interaction commands when necessary.
+
+---
+
+## Execution Architecture
+
+### Finite State Machine
+
+AppAgent uses a finite state machine with 7 states to control its execution flow:
+
+- **CONTINUE**: Continue processing the current subtask
+- **FINISH**: Successfully complete the subtask
+- **ERROR**: Encounter an unrecoverable error
+- **FAIL**: Fail to complete the subtask
+- **PENDING**: Wait for user input or clarification
+- **CONFIRM**: Request user confirmation for sensitive actions
+- **SCREENSHOT**: Capture and re-annotate the application screenshot
+
+**State Details**: See [State Machine Documentation](state.md) for complete state definitions and transitions.
+
+### 4-Phase Processing Pipeline
+
+Each execution round follows a 4-phase pipeline:
+
+```mermaid
+graph LR
+ DC[Phase 1: DATA_COLLECTION Screenshot + Controls] --> LLM[Phase 2: LLM_INTERACTION Reasoning]
+ LLM --> AE[Phase 3: ACTION_EXECUTION GUI/API Action]
+ AE --> MU[Phase 4: MEMORY_UPDATE Record Action]
+
+ style DC fill:#e1f5ff
+ style LLM fill:#fff4e6
+ style AE fill:#e8f5e9
+ style MU fill:#fce4ec
+```
+
+**Strategy Details**: See [Processing Strategy Documentation](strategy.md) for complete pipeline implementation.
+
+---
+
+## Hybrid GUI–API Execution
+
+AppAgent executes actions through the **MCP (Model-Context Protocol) command system**, which provides a unified interface for both GUI automation and native API calls:
+
+```python
+# GUI-based command (fallback)
+command = Command(
+ tool_name="click_input",
+ parameters={"control_id": "12", "button": "left"}
+)
+await command_dispatcher.execute_commands([command])
+
+# API-based command (preferred when available)
+command = Command(
+ tool_name="word_export_table",
+ parameters={"format": "csv", "path": "output.csv"}
+)
+await command_dispatcher.execute_commands([command])
+```
+
+**Implementation**: See [Hybrid Actions](../core_features/hybrid_actions.md) for details on the MCP command system.
+
+---
+
+## Knowledge Enhancement
+
+AppAgent is enhanced with **Retrieval Augmented Generation (RAG)** from heterogeneous sources:
+
+| Knowledge Source | Purpose | Configuration |
+|-----------------|---------|---------------|
+| **Help Documents** | Application-specific documentation | [Learning from Help Documents](../core_features/knowledge_substrate/learning_from_help_document.md) |
+| **Bing Search** | Latest information and updates | [Learning from Bing Search](../core_features/knowledge_substrate/learning_from_bing_search.md) |
+| **Self-Demonstrations** | Successful action trajectories | [Experience Learning](../core_features/knowledge_substrate/experience_learning.md) |
+| **Human Demonstrations** | Expert-provided workflows | [Learning from Demonstrations](../core_features/knowledge_substrate/learning_from_demonstration.md) |
+
+**Knowledge Substrate Overview**: See [Knowledge Substrate](../core_features/knowledge_substrate/overview.md) for the complete RAG architecture.
+
+---
+
+## Command System
+
+AppAgent executes actions through the **MCP (Model-Context Protocol)** command system:
+
+**Application-Level Commands**:
+
+- `capture_window_screenshot` - Capture application window
+- `get_control_info` - Detect UI controls via UIA/OmniParser
+- `click_input` - Click on UI control
+- `set_edit_text` - Type text into input field
+- `annotation` - Annotate screenshot with control labels
+
+**Command Details**: See [Command System Documentation](commands.md) for complete command reference.
+
+---
+
+## Control Detection Backends
+
+AppAgent supports multiple control detection backends for comprehensive UI understanding:
+
+**UIA (UI Automation):**
+Native Windows UI Automation API for standard controls
+
+- ✅ Fast and accurate
+- ✅ Works with most Windows applications
+- ❌ May miss custom controls
+
+**OmniParser (Visual Detection):**
+Vision-based grounding model for visual elements
+
+- ✅ Detects icons, images, custom controls
+- ✅ Works with web content
+- ❌ Requires external service
+
+**Hybrid (UIA + OmniParser):**
+Best of both worlds - maximum coverage
+
+- ✅ Native controls + visual elements
+- ✅ Comprehensive UI understanding
+
+**Control Detection Details**: See [Control Detection Overview](../core_features/control_detection/overview.md).
+
+---
+
+## Input and Output
+
+### AppAgent Input
+
+| Input | Description | Source |
+|-------|-------------|--------|
+| **User Request** | Original user request in natural language | HostAgent |
+| **Sub-Task** | Specific subtask to execute | HostAgent delegation |
+| **Application Context** | Target app name, window info | HostAgent |
+| **Control Information** | Detected UI controls with labels | Data collection phase |
+| **Screenshots** | Clean, annotated, previous step images | Data collection phase |
+| **Blackboard** | Shared memory for inter-agent communication | Global context |
+| **Retrieved Knowledge** | Help docs, demos, search results | RAG system |
+
+### AppAgent Output
+
+| Output | Description | Consumer |
+|--------|-------------|----------|
+| **Observation** | Current UI state description | LLM context |
+| **Thought** | Reasoning about next action | Execution log |
+| **ControlLabel** | Selected control to interact with | Action executor |
+| **Function** | MCP command to execute (click_input, set_edit_text, etc.) | Command dispatcher |
+| **Args** | Command parameters | Command dispatcher |
+| **Status** | Agent state (CONTINUE, FINISH, etc.) | State machine |
+| **Blackboard Update** | Execution results | HostAgent |
+
+**Example Output**:
+```json
+{
+ "Observation": "Word document with table, Export button at [12]",
+ "Thought": "Click Export to extract table data",
+ "ControlLabel": "12",
+ "Function": "click_input",
+ "Args": {"button": "left"},
+ "Status": "CONTINUE"
+}
+```
+
+---
+
+## Related Documentation
+
+**Detailed Documentation:**
+
+- **[State Machine](state.md)**: Complete FSM with state definitions and transitions
+- **[Processing Strategy](strategy.md)**: 4-phase pipeline implementation details
+- **[Command System](commands.md)**: Application-level MCP commands reference
+
+**Core Features:**
+
+- **[Hybrid Actions](../core_features/hybrid_actions.md)**: MCP command system for GUI–API execution
+- **[Control Detection](../core_features/control_detection/overview.md)**: UIA and visual detection
+- **[Knowledge Substrate](../core_features/knowledge_substrate/overview.md)**: RAG system overview
+
+**Tutorials:**
+
+- **[Creating AppAgent](../../tutorials/creating_app_agent/overview.md)**: Step-by-step guide
+- **[Help Document Provision](../../tutorials/creating_app_agent/help_document_provision.md)**: Add help docs
+- **[Demonstration Provision](../../tutorials/creating_app_agent/demonstration_provision.md)**: Add demos
+- **[Wrapping App-Native API](../../tutorials/creating_app_agent/warpping_app_native_api.md)**: Integrate APIs
+
+---
+
+## API Reference
+
+:::agents.agent.app_agent.AppAgent
+
+---
+
+## Summary
+
+**AppAgent Key Characteristics:**
+
+✅ **Application-Specialized Worker**: Dedicated to single Windows application
+✅ **ReAct Control Loop**: Iterative observe → think → act execution
+✅ **Hybrid Execution**: GUI automation + API calls via MCP commands
+✅ **7-State FSM**: Robust state management for execution control
+✅ **4-Phase Pipeline**: Structured data collection → reasoning → action → memory
+✅ **Knowledge-Enhanced**: RAG from docs, demos, and search
+✅ **Orchestrated by HostAgent**: Child agent in hierarchical architecture
+
+**Next Steps:**
+
+1. **Deep Dive**: Read [State Machine](state.md) and [Processing Strategy](strategy.md) for implementation details
+2. **Learn Features**: Explore [Core Features](../core_features/hybrid_actions.md) for advanced capabilities
+3. **Hands-On Tutorial**: Follow [Creating AppAgent](../../tutorials/creating_app_agent/overview.md) guide
diff --git a/documents/docs/ufo2/app_agent/state.md b/documents/docs/ufo2/app_agent/state.md
new file mode 100644
index 000000000..e32df21ea
--- /dev/null
+++ b/documents/docs/ufo2/app_agent/state.md
@@ -0,0 +1,842 @@
+# AppAgent State Machine
+
+AppAgent uses a **7-state finite state machine (FSM)** to control execution flow within a specific Windows application. The state machine manages subtask execution, UI re-annotation, user confirmations, error handling, and handoff back to HostAgent.
+
+---
+
+## State Overview
+
+AppAgent implements a robust 7-state FSM defined in `ufo/agents/states/app_agent_state.py`:
+
+```mermaid
+graph TB
+ subgraph "Execution States"
+ CONTINUE[CONTINUE Main Execution]
+ SCREENSHOT[SCREENSHOT UI Re-annotation]
+ end
+
+ subgraph "Interaction States"
+ PENDING[PENDING Await User Input]
+ CONFIRM[CONFIRM Safety Confirmation]
+ end
+
+ subgraph "Terminal States"
+ FINISH[FINISH Success Return]
+ FAIL[FAIL Failed Return]
+ ERROR[ERROR Error Return]
+ end
+
+ style CONTINUE fill:#e3f2fd
+ style SCREENSHOT fill:#fff3e0
+ style PENDING fill:#f1f8e9
+ style CONFIRM fill:#fce4ec
+ style FINISH fill:#c8e6c9
+ style FAIL fill:#ffe0b2
+ style ERROR fill:#ffcdd2
+```
+
+### State Enumeration
+
+```python
+class AppAgentStatus(Enum):
+ """Store the status of the app agent."""
+
+ CONTINUE = "CONTINUE" # Main execution state
+ SCREENSHOT = "SCREENSHOT" # Re-annotation state
+ FINISH = "FINISH" # Subtask completed successfully
+ FAIL = "FAIL" # Subtask failed but recoverable
+ PENDING = "PENDING" # Awaiting user input
+ CONFIRM = "CONFIRM" # Safety confirmation required
+ ERROR = "ERROR" # Critical failure
+```
+
+| State | Purpose | Processor Executed | Subtask Ends | Returns to HostAgent |
+|-------|---------|-------------------|--------------|---------------------|
+| **CONTINUE** | Main execution - interact with app controls | ✅ Yes (4 phases) | ❌ No | ❌ No |
+| **SCREENSHOT** | Re-capture and re-annotate UI after changes | ✅ Yes (4 phases) | ❌ No | ❌ No |
+| **FINISH** | Subtask completed successfully | ❌ No | ✅ Yes | ✅ Yes |
+| **FAIL** | Subtask failed but can be retried | ❌ No | ✅ Yes | ✅ Yes |
+| **PENDING** | Await user input for clarification | ✅ Yes (ask user) | ❌ No | ❌ No |
+| **CONFIRM** | Request user approval for safety-critical action | ✅ Yes (present dialog) | ❌ No | ❌ No |
+| **ERROR** | Unhandled exception or critical failure | ❌ No | ✅ Yes | ✅ Yes |
+
+---
+
+## State Definitions
+
+### CONTINUE State
+
+**Purpose**: Main execution state where AppAgent iteratively interacts with the application.
+
+```python
+@AppAgentStateManager.register
+class ContinueAppAgentState(AppAgentState):
+ """The class for the continue app agent state."""
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """
+ Handle the agent for the current step.
+ :param agent: The agent for the current step.
+ :param context: The context for the agent and session.
+ """
+ await agent.process(context)
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.CONTINUE.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Execution |
+| **Processor Executed** | ✓ Yes (4-phase pipeline) |
+| **Subtask Ends** | No |
+| **Round Ends** | No |
+| **Next States** | CONTINUE / SCREENSHOT / FINISH / PENDING / CONFIRM / ERROR |
+
+**Behavior**:
+
+- Executes 4-phase processing pipeline (DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE)
+- LLM analyzes UI and selects control to interact with
+- Executes action on selected control
+- Records action in memory and Blackboard
+- Transitions based on LLM's `Status` field in response
+
+**Example Flow**:
+```
+CONTINUE → Capture UI → LLM selects "Export [12]" → Click control 12
+→ LLM returns Status: "SCREENSHOT" → Transition to SCREENSHOT
+```
+
+CONTINUE is the primary execution state where AppAgent spends most of its time during subtask execution.
+
+---
+
+### SCREENSHOT State
+
+**Purpose**: Re-capture and re-annotate UI after control interactions that change the interface.
+
+```python
+@AppAgentStateManager.register
+class ScreenshotAppAgentState(ContinueAppAgentState):
+ """The class for the screenshot app agent state."""
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.SCREENSHOT.value
+
+ def next_state(self, agent: BasicAgent) -> AgentState:
+ """Determine next state based on control_reannotate."""
+ agent_processor = agent.processor
+
+ if agent_processor is None:
+ agent.status = AppAgentStatus.CONTINUE.value
+ return ContinueAppAgentState()
+
+ control_reannotate = agent_processor.control_reannotate
+
+ if control_reannotate is None or len(control_reannotate) == 0:
+ agent.status = AppAgentStatus.CONTINUE.value
+ return ContinueAppAgentState()
+ else:
+ return super().next_state(agent)
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return False
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Execution |
+| **Processor Executed** | ✓ Yes (same as CONTINUE) |
+| **Subtask Ends** | No |
+| **Duration** | Single re-annotation cycle |
+| **Next States** | SCREENSHOT (if controls need re-annotation) / CONTINUE (if complete) |
+
+**Behavior**:
+
+- Inherits from `ContinueAppAgentState` - executes same 4-phase pipeline
+- Re-captures screenshot after UI changes (dialog opened, menu expanded, etc.)
+- Re-detects and re-annotates controls with updated labels
+- Checks `control_reannotate` to determine if more re-annotation needed
+- Transitions to CONTINUE once UI stabilizes
+
+**When to Use**:
+
+- LLM sets `Status: "SCREENSHOT"` when it expects UI changes
+- After clicking buttons that open dialogs
+- After expanding dropdown menus or combo boxes
+- After any action that significantly alters the UI
+
+**Screenshot Example:**
+
+```
+Action: Click "Export" button [12]
+→ Dialog opens with new controls
+→ LLM sets Status: "SCREENSHOT"
+→ SCREENSHOT state re-annotates dialog controls as [1], [2], [3]...
+→ Transitions to CONTINUE with fresh annotations
+```
+
+---
+
+### FINISH State
+
+**Purpose**: Subtask completed successfully - archive results and return control to HostAgent.
+
+```python
+@AppAgentStateManager.register
+class FinishAppAgentState(AppAgentState):
+ """The class for the finish app agent state."""
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Archive subtask result."""
+ if agent.processor:
+ result = agent.processor.processing_context.get_local("result")
+ else:
+ result = None
+
+ await self.archive_subtask(context, result)
+
+ def next_agent(self, agent: "AppAgent") -> HostAgent:
+ """Get the agent for the next step."""
+ return agent.host
+
+ def next_state(self, agent: "AppAgent") -> HostAgentState:
+ """Get the next state of the agent."""
+ if agent.mode == "follower":
+ return FinishHostAgentState()
+ else:
+ return ContinueHostAgentState()
+```
+
+FINISH indicates successful completion. The subtask result is available in the Blackboard for HostAgent to access and use in subsequent orchestration decisions.
+
+---
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.FINISH.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Subtask Ends** | ✓ Yes |
+| **Round Ends** | No (HostAgent continues) |
+| **Next Agent** | HostAgent |
+| **Next States** | HostAgent.CONTINUE (normal) / HostAgent.FINISH (follower mode) |
+
+**Behavior**:
+
+- Archives subtask to `previous_subtasks` with status and result
+- Writes execution results to Blackboard for HostAgent
+- Returns control to HostAgent
+- HostAgent determines next action (new subtask, finish, etc.)
+
+**Transition Logic**:
+
+```python
+# In LLM response
+{
+ "Status": "FINISH",
+ "Comment": "Table data successfully extracted and saved"
+}
+
+# Next agent and state
+next_agent = agent.host # HostAgent
+next_state = ContinueHostAgentState() # HostAgent continues orchestration
+```
+
+!!!success "Subtask Completion"
+ FINISH indicates successful completion. The subtask result is available in the Blackboard for HostAgent to access and use in subsequent orchestration decisions.
+
+---
+
+### PENDING State
+
+**Purpose**: Await user input to clarify ambiguous situations or provide additional information.
+
+```python
+@AppAgentStateManager.register
+class PendingAppAgentState(AppAgentState):
+ """The class for the pending app agent state."""
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Ask the user questions to help the agent proceed."""
+ agent.process_asker(ask_user=ufo_config.system.ask_question)
+
+ def next_state(self, agent: AppAgent) -> AppAgentState:
+ """Get the next state of the agent."""
+ agent.status = AppAgentStatus.CONTINUE.value
+ return ContinueAppAgentState()
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.PENDING.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Interaction |
+| **Processor Executed** | ✓ Yes (ask user) |
+| **Subtask Ends** | No |
+| **Duration** | Until user responds |
+| **Next States** | CONTINUE (user provided input) |
+
+**Behavior**:
+
+- Displays question to user via `process_asker`
+- Waits for user response (configurable via `ask_question` setting)
+- User input is added to context for next CONTINUE execution
+- Always transitions to CONTINUE after user responds
+
+**Use Cases**:
+
+- Ambiguous control selection: "Which 'Export' button should I click?"
+- Missing information: "What filename should I use for the export?"
+- Clarification needed: "Should I overwrite the existing file?"
+
+!!!warning "Configuration Required"
+ Set `system.ask_question = true` in configuration to enable PENDING state user interaction. If disabled, the agent will skip asking and make a best-effort decision.
+
+---
+
+### CONFIRM State
+
+**Purpose**: Request user approval before executing safety-critical or irreversible actions.
+
+```python
+@AppAgentStateManager.register
+class ConfirmAppAgentState(AppAgentState):
+ """The class for the confirm app agent state."""
+
+ def __init__(self) -> None:
+ """Initialize the confirm state."""
+ self._confirm = None
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Request user confirmation for the action."""
+ # If safe guard disabled, proceed automatically
+ if not ufo_config.system.safe_guard:
+ await agent.process_resume()
+ self._confirm = True
+ return
+
+ # Ask user for confirmation
+ self._confirm = agent.process_confirmation()
+
+ # If user confirms, resume the task
+ if self._confirm:
+ await agent.process_resume()
+
+ def next_state(self, agent: AppAgent) -> AppAgentState:
+ """Get the next state based on user decision."""
+ if self._confirm:
+ agent.status = AppAgentStatus.CONTINUE.value
+ return ContinueAppAgentState()
+ else:
+ agent.status = AppAgentStatus.FINISH.value
+ return FinishAppAgentState()
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return False
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.CONFIRM.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Interaction |
+| **Processor Executed** | ✓ Yes (present confirmation) |
+| **Subtask Ends** | No |
+| **Duration** | Until user approves/rejects |
+| **Next States** | CONTINUE (approved) / FINISH (rejected) |
+
+**Behavior**:
+
+- Presents action for user approval via `process_confirmation`
+- Waits for user decision (approve/reject)
+- If approved: Resumes processing via `process_resume` → CONTINUE
+- If rejected: Archives subtask → FINISH
+- Bypassed if `safe_guard` configuration is disabled
+
+**Safety-Critical Actions**:
+
+- File deletions: "About to delete file.txt - Confirm?"
+- Application launches: "Launch Calculator.exe?"
+- System configuration changes: "Modify registry key?"
+
+!!!warning "Safety Mechanism"
+ CONFIRM provides a safety net for potentially destructive operations. Configure `system.safe_guard = true` to enable confirmation prompts.
+
+---
+
+### ERROR State
+
+**Purpose**: Handle unrecoverable exceptions and critical failures - archive error and return to HostAgent.
+
+```python
+@AppAgentStateManager.register
+class ErrorAppAgentState(AppAgentState):
+ """The class for the error app agent state."""
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Archive subtask with error result."""
+ if agent.processor:
+ result = agent.processor.processing_context.get_local("result")
+ else:
+ result = None
+
+ await self.archive_subtask(context, result)
+
+ def next_agent(self, agent: "AppAgent") -> HostAgent:
+ """Get the agent for the next step."""
+ return agent.host
+
+ def next_state(self, agent: "AppAgent") -> HostAgentState:
+ """Get the next state of the agent."""
+ return FinishHostAgentState()
+
+ def is_round_end(self) -> bool:
+ """Check if the round ends."""
+ return True
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.ERROR.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Subtask Ends** | ✓ Yes |
+| **Round Ends** | ✓ Yes |
+| **Next Agent** | HostAgent |
+| **Next States** | HostAgent.FINISH (terminate round) |
+
+**Behavior**:
+
+- Archives subtask with error status and error details
+- Returns control to HostAgent
+- HostAgent transitions to FINISH (ends current round)
+- Error details logged for debugging
+
+**Error Scenarios**:
+
+- Unhandled Python exceptions during processing
+- Critical LLM failures (timeout, invalid response)
+- Command dispatcher failures
+- Unrecoverable application crashes
+
+!!!danger "Terminal State"
+ ERROR terminates both the subtask and the current round. HostAgent will end the session or start a new round depending on configuration.
+
+---
+
+### FAIL State
+
+**Purpose**: Handle recoverable failures - archive failed subtask and return to HostAgent for retry or alternative approach.
+
+```python
+@AppAgentStateManager.register
+class FailAppAgentState(AppAgentState):
+ """The class for the fail app agent state."""
+
+ async def handle(
+ self, agent: "AppAgent", context: Optional["Context"] = None
+ ) -> None:
+ """Archive subtask with failure result."""
+ if agent.processor:
+ result = agent.processor.processing_context.get_local("result")
+ else:
+ result = None
+
+ await self.archive_subtask(context, result)
+
+ def next_agent(self, agent: "AppAgent") -> HostAgent:
+ """Get the agent for the next step."""
+ return agent.host
+
+ def next_state(self, agent: "AppAgent") -> HostAgentState:
+ """Get the next state of the agent."""
+ return FinishHostAgentState()
+
+ def is_round_end(self) -> bool:
+ """Check if the round ends."""
+ return False
+
+ def is_subtask_end(self) -> bool:
+ """Check if the subtask ends."""
+ return True
+
+ @classmethod
+ def name(cls) -> str:
+ """The class name of the state."""
+ return AppAgentStatus.FAIL.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Subtask Ends** | ✓ Yes |
+| **Round Ends** | ✗ No (unlike ERROR) |
+| **Next Agent** | HostAgent |
+| **Next States** | HostAgent.FINISH (but round doesn't end) |
+
+**Behavior**:
+
+- Archives subtask with FAIL status and failure details
+- Returns control to HostAgent
+- HostAgent can retry subtask or try alternative approach
+- Unlike ERROR, does not terminate the round
+- Allows for graceful degradation and recovery
+
+**Failure Scenarios**:
+
+- Control not found but task can be retried
+- Action timeout but application still responsive
+- Partial completion with known issues
+- Expected failure conditions
+
+!!!info "Recoverable Failures"
+ FAIL indicates a recoverable failure that the HostAgent can handle gracefully, unlike ERROR which terminates the entire round. Use FAIL when the task failed but the system is still in a valid state.
+
+---
+
+## State Transition Diagram
+
+```mermaid
+stateDiagram-v2
+ [*] --> CONTINUE: HostAgent Delegates Subtask
+
+ CONTINUE --> CONTINUE: LLM: More actions Status: CONTINUE
+ CONTINUE --> SCREENSHOT: LLM: UI changed Status: SCREENSHOT
+ CONTINUE --> FINISH: LLM: Complete Status: FINISH
+ CONTINUE --> FAIL: LLM: Failed Status: FAIL
+ CONTINUE --> CONFIRM: LLM: Need approval Status: CONFIRM
+ CONTINUE --> PENDING: LLM: Need info Status: PENDING
+ CONTINUE --> ERROR: System: Exception Status: ERROR
+
+ SCREENSHOT --> SCREENSHOT: System: More re-annotation
+ SCREENSHOT --> CONTINUE: System: Re-annotation done
+
+ CONFIRM --> CONTINUE: User: Approved
+ CONFIRM --> FINISH: User: Rejected
+
+ PENDING --> CONTINUE: User: Provided input
+
+ FINISH --> HostAgent_CONTINUE: Return to HostAgent
+ FAIL --> HostAgent_CONTINUE: Return to HostAgent (Can retry)
+ ERROR --> HostAgent_FINISH: Return to HostAgent
+
+ HostAgent_CONTINUE --> [*]: HostAgent Takes Control
+ HostAgent_FINISH --> [*]: Round Terminated
+
+ note right of CONTINUE: Main execution 4-phase pipeline
+ note right of SCREENSHOT: UI re-annotation after changes
+ note left of CONFIRM: Safety check for critical actions
+ note left of PENDING: User input for clarification
+```
+
+
+ 
+ AppAgent State Machine: Visual representation of the 6-state FSM with transitions and conditions
+
+
+---
+
+## State Transition Control
+
+### LLM-Driven Transitions
+
+Most state transitions are controlled by the LLM through the `Status` field in its response:
+
+```json
+{
+ "Observation": "Word document with Export button [12] visible",
+ "Thought": "I should click the Export button to extract table data",
+ "ControlLabel": "12",
+ "ControlText": "Export",
+ "Function": "click_input",
+ "Args": {"button": "left"},
+ "Status": "SCREENSHOT",
+ "Comment": "Clicking Export will open a dialog"
+}
+```
+
+**Status Mapping**:
+
+| LLM Status Value | Next State | Decision Logic |
+|-----------------|------------|----------------|
+| `"CONTINUE"` | CONTINUE | More actions needed, continue execution |
+| `"SCREENSHOT"` | SCREENSHOT | UI will change, re-annotate controls |
+| `"FINISH"` | FINISH | Subtask complete, return to HostAgent |
+| `"FAIL"` | FAIL | Subtask failed but recoverable |
+| `"PENDING"` | PENDING | Need user clarification |
+| `"CONFIRM"` | CONFIRM | Safety-critical action needs approval |
+| `"ERROR"` | ERROR | Manually triggered error (rare) |
+
+### System-Driven Transitions
+
+Some transitions are triggered by system conditions:
+
+```python
+# Exception handling in processor
+try:
+ result = await processor.process(agent, context)
+except Exception as e:
+ agent.status = AppAgentStatus.ERROR.value
+ # Transitions to ERROR state
+
+# Screenshot re-annotation check
+if control_reannotate and len(control_reannotate) > 0:
+ # Stay in SCREENSHOT state
+ return ScreenshotAppAgentState()
+else:
+ # Transition to CONTINUE
+ agent.status = AppAgentStatus.CONTINUE.value
+ return ContinueAppAgentState()
+```
+
+---
+
+## Implementation Details
+
+### State Class Hierarchy
+
+```mermaid
+classDiagram
+ class AgentState {
+ <>
+ +handle(agent, context)*
+ +next_agent(agent)*
+ +next_state(agent)*
+ +is_subtask_end()*
+ +is_round_end()
+ +name()*
+ }
+
+ class AppAgentState {
+ <>
+ +agent_class() AppAgent
+ +archive_subtask(context, result)
+ }
+
+ class ContinueAppAgentState {
+ +handle() await agent.process()
+ +is_subtask_end() False
+ +name() "CONTINUE"
+ }
+
+ class ScreenshotAppAgentState {
+ +next_state() check control_reannotate
+ +name() "SCREENSHOT"
+ }
+
+ class FinishAppAgentState {
+ +handle() archive_subtask
+ +next_agent() HostAgent
+ +next_state() HostAgent.CONTINUE
+ +is_subtask_end() True
+ +name() "FINISH"
+ }
+
+ class PendingAppAgentState {
+ +handle() process_asker
+ +next_state() CONTINUE
+ +name() "PENDING"
+ }
+
+ class ConfirmAppAgentState {
+ -_confirm: bool
+ +handle() process_confirmation
+ +next_state() CONTINUE or FINISH
+ +name() "CONFIRM"
+ }
+
+ class ErrorAppAgentState {
+ +handle() archive_subtask
+ +next_agent() HostAgent
+ +next_state() HostAgent.FINISH
+ +is_round_end() True
+ +is_subtask_end() True
+ +name() "ERROR"
+ }
+
+ class FailAppAgentState {
+ +handle() archive_subtask
+ +next_agent() HostAgent
+ +next_state() HostAgent.FINISH
+ +is_round_end() False
+ +is_subtask_end() True
+ +name() "FAIL"
+ }
+
+ AgentState <|-- AppAgentState
+ AppAgentState <|-- ContinueAppAgentState
+ AppAgentState <|-- FinishAppAgentState
+ AppAgentState <|-- PendingAppAgentState
+ AppAgentState <|-- ConfirmAppAgentState
+ AppAgentState <|-- ErrorAppAgentState
+ AppAgentState <|-- FailAppAgentState
+ ContinueAppAgentState <|-- ScreenshotAppAgentState
+```
+
+### State Manager Registry
+
+```python
+class AppAgentStateManager(AgentStateManager):
+ """State manager for AppAgent with registration system."""
+
+ _state_mapping: Dict[str, Type[AppAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ """The none state of the state manager."""
+ return NoneAppAgentState()
+
+# States are registered via decorator
+@AppAgentStateManager.register
+class ContinueAppAgentState(AppAgentState):
+ ...
+```
+
+**Registration Benefits**:
+
+- Automatic state mapping by name
+- Centralized state lookup via `get_state(status)`
+- Type-safe state retrieval
+- Easy to add new states
+
+---
+
+## Execution Flow Example
+
+### Multi-Step Subtask Execution
+
+```mermaid
+sequenceDiagram
+ participant HostAgent
+ participant AppAgent
+ participant CONTINUE
+ participant SCREENSHOT
+ participant FINISH
+ participant Application
+
+ HostAgent->>AppAgent: Delegate subtask "Extract table from Word"
+ AppAgent->>CONTINUE: Set state
+
+ rect rgb(230, 240, 255)
+ Note over CONTINUE, Application: Step 1: Capture and analyze
+ CONTINUE->>Application: Capture screenshot
+ Application-->>CONTINUE: Screenshot + 50 controls
+ CONTINUE->>CONTINUE: LLM: "Click Export [12]"
+ CONTINUE->>Application: click_input(12)
+ Application-->>CONTINUE: Dialog opened
+ CONTINUE->>SCREENSHOT: Status: "SCREENSHOT"
+ end
+
+ rect rgb(255, 250, 230)
+ Note over SCREENSHOT, Application: Step 2: Re-annotate
+ SCREENSHOT->>Application: Re-capture screenshot
+ Application-->>SCREENSHOT: Screenshot + 30 dialog controls
+ SCREENSHOT->>SCREENSHOT: LLM: "Select CSV [5]"
+ SCREENSHOT->>Application: click_input(5)
+ Application-->>SCREENSHOT: Format selected
+ SCREENSHOT->>CONTINUE: Re-annotation done
+ end
+
+ rect rgb(230, 255, 240)
+ Note over CONTINUE, Application: Step 3: Complete export
+ CONTINUE->>Application: Capture screenshot
+ Application-->>CONTINUE: Screenshot + updated controls
+ CONTINUE->>CONTINUE: LLM: "Click OK [1]"
+ CONTINUE->>Application: click_input(1)
+ Application-->>CONTINUE: Export complete
+ CONTINUE->>FINISH: Status: "FINISH"
+ end
+
+ FINISH->>HostAgent: Return control subtask result in Blackboard
+```
+
+---
+
+## Related Documentation
+
+**Architecture:**
+
+- **[AppAgent Overview](overview.md)**: High-level architecture and responsibilities
+- **[Processing Strategy](strategy.md)**: 4-phase processing pipeline details
+- **[HostAgent State Machine](../host_agent/state.md)**: Parent agent FSM
+
+**Design Patterns:**
+
+- **[State Layer Design](../../infrastructure/agents/design/state.md)**: FSM design principles
+- **[Processor Framework](../../infrastructure/agents/design/processor.md)**: Processing architecture
+
+---
+
+## API Reference
+
+:::agents.states.app_agent_state.AppAgentState
+:::agents.states.app_agent_state.AppAgentStateManager
+
+---
+
+## Summary
+
+**AppAgent State Machine Key Features:**
+
+✅ **7-State FSM**: CONTINUE, SCREENSHOT, FINISH, FAIL, PENDING, CONFIRM, ERROR
+✅ **LLM-Driven**: Most transitions controlled by LLM's `Status` field
+✅ **UI Re-annotation**: SCREENSHOT state handles dynamic UI changes
+✅ **User Interaction**: PENDING and CONFIRM states for human input
+✅ **Error Handling**: ERROR and FAIL states for graceful failure recovery
+✅ **HostAgent Integration**: FINISH/FAIL/ERROR return control to parent agent
+✅ **Subtask Archiving**: Execution history tracked in `previous_subtasks`
+
+**Next Steps:**
+
+1. **Understand Processing**: Read [Processing Strategy](strategy.md) for pipeline details
+2. **Learn Commands**: Check [Command System](commands.md) for available actions
+3. **Explore Patterns**: Review [State Layer Design](../../infrastructure/agents/design/state.md) for FSM principles
diff --git a/documents/docs/ufo2/app_agent/strategy.md b/documents/docs/ufo2/app_agent/strategy.md
new file mode 100644
index 000000000..15f778f41
--- /dev/null
+++ b/documents/docs/ufo2/app_agent/strategy.md
@@ -0,0 +1,1031 @@
+# AppAgent Processing Strategy
+
+AppAgent executes a **4-phase processing pipeline** in **CONTINUE** and **SCREENSHOT** states. Each phase handles a specific aspect of application-level automation: **data collection** (screenshot + controls), **LLM reasoning**, **action execution**, and **memory recording**. This document details the implementation of each strategy based on the actual codebase.
+
+---
+
+## Strategy Assembly
+
+Processing strategies are **assembled and orchestrated** by the `AppAgentProcessor` class defined in `ufo/agents/processors/app_agent_processor.py`. The processor acts as the **coordinator** that initializes, configures, and executes the 4-phase pipeline for application-level automation.
+
+### AppAgentProcessor Overview
+
+The `AppAgentProcessor` extends `ProcessorTemplate` and serves as the main orchestrator for AppAgent workflows:
+
+```python
+class AppAgentProcessor(ProcessorTemplate):
+ """
+ App Agent Processor - Modern, extensible App Agent processing implementation.
+
+ Processing Pipeline:
+ 1. Data Collection: Screenshot capture and UI control information (composed strategy)
+ 2. LLM Interaction: Context-aware prompting and response parsing
+ 3. Action Execution: UI automation and control interaction
+ 4. Memory Update: Agent memory and blackboard synchronization
+
+ Middleware Stack:
+ - Structured logging and debugging middleware
+ """
+
+ processor_context_class = AppAgentProcessorContext
+
+ def __init__(self, agent: "AppAgent", global_context: "Context"):
+ super().__init__(agent, global_context)
+```
+
+### Strategy Registration
+
+During initialization, `AppAgentProcessor._setup_strategies()` registers all four processing strategies:
+
+```python
+def _setup_strategies(self) -> None:
+ """Setup processing strategies for App Agent."""
+
+ # Phase 1: Data collection (COMPOSED: Screenshot + Control Info)
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ AppScreenshotCaptureStrategy(),
+ AppControlInfoStrategy(),
+ ],
+ name="AppDataCollectionStrategy",
+ fail_fast=True, # Data collection is critical
+ )
+
+ # Phase 2: LLM interaction (critical - fail_fast=True)
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ AppLLMInteractionStrategy(
+ fail_fast=True # LLM failure should trigger recovery
+ )
+ )
+
+ # Phase 3: Action execution (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ AppActionExecutionStrategy(
+ fail_fast=False # Action failures can be handled gracefully
+ )
+ )
+
+ # Phase 4: Memory update (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ AppMemoryUpdateStrategy(
+ fail_fast=False # Memory update failures shouldn't stop process
+ )
+ )
+```
+
+| Phase | Strategy Class | fail_fast | Composition | Rationale |
+|-------|---------------|-----------|-------------|-----------|
+| **DATA_COLLECTION** | `ComposedStrategy` (Screenshot + Control Info) | ✓ True | ✓ Composed | Screenshot and control detection are critical for LLM context |
+| **LLM_INTERACTION** | `AppLLMInteractionStrategy` | ✓ True | ✗ Single | LLM response failure requires immediate recovery |
+| **ACTION_EXECUTION** | `AppActionExecutionStrategy` | ✗ False | ✗ Single | Action failures can be gracefully handled and retried |
+| **MEMORY_UPDATE** | `AppMemoryUpdateStrategy` | ✗ False | ✗ Single | Memory failures shouldn't block the main execution flow |
+
+**Composed Strategy Pattern:**
+Phase 1 uses **ComposedStrategy** to execute two sub-strategies sequentially:
+
+1. **AppScreenshotCaptureStrategy**: Captures application window + desktop screenshots
+2. **AppControlInfoStrategy**: Detects UI controls via UIA/OmniParser and creates annotations
+
+This ensures both screenshot and control data are available together for the LLM analysis phase.
+
+### Middleware Configuration
+
+The processor configures specialized logging middleware:
+
+```python
+def _setup_middleware(self) -> None:
+ """Setup middleware pipeline for App Agent."""
+ self.middleware_chain = [AppAgentLoggingMiddleware()]
+```
+
+**AppAgentLoggingMiddleware** provides:
+
+- Subtask and application context tracking
+- Rich Panel displays with color coding
+- Action execution logging
+- Performance metrics and cost tracking
+
+---
+
+## Processing Pipeline Architecture
+
+```mermaid
+graph TB
+ subgraph "Phase 1: DATA_COLLECTION (ComposedStrategy)"
+ SS[AppScreenshotCaptureStrategy Capture Screenshots]
+ CI[AppControlInfoStrategy Detect & Annotate Controls]
+ SS --> CI
+ end
+
+ subgraph "Phase 2: LLM_INTERACTION"
+ LLM[AppLLMInteractionStrategy LLM Reasoning]
+ end
+
+ subgraph "Phase 3: ACTION_EXECUTION"
+ AE[AppActionExecutionStrategy Execute UI Action]
+ end
+
+ subgraph "Phase 4: MEMORY_UPDATE"
+ MU[AppMemoryUpdateStrategy Record in Memory & Blackboard]
+ end
+
+ CI --> LLM
+ LLM --> AE
+ AE --> MU
+
+ style SS fill:#e1f5ff
+ style CI fill:#e1f5ff
+ style LLM fill:#fff4e6
+ style AE fill:#e8f5e9
+ style MU fill:#fce4ec
+```
+
+---
+
+## Phase 1: DATA_COLLECTION
+
+### Strategy: `ComposedStrategy` (Screenshot + Control Info)
+
+**Purpose**: Gather comprehensive application UI context including screenshots and control information for LLM decision making.
+
+```python
+# Composed strategy combines two sub-strategies
+self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ AppScreenshotCaptureStrategy(),
+ AppControlInfoStrategy(),
+ ],
+ name="AppDataCollectionStrategy",
+ fail_fast=True,
+)
+```
+
+### Sub-Strategy 1: AppScreenshotCaptureStrategy
+
+**Purpose**: Capture application window and desktop screenshots.
+
+```python
+@depends_on("app_root", "log_path", "session_step")
+@provides(
+ "clean_screenshot_path",
+ "annotated_screenshot_path",
+ "desktop_screenshot_path",
+ "ui_tree_path",
+ "clean_screenshot_url",
+ "desktop_screenshot_url",
+ "application_window_info",
+ "screenshot_saved_time",
+)
+class AppScreenshotCaptureStrategy(BaseProcessingStrategy):
+ """Strategy for capturing application screenshots and desktop screenshots."""
+
+ async def execute(self, agent, context) -> ProcessingResult:
+ # 1. Capture application window screenshot
+ clean_screenshot_url = await self._capture_app_screenshot(
+ clean_screenshot_path, command_dispatcher
+ )
+
+ # 2. Capture desktop screenshot if needed
+ if ufo_config.system.save_full_screen:
+ desktop_screenshot_url = await self._capture_desktop_screenshot(
+ desktop_screenshot_path, command_dispatcher
+ )
+
+ # 3. Capture UI tree if needed
+ if ufo_config.system.save_ui_tree:
+ await self._capture_ui_tree(ui_tree_path, command_dispatcher)
+
+ # 4. Get application window information
+ application_window_info = await self._get_application_window_info(
+ command_dispatcher
+ )
+
+ return ProcessingResult(success=True, data={...})
+```
+
+**Execution Steps**:
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant CommandDispatcher
+ participant Application
+
+ Strategy->>CommandDispatcher: capture_window_screenshot()
+ CommandDispatcher->>Application: Screenshot app window
+ Application-->>Strategy: clean_screenshot_url
+ Strategy->>Strategy: Save to log_path/action_stepN.png
+
+ alt save_full_screen=True
+ Strategy->>CommandDispatcher: capture_desktop_screenshot(all_screens=True)
+ CommandDispatcher-->>Strategy: desktop_screenshot_url
+ Strategy->>Strategy: Save to log_path/desktop_stepN.png
+ end
+
+ alt save_ui_tree=True
+ Strategy->>CommandDispatcher: get_ui_tree()
+ CommandDispatcher-->>Strategy: ui_tree JSON
+ Strategy->>Strategy: Save to log_path/ui_trees/ui_tree_stepN.json
+ end
+
+ Strategy->>CommandDispatcher: get_app_window_info()
+ CommandDispatcher-->>Strategy: application_window_info
+```
+
+**Key Outputs**:
+
+| Output | Type | Description | Example |
+|--------|------|-------------|---------|
+| `clean_screenshot_url` | str | Base64 image of app window | `data:image/png;base64,iVBORw0K...` |
+| `clean_screenshot_path` | str | File path to screenshot | `logs/action_step5.png` |
+| `desktop_screenshot_url` | str | Base64 image of desktop | `data:image/png;base64,iVBORw0K...` |
+| `application_window_info` | TargetInfo | Window metadata (name, rect, type) | `TargetInfo(name="Word", rect=[0,0,1920,1080])` |
+| `screenshot_saved_time` | float | Performance timing (seconds) | `0.324` |
+
+### Sub-Strategy 2: AppControlInfoStrategy
+
+**Purpose**: Detect, filter, and annotate UI controls using UIA and/or OmniParser.
+
+```python
+@depends_on("clean_screenshot_path", "application_window_info")
+@provides(
+ "control_info",
+ "annotation_dict",
+ "control_filter_time",
+ "control_recorder",
+ "annotated_screenshot_path",
+ "annotated_screenshot_url",
+)
+class AppControlInfoStrategy(BaseProcessingStrategy):
+ """Strategy for collecting and filtering UI control information."""
+
+ def __init__(self, fail_fast: bool = True):
+ super().__init__(name="app_control_info", fail_fast=fail_fast)
+ self.control_detection_backend = ufo_config.system.control_backend
+ self.photographer = PhotographerFacade()
+
+ if "omniparser" in self.control_detection_backend:
+ self.grounding_service = OmniparserGrounding(...)
+```
+
+**Execution Steps**:
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant UIA
+ participant OmniParser
+ participant Photographer
+
+ alt UIA Backend Enabled
+ Strategy->>UIA: get_app_window_controls_target_info()
+ UIA-->>Strategy: api_control_list (50 controls)
+ end
+
+ alt OmniParser Backend Enabled
+ Strategy->>OmniParser: screen_parsing(screenshot)
+ OmniParser-->>Strategy: grounding_control_list (12 controls)
+ end
+
+ Strategy->>Strategy: Merge UIA + OmniParser lists (deduplicate by IoU overlap)
+ Strategy->>Strategy: Create annotation_dict {id: TargetInfo}
+
+ Strategy->>Photographer: capture_with_target_list() (draw labels [1], [2], [3]...)
+ Photographer-->>Strategy: annotated_screenshot_url
+```
+
+**Control Detection Backends**:
+
+**UIA (UI Automation):**
+
+```python
+async def _collect_uia_controls(self, command_dispatcher) -> List[TargetInfo]:
+ """Collect UIA controls from the application window."""
+ result = await command_dispatcher.execute_commands([
+ Command(
+ tool_name="get_app_window_controls_target_info",
+ parameters={"field_list": ["id", "name", "type", "rect", ...]},
+ )
+ ])
+
+ target_info_list = [TargetInfo(**control) for control in result[0].result]
+ return target_info_list
+```
+
+**Advantages**: Fast, accurate, native Windows controls
+**Limitations**: May miss custom controls, web content, icons
+
+**OmniParser (Visual):**
+
+```python
+async def _collect_grounding_controls(
+ self, clean_screenshot_path, application_window_info
+) -> List[TargetInfo]:
+ """Collect controls using grounding service."""
+ grounding_controls = self.grounding_service.screen_parsing(
+ clean_screenshot_path, application_window_info
+ )
+ return grounding_controls
+```
+
+**Advantages**: Detects visual elements (icons, images, custom controls)
+**Limitations**: Slower, requires external service
+
+**Hybrid (UIA + OmniParser):**
+
+```python
+def _collect_merged_control_list(
+ self, api_control_list, grounding_control_list
+) -> List[TargetInfo]:
+ """Merge UIA and grounding sources with IoU deduplication."""
+ merged_controls = self.photographer.merge_target_info_list(
+ api_control_list,
+ grounding_control_list,
+ iou_overlap_threshold=ufo_config.system.iou_threshold_for_merge,
+ )
+ return merged_controls
+```
+
+**Advantage**: Maximum coverage - native + visual elements
+
+**Annotation Process**:
+
+```python
+# Create annotation dictionary mapping IDs to controls
+annotation_dict = {
+ "1": TargetInfo(id="1", name="Export", type="Button", rect=[100, 200, 150, 230]),
+ "2": TargetInfo(id="2", name="Save", type="Button", rect=[160, 200, 210, 230]),
+ # ... more controls
+}
+
+# Draw labels on screenshot
+annotated_screenshot_url = self._save_annotated_screenshot(
+ application_window_info,
+ clean_screenshot_path,
+ merged_control_list,
+ annotated_screenshot_path,
+)
+```
+
+!!!example "Control Detection Example"
+ ```
+ UIA detects: 45 controls (buttons, textboxes, menus)
+ OmniParser detects: 12 visual elements (icons, images)
+ IoU deduplication removes: 3 overlapping controls
+ Final merged list: 54 annotated controls [1] to [54]
+ ```
+
+---
+
+## Phase 2: LLM_INTERACTION
+
+### Strategy: `AppLLMInteractionStrategy`
+
+**Purpose**: Build context-aware prompts with app-specific data and get LLM reasoning for next action.
+
+```python
+@provides(
+ "parsed_response",
+ "response_text",
+ "llm_cost",
+ "prompt_message",
+ "save_screenshot",
+ "comment",
+ "concat_screenshot_path",
+ "plan",
+ "observation",
+ "last_control_screenshot_path",
+ "action",
+ "thought",
+)
+class AppLLMInteractionStrategy(BaseProcessingStrategy):
+ """Strategy for LLM interaction with App Agent specific prompting."""
+
+ async def execute(self, agent, context) -> ProcessingResult:
+ # 1. Collect image strings (last step + current clean + annotated)
+ image_string_list = self._collect_image_strings(...)
+
+ # 2. Retrieve knowledge from RAG system
+ knowledge_retrieved = self._knowledge_retrieval(agent, subtask)
+
+ # 3. Build comprehensive prompt
+ prompt_message = await self._build_app_prompt(...)
+
+ # 4. Get LLM response with retry logic
+ response_text, llm_cost = await self._get_llm_response(agent, prompt_message)
+
+ # 5. Parse and validate response
+ parsed_response = self._parse_app_response(agent, response_text)
+
+ return ProcessingResult(success=True, data={...})
+```
+
+**Execution Flow**:
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Photographer
+ participant RAG
+ participant LLM
+
+ Strategy->>Photographer: Collect image strings
+ Note over Strategy: - Last step screenshot (selected control) - Clean screenshot - Annotated screenshot - Concatenated clean+annotated
+ Photographer-->>Strategy: image_string_list
+
+ Strategy->>RAG: Retrieve knowledge for subtask
+ Note over RAG: - Experience examples - Demonstration examples - Offline docs - Online search results
+ RAG-->>Strategy: knowledge_retrieved
+
+ Strategy->>Strategy: Build comprehensive prompt (images + controls + knowledge + history)
+
+ Strategy->>LLM: Get response with retry (max 3 attempts)
+ LLM-->>Strategy: response_text
+
+ Strategy->>Strategy: Parse JSON response to AppAgentResponse
+ Strategy-->>Strategy: Return parsed_response
+```
+
+**Prompt Construction**:
+
+```python
+async def _build_app_prompt(
+ self,
+ agent,
+ control_info, # List of detected controls
+ image_string_list, # Screenshots
+ knowledge_retrieved, # RAG results
+ request, # User request
+ subtask, # Current subtask
+ plan, # Previous plan
+ prev_subtask, # Previous subtasks
+ application_process_name,
+ host_message, # Message from HostAgent
+ session_step,
+ request_logger,
+) -> List[Dict]:
+ """Build comprehensive prompt for App Agent."""
+
+ # Get blackboard context
+ blackboard_prompt = agent.blackboard.blackboard_to_prompt()
+
+ # Get last successful actions
+ last_success_actions = self._get_last_success_actions(agent)
+
+ # Extract knowledge
+ retrieved_examples = (
+ knowledge_retrieved["experience_examples"] +
+ knowledge_retrieved["demonstration_examples"]
+ )
+ retrieved_knowledge = (
+ knowledge_retrieved["offline_docs"] +
+ knowledge_retrieved["online_docs"]
+ )
+
+ # Build prompt using agent's message constructor
+ prompt_message = agent.message_constructor(
+ dynamic_examples=retrieved_examples,
+ dynamic_knowledge=retrieved_knowledge,
+ image_list=image_string_list,
+ control_info=control_info,
+ prev_subtask=prev_subtask,
+ plan=plan,
+ request=request,
+ subtask=subtask,
+ current_application=application_process_name,
+ host_message=host_message,
+ blackboard_prompt=blackboard_prompt,
+ last_success_actions=last_success_actions,
+ )
+
+ return prompt_message
+```
+
+**LLM Response Parsing**:
+
+```python
+def _parse_app_response(self, agent, response_text: str) -> AppAgentResponse:
+ """Parse LLM response into structured AppAgentResponse."""
+ response_dict = agent.response_to_dict(response_text)
+ parsed_response = AppAgentResponse.model_validate(response_dict)
+ return parsed_response
+```
+
+**AppAgentResponse Schema**:
+
+```python
+{
+ "Observation": "Word document with Export button at label [12]",
+ "Thought": "I should click Export to extract table data",
+ "ControlLabel": "12",
+ "ControlText": "Export",
+ "Function": "click_input",
+ "Args": {"button": "left", "double": false},
+ "Status": "SCREENSHOT",
+ "Plan": ["Click Export", "Select CSV format", "Choose save location"],
+ "Comment": "Clicking Export will open a dialog",
+ "SaveScreenshot": {"save": false, "reason": ""}
+}
+```
+
+!!!tip "Retry Logic"
+ LLM interaction includes automatic retry (configurable, default 3 attempts) to handle transient failures or JSON parsing errors.
+
+---
+
+## Phase 3: ACTION_EXECUTION
+
+### Strategy: `AppActionExecutionStrategy`
+
+**Purpose**: Execute UI actions on selected controls based on LLM response.
+
+```python
+@depends_on("parsed_response", "log_path", "session_step")
+@provides(
+ "execution_result",
+ "action_info",
+ "control_log",
+ "status",
+ "selected_control_screenshot_path",
+)
+class AppActionExecutionStrategy(BaseProcessingStrategy):
+ """Strategy for executing App Agent actions."""
+
+ async def execute(self, agent, context) -> ProcessingResult:
+ # 1. Extract parsed response
+ parsed_response = context.get_local("parsed_response")
+
+ # 2. Execute the action via command dispatcher
+ execution_results = await self._execute_app_action(
+ command_dispatcher,
+ parsed_response.action
+ )
+
+ # 3. Create action info for memory
+ actions = self._create_action_info(
+ annotation_dict,
+ parsed_response.action,
+ execution_results,
+ )
+
+ # 4. Save annotated screenshot with selected control highlighted
+ self._save_annotated_screenshot(...)
+
+ return ProcessingResult(success=True, data={...})
+```
+
+**Execution Flow**:
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant CommandDispatcher
+ participant Application
+ participant Photographer
+
+ Strategy->>Strategy: Extract action from parsed_response
+ Note over Strategy: ControlLabel: "12" Function: "click_input" Args: {"button": "left"}
+
+ Strategy->>Strategy: Convert action to Command
+ Note over Strategy: Command(tool_name="click_input", parameters={"id": "12", "button": "left"})
+
+ Strategy->>CommandDispatcher: execute_commands([command])
+ CommandDispatcher->>Application: Perform UI automation
+ Application-->>CommandDispatcher: Result (status, message)
+ CommandDispatcher-->>Strategy: execution_results
+
+ Strategy->>Strategy: Create action_info (merge control, action, result)
+ Strategy->>Strategy: Print action to console
+
+ Strategy->>Photographer: Save screenshot with selected control
+ Photographer-->>Strategy: selected_control_screenshot_path
+```
+
+**Action to Command Conversion**:
+
+```python
+def _action_to_command(self, action: ActionCommandInfo) -> Command:
+ """Convert ActionCommandInfo to Command for execution."""
+ return Command(
+ tool_name=action.function, # e.g., "click_input"
+ parameters=action.arguments or {}, # e.g., {"id": "12", "button": "left"}
+ tool_type="action",
+ )
+```
+
+**Action Info Creation**:
+
+```python
+def _create_action_info(
+ self,
+ annotation_dict,
+ actions,
+ execution_results,
+) -> List[ActionCommandInfo]:
+ """Create action information for memory tracking."""
+
+ # Handle single or multiple actions
+ if isinstance(actions, ActionCommandInfo):
+ actions = [actions]
+
+ # Merge control info with action results
+ for i, action in enumerate(actions):
+ if action.arguments and "id" in action.arguments:
+ control_id = action.arguments["id"]
+ target_control = annotation_dict.get(control_id)
+ action.target = target_control # Link to TargetInfo
+
+ action.result = execution_results[i] # Link to execution result
+
+ return actions
+```
+
+**Example Action Execution**:
+
+```
+Input: ControlLabel="12", Function="click_input", Args={"button": "left"}
+↓
+Command: Command(tool_name="click_input", parameters={"id": "12", "button": "left"})
+↓
+Execution: Click control [12] (Export button) with left mouse button
+↓
+Result: ResultStatus.SUCCESS, message="Clicked control successfully"
+↓
+Action Info: ActionCommandInfo(
+ function="click_input",
+ target=TargetInfo(name="Export", type="Button"),
+ result=Result(status=SUCCESS),
+ action_string="click_input on [12]Export"
+)
+```
+
+!!!warning "Error Handling"
+ Action execution uses `fail_fast=False`, allowing graceful handling of failures. Failed actions are logged but don't halt the pipeline.
+
+---
+
+## Phase 4: MEMORY_UPDATE
+
+### Strategy: `AppMemoryUpdateStrategy`
+
+**Purpose**: Record execution history in agent memory and update shared Blackboard.
+
+```python
+@depends_on("session_step", "parsed_response")
+@provides("additional_memory", "memory_item", "updated_blackboard")
+class AppMemoryUpdateStrategy(BaseProcessingStrategy):
+ """Strategy for updating App Agent memory and blackboard."""
+
+ async def execute(self, agent, context) -> ProcessingResult:
+ # 1. Create additional memory data
+ additional_memory = self._create_additional_memory_data(agent, context)
+
+ # 2. Create and populate memory item
+ memory_item = self._create_and_populate_memory_item(
+ parsed_response,
+ additional_memory
+ )
+
+ # 3. Add memory to agent
+ agent.add_memory(memory_item)
+
+ # 4. Update blackboard
+ self._update_blackboard(agent, save_screenshot, ...)
+
+ # 5. Update structural logs
+ self._update_structural_logs(context, memory_item)
+
+ return ProcessingResult(success=True, data={...})
+```
+
+**Execution Flow**:
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Memory
+ participant Blackboard
+ participant Logs
+
+ Strategy->>Strategy: Create additional_memory (step, cost, actions, results)
+ Strategy->>Strategy: Create memory_item (merge response + additional data)
+
+ Strategy->>Memory: agent.add_memory(memory_item)
+ Memory-->>Strategy: Memory updated
+
+ alt save_screenshot=True
+ Strategy->>Blackboard: add_image(screenshot, metadata)
+ Blackboard-->>Strategy: Image saved
+ end
+
+ Strategy->>Blackboard: add_trajectories(memorized_action)
+ Blackboard-->>Strategy: Trajectories updated
+
+ Strategy->>Logs: Update structural logs
+ Logs-->>Strategy: Logs updated
+```
+
+**Memory Item Creation**:
+
+```python
+def _create_and_populate_memory_item(
+ self,
+ parsed_response: AppAgentResponse,
+ additional_memory: AppAgentProcessorContext,
+) -> MemoryItem:
+ """Create and populate memory item."""
+ memory_item = MemoryItem()
+
+ # Add LLM response data
+ if parsed_response:
+ memory_item.add_values_from_dict(parsed_response.model_dump())
+
+ # Add additional context data
+ memory_item.add_values_from_dict(additional_memory.to_dict(selective=True))
+
+ return memory_item
+```
+
+**Additional Memory Data**:
+
+```python
+def _create_additional_memory_data(self, agent, context):
+ """Create additional memory data for App Agent."""
+ app_context = AppAgentProcessorContext()
+
+ # Action information
+ action_info = context.get("action_info")
+ if action_info:
+ app_context.function_call = action_info.get_function_calls()
+ app_context.action = action_info.to_list_of_dicts()
+ app_context.action_success = action_info.to_list_of_dicts(success_only=True)
+ app_context.action_type = [action.result.namespace for action in action_info.actions]
+ app_context.action_representation = action_info.to_representation()
+
+ # Step information
+ app_context.session_step = context.get_global("SESSION_STEP", 0)
+ app_context.round_step = context.get_global("CURRENT_ROUND_STEP", 0)
+ app_context.round_num = context.get_global("CURRENT_ROUND_ID", 0)
+ app_context.agent_step = agent.step
+
+ # Task information
+ app_context.subtask = context.get("subtask", "")
+ app_context.request = context.get("request", "")
+ app_context.app_root = context.get("app_root", "")
+
+ # Cost and results
+ app_context.cost = context.get("llm_cost", 0.0)
+ app_context.results = context.get("execution_result", [])
+
+ return app_context
+```
+
+**Blackboard Update**:
+
+```python
+def _update_blackboard(
+ self,
+ agent,
+ save_screenshot,
+ save_reason,
+ screenshot_path,
+ memory_item,
+ application_process_name,
+):
+ """Update agent blackboard with screenshots and actions."""
+
+ # Add action trajectories
+ history_keys = ufo_config.system.history_keys
+ if history_keys:
+ memory_dict = memory_item.to_dict()
+ memorized_action = {
+ key: memory_dict.get(key)
+ for key in history_keys
+ if key in memory_dict
+ }
+ if memorized_action:
+ agent.blackboard.add_trajectories(memorized_action)
+
+ # Add screenshot if requested
+ if save_screenshot:
+ metadata = {
+ "screenshot application": application_process_name,
+ "saving reason": save_reason,
+ }
+ agent.blackboard.add_image(screenshot_path, metadata)
+```
+
+**Memory Item Example**:
+
+```python
+{
+ "observation": "Word document with Export button at [12]",
+ "thought": "Click Export to extract table",
+ "control_label": "12",
+ "function_call": ["click_input"],
+ "action": [{"function": "click_input", "target": {...}, "result": {...}}],
+ "action_success": [{"action_string": "click_input on [12]Export", ...}],
+ "status": "SCREENSHOT",
+ "plan": ["Click Export", "Select CSV", "Save file"],
+ "cost": 0.0023,
+ "session_step": 5,
+ "round_step": 2,
+ "subtask": "Extract table from Word document",
+}
+```
+
+!!!info "Selective Memory"
+ The `history_keys` configuration controls which fields are added to Blackboard trajectories. This prevents information overload while maintaining essential context for cross-agent communication.
+
+---
+
+## Complete Execution Example
+
+### Single Action Cycle
+
+```mermaid
+sequenceDiagram
+ participant AppAgent
+ participant DC as DATA_COLLECTION
+ participant LLM as LLM_INTERACTION
+ participant AE as ACTION_EXECUTION
+ participant MU as MEMORY_UPDATE
+ participant Application
+
+ rect rgb(230, 240, 255)
+ Note over AppAgent, DC: Phase 1: Data Collection
+ AppAgent->>DC: Start processing
+ DC->>Application: capture_window_screenshot()
+ Application-->>DC: clean_screenshot_url
+ DC->>Application: get_app_window_controls_target_info()
+ Application-->>DC: 50 controls detected
+ DC->>DC: Annotate screenshot [1] to [50]
+ DC-->>AppAgent: Screenshots + Controls ready
+ end
+
+ rect rgb(255, 250, 230)
+ Note over AppAgent, LLM: Phase 2: LLM Interaction
+ AppAgent->>LLM: Process with controls + images
+ LLM->>LLM: Build prompt (RAG + history)
+ LLM->>LLM: Get LLM response
+ LLM->>LLM: Parse JSON response
+ LLM-->>AppAgent: Action: click_input([12], left)
+ end
+
+ rect rgb(230, 255, 240)
+ Note over AppAgent, AE: Phase 3: Action Execution
+ AppAgent->>AE: Execute action
+ AE->>Application: click_input(id="12")
+ Application-->>AE: SUCCESS: Clicked Export button
+ AE->>AE: Create action_info
+ AE-->>AppAgent: Action completed
+ end
+
+ rect rgb(255, 240, 245)
+ Note over AppAgent, MU: Phase 4: Memory Update
+ AppAgent->>MU: Update memory
+ MU->>MU: Create memory_item
+ MU->>MU: Add to agent.memory
+ MU->>MU: Update blackboard
+ MU-->>AppAgent: Memory updated
+ end
+```
+
+---
+
+## Error Handling
+
+### Fail-Fast vs Graceful
+
+```python
+# DATA_COLLECTION: fail_fast=True
+# Critical failure stops pipeline immediately
+try:
+ result = await screenshot_strategy.execute(agent, context)
+except Exception as e:
+ # Propagate immediately - cannot proceed without screenshots
+ raise ProcessingError(f"Data collection failed: {e}")
+
+# ACTION_EXECUTION: fail_fast=False
+# Failures are logged but don't stop pipeline
+try:
+ result = await action_strategy.execute(agent, context)
+except Exception as e:
+ # Log error, return partial result, continue to memory phase
+ logger.error(f"Action execution failed: {e}")
+ return ProcessingResult(success=False, error=str(e), data={})
+```
+
+### Retry Mechanisms
+
+**LLM Interaction Retry**:
+
+```python
+async def _get_llm_response(self, agent, prompt_message):
+ """Get response from LLM with retry logic."""
+ max_retries = ufo_config.system.json_parsing_retry # Default: 3
+
+ for retry_count in range(max_retries):
+ try:
+ # Run LLM call in thread executor to avoid blocking
+ loop = asyncio.get_event_loop()
+ response_text, cost = await loop.run_in_executor(
+ None,
+ agent.get_response,
+ prompt_message,
+ AgentType.APP,
+ True, # use_backup_engine
+ )
+
+ # Validate response can be parsed
+ agent.response_to_dict(response_text)
+ return response_text, cost
+
+ except Exception as e:
+ if retry_count < max_retries - 1:
+ logger.warning(f"LLM retry {retry_count + 1}/{max_retries}: {e}")
+ else:
+ raise
+```
+
+---
+
+## Performance Optimization
+
+### Composed Strategy Benefits
+
+```python
+# Sequential execution with shared context
+self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy(
+ strategies=[
+ AppScreenshotCaptureStrategy(), # Provides: screenshots, window_info
+ AppControlInfoStrategy(), # Depends on: screenshots, window_info
+ ],
+ name="AppDataCollectionStrategy",
+ fail_fast=True,
+)
+```
+
+**Benefits**:
+
+- **Context Sharing**: Screenshot output immediately available to Control Info strategy
+- **Atomic Failure**: If screenshot fails, control detection is skipped
+- **Performance**: Avoids redundant window queries
+
+### Dependency Injection
+
+```python
+@depends_on("clean_screenshot_path", "application_window_info")
+@provides("control_info", "annotation_dict", "annotated_screenshot_url")
+class AppControlInfoStrategy(BaseProcessingStrategy):
+ # Automatically receives dependencies from previous strategies
+ pass
+```
+
+**Benefits**:
+
+- Type-safe dependency declaration
+- Automatic data flow between strategies
+- Easy to add new strategies without refactoring
+
+---
+
+## Related Documentation
+
+**Architecture:**
+
+- **[AppAgent Overview](overview.md)**: High-level architecture and responsibilities
+- **[State Machine](state.md)**: State machine that invokes this pipeline
+- **[Command System](commands.md)**: MCP command details
+- **[HostAgent Processing Strategy](../host_agent/strategy.md)**: Parent agent pipeline
+
+**Core Features:**
+
+- **[Hybrid Actions](../core_features/hybrid_actions.md)**: MCP command system
+- **[Control Detection](../core_features/control_detection/overview.md)**: UIA + OmniParser backends
+- **[Knowledge Substrate](../core_features/knowledge_substrate/overview.md)**: RAG system integration
+
+**Design Patterns:**
+
+- **[Processor Framework](../../infrastructure/agents/design/processor.md)**: ProcessorTemplate architecture
+- **[Strategy Pattern](../../infrastructure/agents/design/processor.md)**: BaseProcessingStrategy design
+
+---
+
+## Summary
+
+**AppAgent Processing Pipeline Key Features:**
+
+✅ **4-Phase Pipeline**: DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE
+✅ **Composed Strategy**: Phase 1 combines Screenshot + Control Info strategies
+✅ **Multi-Backend Control Detection**: UIA + OmniParser with hybrid merging
+✅ **Knowledge-Enhanced Prompting**: RAG integration from docs, demos, and search
+✅ **Retry Logic**: Automatic LLM retry with configurable attempts
+✅ **Memory & Blackboard**: Comprehensive execution tracking and inter-agent communication
+✅ **Graceful Error Handling**: fail_fast configuration per phase
+
+**Next Steps:**
+
+1. **Study Commands**: Read [Command System](commands.md) for MCP command details
+2. **Explore States**: Review [State Machine](state.md) for FSM that invokes pipeline
+3. **Learn Patterns**: Check [Processor Framework](../../infrastructure/agents/design/processor.md) for architecture details
diff --git a/documents/docs/ufo2/as_galaxy_device.md b/documents/docs/ufo2/as_galaxy_device.md
new file mode 100644
index 000000000..b5289c4dd
--- /dev/null
+++ b/documents/docs/ufo2/as_galaxy_device.md
@@ -0,0 +1,1034 @@
+# UFO² as UFO³ Galaxy Device
+
+Integrate **UFO² (Windows Desktop Automation Agent)** into the **UFO³ Galaxy framework** as a managed sub-agent device. This enables Galaxy to orchestrate complex cross-platform workflows combining Windows desktop automation with Linux server operations and other heterogeneous devices.
+
+## Overview
+
+UFO² can function as a **device agent** within the UFO³ Galaxy multi-tier orchestration framework. When configured as a Galaxy device, UFO² operates in **server-client mode**, allowing the Galaxy ConstellationAgent to:
+
+- Dispatch Windows automation subtasks to UFO² devices
+- Coordinate cross-platform workflows (Windows desktop + Linux servers)
+- Leverage UFO²'s HostAgent and AppAgent capabilities at scale
+- Manage multiple Windows devices from a unified control plane
+- Dynamically select devices based on capabilities and installed applications
+
+UFO² integration follows the **server-client architecture** pattern where the UFO² Server manages task orchestration and state machines, the UFO² Client executes Windows automation commands via MCP tools, and the Galaxy ConstellationAgent acts as the top-level orchestrator. Communication is enabled through the Agent Interaction Protocol (AIP). For detailed architecture information, see [Server-Client Architecture](../infrastructure/agents/server_client_architecture.md).
+
+## Galaxy Integration Architecture
+
+```mermaid
+graph TB
+ User[User Request]
+ Galaxy[Galaxy ConstellationAgent Top-Level Orchestrator]
+
+ subgraph "Device Pool"
+ subgraph "Windows Devices (UFO²)"
+ Win1[UFO² Device 1 Office Desktop]
+ Win2[UFO² Device 2 Dev Workstation]
+ Win3[UFO² Device 3 Test Machine]
+ end
+
+ subgraph "Linux Devices"
+ Linux1[Linux Agent 1 Web Server]
+ Linux2[Linux Agent 2 Database Server]
+ end
+
+ subgraph "Other Devices"
+ Mobile1[Mobile Device]
+ Cloud1[Cloud Service]
+ end
+ end
+
+ User -->|Complex Cross-Platform Task| Galaxy
+
+ Galaxy -->|Windows Automation Subtask| Win1
+ Galaxy -->|Desktop Application Task| Win2
+ Galaxy -->|Testing Task| Win3
+
+ Galaxy -->|Server Management Task| Linux1
+ Galaxy -->|Database Query Task| Linux2
+
+ Galaxy -->|Mobile Automation| Mobile1
+ Galaxy -->|API Integration| Cloud1
+
+ style Galaxy fill:#ffe1e1
+ style Win1 fill:#e1f5ff
+ style Win2 fill:#e1f5ff
+ style Win3 fill:#e1f5ff
+ style Linux1 fill:#f0ffe1
+ style Linux2 fill:#f0ffe1
+```
+
+**Example Multi-Device Workflow:**
+
+> **User Request:** "Generate a sales report from the database, create an Excel dashboard, and email it to the team"
+
+**Galaxy orchestrates:**
+
+1. **Linux DB Server**: Extract sales data from PostgreSQL → CSV export
+2. **UFO² Desktop**: Open Excel, import CSV, create visualizations and pivot tables
+3. **UFO² Desktop**: Open Outlook, compose email with Excel attachment
+4. **UFO² Desktop**: Send email to distribution list
+
+## Prerequisites
+
+Before configuring UFO² as a Galaxy device, ensure you have:
+
+| Component | Requirement | Verification |
+|-----------|-------------|--------------|
+| **UFO Repository** | Cloned and up-to-date | `git pull origin main` |
+| **Python** | 3.10+ installed | `python --version` |
+| **Dependencies** | All packages installed | `pip install -r requirements.txt` |
+| **LLM Configuration** | API keys configured | Check `config/ufo/agents.yaml` |
+| **Network** | Server-client connectivity | `ping ` |
+| **Windows Machine** | UFO² will run here | Windows 10/11 |
+
+### Configure Agent Configuration
+
+**Before proceeding with Galaxy integration**, you must configure your agent settings in `config/ufo/agents.yaml`:
+
+1. Copy the template file:
+ ```powershell
+ Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
+ ```
+
+2. Configure your LLM provider (OpenAI, Azure OpenAI, etc.) and add API keys
+
+Without proper agent configuration, UFO² cannot function as a Galaxy device. See [Agents Configuration Guide](../configuration/system/agents_config.md) for detailed setup instructions.
+
+## Server-Client Mode Setup
+
+UFO² **must** operate in **server-client mode** when integrated into Galaxy. This architecture separates orchestration (server) from execution (client), enabling Galaxy to manage multiple UFO² devices efficiently. Unlike standalone UFO² usage (local mode), Galaxy integration requires running UFO² in distributed server-client mode to ensure Galaxy can communicate with UFO² via Agent Interaction Protocol (AIP), multiple UFO² clients can be managed by a single server, task state is managed server-side for reliability, and clients remain stateless execution endpoints.
+
+## Step 1: Start UFO² Server
+
+The **UFO² Server** handles task orchestration, state management, and LLM-driven decision-making. It communicates with Galaxy and dispatches commands to UFO² clients.
+
+### Basic Server Startup
+
+Launch UFO² Server on the machine that will host the server (can be any Windows/Linux machine):
+
+```powershell
+python -m ufo.server.app --port 5000
+```
+
+**Expected Output:**
+
+```console
+2025-11-06 10:30:22 - ufo.server.app - INFO - Starting UFO Server on 0.0.0.0:5000
+INFO: Started server process [12345]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
+```
+
+Once you see "Uvicorn running", the server is ready at `ws://0.0.0.0:5000/ws`.
+
+### Server Configuration Options
+
+| Argument | Default | Description | Example |
+|----------|---------|-------------|---------|
+| `--port` | `5000` | Server listening port | `--port 5000` |
+| `--host` | `0.0.0.0` | Bind address (0.0.0.0 = all interfaces) | `--host 192.168.1.100` |
+| `--log-level` | `WARNING` | Logging verbosity | `--log-level DEBUG` |
+| `--local` | `False` | Run server in local mode | `--local` |
+
+**Examples:**
+
+Specific port:
+```powershell
+python -m ufo.server.app --port 5000
+```
+
+Specific IP binding:
+```powershell
+python -m ufo.server.app --host 192.168.1.100 --port 5000
+```
+
+Debug mode:
+```powershell
+python -m ufo.server.app --port 5000 --log-level DEBUG
+```
+
+### Verify Server Health
+
+```powershell
+# Test server health endpoint
+curl http://localhost:5000/api/health
+```
+
+**Expected Response:**
+
+```json
+{
+ "status": "healthy",
+ "online_clients": []
+}
+```
+
+## Step 2: Start UFO² Client (Windows Machine)
+
+The **UFO² Client** runs on the Windows machine where you want to perform desktop automation. It connects to the UFO² server via WebSocket and executes automation commands through MCP tools.
+
+### Basic Client Startup
+
+Connect UFO² Client to Server on the **Windows machine** where you want to run desktop automation:
+
+```powershell
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://192.168.1.100:5000/ws `
+ --client-id ufo2_desktop_1 `
+ --platform windows
+```
+
+**Note:** In PowerShell, use backtick `` ` `` for line continuation. In Command Prompt, use `^`.
+
+### Client Parameters Explained
+
+| Parameter | Required | Description | Example |
+|-----------|----------|-------------|---------|
+| `--ws` | ✅ Yes | Enable WebSocket mode | `--ws` |
+| `--ws-server` | ✅ Yes | Server WebSocket URL | `ws://192.168.1.100:5000/ws` |
+| `--client-id` | ✅ Yes | **Unique** device identifier | `ufo2_desktop_1` |
+| `--platform` | ✅ Yes | Platform type (must be `windows` for UFO²) | `--platform windows` |
+
+**Important:**
+- `--client-id` must be globally unique - No two devices can share the same ID
+- `--platform windows` is mandatory - Without this flag, UFO² won't work correctly
+- Server address must be correct - Replace `192.168.1.100:5000` with your actual server IP and port
+
+### Understanding the WebSocket URL
+
+The `--ws-server` parameter format is:
+
+```
+ws://:/ws
+```
+
+Examples:
+
+| Scenario | WebSocket URL | Description |
+|----------|---------------|-------------|
+| **Localhost** | `ws://localhost:5000/ws` | Server and client on same machine |
+| **Same Network** | `ws://192.168.1.100:5000/ws` | Server on local network |
+| **Remote Server** | `ws://203.0.113.50:5000/ws` | Server on internet (public IP) |
+
+### Connection Success Indicators
+
+**Client Logs:**
+
+```log
+INFO - Platform detected/specified: windows
+INFO - UFO Client initialized for platform: windows
+INFO - [WS] Connecting to ws://192.168.1.100:5000/ws (attempt 1/5)
+INFO - [WS] [AIP] Successfully registered as ufo2_desktop_1
+INFO - [WS] Heartbeat loop started (interval: 30s)
+```
+
+**Server Logs:**
+
+```log
+INFO - [WS] ✅ Registered device client: ufo2_desktop_1
+INFO - [WS] Device ufo2_desktop_1 platform: windows
+```
+
+When you see "Successfully registered", the UFO² client is connected and ready to receive tasks.
+
+### Verify Connection
+
+```powershell
+# Check connected clients on server
+curl http://192.168.1.100:5000/api/clients
+```
+
+**Expected Response:**
+
+```json
+{
+ "clients": [
+ {
+ "client_id": "ufo2_desktop_1",
+ "type": "device",
+ "platform": "windows",
+ "connected_at": 1730899822.0,
+ "uptime_seconds": 45
+ }
+ ]
+}
+```
+
+## Step 3: Configure MCP Services
+
+UFO² relies on **MCP (Model Context Protocol) servers** to provide Windows automation capabilities. Unlike Linux agents that may require separate HTTP MCP servers, UFO² MCP servers are primarily **local** and start automatically with the client.
+
+UFO² uses **local MCP servers** that run in-process with the client:
+
+- **UI Automation MCP**: Click, type, screenshot, control detection
+- **File Operations MCP**: Read, write, copy, delete files
+- **Application Control MCP**: Launch apps, switch windows, close processes
+
+These are **automatically initialized** when the UFO² client starts.
+
+### Default MCP Configuration
+
+By default, UFO² client automatically starts all necessary **local MCP servers**. No additional configuration is required for standard Windows automation.
+
+When you start the UFO² client, it automatically initializes UI automation tools, registers file operation handlers, configures application control interfaces, and sets up screenshot and OCR capabilities.
+
+### Optional: HTTP MCP Server (Advanced)
+
+For specialized scenarios requiring **remote MCP access** (e.g., hardware automation via external tools), you can optionally start HTTP-based MCP servers. However, note that there is no `windows_mcp_server.py` in the codebase. Available HTTP MCP servers are:
+
+- `hardware_mcp_server.py` - For hardware-level operations
+- `linux_mcp_server.py` - For Linux-specific operations
+
+Start an HTTP MCP server if needed:
+
+```powershell
+python -m ufo.client.mcp.http_servers.hardware_mcp_server
+```
+
+**Note:** For standard Galaxy integration with UFO², local MCP servers are sufficient and HTTP MCP servers are not required.
+
+## Step 4: Configure as Galaxy Device
+
+To integrate UFO² into the Galaxy framework, register it in the Galaxy device configuration file.
+
+### Device Configuration File
+
+The Galaxy device pool is configured in `config/galaxy/devices.yaml`.
+
+### Add UFO² Device Configuration
+
+Edit `config/galaxy/devices.yaml` and add your UFO² device(s) under the `devices` section:
+
+```yaml
+devices:
+ - device_id: "ufo2_desktop_1"
+ server_url: "ws://192.168.1.100:5000/ws"
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "web_browsing"
+ - "email"
+ - "file_management"
+ metadata:
+ os: "windows"
+ version: "11"
+ performance: "high"
+ installed_apps:
+ - "Microsoft Excel"
+ - "Microsoft Word"
+ - "Microsoft PowerPoint"
+ - "Microsoft Outlook"
+ - "Google Chrome"
+ - "Adobe Acrobat"
+ description: "Primary office workstation for document automation"
+ auto_connect: true
+ max_retries: 5
+```
+
+### Configuration Fields Explained
+
+| Field | Required | Type | Description | Example |
+|-------|----------|------|-------------|---------|
+| `device_id` | ✅ Yes | string | **Must match client `--client-id`** | `"ufo2_desktop_1"` |
+| `server_url` | ✅ Yes | string | **Must match server WebSocket URL** | `"ws://192.168.1.100:5000/ws"` |
+| `os` | ✅ Yes | string | Operating system | `"windows"` |
+| `capabilities` | ❌ Optional | list | Device capabilities (for task routing) | `["desktop_automation", "office"]` |
+| `metadata` | ❌ Optional | dict | Custom metadata for task context | See below |
+| `auto_connect` | ❌ Optional | boolean | Auto-connect on Galaxy startup | `true` |
+| `max_retries` | ❌ Optional | integer | Connection retry attempts | `5` |
+
+### Capabilities-Based Task Routing
+
+Galaxy uses the `capabilities` field to intelligently route subtasks to appropriate UFO² devices. Define capabilities based on application categories (e.g., `"office_applications"`, `"web_browsing"`), task types (e.g., `"desktop_automation"`, `"data_entry"`), specific software (e.g., `"excel"`, `"outlook"`), and user workflows (e.g., `"email"`, `"reporting"`).
+
+**Example capability configurations:**
+
+**Office Workstation:**
+```yaml
+capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel"
+ - "word"
+ - "powerpoint"
+ - "outlook"
+ - "email"
+ - "reporting"
+```
+
+**Web Development Machine:**
+```yaml
+capabilities:
+ - "desktop_automation"
+ - "web_browsing"
+ - "chrome"
+ - "visual_studio_code"
+ - "git"
+ - "development"
+```
+
+**Testing Workstation:**
+```yaml
+capabilities:
+ - "desktop_automation"
+ - "ui_testing"
+ - "web_browsing"
+ - "screenshot_comparison"
+ - "quality_assurance"
+```
+
+**Media Production:**
+```yaml
+capabilities:
+ - "desktop_automation"
+ - "media_editing"
+ - "photoshop"
+ - "premiere"
+ - "video_processing"
+ - "image_manipulation"
+```
+
+The `metadata` field provides **contextual information** that the LLM can use when generating automation commands.
+
+**Metadata Examples:**
+
+**Office Workstation Metadata:**
+```yaml
+metadata:
+ os: "windows"
+ version: "11"
+ performance: "high"
+ installed_apps:
+ - "Microsoft Excel"
+ - "Microsoft Word"
+ - "Microsoft Outlook"
+ - "Adobe Acrobat Reader"
+ default_paths:
+ documents: "C:\\Users\\user\\Documents"
+ downloads: "C:\\Users\\user\\Downloads"
+ desktop: "C:\\Users\\user\\Desktop"
+ email_account: "user@company.com"
+ description: "Primary office workstation"
+```
+
+**Development Workstation Metadata:**
+```yaml
+metadata:
+ os: "windows"
+ version: "11"
+ performance: "high"
+ installed_apps:
+ - "Visual Studio Code"
+ - "Google Chrome"
+ - "Git"
+ - "Node.js"
+ - "Python"
+ default_paths:
+ projects: "C:\\Users\\dev\\Projects"
+ repos: "C:\\Users\\dev\\Repos"
+ git_username: "developer"
+ description: "Development environment"
+```
+
+**Testing Workstation Metadata:**
+```yaml
+metadata:
+ os: "windows"
+ version: "10"
+ performance: "medium"
+ installed_apps:
+ - "Google Chrome"
+ - "Microsoft Edge"
+ - "Firefox"
+ - "Selenium"
+ test_data_path: "C:\\TestData"
+ screenshot_path: "C:\\Screenshots"
+ description: "Automated testing environment"
+```
+
+**How Metadata is Used:**
+
+The LLM receives metadata in the system prompt, enabling context-aware automation:
+
+```
+System Context:
+- Device: ufo2_desktop_1
+- OS: Windows 11
+- Installed Apps: Microsoft Excel, Microsoft Word, Microsoft Outlook
+- Documents Path: C:\Users\user\Documents
+
+User Request: "Create a new Excel spreadsheet and save it as Q4_Report.xlsx"
+
+UFO² Output:
+1. Launch Microsoft Excel
+2. Create new workbook
+3. Save as C:\Users\user\Documents\Q4_Report.xlsx
+```
+
+## Step 5: Multiple UFO² Devices Configuration
+
+Galaxy can manage **multiple UFO² devices** simultaneously, enabling parallel Windows automation across different machines.
+
+**Multi-Device Galaxy Configuration Example:**
+
+```yaml
+devices:
+ # UFO² Office Desktop 1
+ - device_id: "ufo2_office_1"
+ server_url: "ws://192.168.1.100:5000/ws"
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel"
+ - "word"
+ - "outlook"
+ - "email"
+ metadata:
+ os: "windows"
+ version: "11"
+ installed_apps: ["Microsoft Excel", "Microsoft Word", "Microsoft Outlook"]
+ description: "Primary office desktop"
+ auto_connect: true
+ max_retries: 5
+
+ # UFO² Office Desktop 2
+ - device_id: "ufo2_office_2"
+ server_url: "ws://192.168.1.101:5001/ws"
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel"
+ - "powerpoint"
+ - "web_browsing"
+ metadata:
+ os: "windows"
+ version: "11"
+ installed_apps: ["Microsoft Excel", "Microsoft PowerPoint", "Google Chrome"]
+ description: "Secondary office desktop"
+ auto_connect: true
+ max_retries: 5
+
+ # UFO² Development Workstation
+ - device_id: "ufo2_dev_1"
+ server_url: "ws://192.168.1.102:5002/ws"
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "development"
+ - "web_browsing"
+ - "code_editing"
+ metadata:
+ os: "windows"
+ version: "11"
+ installed_apps: ["Visual Studio Code", "Google Chrome", "Git"]
+ description: "Development workstation"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Database Server (for cross-platform workflows)
+ - device_id: "linux_db_server"
+ server_url: "ws://192.168.1.200:5010/ws"
+ os: "linux"
+ capabilities:
+ - "database_server"
+ - "postgresql"
+ - "data_export"
+ metadata:
+ os: "linux"
+ logs_file_path: "/var/log/postgresql/postgresql.log"
+ description: "Production database server"
+ auto_connect: true
+ max_retries: 5
+```
+
+## Step 6: Launch Galaxy with UFO² Devices
+
+Once all components are configured, launch Galaxy to begin orchestrating multi-device workflows.
+
+### Prerequisites Checklist
+
+Ensure all components are running **before** starting Galaxy:
+
+1. ✅ **UFO² Server(s)** running on configured ports
+2. ✅ **UFO² Client(s)** connected to their respective servers
+3. ✅ **MCP Services** initialized (automatic with UFO² client)
+4. ✅ **LLM configured** in `config/ufo/agents.yaml`
+5. ✅ **Network connectivity** between all components
+
+### Launch Sequence
+
+**Step 1: Start all UFO² Servers**
+
+```powershell
+# On first Windows machine (192.168.1.100)
+python -m ufo.server.app --port 5000
+
+# On second Windows machine (192.168.1.101)
+python -m ufo.server.app --port 5001
+
+# On third Windows machine (192.168.1.102)
+python -m ufo.server.app --port 5002
+```
+
+**Step 2: Start all UFO² Clients**
+
+```powershell
+# On first Windows desktop
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://192.168.1.100:5000/ws `
+ --client-id ufo2_office_1 `
+ --platform windows
+
+# On second Windows desktop
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://192.168.1.101:5001/ws `
+ --client-id ufo2_office_2 `
+ --platform windows
+
+# On development workstation
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://192.168.1.102:5002/ws `
+ --client-id ufo2_dev_1 `
+ --platform windows
+```
+
+**Step 3: Launch Galaxy**
+
+```powershell
+# On your control machine (interactive mode)
+python -m galaxy --interactive
+```
+
+**Or launch with a specific request:**
+
+```powershell
+python -m galaxy "Your task description here"
+```
+
+Galaxy will automatically connect to all configured UFO² devices (based on `config/galaxy/devices.yaml`) and display the orchestration interface.
+
+## Example Multi-Device Workflows
+
+### Workflow 1: Cross-Platform Report Generation
+
+**User Request:**
+> "Generate a weekly sales report: extract data from PostgreSQL, create Excel dashboard, and email to management"
+
+**Galaxy Orchestration:**
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Galaxy
+ participant LinuxDB as Linux DB Server
+ participant UFO2 as UFO² Desktop
+
+ User->>Galaxy: Request sales report
+ Galaxy->>Galaxy: Decompose task
+
+ Note over Galaxy,LinuxDB: Subtask 1: Extract data
+ Galaxy->>LinuxDB: "Export sales data from PostgreSQL to CSV"
+ LinuxDB->>LinuxDB: Execute SQL query
+ LinuxDB->>LinuxDB: Generate CSV file
+ LinuxDB-->>Galaxy: CSV file location
+
+ Note over Galaxy,UFO2: Subtask 2: Create Excel report
+ Galaxy->>UFO2: "Create Excel dashboard from CSV"
+ UFO2->>UFO2: Open Excel
+ UFO2->>UFO2: Import CSV data
+ UFO2->>UFO2: Create pivot tables
+ UFO2->>UFO2: Add charts and formatting
+ UFO2-->>Galaxy: Excel file created
+
+ Note over Galaxy,UFO2: Subtask 3: Send email
+ Galaxy->>UFO2: "Email report to management"
+ UFO2->>UFO2: Open Outlook
+ UFO2->>UFO2: Compose email with attachment
+ UFO2->>UFO2: Send email
+ UFO2-->>Galaxy: Email sent
+
+ Galaxy-->>User: Task completed
+```
+
+### Workflow 2: Parallel Document Processing
+
+**User Request:**
+> "Process all invoices in the shared folder: convert PDFs to Excel, categorize by vendor, and summarize totals"
+
+**Galaxy Orchestration:**
+
+1. **UFO² Desktop 1**: Process invoices A-M (parallel batch 1)
+2. **UFO² Desktop 2**: Process invoices N-Z (parallel batch 2)
+3. **UFO² Desktop 1**: Consolidate results into master Excel file
+4. **UFO² Desktop 1**: Generate summary report
+5. **UFO² Desktop 1**: Send notification email
+
+### Workflow 3: Development Workflow Automation
+
+**User Request:**
+> "Pull latest code, run tests, and create deployment package"
+
+**Galaxy Orchestration:**
+
+1. **UFO² Dev Workstation**: Open VS Code, pull from Git repository
+2. **UFO² Dev Workstation**: Run automated tests, capture results
+3. **Linux Build Server**: Build deployment package
+4. **UFO² Dev Workstation**: Open browser, upload to staging server
+5. **UFO² Desktop**: Send deployment notification email
+
+---
+
+## Task Assignment Behavior
+
+### How Galaxy Routes Tasks to UFO² Devices
+
+Galaxy's ConstellationAgent uses several factors to select the appropriate UFO² device for each subtask:
+
+| Factor | Description | Example |
+|--------|-------------|---------|
+| **Capabilities** | Match subtask requirements to device capabilities | `"excel"` → Office workstation |
+| **OS Requirement** | Platform-specific tasks routed to correct OS | Windows automation → UFO² devices |
+| **Metadata Context** | Use device-specific apps and configurations | Email task → device with Outlook |
+| **Device Status** | Only assign to online, healthy devices | Skip offline or failing devices |
+| **Load Balancing** | Distribute tasks across similar devices | Round-robin across office desktops |
+
+### Example Task Decomposition
+
+**User Request:**
+> "Prepare quarterly financial reports and distribute to stakeholders"
+
+**Galaxy Decomposition:**
+
+```yaml
+Task 1:
+ Description: "Extract financial data from database"
+ Target: linux_db_server
+ Reason: Has "database_server" capability
+
+Task 2:
+ Description: "Create Excel financial dashboard"
+ Target: ufo2_office_1
+ Reason: Has "excel" capability, device is idle
+
+Task 3:
+ Description: "Generate PowerPoint presentation"
+ Target: ufo2_office_2
+ Reason: Has "powerpoint" capability
+
+Task 4:
+ Description: "Email reports to stakeholders"
+ Target: ufo2_office_1
+ Reason: Has "outlook" and "email" capabilities
+```
+
+## Critical Configuration Requirements
+
+!!!danger "Configuration Validation Checklist"
+ Ensure these match **exactly** or Galaxy cannot control the UFO² device:
+
+ **Device ID Match:**
+ - In `devices.yaml`: `device_id: "ufo2_desktop_1"`
+ - In client command: `--client-id ufo2_desktop_1`
+
+ **Server URL Match:**
+ - In `devices.yaml`: `server_url: "ws://192.168.1.100:5000/ws"`
+ - In client command: `--ws-server ws://192.168.1.100:5000/ws`
+
+ **Platform Specification:**
+ - Must include `--platform windows` for UFO² devices
+
+## Monitoring & Debugging
+
+### Verify Device Registration
+
+Check if clients are connected to UFO² server:
+
+```powershell
+curl http://192.168.1.100:5000/api/clients
+```
+
+**Expected response:**
+
+```json
+{
+ "online_clients": [
+ {
+ "client_id": "ufo2_office_1",
+ "type": "device",
+ "platform": "windows",
+ "connected_at": 1730899822.0,
+ "uptime_seconds": 45
+ },
+ {
+ "client_id": "ufo2_office_2",
+ "type": "device",
+ "platform": "windows",
+ "connected_at": 1730899850.0,
+ "uptime_seconds": 17
+ }
+ ]
+}
+```
+
+### View Task Assignments
+
+Galaxy logs show task routing decisions:
+
+```log
+INFO - [Galaxy] Task decomposition: 3 subtasks created
+INFO - [Galaxy] Subtask 1 → linux_db_server (capability match: database_server)
+INFO - [Galaxy] Subtask 2 → ufo2_office_1 (capability match: excel)
+INFO - [Galaxy] Subtask 3 → ufo2_office_1 (capability match: email)
+```
+
+### Troubleshooting Device Connection
+
+**Issue**: UFO² device not appearing in Galaxy device pool
+
+**Diagnosis:**
+
+1. Check if client is connected to server:
+ ```powershell
+ curl http://192.168.1.100:5000/api/clients
+ ```
+
+2. Verify `devices.yaml` configuration matches client parameters
+
+3. Check Galaxy logs for connection errors
+
+4. Ensure `auto_connect: true` in `devices.yaml`
+
+5. Verify UFO² server is running and accessible
+
+## Common Issues & Troubleshooting
+
+### Issue 1: UFO² Client Cannot Connect to Server
+
+!!!bug "Error: Connection Refused"
+ **Symptoms:**
+ ```log
+ ERROR - [WS] Failed to connect to ws://192.168.1.100:5000/ws
+ Connection refused
+ ```
+
+ **Diagnosis Checklist:**
+
+ - [ ] Is the UFO² server running? (`curl http://192.168.1.100:5000/api/health`)
+ - [ ] Is the port correct? (Check server startup logs)
+ - [ ] Can client reach server IP? (`ping 192.168.1.100`)
+ - [ ] Is Windows Firewall blocking port 5000?
+ - [ ] Is the WebSocket URL correct? (should start with `ws://`)
+
+ **Solutions:**
+
+ **Verify Server:**
+ ```powershell
+ # On server machine
+ curl http://localhost:5000/api/health
+
+ # From client machine
+ curl http://192.168.1.100:5000/api/health
+ ```
+
+ **Check Network:**
+ ```powershell
+ # Test connectivity
+ ping 192.168.1.100
+
+ # Test port accessibility (requires telnet client)
+ Test-NetConnection -ComputerName 192.168.1.100 -Port 5000
+ ```
+
+ **Check Windows Firewall:**
+ ```powershell
+ # Allow port through firewall
+ New-NetFirewallRule -DisplayName "UFO Server" `
+ -Direction Inbound `
+ -LocalPort 5000 `
+ -Protocol TCP `
+ -Action Allow
+ ```
+
+### Issue 2: Missing `--platform windows` Flag
+
+!!!bug "Error: Incorrect Agent Type"
+ **Symptoms:**
+ - Client connects but cannot execute Windows automation
+ - Server logs show wrong platform type
+ - Tasks fail with "unsupported operation" errors
+
+ **Cause:**
+ Forgot to add `--platform windows` flag when starting the client.
+
+ **Solution:**
+ ```powershell
+ # Wrong (missing platform)
+ python -m ufo.client.client --ws --client-id ufo2_desktop_1
+
+ # Correct
+ python -m ufo.client.client `
+ --ws `
+ --client-id ufo2_desktop_1 `
+ --platform windows
+ ```
+
+### Issue 3: Duplicate Client ID
+
+!!!bug "Error: Registration Failed"
+ **Symptoms:**
+ ```log
+ ERROR - [WS] Registration failed: client_id already exists
+ ERROR - Another device is using ID 'ufo2_desktop_1'
+ ```
+
+ **Cause:**
+ Multiple UFO² clients trying to use the same `client_id`.
+
+ **Solutions:**
+
+ 1. **Use unique client IDs:**
+ ```powershell
+ # Device 1
+ --client-id ufo2_desktop_1
+
+ # Device 2
+ --client-id ufo2_desktop_2
+
+ # Device 3
+ --client-id ufo2_dev_1
+ ```
+
+ 2. **Check currently connected clients:**
+ ```powershell
+ curl http://192.168.1.100:5000/api/clients
+ ```
+
+### Issue 4: Galaxy Cannot Find UFO² Device
+
+!!!bug "Error: Device Not Configured"
+ **Symptoms:**
+ ```log
+ ERROR - Device 'ufo2_desktop_1' not found in configuration
+ WARNING - Cannot dispatch task to unknown device
+ ```
+
+ **Cause:**
+ Mismatch between `devices.yaml` configuration and actual client setup.
+
+ **Diagnosis:**
+
+ Check that these match **exactly**:
+
+ | Location | Field | Example |
+ |----------|-------|---------|
+ | `devices.yaml` | `device_id` | `"ufo2_desktop_1"` |
+ | Client command | `--client-id` | `ufo2_desktop_1` |
+ | `devices.yaml` | `server_url` | `"ws://192.168.1.100:5000/ws"` |
+ | Client command | `--ws-server` | `ws://192.168.1.100:5000/ws` |
+
+ **Solution:**
+
+ Update `devices.yaml` to match your client configuration, or vice versa.
+
+### Issue 5: MCP Tools Not Available
+
+!!!bug "Error: Tool Execution Failed"
+ **Symptoms:**
+ ```log
+ ERROR - MCP tool 'click' not found
+ ERROR - Cannot execute Windows automation command
+ ```
+
+ **Diagnosis:**
+
+ - [ ] Is UFO² client running properly?
+ - [ ] Are local MCP servers initialized?
+ - [ ] Check client startup logs for MCP initialization errors
+
+ **Solution:**
+
+ Restart UFO² client and verify MCP initialization:
+
+ ```powershell
+ python -m ufo.client.client `
+ --ws `
+ --ws-server ws://192.168.1.100:5000/ws `
+ --client-id ufo2_desktop_1 `
+ --platform windows
+ ```
+
+ Look for:
+ ```log
+ INFO - MCP servers initialized: ui_automation, file_operations, app_control
+ INFO - UFO Client ready with 15 available tools
+ ```
+
+---
+
+## Comparison with Standalone UFO²
+
+| Aspect | Standalone UFO² | UFO² as Galaxy Device |
+|--------|----------------|----------------------|
+| **Architecture** | Local mode (single process) | Server-client mode (distributed) |
+| **Control** | Direct user interaction | Galaxy orchestration |
+| **Multi-Device** | Single device only | Multiple UFO² devices |
+| **Cross-Platform** | Windows only | Windows + Linux + others |
+| **Task Distribution** | Manual | Automatic (capabilities-based) |
+| **Scalability** | Limited to one machine | Scales to device pool |
+| **Use Case** | Individual automation tasks | Enterprise multi-tier workflows |
+| **Configuration** | Simple (no server/client setup) | Requires server-client + Galaxy config |
+
+**When to use Standalone UFO²:**
+
+- Simple, single-device Windows automation
+- Development and testing
+- Personal productivity tasks
+- No need for cross-platform workflows
+
+**When to use UFO² as Galaxy Device:**
+
+- Enterprise-scale automation
+- Multi-device orchestration
+- Cross-platform workflows (Windows + Linux)
+- Centralized management and monitoring
+- Parallel task execution across multiple machines
+
+## Related Documentation
+
+- **[UFO² Overview](overview.md)** - Architecture and core concepts
+- **[HostAgent](host_agent/overview.md)** - Desktop-level automation
+- **[AppAgent](app_agent/overview.md)** - Application-specific automation
+- **[Galaxy Overview](../galaxy/overview.md)** - Multi-tier orchestration framework
+- **[Server-Client Architecture](../infrastructure/agents/server_client_architecture.md)** - Distributed agent design
+- **[Linux as Galaxy Device](../linux/as_galaxy_device.md)** - Linux agent integration (similar pattern)
+- **[Quick Start Linux](../getting_started/quick_start_linux.md)** - Similar server-client setup for Linux
+
+## Summary
+
+Integrating UFO² into UFO³ Galaxy enables:
+
+- **Multi-tier orchestration** - Galaxy coordinates UFO² + Linux + other devices
+- **Cross-platform workflows** - Seamlessly combine Windows desktop + Linux servers
+- **Capability-based routing** - Intelligent task assignment to appropriate devices
+- **Scalable automation** - Manage multiple UFO² devices from unified control plane
+- **Enterprise-ready** - Centralized monitoring, fault isolation, load balancing
+- **Server-client architecture** - Separation of orchestration and execution
+- **Local MCP services** - Automatic initialization, no manual setup required
+
+**Next Steps:**
+
+1. Start with a single UFO² device to verify the setup
+2. Add more UFO² devices as needed for parallel execution
+3. Integrate Linux agents for cross-platform workflows
+4. Define custom capabilities for your specific use cases
+5. Monitor Galaxy logs to understand task routing decisions
diff --git a/documents/docs/ufo2/core_features/control_detection/hybrid_detection.md b/documents/docs/ufo2/core_features/control_detection/hybrid_detection.md
new file mode 100644
index 000000000..67a9ea47c
--- /dev/null
+++ b/documents/docs/ufo2/core_features/control_detection/hybrid_detection.md
@@ -0,0 +1,94 @@
+# Hybrid Control Detection
+
+Hybrid control detection combines both UIA and OmniParser to provide comprehensive UI coverage. It merges standard Windows controls detected via UIA with visual elements detected through OmniParser, removing duplicates based on Intersection over Union (IoU) overlap.
+
+
+
+## How It Works
+
+The hybrid detection process follows these steps:
+
+```mermaid
+graph LR
+ A[Screenshot] --> B[UIA Detection]
+ A --> C[OmniParser Detection]
+ B --> D[UIA Controls Standard UI Elements]
+ C --> E[Visual Controls Icons, Images, Custom UI]
+ D --> F[Merge & Deduplicate IoU Threshold: 0.1]
+ E --> F
+ F --> G[Final Control List Annotated [1] to [N]]
+
+ style D fill:#e3f2fd
+ style E fill:#fff3e0
+ style F fill:#e8f5e9
+ style G fill:#f3e5f5
+```
+
+**Deduplication Algorithm:**
+
+1. Keep all UIA-detected controls (main list)
+2. For each OmniParser-detected control (additional list):
+ - Calculate IoU with all UIA controls
+ - If IoU > threshold (default 0.1), discard as duplicate
+ - Otherwise, add to merged list
+3. Result: Maximum coverage with minimal duplicates
+
+## Benefits
+
+- **Maximum Coverage**: Detects both standard and custom UI elements
+- **No Gaps**: Visual detection fills in UIA blind spots
+- **Efficiency**: Deduplication prevents redundant annotations
+- **Flexibility**: Works across diverse application types
+
+## Configuration
+
+### Prerequisites
+
+Before enabling hybrid detection, you must deploy and configure OmniParser. See [Visual Detection - Deployment](./visual_detection.md#deployment) for instructions.
+
+### Enable Hybrid Mode
+
+Configure both backends in `config/ufo/system.yaml`:
+
+```yaml
+# Enable hybrid detection
+CONTROL_BACKEND: ["uia", "omniparser"]
+
+# IoU threshold for merging (controls with IoU > threshold are considered duplicates)
+IOU_THRESHOLD_FOR_MERGE: 0.1 # Default: 0.1
+
+# OmniParser configuration
+OMNIPARSER:
+ ENDPOINT: ""
+ BOX_THRESHOLD: 0.05
+ IOU_THRESHOLD: 0.1
+ USE_PADDLEOCR: True
+ IMGSZ: 640
+```
+
+### Configuration Options
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `CONTROL_BACKEND` | List[str] | `["uia"]` | List of detection backends to use |
+| `IOU_THRESHOLD_FOR_MERGE` | float | `0.1` | IoU threshold for duplicate detection (0.0-1.0) |
+
+**Tuning Guidelines:**
+
+- **Lower threshold (< 0.1)**: More aggressive deduplication, may miss some controls
+- **Higher threshold (> 0.1)**: Keep more overlapping controls, may have duplicates
+- **Recommended**: Keep default 0.1 for optimal balance
+
+See [System Configuration](../../../configuration/system/system_config.md#control-backend) for complete configuration details.
+
+## Implementation
+
+The hybrid detection is implemented through:
+
+- **`AppControlInfoStrategy`**: Orchestrates control collection from multiple backends
+- **`PhotographerFacade.merge_target_info_list()`**: Performs IoU-based deduplication
+- **`OmniparserGrounding`**: Handles visual detection and parsing
+
+## Reference
+
+:::automator.ui_control.grounding.omniparser.OmniparserGrounding
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/control_detection/overview.md b/documents/docs/ufo2/core_features/control_detection/overview.md
new file mode 100644
index 000000000..11be9056a
--- /dev/null
+++ b/documents/docs/ufo2/core_features/control_detection/overview.md
@@ -0,0 +1,29 @@
+# Control Detection
+
+We support different control detection methods to detect controls in the application to accommodate both standard (UIA) and custom controls (Visual).
+
+## Detection Methods
+
+| Method | Description | Use Case |
+|--------|-------------|----------|
+| [**UIA**](./uia_detection.md) | Uses Windows UI Automation framework to detect standard controls. Provides APIs to access and manipulate UI elements in Windows applications. | Standard Windows applications with native controls |
+| [**Visual (OmniParser)**](./visual_detection.md) | Uses OmniParser vision-based detection to identify custom controls through computer vision techniques based on visual appearance. | Applications with custom controls, icons, or visual elements not accessible via UIA |
+| [**Hybrid**](./hybrid_detection.md) | Combines both UIA and OmniParser detection methods. Merges results from both approaches, removing duplicates based on IoU overlap. | Maximum coverage for applications with both standard and custom controls |
+
+## Configuration
+
+Configure the control detection method by setting the `CONTROL_BACKEND` parameter in `config/ufo/system.yaml`:
+
+```yaml
+# Use UIA only (default, recommended)
+CONTROL_BACKEND: ["uia"]
+
+# Use OmniParser only
+CONTROL_BACKEND: ["omniparser"]
+
+# Use hybrid mode (UIA + OmniParser)
+CONTROL_BACKEND: ["uia", "omniparser"]
+```
+
+See [System Configuration](../../../configuration/system/system_config.md#control-backend) for detailed configuration options.
+
diff --git a/documents/docs/ufo2/core_features/control_detection/uia_detection.md b/documents/docs/ufo2/core_features/control_detection/uia_detection.md
new file mode 100644
index 000000000..6df4737e2
--- /dev/null
+++ b/documents/docs/ufo2/core_features/control_detection/uia_detection.md
@@ -0,0 +1,38 @@
+# UIA Control Detection
+
+UIA control detection uses the Windows UI Automation (UIA) framework to detect and interact with standard controls in Windows applications. It provides a robust set of APIs to access and manipulate UI elements programmatically.
+
+## Features
+
+- **Fast and Reliable**: Native Windows API with optimal performance
+- **Standard Controls**: Works with most Windows applications using standard controls
+- **Rich Metadata**: Provides detailed control information (type, name, position, state, etc.)
+
+## Limitations
+
+UIA control detection may not detect non-standard controls, custom-rendered UI elements, or visual components that don't expose UIA interfaces (e.g., canvas-based controls, game UIs, some web content).
+
+## Configuration
+
+UIA is the default control detection backend. Configure it in `config/ufo/system.yaml`:
+
+```yaml
+CONTROL_BACKEND: ["uia"]
+```
+
+For applications with custom controls, consider using [hybrid detection](./hybrid_detection.md) which combines UIA with visual detection.
+
+## Implementation
+
+UFO² uses the `ControlInspectorFacade` class to interact with the UIA framework. The facade pattern provides a simplified interface to:
+
+- Enumerate desktop windows
+- Find control elements in window hierarchies
+- Filter controls by type, visibility, and state
+- Extract control metadata and positions
+
+See [System Configuration](../../../configuration/system/system_config.md#control-backend) for additional options.
+
+## Reference
+
+:::automator.ui_control.inspector.ControlInspectorFacade
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/control_detection/visual_detection.md b/documents/docs/ufo2/core_features/control_detection/visual_detection.md
new file mode 100644
index 000000000..e84b6fba6
--- /dev/null
+++ b/documents/docs/ufo2/core_features/control_detection/visual_detection.md
@@ -0,0 +1,69 @@
+# Visual Control Detection (OmniParser)
+
+Visual control detection uses [OmniParser-v2](https://github.com/microsoft/OmniParser), a vision-based grounding model that detects UI elements through computer vision. This method is particularly effective for custom controls, icons, images, and visual elements that may not be accessible through standard UIA.
+
+## Use Cases
+
+- **Custom Controls**: Detects proprietary or non-standard UI elements
+- **Visual Elements**: Icons, images, and graphics-based controls
+- **Web Content**: Elements within browser windows or web views
+- **Canvas-based UIs**: Applications that render custom graphics
+
+## Deployment
+
+### 1. Clone the OmniParser Repository
+
+On your remote GPU server:
+
+```bash
+git clone https://github.com/microsoft/OmniParser.git
+cd OmniParser/omnitool/omniparserserver
+```
+
+### 2. Start the OmniParser Service
+
+```bash
+python gradio_demo.py
+```
+
+This will generate output similar to:
+
+```
+* Running on local URL: http://0.0.0.0:7861
+* Running on public URL: https://xxxxxxxxxxxxxxxxxx.gradio.live
+```
+
+For detailed deployment instructions, refer to the [OmniParser README](https://github.com/microsoft/OmniParser/tree/master/omnitool).
+
+## Configuration
+
+### OmniParser Settings
+
+Configure the OmniParser endpoint and parameters in `config/ufo/system.yaml`:
+
+```yaml
+OMNIPARSER:
+ ENDPOINT: "" # The endpoint URL from deployment
+ BOX_THRESHOLD: 0.05 # Bounding box confidence threshold
+ IOU_THRESHOLD: 0.1 # IoU threshold for non-max suppression
+ USE_PADDLEOCR: True # Enable OCR for text detection
+ IMGSZ: 640 # Input image size for the model
+```
+
+### Enable Visual Detection
+
+Set `CONTROL_BACKEND` to use OmniParser:
+
+```yaml
+# Use OmniParser only
+CONTROL_BACKEND: ["omniparser"]
+
+# Or use hybrid mode (recommended for maximum coverage)
+CONTROL_BACKEND: ["uia", "omniparser"]
+```
+
+See [Hybrid Detection](./hybrid_detection.md) for combining UIA and OmniParser, or [System Configuration](../../../configuration/system/system_config.md#control-backend) for detailed options.
+
+## Reference
+
+:::automator.ui_control.grounding.omniparser.OmniparserGrounding
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/hybrid_actions.md b/documents/docs/ufo2/core_features/hybrid_actions.md
new file mode 100644
index 000000000..b916f4c5e
--- /dev/null
+++ b/documents/docs/ufo2/core_features/hybrid_actions.md
@@ -0,0 +1,370 @@
+# Hybrid GUI–API Action Layer
+
+UFO² introduces a **hybrid action layer** that seamlessly combines traditional GUI automation with native application APIs, enabling agents to dynamically select the optimal execution method for each task. This design bridges the gap between universal GUI availability and high-fidelity API control, achieving both robustness and efficiency.
+
+## The Two-Interface Problem
+
+Application environments typically expose two complementary classes of interfaces, each with distinct trade-offs:
+
+### GUI Frontends (Traditional Approach)
+
+**Characteristics:**
+✅ **Universally Available** — Works with any application, even without API documentation
+✅ **Visual Compatibility** — Follows actual UI layout users see
+✅ **No Integration Required** — Works out-of-the-box with UI Automation
+
+**Limitations:**
+❌ **Brittle to UI Changes** — Layout modifications break automation
+❌ **Slow Execution** — Requires screenshot capture, OCR, and simulated input
+❌ **Limited Precision** — Pixel-based targeting prone to errors
+❌ **High Cognitive Load** — LLMs must interpret visual information at each step
+
+### Native APIs (Preferred Approach)
+
+**Characteristics:**
+✅ **High-Fidelity Control** — Direct manipulation of application state
+✅ **Fast Execution** — No screenshot analysis or UI rendering delays
+✅ **Precise Operations** — Programmatic access to exact data structures
+✅ **Robust to UI Changes** — API contracts remain stable across versions
+
+**Limitations:**
+❌ **Requires Explicit Integration** — Must implement API wrappers for each app
+❌ **Limited Availability** — Not all applications expose comprehensive APIs
+❌ **Maintenance Overhead** — API changes require code updates
+❌ **Documentation Dependency** — Requires accurate API references
+
+!!! info "Research Finding"
+ Studies show that **API-based agents outperform GUI-only agents** by 15–30% on tasks where APIs are available, but **GUI fallback is essential** for broad application coverage and handling edge cases where APIs are insufficient.
+ 📄 Reference: [API Agents vs. GUI Agents](https://arxiv.org/abs/2501.05446)
+
+## UFO²'s Hybrid Solution
+
+UFO² addresses this dilemma through a **unified action layer** that:
+
+1. **Dynamically selects** between GUI and API execution based on availability and task requirements
+2. **Composes hybrid workflows** that mix GUI and API actions within a single task
+3. **Provides graceful fallback** from API to GUI when APIs are unavailable or insufficient
+4. **Leverages MCP servers** for extensible, modular integration of application-specific APIs
+
+
+*UFO²'s hybrid action architecture powered by Model Context Protocol (MCP) servers. Agents dynamically select between GUI automation (via UI Automation/Win32 APIs) and native application APIs (via MCP servers like Excel COM, Outlook API, PowerPoint), enabling optimal execution strategies for each task.*
+
+## MCP-Powered Action Execution
+
+UFO² implements the hybrid action layer through the **Model Context Protocol (MCP)** framework:
+
+### Architecture Components
+
+| Component | Role | Examples |
+|-----------|------|----------|
+| **MCP Servers** | Expose application-specific APIs as standardized tools | Excel COM Server, Outlook API Server, PowerPoint Server |
+| **GUI Automation Servers** | Provide universal UI interaction commands | UICollector, HostUIExecutor, AppUIExecutor |
+| **Command Dispatcher** | Routes agent requests to appropriate MCP server | Selects Excel API for cell operations, GUI for unlabeled buttons |
+| **Action Strategies** | Determine execution method based on context | Prefer API for bulk operations, GUI for visual verification |
+
+### Execution Flow
+
+```mermaid
+graph TB
+ Agent[AppAgent Action Decision] --> Decision{API Available & Preferred?}
+
+ Decision -->|Yes| API[MCP API Server]
+ Decision -->|No/Fallback| GUI[GUI Automation Server]
+
+ API --> ExcelAPI[Excel COM]
+ API --> OutlookAPI[Outlook COM]
+ API --> PowerPointAPI[PowerPoint COM]
+
+ GUI --> UIA[UI Automation]
+ GUI --> Win32[Win32 APIs]
+
+ ExcelAPI --> Result[Execution Result]
+ OutlookAPI --> Result
+ PowerPointAPI --> Result
+ UIA --> Result
+ Win32 --> Result
+
+ style API fill:#e8f5e9
+ style GUI fill:#fff3e0
+ style Result fill:#e3f2fd
+```
+
+### Example: Excel Chart Creation
+
+**Scenario:** Create a column chart from data in cells A1:B10
+
+**API-First Execution:**
+
+```python
+# Agent decision: Use Excel API (fast, precise)
+command = ExcelCreateChartCommand(
+ data_range="A1:B10",
+ chart_type="column",
+ chart_title="Sales Data"
+)
+
+# MCP Server: Excel COM
+result = mcp_server.execute(command)
+# → Direct API call: workbook.charts.add(...)
+# → Execution time: ~0.5s
+```
+
+**GUI Fallback Execution:**
+
+```python
+# Agent decision: API unavailable, use GUI
+commands = [
+ SelectControlCommand(control="A1:B10"),
+ ClickCommand(control="Insert > Chart"),
+ SelectChartTypeCommand(type="Column"),
+ SetTextCommand(control="Chart Title", text="Sales Data"),
+ ClickCommand(control="OK")
+]
+
+# MCP Server: UICollector
+for cmd in commands:
+ result = mcp_server.execute(cmd)
+# → UI Automation: capture, annotate, click sequence
+# → Execution time: ~8s
+```
+
+**Hybrid Execution:**
+
+```python
+# Agent decision: Mix API + GUI for optimal workflow
+
+# Step 1: API for data manipulation (fast)
+api_command = ExcelSetRangeCommand(
+ range="A1:B10",
+ values=processed_data
+)
+mcp_api_server.execute(api_command)
+
+# Step 2: GUI for chart insertion (visual verification)
+gui_commands = [
+ SelectControlCommand(control="A1:B10"),
+ ClickCommand(control="Insert > Recommended Charts"),
+ # Visual confirmation before finalizing
+ ScreenshotCommand(),
+ ClickCommand(control="OK")
+]
+for cmd in gui_commands:
+ mcp_gui_server.execute(cmd)
+```
+
+---
+
+## Dynamic Action Selection
+
+UFO²'s agents use a **strategy-based decision process** to select execution methods:
+
+### Selection Criteria
+
+UFO² agents dynamically select between GUI and API execution based on:
+
+| Factor | API Preference | GUI Preference |
+|--------|---------------|---------------|
+| **Operation Type** | Bulk data operations, calculations | Visual layout, custom UI elements |
+| **Performance Requirement** | Time-critical tasks | Tasks requiring visual verification |
+| **API Availability** | Application has MCP server configured | Application only has GUI automation |
+| **Precision Requirement** | Exact data manipulation | Approximate interactions (e.g., scrolling) |
+| **Error Handling** | Predictable state changes | Exploratory interactions |
+
+**How Agents Decide:**
+
+The agent **reasoning process** determines execution method based on:
+
+1. **Available MCP servers** — Check if application has API-based MCP servers configured
+2. **Task characteristics** — Bulk operations favor API, visual tasks favor GUI
+3. **Tool availability** — Each MCP server exposes specific capabilities as tools
+4. **LLM decision** — Agent reasons about which available tool best fits the task
+
+**Real-World Decision Examples:**
+
+**Task: "Fill 1000 Excel cells with sequential numbers"**
+→ **Decision: ExcelCOMExecutor** (COM API bulk operation ~2s vs. GUI 1000 clicks ~300s)
+
+**Task: "Click the blue 'Submit' button in custom dialog"**
+→ **Decision: AppUIExecutor** (No API for custom dialogs, visual grounding needed)
+
+**Task: "Create presentation from Excel data, verify slide layout"**
+→ **Decision: Both servers** (PowerPointCOMExecutor for data, AppUIExecutor for verification)
+
+## MCP Server Configuration
+
+UFO² agents discover available MCP servers through the `config/ufo/mcp.yaml` configuration:
+
+### Server Registration
+
+```yaml
+# config/ufo/mcp.yaml
+# MCP servers are organized by agent type and application
+
+AppAgent:
+ # Default configuration for all applications
+ default:
+ data_collection:
+ - namespace: UICollector # Screenshot capture, UI tree extraction
+ type: local # Local in-memory server
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor # GUI automation (click, type, scroll)
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor # Command-line execution
+ type: local
+ start_args: []
+ reset: false
+
+ # Excel-specific configuration (adds COM API)
+ EXCEL.EXE:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: AppUIExecutor # GUI fallback
+ type: local
+ start_args: []
+ reset: false
+ - namespace: ExcelCOMExecutor # Excel COM API
+ type: local
+ start_args: []
+ reset: true # Reset when switching apps
+
+ # Word-specific configuration
+ WINWORD.EXE:
+ action:
+ - namespace: WordCOMExecutor # Word COM API
+ type: local
+ start_args: []
+ reset: true
+
+ # PowerPoint-specific configuration
+ POWERPNT.EXE:
+ action:
+ - namespace: PowerPointCOMExecutor # PowerPoint COM API
+ type: local
+ start_args: []
+ reset: true
+
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: HostUIExecutor # Desktop-level GUI automation
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+```
+
+### How Agents Load MCP Servers
+
+When an agent is initialized for a specific application, the system:
+
+1. **Matches application** — Uses process name (e.g., `EXCEL.EXE`) to find configuration
+2. **Creates MCP servers** — Initializes servers via `MCPServerManager.create_or_get_server()`
+3. **Registers tools** — Each MCP server exposes tools (e.g., `excel_write_cell`, `ui_click`)
+4. **Agent discovers capabilities** — LLM sees available tools in system prompt
+
+**Example: Available Tools for Excel**
+
+When AppAgent opens Excel, it gets tools from:
+
+**ExcelCOMExecutor (API):**
+- `excel_write_cell` — Write to specific cell
+- `excel_read_range` — Read cell range
+- `excel_create_chart` — Create chart
+- `excel_run_macro` — Run VBA macro
+
+**AppUIExecutor (GUI):**
+- `ui_click` — Click UI element
+- `ui_type_text` — Type text
+- `ui_select` — Select from dropdown
+
+**UICollector (Data):**
+- `capture_screenshot` — Capture screen
+- `get_ui_tree` — Get UI element tree
+
+For complete MCP documentation, see:
+
+- [MCP Overview](../../mcp/overview.md) — Model Context Protocol architecture
+- [MCP Configuration Reference](../../configuration/system/mcp_reference.md) — Complete configuration options
+- [MCP Server Documentation](../../mcp/local_servers.md) — All available MCP servers
+
+## Best Practices
+
+### When to Use API
+
+✅ **Bulk data operations** — Filling cells, processing records
+✅ **Precise calculations** — Formula application, data transformations
+✅ **Programmatic workflows** — Email automation, calendar scheduling
+✅ **Time-critical tasks** — High-volume operations with strict SLAs
+
+### When to Use GUI
+
+✅ **Visual verification** — Layout checking, color validation
+✅ **Custom UI elements** — Application-specific dialogs, unlabeled controls
+✅ **Exploratory tasks** — Navigating unfamiliar applications
+✅ **Legacy applications** — Apps without accessible APIs
+
+### When to Use Hybrid
+
+✅ **Complex workflows** — Combine API efficiency with GUI verification
+✅ **Partial API coverage** — Use API where available, GUI for gaps
+✅ **User-facing demos** — API for backend, GUI for visible interactions
+✅ **Debugging** — API for state setup, GUI for manual inspection
+
+!!! warning "Common Pitfalls"
+ - **Over-relying on APIs** — Some UI states only visible through screenshots
+ - **Ignoring API errors** — Always implement GUI fallback for resilience
+ - **Static execution plans** — Use dynamic selection based on runtime context
+ - **Inadequate verification** — Combine API execution with screenshot validation
+
+## Related Documentation
+
+### Core Concepts
+
+- [**MCP Overview**](../../mcp/overview.md) — Model Context Protocol architecture
+- [**AppAgent**](../app_agent/overview.md) — Application-level agent implementation
+- [**HostAgent**](../host_agent/overview.md) — Desktop-level agent implementation
+
+### Configuration
+
+- [**MCP Configuration Reference**](../../configuration/system/mcp_reference.md) — Complete MCP server configuration options
+- [**Configuration Guide**](../../configuration/system/overview.md) — System configuration overview
+
+### MCP Servers
+
+- [**UICollector**](../../mcp/servers/ui_collector.md) — Screenshot and UI tree capture
+- [**AppUIExecutor**](../../mcp/servers/app_ui_executor.md) — GUI automation server
+- [**ExcelCOMExecutor**](../../mcp/servers/excel_com_executor.md) — Excel COM API integration
+- [**WordCOMExecutor**](../../mcp/servers/word_com_executor.md) — Word COM API integration
+- [**PowerPointCOMExecutor**](../../mcp/servers/ppt_com_executor.md) — PowerPoint COM API integration
+- [**CommandLineExecutor**](../../mcp/servers/command_line_executor.md) — Command-line execution
+
+---
+
+## Next Steps
+
+1. **Explore MCP Architecture**: Read [MCP Overview](../../mcp/overview.md) to understand the protocol design
+2. **Configure MCP Servers**: Review [MCP Configuration](../../configuration/system/mcp_reference.md) for setup options
+3. **Study MCP Servers**: Check built-in implementations in [MCP Server Documentation](../../mcp/local_servers.md)
+4. **Build Custom Agents**: Follow [Creating AppAgent](../../tutorials/creating_app_agent/overview.md) to use hybrid actions
+
+Want to see hybrid actions in practice?
+
+- [Quick Start Guide](../../getting_started/quick_start_ufo2.md) — Run UFO² with default MCP servers
+- [Creating AppAgent Tutorial](../../tutorials/creating_app_agent/overview.md) — Build custom agents with hybrid actions
+- [Speculative Multi-Action Execution](multi_action.md) — Optimize performance with batch action prediction
diff --git a/documents/docs/ufo2/core_features/knowledge_substrate/experience_learning.md b/documents/docs/ufo2/core_features/knowledge_substrate/experience_learning.md
new file mode 100644
index 000000000..d79a1ecf1
--- /dev/null
+++ b/documents/docs/ufo2/core_features/knowledge_substrate/experience_learning.md
@@ -0,0 +1,61 @@
+# Learning from Self-Experience
+
+When UFO successfully completes a task, users can save the successful experience to enhance the AppAgent's future performance. The AppAgent learns from its own successful experiences to improve task execution.
+
+## Mechanism
+
+```mermaid
+graph TD
+ A[Complete Session] --> B[Prompt User to Save Experience]
+ B --> C{User Saves?}
+ C -->|Yes| D[Summarize with ExperienceSummarizer]
+ C -->|No| I[End]
+ D --> E[Save to Experience Database]
+ F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience]
+ G --> H[Generate Plan Using Retrieved Experience]
+```
+
+### Workflow Steps
+
+1. **Complete a Session**: UFO finishes executing a task successfully
+
+2. **Prompt User to Save**: The system asks whether to save the experience
+
+ 
+
+3. **Summarize Experience**: If the user chooses to save, the `ExperienceSummarizer` processes the session:
+ - Extracts key information from the execution trajectory
+ - Summarizes the experience into a structured demonstration example
+ - Saves it to the experience database at the configured path
+ - The demonstration example includes fields similar to those in the [AppAgent's prompt examples](../../prompts/examples_prompts.md)
+
+4. **Retrieve and Utilize**: When encountering similar tasks in the future:
+ - The AppAgent queries the experience database
+ - Retrieves relevant past experiences
+ - Uses them to inform plan generation
+
+## Configuration
+
+Configure the following parameters in `config.yaml` to enable self-experience learning:
+
+| Configuration Option | Description | Type | Default |
+|---------------------|-------------|------|---------|
+| `RAG_EXPERIENCE` | Enable experience-based learning | Boolean | `False` |
+| `RAG_EXPERIENCE_RETRIEVED_TOPK` | Number of top experiences to retrieve | Integer | `5` |
+| `EXPERIENCE_SAVED_PATH` | Database path for storing experiences | String | `"vectordb/experience/"` |
+
+For more details on RAG configuration, see the [RAG Configuration Guide](../../../configuration/system/rag_config.md).
+
+## API Reference
+
+### Experience Summarizer
+
+The `ExperienceSummarizer` class in `ufo/experience/summarizer.py` handles experience summarization:
+
+:::experience.summarizer.ExperienceSummarizer
+
+### Experience Retriever
+
+The `ExperienceRetriever` class in `ufo/rag/retriever.py` handles experience retrieval:
+
+:::rag.retriever.ExperienceRetriever
diff --git a/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_bing_search.md b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_bing_search.md
new file mode 100644
index 000000000..837e83628
--- /dev/null
+++ b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_bing_search.md
@@ -0,0 +1,38 @@
+# Learning from Bing Search
+
+UFO can enhance the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications beyond the AppAgent's existing knowledge base.
+
+## Mechanism
+
+When processing a request, the AppAgent:
+
+1. Constructs a Bing search query based on the request context
+2. Retrieves top-k search results from Bing
+3. Extracts relevant information from the search results
+4. Generates a plan informed by the retrieved information
+
+This mechanism is particularly useful for:
+- Tasks requiring current information (e.g., latest software features, current events)
+- Applications or domains not covered by the agent's training data
+- Dynamic information that changes frequently
+
+## Configuration
+
+To enable Bing search integration:
+
+1. **Obtain Bing API Key**: Get your API key from [Microsoft Azure Bing Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api)
+
+2. **Configure Parameters**: Set the following options in `config.yaml`:
+
+| Configuration Option | Description | Type | Default |
+|---------------------|-------------|------|---------|
+| `RAG_ONLINE_SEARCH` | Enable Bing search integration | Boolean | `False` |
+| `BING_API_KEY` | Bing Search API key | String | `""` |
+| `RAG_ONLINE_SEARCH_TOPK` | Number of search results to retrieve | Integer | `5` |
+| `RAG_ONLINE_RETRIEVED_TOPK` | Number of retrieved results to include in prompt | Integer | `5` |
+
+For more details on RAG configuration, see the [RAG Configuration Guide](../../../configuration/system/rag_config.md).
+
+## API Reference
+
+:::rag.retriever.OnlineDocRetriever
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_demonstration.md b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_demonstration.md
new file mode 100644
index 000000000..db5aab079
--- /dev/null
+++ b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_demonstration.md
@@ -0,0 +1,49 @@
+# Learning from User Demonstration
+
+For complex tasks, users can demonstrate the task execution process to help UFO learn effective action patterns. UFO uses Windows [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) to capture user action trajectories, which are then processed and stored for future reference.
+
+## Mechanism
+
+UFO leverages the Windows Step Recorder tool to capture task demonstrations. The workflow operates as follows:
+
+1. **Record**: User performs the task while Step Recorder captures the action sequence
+2. **Process**: The `DemonstrationSummarizer` extracts and summarizes the recorded demonstration from the zip file
+3. **Store**: Summarized demonstrations are saved to the configured demonstration database
+4. **Retrieve**: When encountering similar tasks, the `DemonstrationRetriever` queries relevant demonstrations
+5. **Apply**: Retrieved demonstrations guide the AppAgent's plan generation
+
+See the [User Demonstration Provision](../../../tutorials/creating_app_agent/demonstration_provision.md) guide for detailed recording instructions.
+
+**Demo Video:**
+
+
+## Configuration
+
+To enable learning from user demonstrations:
+
+1. **Provide Demonstrations**: Follow the [User Demonstration Provision](../../../tutorials/creating_app_agent/demonstration_provision.md) guide to record demonstrations
+
+2. **Configure Parameters**: Set the following options in `config.yaml`:
+
+| Configuration Option | Description | Type | Default |
+|---------------------|-------------|------|---------|
+| `RAG_DEMONSTRATION` | Enable demonstration-based learning | Boolean | `False` |
+| `RAG_DEMONSTRATION_RETRIEVED_TOPK` | Number of top demonstrations to retrieve | Integer | `5` |
+| `RAG_DEMONSTRATION_COMPLETION_N` | Number of completion choices for demonstration results | Integer | `3` |
+| `DEMONSTRATION_SAVED_PATH` | Database path for storing demonstrations | String | `"vectordb/demonstration/"` |
+
+For more details on RAG configuration, see the [RAG Configuration Guide](../../../configuration/system/rag_config.md).
+
+## API Reference
+
+### Demonstration Summarizer
+
+The `DemonstrationSummarizer` class in `record_processor/summarizer/summarizer.py` handles demonstration summarization:
+
+:::summarizer.summarizer.DemonstrationSummarizer
+
+### Demonstration Retriever
+
+The `DemonstrationRetriever` class in `ufo/rag/retriever.py` handles demonstration retrieval:
+
+:::rag.retriever.DemonstrationRetriever
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_help_document.md b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_help_document.md
new file mode 100644
index 000000000..7e306e02b
--- /dev/null
+++ b/documents/docs/ufo2/core_features/knowledge_substrate/learning_from_help_document.md
@@ -0,0 +1,34 @@
+# Learning from Help Documents
+
+Users or applications can provide help documents to enhance the AppAgent's capabilities. The AppAgent retrieves relevant knowledge from these documents to improve task understanding, plan quality, and application interaction efficiency.
+
+For instructions on providing help documents, see the [Help Document Provision](../../../tutorials/creating_app_agent/help_document_provision.md) guide.
+
+## Mechanism
+
+Help documents are structured as **task-solution pairs**. When processing a request, the AppAgent:
+
+1. Retrieves relevant help documents by matching the request against task descriptions
+2. Uses the retrieved solutions as references for plan generation
+3. Adapts the solutions to the specific context
+
+Since retrieved documents may not be perfectly relevant, the AppAgent treats them as references rather than strict instructions, allowing for flexible adaptation to the actual task requirements.
+
+## Configuration
+
+To enable learning from help documents:
+
+1. **Provide Help Documents**: Follow the [Help Document Provision](../../../tutorials/creating_app_agent/help_document_provision.md) guide to prepare and index help documents
+
+2. **Configure Parameters**: Set the following options in `config.yaml`:
+
+| Configuration Option | Description | Type | Default |
+|---------------------|-------------|------|---------|
+| `RAG_OFFLINE_DOCS` | Enable offline help document retrieval | Boolean | `False` |
+| `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` | Number of top documents to retrieve | Integer | `1` |
+
+For more details on RAG configuration, see the [RAG Configuration Guide](../../../configuration/system/rag_config.md).
+
+## API Reference
+
+:::rag.retriever.OfflineDocRetriever
\ No newline at end of file
diff --git a/documents/docs/ufo2/core_features/knowledge_substrate/overview.md b/documents/docs/ufo2/core_features/knowledge_substrate/overview.md
new file mode 100644
index 000000000..e10d71503
--- /dev/null
+++ b/documents/docs/ufo2/core_features/knowledge_substrate/overview.md
@@ -0,0 +1,76 @@
+# Knowledge Substrate
+
+UFO provides versatile mechanisms to enhance the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other knowledge retrieval techniques. These mechanisms improve the AppAgent's task understanding, plan quality, and interaction efficiency with applications.
+
+## Supported Knowledge Sources
+
+UFO currently supports the following knowledge retrieval methods:
+
+| Knowledge Source | Description |
+|------------------|-------------|
+| [Help Documents](./learning_from_help_document.md) | Retrieve knowledge from offline help documentation indexed for specific applications. |
+| [Bing Search](./learning_from_bing_search.md) | Search online information via Bing to obtain up-to-date knowledge. |
+| [Self-Experience](./experience_learning.md) | Learn from the agent's own successful task execution history. |
+| [User Demonstrations](./learning_from_demonstration.md) | Learn from action trajectories demonstrated by users. |
+
+## Context Provision
+
+UFO provides knowledge to the AppAgent through the `context_provision` method defined in the `AppAgent` class:
+
+```python
+async def context_provision(
+ self, request: str = "", context: Context = None
+) -> None:
+ """
+ Provision the context for the app agent.
+ :param request: The request sent to the Bing search retriever.
+ """
+
+ ufo_config = get_ufo_config()
+
+ # Load the offline document indexer for the app agent if available.
+ if ufo_config.rag.offline_docs:
+ console.print(
+ f"📚 Loading offline help document indexer for {self._process_name}...",
+ style="magenta",
+ )
+ self.build_offline_docs_retriever()
+
+ # Load the online search indexer for the app agent if available.
+
+ if ufo_config.rag.online_search and request:
+ console.print("🔍 Creating a Bing search indexer...", style="magenta")
+ self.build_online_search_retriever(
+ request, ufo_config.rag.online_search_topk
+ )
+
+ # Load the experience indexer for the app agent if available.
+ if ufo_config.rag.experience:
+ console.print("📖 Creating an experience indexer...", style="magenta")
+ experience_path = ufo_config.rag.experience_saved_path
+ db_path = os.path.join(experience_path, "experience_db")
+ self.build_experience_retriever(db_path)
+
+ # Load the demonstration indexer for the app agent if available.
+ if ufo_config.rag.demonstration:
+ console.print("🎬 Creating an demonstration indexer...", style="magenta")
+ demonstration_path = ufo_config.rag.demonstration_saved_path
+ db_path = os.path.join(demonstration_path, "demonstration_db")
+ self.build_human_demonstration_retriever(db_path)
+
+ await self._load_mcp_context(context)
+```
+
+The `context_provision` method loads various knowledge retrievers based on the configuration settings in `config.yaml`:
+
+- **Offline document retriever**: Loads indexed help documentation for the target application
+- **Online search retriever**: Creates a Bing search indexer when a search request is provided
+- **Experience retriever**: Loads the agent's historical successful experiences
+- **Demonstration retriever**: Loads user-demonstrated action trajectories
+- **MCP context**: Loads Model Context Protocol tool information for the current application
+
+## Retriever API Reference
+
+UFO employs the `Retriever` class located in `ufo/rag/retriever.py` to retrieve knowledge from various sources. For detailed API documentation, see:
+
+:::rag.retriever.Retriever
diff --git a/documents/docs/ufo2/core_features/multi_action.md b/documents/docs/ufo2/core_features/multi_action.md
new file mode 100644
index 000000000..0b773c566
--- /dev/null
+++ b/documents/docs/ufo2/core_features/multi_action.md
@@ -0,0 +1,140 @@
+# Speculative Multi-Action Execution
+
+UFO² introduces **Speculative Multi-Action Execution**, a feature that allows agents to bundle multiple predicted steps into a single LLM call and validate them against the live application state. This approach can reduce LLM queries by up to **51%** compared to inferring each action separately.
+
+## Overview
+
+Traditional agent execution follows a sequential pattern: **think → act → observe → think → act → observe**. Each cycle requires a separate LLM inference, making complex tasks slow and expensive.
+
+Speculative multi-action execution optimizes this by predicting a **batch of likely actions** upfront, then validating them against the live UI Automation state in a single execution pass:
+
+
+
+**Key Benefits:**
+
+- **Reduced LLM Calls**: Up to 51% fewer inference requests for multi-step tasks
+- **Faster Execution**: Batch prediction eliminates per-action round-trips
+- **Lower Costs**: Fewer API calls reduce operational expenses
+- **Maintained Accuracy**: Live validation ensures actions remain correct
+
+## How It Works
+
+When enabled, the agent:
+
+1. **Predicts Action Sequence**: Uses contextual understanding to forecast likely next steps (e.g., "Open Excel → Navigate to cell A1 → Enter value → Save")
+2. **Validates Against Live State**: Checks each predicted action against current UI Automation state
+3. **Executes Valid Actions**: Runs all validated actions in sequence
+4. **Handles Failures Gracefully**: Falls back to single-action mode if predictions fail validation
+
+## Configuration
+
+Enable speculative multi-action execution in `config/ufo/system.yaml`:
+
+```yaml
+# Action Configuration
+ACTION_SEQUENCE: true # Enable multi-action prediction and execution
+```
+
+**Configuration Location**: `config/ufo/system.yaml` (migrated from legacy `config_dev.yaml`)
+
+For configuration migration details, see [Configuration Migration Guide](../../configuration/system/migration.md).
+
+## Implementation Details
+
+The multi-action system is implemented through two core classes in `ufo/agents/processors/schemas/actions.py`:
+
+### ActionCommandInfo
+
+Represents a single action with execution metadata:
+
+:::agents.processors.schemas.actions.ActionCommandInfo
+
+**Key Properties:**
+
+- `function`: Action name (e.g., `click`, `type_text`)
+- `arguments`: Action parameters
+- `target`: UI element information
+- `result`: Execution result with status and error details
+- `action_string`: Human-readable representation
+
+### ListActionCommandInfo
+
+Manages sequences of multiple actions:
+
+:::agents.processors.schemas.actions.ListActionCommandInfo
+
+**Key Methods:**
+
+- `add_action()`: Append action to sequence
+- `to_list_of_dicts()`: Serialize for logging/debugging
+- `to_representation()`: Generate human-readable summary
+- `count_repeat_times()`: Track repeated actions for loop detection
+- `get_results()`: Extract execution outcomes
+
+## Example Scenarios
+
+**Scenario 1: Excel Data Entry**
+
+Without multi-action:
+```
+Think → Open Excel → Observe → Think → Click A1 → Observe → Think → Type "Sales" → Observe → Think → Save → Observe
+```
+**5 LLM calls**
+
+With multi-action:
+```
+Think → [Open Excel, Click A1, Type "Sales", Save] → Observe
+```
+**1 LLM call** (80% reduction)
+
+**Scenario 2: Email Composition**
+
+Single-action mode:
+```
+Think → Open Outlook → Think → Click New → Think → Enter recipient → Think → Enter subject → Think → Type body → Think → Send
+```
+**7 LLM calls**
+
+Multi-action mode:
+```
+Think → [Open Outlook, Click New, Enter recipient, Enter subject, Type body, Send] → Observe
+```
+**1 LLM call** (85% reduction)
+
+## When to Use
+
+**Best for:**
+
+✅ Predictable workflows with clear action sequences
+✅ Repetitive tasks (data entry, form filling)
+✅ Applications with stable UI structures
+✅ Cost-sensitive deployments requiring fewer LLM calls
+
+**Not recommended for:**
+
+❌ Highly dynamic UIs with frequent state changes
+❌ Exploratory tasks requiring frequent observation
+❌ Error-prone applications where validation is critical per step
+❌ Tasks requiring user confirmation between actions
+
+## Related Documentation
+
+- [AppAgent Processing Strategy](../app_agent/strategy.md) — How agents process and execute actions
+- [Hybrid GUI-API Actions](hybrid_actions.md) — Combining GUI automation with native APIs
+- [System Configuration Reference](../../configuration/system/system_config.md) — Complete `system.yaml` options
+- [Configuration Migration](../../configuration/system/migration.md) — Migrating from legacy `config_dev.yaml`
+
+## Performance Considerations
+
+**Trade-offs:**
+
+- **Accuracy vs. Speed**: Multi-action sacrifices per-step validation for batch efficiency
+- **Memory Usage**: Larger context windows needed to predict action sequences
+- **Failure Recovery**: Invalid predictions require full sequence rollback and retry
+
+**Optimization Tips:**
+
+1. **Start Conservative**: Test with `ACTION_SEQUENCE: false` before enabling
+2. **Monitor Validation Rates**: High rejection rates indicate poor prediction quality
+3. **Combine with Hybrid Actions**: Use [API-based execution](hybrid_actions.md) where possible for fastest performance
+4. **Tune MAX_STEP**: Set appropriate `MAX_STEP` limits in `system.yaml` to prevent runaway sequences
diff --git a/documents/docs/dataflow/execution.md b/documents/docs/ufo2/dataflow/execution.md
similarity index 100%
rename from documents/docs/dataflow/execution.md
rename to documents/docs/ufo2/dataflow/execution.md
diff --git a/documents/docs/dataflow/instantiation.md b/documents/docs/ufo2/dataflow/instantiation.md
similarity index 100%
rename from documents/docs/dataflow/instantiation.md
rename to documents/docs/ufo2/dataflow/instantiation.md
diff --git a/documents/docs/dataflow/overview.md b/documents/docs/ufo2/dataflow/overview.md
similarity index 100%
rename from documents/docs/dataflow/overview.md
rename to documents/docs/ufo2/dataflow/overview.md
diff --git a/documents/docs/dataflow/result.md b/documents/docs/ufo2/dataflow/result.md
similarity index 100%
rename from documents/docs/dataflow/result.md
rename to documents/docs/ufo2/dataflow/result.md
diff --git a/documents/docs/dataflow/windows_app_env.md b/documents/docs/ufo2/dataflow/windows_app_env.md
similarity index 100%
rename from documents/docs/dataflow/windows_app_env.md
rename to documents/docs/ufo2/dataflow/windows_app_env.md
diff --git a/documents/docs/benchmark/osworld.md b/documents/docs/ufo2/evaluation/benchmark/osworld.md
similarity index 100%
rename from documents/docs/benchmark/osworld.md
rename to documents/docs/ufo2/evaluation/benchmark/osworld.md
diff --git a/documents/docs/benchmark/overview.md b/documents/docs/ufo2/evaluation/benchmark/overview.md
similarity index 100%
rename from documents/docs/benchmark/overview.md
rename to documents/docs/ufo2/evaluation/benchmark/overview.md
diff --git a/documents/docs/benchmark/windows_agent_arena.md b/documents/docs/ufo2/evaluation/benchmark/windows_agent_arena.md
similarity index 100%
rename from documents/docs/benchmark/windows_agent_arena.md
rename to documents/docs/ufo2/evaluation/benchmark/windows_agent_arena.md
diff --git a/documents/docs/ufo2/evaluation/evaluation_agent.md b/documents/docs/ufo2/evaluation/evaluation_agent.md
new file mode 100644
index 000000000..de8adc09d
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/evaluation_agent.md
@@ -0,0 +1,105 @@
+# EvaluationAgent
+
+The `EvaluationAgent` evaluates whether a `Session` or `Round` has been successfully completed by assessing the performance of the `HostAgent` and `AppAgent` in fulfilling user requests. Configuration options are available in `config/ufo/system.yaml`. For more details, refer to the [System Configuration Guide](../../configuration/system/system_config.md).
+
+The `EvaluationAgent` is fully LLM-driven and conducts evaluations based on action trajectories and screenshots. Since LLM-based evaluation may not be 100% accurate, the results should be used as guidance rather than absolute truth.
+
+
+
+## Configuration
+
+Configure the `EvaluationAgent` in `config/ufo/system.yaml`:
+
+| Configuration Option | Description | Type | Default Value |
+|---------------------------|-----------------------------------------------|---------|---------------|
+| `EVA_SESSION` | Whether to evaluate the entire session. | Boolean | True |
+| `EVA_ROUND` | Whether to evaluate each round. | Boolean | False |
+| `EVA_ALL_SCREENSHOTS` | Whether to include all screenshots in evaluation. If `False`, only the first and last screenshots are used. | Boolean | True |
+
+## Evaluation Process
+
+The `EvaluationAgent` uses a Chain-of-Thought (CoT) mechanism to:
+
+1. Decompose the evaluation into multiple sub-goals based on the user request
+2. Evaluate each sub-goal separately
+3. Aggregate the sub-scores to determine the overall completion status
+
+```mermaid
+graph TD
+ A[User Request] --> B[EvaluationAgent]
+ C[Action Trajectories] --> B
+ D[Screenshots] --> B
+ E[APIs Description] --> B
+
+ B --> F[CoT: Decompose into Sub-goals]
+ F --> G[Evaluate Sub-goal 1]
+ F --> H[Evaluate Sub-goal 2]
+ F --> I[Evaluate Sub-goal N]
+
+ G --> J[Aggregate Sub-scores]
+ H --> J
+ I --> J
+
+ J --> K{Overall Completion Status}
+ K -->|yes| L[Task Completed]
+ K -->|no| M[Task Failed]
+ K -->|unsure| N[Uncertain Result]
+
+ B --> O[Generate Detailed Reason]
+ O --> P[Evaluation Report]
+ J --> P
+```
+
+### Inputs
+
+The `EvaluationAgent` takes the following inputs:
+
+| Input | Description | Type |
+| --- | --- | --- |
+| User Request | The user's request to be evaluated. | String |
+| APIs Description | Description of the APIs (tools) used during execution. | String |
+| Action Trajectories | Action trajectories executed by the `HostAgent` and `AppAgent`, including subtask, step, observation, thought, plan, comment, action, and application. | List of Dictionaries |
+| Screenshots | Screenshots captured during execution. | List of Images |
+
+The input construction is handled by the `EvaluationAgentPrompter` class in `ufo/prompter/eva_prompter.py`.
+
+### Outputs
+
+The `EvaluationAgent` generates the following outputs:
+
+| Output | Description | Type |
+| --- | --- | --- |
+| reason | Detailed reasoning for the judgment based on screenshot analysis and execution trajectory. | String |
+| sub_scores | List of sub-scoring points evaluating different aspects of the task. Each sub-score contains a name and evaluation result. | List of Dictionaries |
+| complete | Overall completion status: `yes`, `no`, or `unsure`. | String |
+
+Example output:
+
+```json
+{
+ "reason": "The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams.
+ The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open.
+ The agent then focused on the chat window, input the message 'hello', and clicked the Send button.
+ The final screenshot confirms that the message 'hello' was sent to Zac.",
+ "sub_scores": [
+ { "name": "correct application focus", "evaluation": "yes" },
+ { "name": "correct message input", "evaluation": "yes" },
+ { "name": "message sent successfully", "evaluation": "yes" }
+ ],
+ "complete": "yes"
+}
+```
+
+Evaluation logs are saved in `logs/{task_name}/evaluation.log`.
+
+## See Also
+
+- [System Configuration](../../configuration/system/system_config.md) - Configure evaluation settings
+- [Evaluation Logs](logs/evaluation_logs.md) - Understanding evaluation logs structure
+- [Logs Overview](logs/overview.md) - Complete guide to UFO logging system
+- [Benchmark Overview](benchmark/overview.md) - Benchmarking UFO performance using evaluation results
+
+## Reference
+
+:::agents.agent.evaluation_agent.EvaluationAgent
+
diff --git a/documents/docs/ufo2/evaluation/logs/evaluation_logs.md b/documents/docs/ufo2/evaluation/logs/evaluation_logs.md
new file mode 100644
index 000000000..a34413611
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/evaluation_logs.md
@@ -0,0 +1,48 @@
+# Evaluation Logs
+
+The evaluation log stores task completion assessment results from the `EvaluationAgent`. The log is saved as `evaluation.log` in JSON format, containing a single entry that evaluates the entire session.
+
+## Log Structure
+
+The evaluation log contains the following fields:
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `complete` | Overall completion status: `yes`, `no`, or `unsure` | String |
+| `sub_scores` | Breakdown of evaluation into sub-goals, each with name and evaluation status | List of Dictionaries |
+| `reason` | Detailed justification based on screenshots and execution trajectory | String |
+| `level` | Evaluation scope (e.g., `session`) | String |
+| `request` | Original user request being evaluated | String |
+| `type` | Log entry type, set to `evaluation_result` | String |
+
+## Sub-score Structure
+
+Each item in `sub_scores` contains:
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `name` | Name of the sub-goal being evaluated | String |
+| `evaluation` | Completion status: `yes`, `no`, or `unsure` | String |
+
+## Example
+
+```json
+{
+ "complete": "yes",
+ "sub_scores": [
+ {
+ "name": "Open application",
+ "evaluation": "yes"
+ },
+ {
+ "name": "Complete data entry",
+ "evaluation": "yes"
+ }
+ ],
+ "reason": "All sub-tasks completed successfully. Screenshots show the application was opened and data was correctly entered.",
+ "level": "session",
+ "request": "Open the application and enter data",
+ "type": "evaluation_result"
+}
+
+
diff --git a/documents/docs/ufo2/evaluation/logs/markdown_log_viewer.md b/documents/docs/ufo2/evaluation/logs/markdown_log_viewer.md
new file mode 100644
index 000000000..cbd749fce
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/markdown_log_viewer.md
@@ -0,0 +1,37 @@
+# Markdown Log Viewer
+
+UFO provides a Markdown-formatted log viewer that consolidates all execution data into a readable, structured document. This format is ideal for debugging, analysis, and documentation.
+
+## Configuration
+
+Enable Markdown log generation in `config_dev.yaml`:
+
+```yaml
+LOG_TO_MARKDOWN: true
+```
+
+## Output
+
+**File location:** `logs/{task_name}/output.md`
+
+The generated Markdown file includes:
+
+- Session overview and metadata
+- Step-by-step execution timeline
+- Agent responses and reasoning
+- Screenshots embedded inline
+- Evaluation results
+
+## Use Cases
+
+**Debugging:** Quickly trace through execution flow with visual context
+
+**Documentation:** Share execution logs with human-readable formatting
+
+**Analysis:** Review agent decision-making process with screenshots
+
+**Reporting:** Generate execution reports for evaluation or review
+
+## Implementation
+
+The Markdown log is automatically generated at session end by the `Trajectory` class (located in `ufo/trajectory/parser.py`), which parses `response.log` and combines it with screenshots and other artifacts.
diff --git a/documents/docs/ufo2/evaluation/logs/overview.md b/documents/docs/ufo2/evaluation/logs/overview.md
new file mode 100644
index 000000000..b8dbbd379
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/overview.md
@@ -0,0 +1,15 @@
+# UFO Logs
+
+UFO generates comprehensive logs for debugging, analysis, and evaluation. Understanding these logs is essential for diagnosing issues and improving agent performance.
+
+## Log Types
+
+| Log Type | Description | Location |
+| --- | --- | --- |
+| [Request Log](./request_logs.md) | LLM prompt requests at each step | `logs/{task_name}/request.log` |
+| [Step Log](./step_logs.md) | Agent responses and execution details | `logs/{task_name}/response.log` |
+| [Evaluation Log](./evaluation_logs.md) | Task evaluation results | `logs/{task_name}/evaluation.log` |
+| [Screenshots](./screenshots_logs.md) | UI screenshots and visual captures | `logs/{task_name}/` |
+| [UI Tree](./ui_tree_logs.md) | Application UI structure data | `logs/{task_name}/ui_tree/` |
+
+All logs are stored in the `logs/{task_name}` directory, where `{task_name}` is auto-generated based on timestamp.
\ No newline at end of file
diff --git a/documents/docs/ufo2/evaluation/logs/request_logs.md b/documents/docs/ufo2/evaluation/logs/request_logs.md
new file mode 100644
index 000000000..7f925489f
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/request_logs.md
@@ -0,0 +1,35 @@
+# Request Logs
+
+The request log stores all prompt messages sent to LLMs during execution. Each line is a JSON entry representing one LLM request at a specific step.
+
+## Location
+
+```
+logs/{task_name}/request.log
+```
+
+## Log Fields
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `step` | Step number in the session | Integer |
+| `prompt` | Complete prompt message sent to the LLM | Dictionary/List |
+
+## Reading Request Logs
+
+```python
+import json
+
+with open('logs/{task_name}/request.log', 'r') as f:
+ for line in f:
+ log = json.loads(line)
+ print(f"Step {log['step']}: {log['prompt']}")
+```
+
+The request log is useful for:
+
+- Debugging LLM interactions
+- Understanding what context was provided at each step
+- Analyzing prompt effectiveness
+- Reproducing agent behavior
+
\ No newline at end of file
diff --git a/documents/docs/ufo2/evaluation/logs/screenshots_logs.md b/documents/docs/ufo2/evaluation/logs/screenshots_logs.md
new file mode 100644
index 000000000..ce80a8966
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/screenshots_logs.md
@@ -0,0 +1,64 @@
+# Screenshot Logs
+
+UFO captures screenshots at every step for debugging and evaluation purposes. All screenshots are stored in the `logs/{task_name}/` directory.
+
+## Screenshot Types
+
+### 1. Clean Screenshots
+
+Unmodified screenshots of the desktop or application window.
+
+**File naming:**
+
+- Step screenshots: `action_step{step_number}.png`
+- Subtask completion: `action_round_{round_id}_sub_round_{sub_task_id}_final.png`
+- Round completion: `action_round_{round_id}_final.png`
+- Session completion: `action_step_final.png`
+
+**Example:**
+
+
+
+
+
+### 2. Annotated Screenshots
+
+Screenshots with UI controls labeled using the [Set-of-Mark](https://arxiv.org/pdf/2310.11441) paradigm. Each interactive control is marked with a number for reference.
+
+**File naming:** `action_step{step_number}_annotated.png`
+
+**Example:**
+
+
+
+
+
+Only control types configured in `CONTROL_LIST` (in `config_dev.yaml`) are annotated. Different control types use different colors, configurable via `ANNOTATION_COLORS`.
+
+### 3. Concatenated Screenshots
+
+Clean and annotated screenshots placed side-by-side for comparison.
+
+**File naming:** `action_step{step_number}_concat.png`
+
+**Example:**
+
+
+
+
+
+Configure whether to feed concatenated or separate screenshots to LLMs using `CONCAT_SCREENSHOT` in `config_dev.yaml`.
+
+### 4. Selected Control Screenshots
+
+Close-up view of the control element selected for interaction in the previous step.
+
+**File naming:** `action_step{step_number}_selected_controls.png`
+
+**Example:**
+
+
+
+
+
+Enable/disable sending selected control screenshots to LLM using `INCLUDE_LAST_SCREENSHOT` in `config_dev.yaml`.
\ No newline at end of file
diff --git a/documents/docs/ufo2/evaluation/logs/step_logs.md b/documents/docs/ufo2/evaluation/logs/step_logs.md
new file mode 100644
index 000000000..39195768f
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/step_logs.md
@@ -0,0 +1,97 @@
+# Step Logs
+
+The step log captures agent responses and execution details at every step. Each line in `response.log` is a JSON entry representing one agent action.
+
+## Location
+
+```
+logs/{task_name}/response.log
+```
+
+## HostAgent Logs
+
+### LLM Response Fields
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `observation` | Desktop screenshot analysis and current state | String |
+| `thought` | Reasoning process for task decomposition | String |
+| `current_subtask` | Subtask to be executed by AppAgent | String |
+| `message` | Instructions and context for AppAgent | List of Strings |
+| `control_label` | Index of selected application | String |
+| `control_text` | Name of selected application | String |
+| `plan` | Future subtasks after current one | List of Strings |
+| `status` | Agent state: `FINISH`, `CONTINUE`, `PENDING`, or `ASSIGN` | String |
+| `comment` | User-facing summary or progress update | String |
+| `questions` | Questions requiring user clarification | List of Strings |
+| `function` | System command to execute (optional) | String |
+
+### Additional Metadata
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `step` | Global step number in session | Integer |
+| `round_step` | Step number within current round | Integer |
+| `agent_step` | Step number for this agent instance | Integer |
+| `round_num` | Current round number | Integer |
+| `request` | Original user request | String |
+| `agent_type` | Set to `HostAgent` | String |
+| `agent_name` | Agent instance name | String |
+| `application` | Application process name | String |
+| `cost` | LLM cost for this step | Float |
+| `result` | Execution results | String |
+| `screenshot_clean` | Clean desktop screenshot path | String |
+| `screenshot_annotated` | Annotated screenshot path | String |
+| `screenshot_concat` | Concatenated screenshot path | String |
+| `screenshot_selected_control` | Selected control screenshot path | String |
+| `time_cost` | Time spent on each processing phase | Dictionary |
+
+## AppAgent Logs
+
+### LLM Response Fields
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `observation` | Application UI analysis and status | String |
+| `thought` | Reasoning for next action | String |
+| `control_label` | Index of selected control element | String |
+| `control_text` | Name of selected control element | String |
+| `action` | Action details including function and arguments | Dictionary or List |
+| `status` | Agent state (CONTINUE, FINISH, etc.) | String |
+| `plan` | Planned steps after current action | List of Strings |
+| `comment` | Progress summary or completion notes | String |
+| `save_screenshot` | Screenshot save configuration | Dictionary |
+
+### Additional Metadata
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `step` | Global step number in session | Integer |
+| `round_step` | Step number within current round | Integer |
+| `agent_step` | Step number for this agent instance | Integer |
+| `round_num` | Current round number | Integer |
+| `subtask` | Subtask assigned by HostAgent | String |
+| `subtask_index` | Index of subtask in current round | Integer |
+| `action_type` | Type of action performed | String |
+| `request` | Original user request | String |
+| `agent_type` | Set to `AppAgent` | String |
+| `agent_name` | Agent instance name | String |
+| `application` | Application process name | String |
+| `cost` | LLM cost for this step | Float |
+| `result` | Execution results | String |
+| `screenshot_clean` | Clean application screenshot path | String |
+| `screenshot_annotated` | Annotated screenshot path | String |
+| `screenshot_concat` | Concatenated screenshot path | String |
+| `time_cost` | Time spent on each processing phase | Dictionary |
+
+## Reading Step Logs
+
+```python
+import json
+
+with open('logs/{task_name}/response.log', 'r') as f:
+ for line in f:
+ log = json.loads(line)
+ print(f"Step {log['step']} - Agent: {log['agent_type']}")
+ print(f"Thought: {log['thought']}")
+```
diff --git a/documents/docs/ufo2/evaluation/logs/ui_tree_logs.md b/documents/docs/ufo2/evaluation/logs/ui_tree_logs.md
new file mode 100644
index 000000000..175e14290
--- /dev/null
+++ b/documents/docs/ufo2/evaluation/logs/ui_tree_logs.md
@@ -0,0 +1,110 @@
+# UI Tree Logs
+
+UFO can capture the complete UI control tree of application windows at every step. This structured data represents the hierarchical UI layout and is useful for analysis and debugging.
+
+## Configuration
+
+Enable UI tree logging by setting `SAVE_UI_TREE: true` in `config_dev.yaml`.
+
+**Location:** `logs/{task_name}/ui_tree/`
+
+**File naming:** `step_{step_number}.json`
+
+## Example
+
+```json
+{
+ "id": "node_0",
+ "name": "Mail - Chaoyun Zhang - Outlook",
+ "control_type": "Window",
+ "rectangle": {
+ "left": 628,
+ "top": 258,
+ "right": 3508,
+ "bottom": 1795
+ },
+ "adjusted_rectangle": {
+ "left": 0,
+ "top": 0,
+ "right": 2880,
+ "bottom": 1537
+ },
+ "relative_rectangle": {
+ "left": 0.0,
+ "top": 0.0,
+ "right": 1.0,
+ "bottom": 1.0
+ },
+ "level": 0,
+ "children": [
+ {
+ "id": "node_1",
+ "name": "",
+ "control_type": "Pane",
+ "rectangle": {
+ "left": 3282,
+ "top": 258,
+ "right": 3498,
+ "bottom": 330
+ },
+ "adjusted_rectangle": {
+ "left": 2654,
+ "top": 0,
+ "right": 2870,
+ "bottom": 72
+ },
+ "relative_rectangle": {
+ "left": 0.9215277777777777,
+ "top": 0.0,
+ "right": 0.9965277777777778,
+ "bottom": 0.0468445022771633
+ },
+ "level": 1,
+ "children": []
+ }
+ ]
+}
+```
+
+
+## Field Reference
+
+| Field | Description | Type |
+| --- | --- | --- |
+| `id` | Unique node identifier in the tree | String |
+| `name` | Control element name/text | String |
+| `control_type` | UI element type (Window, Button, Edit, etc.) | String |
+| `rectangle` | Absolute screen coordinates | Dictionary |
+| `adjusted_rectangle` | Coordinates relative to window | Dictionary |
+| `relative_rectangle` | Normalized coordinates (0.0-1.0) | Dictionary |
+| `level` | Depth in the UI tree hierarchy | Integer |
+| `children` | Child UI elements | List |
+
+### Rectangle Structure
+
+All rectangle fields contain:
+
+```json
+{
+ "left": 0,
+ "top": 0,
+ "right": 100,
+ "bottom": 100
+}
+```
+
+## Usage
+
+UI tree logs enable:
+
+- Understanding application structure
+- Analyzing control element hierarchy
+- Debugging control selection issues
+- Training ML models on UI data
+
+!!! note "Performance Impact"
+ Saving UI trees increases execution latency. Disable when not needed for data collection.
+
+## Reference
+
+:::automator.ui_control.ui_tree.UITree
\ No newline at end of file
diff --git a/documents/docs/ufo2/host_agent/commands.md b/documents/docs/ufo2/host_agent/commands.md
new file mode 100644
index 000000000..bd0dd61a0
--- /dev/null
+++ b/documents/docs/ufo2/host_agent/commands.md
@@ -0,0 +1,254 @@
+# HostAgent Command System
+
+HostAgent executes desktop-level commands through the **MCP (Model Context Protocol)** system. Commands are dynamically provided by MCP servers and executed through the `CommandDispatcher` interface. This document describes the MCP configuration for HostAgent commands.
+
+---
+
+## Command Execution Architecture
+
+```mermaid
+graph TB
+ HostAgent[HostAgent] --> Dispatcher[CommandDispatcher]
+ Dispatcher --> MCPClient[MCP Client]
+ MCPClient --> UICollector[UICollector Server]
+ MCPClient --> HostUIExecutor[HostUIExecutor Server]
+ MCPClient --> CLIExecutor[CommandLine Executor]
+
+ UICollector --> DataCollection[Desktop Screenshot Window Info]
+ HostUIExecutor --> DesktopActions[Window Selection App Launch]
+ CLIExecutor --> ShellActions[Shell Commands]
+
+ style HostAgent fill:#e3f2fd
+ style Dispatcher fill:#fff3e0
+ style MCPClient fill:#f1f8e9
+ style UICollector fill:#c8e6c9
+ style HostUIExecutor fill:#fff9c4
+ style CLIExecutor fill:#d1c4e9
+```
+
+!!!note "Dynamic Commands"
+ HostAgent commands are **not hardcoded**. They are dynamically discovered from configured MCP servers. Available commands depend on MCP server configuration in `config/ufo/mcp.yaml`, installed MCP servers, and active MCP connections.
+
+---
+
+## MCP Server Configuration
+
+### Configuration File
+
+HostAgent commands are configured in **`config/ufo/mcp.yaml`**:
+
+```yaml
+HostAgent:
+ default:
+ data_collection:
+ - namespace: UICollector
+ type: local
+ start_args: []
+ reset: false
+ action:
+ - namespace: HostUIExecutor
+ type: local
+ start_args: []
+ reset: false
+ - namespace: CommandLineExecutor
+ type: local
+ start_args: []
+ reset: false
+```
+
+### MCP Servers Used by HostAgent
+
+| Server | Namespace | Type | Purpose | Command Categories |
+|--------|-----------|------|---------|-------------------|
+| **UICollector** | `UICollector` | Local | Data collection | Desktop screenshot, window enumeration |
+| **HostUIExecutor** | `HostUIExecutor` | Local | Desktop actions | Window selection, application launch |
+| **CommandLineExecutor** | `CommandLineExecutor` | Local | Shell execution | PowerShell, Bash commands |
+
+---
+
+## Command Discovery
+
+### Listing Available Commands
+
+HostAgent dynamically discovers available commands from MCP servers:
+
+```python
+# Get all available tools from MCP servers
+result = await command_dispatcher.execute_commands([
+ Command(tool_name="list_tools", parameters={})
+])
+
+tools = result[0].result
+# Returns list of all available commands with their schemas
+```
+
+### Command Categories
+
+Commands are categorized by purpose:
+
+| Category | Server | Examples |
+|----------|--------|----------|
+| **Data Collection** | UICollector | `capture_desktop_screenshot`, `get_desktop_app_target_info`, `get_desktop_window_info` |
+| **Window Management** | HostUIExecutor | `select_application_window`, `launch_application` |
+| **Process Control** | HostUIExecutor | `close_application`, `get_process_info` |
+| **Shell Execution** | CommandLineExecutor | `execute_command` |
+| **Tool Discovery** | All Servers | `list_tools` |
+
+---
+
+## Command Execution
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Executor as ActionExecutor
+ participant Dispatcher as CommandDispatcher
+ participant MCP as MCP Server
+
+ Strategy->>Executor: execute(action_info)
+ Executor->>Dispatcher: execute_commands([Command(...)])
+ Dispatcher->>MCP: Invoke tool
+ MCP->>MCP: Execute command logic
+ MCP-->>Dispatcher: Result
+ Dispatcher-->>Executor: Result
+ Executor-->>Strategy: Success/Error
+```
+
+### Example: Capture Desktop Screenshot
+
+```python
+from aip.messages import Command
+
+# Create command
+command = Command(
+ tool_name="capture_desktop_screenshot",
+ parameters={"all_screens": True},
+ tool_type="data_collection",
+)
+
+# Execute command
+results = await command_dispatcher.execute_commands([command])
+
+# Access result
+screenshot_data = results[0].result # Base64-encoded image
+```
+
+### Example: Select Application Window
+
+```python
+# Select and focus application window
+command = Command(
+ tool_name="select_application_window",
+ parameters={
+ "id": "0",
+ "name": "Microsoft Word - Document1"
+ },
+ tool_type="action",
+)
+
+results = await command_dispatcher.execute_commands([command])
+app_info = results[0].result
+```
+
+---
+
+## Configuration Resources
+
+For detailed MCP configuration, server setup, and command reference:
+
+**Quick References:**
+
+- **[MCP Configuration Reference](../../configuration/system/mcp_reference.md)** - Quick MCP settings reference
+- **[MCP Overview](../../mcp/overview.md)** - MCP architecture and concepts
+
+**Configuration Guides:**
+
+- **[MCP Configuration Guide](../../mcp/configuration.md)** - Complete configuration documentation
+- **[Local Servers](../../mcp/local_servers.md)** - Built-in MCP servers
+- **[Remote Servers](../../mcp/remote_servers.md)** - HTTP and stdio servers
+- **[Creating MCP Servers](../../tutorials/creating_mcp_servers.md)** - Creating custom MCP servers
+
+**Server Type Documentation:**
+
+- **[Action Servers](../../mcp/action.md)** - Action server documentation
+- **[Data Collection Servers](../../mcp/data_collection.md)** - Data collection server documentation
+
+### Detailed Server Documentation
+
+Each MCP server has comprehensive documentation:
+
+| Server | Documentation | Command Details |
+|--------|--------------|----------------|
+| UICollector | [UICollector Server](../../mcp/servers/ui_collector.md) | Screenshot, window info, control detection commands |
+| HostUIExecutor | [HostUIExecutor Server](../../mcp/servers/host_ui_executor.md) | Window management and desktop automation commands |
+| CommandLineExecutor | [CommandLine Executor](../../mcp/servers/command_line_executor.md) | Shell command execution |
+
+!!!warning "Command Details Subject to Change"
+ Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the server-specific documentation for the most up-to-date command reference.
+
+---
+
+## Agent Configuration Settings
+
+### HostAgent Configuration
+
+```yaml
+# config/ufo/host_agent_config.yaml
+system:
+ # Control detection backend
+ control_backend:
+ - "uia" # Windows UI Automation
+ - "omniparser" # Vision-based detection
+
+ # Screenshot settings
+ save_full_screen: true # Capture desktop screenshots
+ save_ui_tree: true # Save UI tree JSON
+ include_last_screenshot: true # Include previous step
+ concat_screenshot: true # Concatenate clean + annotated
+
+ # Window behavior
+ maximize_window: false # Maximize on selection
+ show_visual_outline_on_screen: true # Draw red outline
+```
+
+See **[Configuration Overview](../../configuration/system/overview.md)** and **[System Configuration](../../configuration/system/system_config.md)** for complete configuration options.
+
+---
+
+## Related Documentation
+
+**Architecture & Design:**
+
+- **[HostAgent Overview](overview.md)** - High-level HostAgent architecture
+- **[State Machine](state.md)** - 7-state FSM documentation
+- **[Processing Strategy](strategy.md)** - 4-phase processing pipeline
+- **[AppAgent Commands](../app_agent/commands.md)** - Application-level commands
+
+**Core Features:**
+ - **[Hybrid Actions](../core_features/hybrid_actions.md)** - MCP command system architecture
+ - **[Control Detection](../core_features/control_detection/overview.md)** - UIA and OmniParser backends
+ - **[Command Dispatcher](../../infrastructure/modules/dispatcher.md)** - Command routing
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **MCP-Based**: All commands provided by MCP servers configured in `mcp.yaml`
+- **Dynamic Discovery**: Commands discovered at runtime via `list_tools`
+- **Desktop-Level**: System-wide operations (screenshots, window management)
+- **Configurable**: Extensive MCP server configuration options
+- **Documented**: Each server has detailed command reference
+
+!!!warning
+ Command details subject to change - refer to server documentation for latest information
+
+**Next Steps:**
+
+1. **Review MCP Configuration**: [MCP Configuration Reference](../../configuration/system/mcp_reference.md)
+2. **Explore Server Documentation**: Click server links above for command details
+3. **Understand Processing**: [Processing Strategy](strategy.md) shows commands in action
+4. **Learn State Machine**: [State Machine](state.md) explains when commands execute
diff --git a/documents/docs/ufo2/host_agent/overview.md b/documents/docs/ufo2/host_agent/overview.md
new file mode 100644
index 000000000..a738674b1
--- /dev/null
+++ b/documents/docs/ufo2/host_agent/overview.md
@@ -0,0 +1,196 @@
+# HostAgent: Desktop Orchestrator
+
+**HostAgent** serves as the centralized control plane of UFO². It interprets user-specified goals, decomposes them into structured subtasks, instantiates and dispatches AppAgent modules, and coordinates their progress across the system. HostAgent provides system-level services for introspection, planning, application lifecycle management, and multi-agent synchronization.
+
+---
+
+## Architecture Overview
+
+Operating atop the native Windows substrate, HostAgent monitors active applications, issues shell commands to spawn new processes as needed, and manages the creation and teardown of application-specific AppAgent instances. All coordination occurs through a persistent state machine, which governs the transitions across execution phases.
+
+
+ 
+ Figure: HostAgent architecture showing the finite state machine, processing pipeline, and interactions with AppAgents through the Blackboard pattern.
+
+
+---
+
+## Core Responsibilities
+
+### Task Decomposition
+
+Given a user's natural language input, HostAgent identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.
+
+**Example:** User request "Extract data from Word and create an Excel chart" becomes:
+
+1. Extract table from Word document
+2. Create chart in Excel with extracted data
+
+
+ 
+ Figure: HostAgent decomposes user requests into sequential subtasks, assigns each to the appropriate application, and orchestrates AppAgents to complete them in dependency order.
+
+
+### Application Lifecycle Management
+
+For each subtask, HostAgent inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.
+
+### AppAgent Instantiation
+
+HostAgent spawns the corresponding AppAgent for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).
+
+### Task Scheduling and Control
+
+The global execution plan is serialized into a finite state machine (FSM), allowing HostAgent to enforce execution order, detect failures, and resolve dependencies across agents. See **[State Machine Details](state.md)** for the FSM architecture.
+
+### Shared State Communication
+
+HostAgent reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.
+
+---
+
+## Key Characteristics
+
+- **Scope**: Desktop-level orchestrator (system-wide, not application-specific)
+- **Lifecycle**: Single instance per session, persists throughout task execution
+- **Hierarchy**: Parent agent that manages multiple child AppAgents
+- **Communication**: Owns and coordinates the shared Blackboard
+- **Control**: 7-state finite state machine with 4-phase processing pipeline
+
+---
+
+## Execution Workflow
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant HostAgent
+ participant Blackboard
+ participant AppAgent1
+ participant AppAgent2
+
+ User->>HostAgent: "Extract Word table, create Excel chart"
+ HostAgent->>HostAgent: Decompose into subtasks
+ HostAgent->>Blackboard: Write subtask 1
+ HostAgent->>AppAgent1: Create/Get Word AppAgent
+ AppAgent1->>AppAgent1: Execute Word task
+ AppAgent1->>Blackboard: Write result 1
+ AppAgent1-->>HostAgent: Return FINISH
+
+ HostAgent->>Blackboard: Read result 1
+ HostAgent->>Blackboard: Write subtask 2
+ HostAgent->>AppAgent2: Create/Get Excel AppAgent
+ AppAgent2->>Blackboard: Read result 1
+ AppAgent2->>AppAgent2: Execute Excel task
+ AppAgent2->>Blackboard: Write result 2
+ AppAgent2-->>HostAgent: Return FINISH
+
+ HostAgent->>HostAgent: Verify completion
+ HostAgent-->>User: Task completed
+```
+
+---
+
+## Deep Dive Topics
+
+- **[State Machine](state.md)**: 7-state FSM architecture and transitions
+- **[Processing Strategy](strategy.md)**: 4-phase processing pipeline
+- **[Command System](commands.md)**: Desktop-level MCP commands
+
+---
+
+## Input and Output
+
+### HostAgent Input
+
+| Input | Description | Type |
+|-------|-------------|------|
+| User Request | Natural language task description | String |
+| Application Information | Active application metadata | List of Dicts |
+| Desktop Screenshots | Visual context of desktop state | Image |
+| Previous Sub-Tasks | Completed subtask history | List of Dicts |
+| Previous Plan | Planned future subtasks | List of Strings |
+| Blackboard | Shared memory space | Dictionary |
+
+### HostAgent Output
+
+| Output | Description | Type |
+|--------|-------------|------|
+| Observation | Desktop screenshot analysis | String |
+| Thought | Reasoning process | String |
+| Current Sub-Task | Active subtask description | String |
+| Message | Information for AppAgent | String |
+| ControlLabel | Selected application index | String |
+| ControlText | Selected application name | String |
+| Plan | Future subtask sequence | List of Strings |
+| Status | Agent state (CONTINUE/ASSIGN/FINISH/etc.) | String |
+| Comment | User-facing information | String |
+| Questions | Clarification requests | List of Strings |
+| Bash | System command to execute | String |
+
+**Example Output:**
+```json
+{
+ "Observation": "Desktop shows Microsoft Word with document open containing a table",
+ "Thought": "User wants to extract data from Word first",
+ "Current Sub-Task": "Extract the table data from the document",
+ "Message": "Starting data extraction from Word document",
+ "ControlLabel": "0",
+ "ControlText": "Microsoft Word - Document1",
+ "Plan": ["Extract table from Word", "Create chart in Excel"],
+ "Status": "ASSIGN",
+ "Comment": "Delegating table extraction to Word AppAgent",
+ "Questions": [],
+ "Bash": ""
+}
+```
+
+---
+
+## Related Documentation
+
+**Architecture & Design:**
+
+- **[Windows Agent Overview](../overview.md)**: Module architecture and hierarchy
+- **[AppAgent](../app_agent/overview.md)**: Application automation agent
+- **[Blackboard](../../infrastructure/agents/design/blackboard.md)**: Inter-agent communication
+- **[Memory System](../../infrastructure/agents/design/memory.md)**: Execution history
+
+**Configuration:**
+
+- **[Configuration System Overview](../../configuration/system/overview.md)**: System configuration structure
+- **[Agents Configuration](../../configuration/system/agents_config.md)**: LLM and agent settings
+- **[System Configuration](../../configuration/system/system_config.md)**: Runtime and execution settings
+- **[MCP Reference](../../configuration/system/mcp_reference.md)**: MCP server configuration
+
+**System Integration:**
+
+- **[Session Management](../../infrastructure/modules/session.md)**: Session lifecycle
+- **[Round Management](../../infrastructure/modules/round.md)**: Execution rounds
+
+---
+
+## API Reference
+
+:::agents.agent.host_agent.HostAgent
+
+---
+
+## Summary
+
+HostAgent is the desktop-level orchestrator that:
+
+- Decomposes tasks and coordinates AppAgents
+- Operates at system level, not application level
+- Uses a 7-state FSM: CONTINUE → ASSIGN → AppAgent → CONTINUE → FINISH
+- Executes a 4-phase pipeline: DATA_COLLECTION → LLM → ACTION → MEMORY
+- Creates, caches, and reuses AppAgent instances
+- Provides shared Blackboard memory for all agents
+- Maintains single instance per session managing multiple AppAgents
+
+**Next Steps:**
+
+1. Read [State Machine](state.md) for FSM details
+2. Read [Processing Strategy](strategy.md) for pipeline architecture
+3. Read [Command System](commands.md) for available desktop operations
+4. Read [AppAgent](../app_agent/overview.md) for application-level execution
diff --git a/documents/docs/ufo2/host_agent/state.md b/documents/docs/ufo2/host_agent/state.md
new file mode 100644
index 000000000..2c55479a7
--- /dev/null
+++ b/documents/docs/ufo2/host_agent/state.md
@@ -0,0 +1,597 @@
+# HostAgent State Machine
+
+!!!abstract "Overview"
+ HostAgent uses a **7-state finite state machine (FSM)** to manage task orchestration flow. The state machine controls task decomposition, application selection, AppAgent delegation, and completion verification. States transition based on LLM decisions and system events.
+
+---
+
+## State Machine Architecture
+
+### State Enumeration
+
+```python
+class HostAgentStatus(Enum):
+ """Store the status of the host agent"""
+ ERROR = "ERROR" # Unhandled exception or system error
+ FINISH = "FINISH" # Task completed successfully
+ CONTINUE = "CONTINUE" # Active processing state
+ ASSIGN = "ASSIGN" # Delegate to AppAgent
+ FAIL = "FAIL" # Task failed, cannot proceed
+ PENDING = "PENDING" # Await external event or user input
+ CONFIRM = "CONFIRM" # Request user approval
+```
+
+### State Management
+
+HostAgent states are managed by `HostAgentStateManager`, which implements a singleton registry pattern:
+
+```python
+class HostAgentStateManager(AgentStateManager):
+ """Manages the states of the host agent"""
+ _state_mapping: Dict[str, Type[HostAgentState]] = {}
+
+ @property
+ def none_state(self) -> AgentState:
+ return NoneHostAgentState()
+```
+
+All HostAgent states are registered using the `@HostAgentStateManager.register` decorator, enabling dynamic state lookup by name.
+
+---
+
+## State Definitions
+
+### 1. CONTINUE State
+
+**Purpose**: Active orchestration state where HostAgent executes its 4-phase processing pipeline.
+
+```python
+@HostAgentStateManager.register
+class ContinueHostAgentState(HostAgentState):
+ """The class for the continue host agent state"""
+
+ async def handle(self, agent: "HostAgent", context: Optional["Context"] = None):
+ """Execute the 4-phase processing pipeline"""
+ await agent.process(context)
+
+ def is_round_end(self) -> bool:
+ return False # Round continues
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.CONTINUE.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Active |
+| **Processor Executed** | ✓ Yes (4 phases) |
+| **Round Ends** | No |
+| **Duration** | Single round |
+| **Next States** | CONTINUE, ASSIGN, FINISH, CONFIRM, ERROR |
+
+**Behavior**:
+
+1. Captures desktop screenshot
+2. LLM analyzes desktop and selects application
+3. Updates context with selected application
+4. Records orchestration step in memory
+
+**Example Usage:**
+
+```python
+# HostAgent in CONTINUE state
+agent.status = HostAgentStatus.CONTINUE.value
+agent.set_state(ContinueHostAgentState())
+
+# State executes 4-phase pipeline
+await state.handle(agent, context)
+
+# LLM sets next status in response
+# {"Status": "ASSIGN", "ControlText": "Microsoft Word"}
+```
+
+---
+
+### 2. ASSIGN State
+
+**Purpose**: Create or retrieve AppAgent for the selected application and delegate execution.
+
+```python
+@HostAgentStateManager.register
+class AssignHostAgentState(HostAgentState):
+ """The class for the assign host agent state"""
+
+ async def handle(self, agent: "HostAgent", context: Optional["Context"] = None):
+ """Create/get AppAgent for selected application"""
+ agent.create_subagent(context)
+
+ def next_state(self, agent: "HostAgent") -> "AppAgentState":
+ """Transition to AppAgent's CONTINUE state"""
+ next_agent = self.next_agent(agent)
+
+ if type(next_agent) == OpenAIOperatorAgent:
+ return ContinueOpenAIOperatorState()
+ else:
+ return ContinueAppAgentState()
+
+ def next_agent(self, agent: "HostAgent") -> "AppAgent":
+ """Get the active AppAgent for delegation"""
+ return agent.get_active_appagent()
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.ASSIGN.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Transition |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | No |
+| **Duration** | Immediate |
+| **Next States** | AppAgent.CONTINUE |
+| **Next Agent** | AppAgent (switched) |
+
+**Behavior**:
+
+1. Checks if AppAgent for application already exists (cache)
+2. Creates new AppAgent if not cached
+3. Sets parent-child relationship (`app_agent.host = self`)
+4. Shares Blackboard (`app_agent.blackboard = self.blackboard`)
+5. Transitions to `AppAgent.CONTINUE` state
+
+**AppAgent Caching:**
+
+```python
+# HostAgent maintains a cache of created AppAgents
+agent_key = f"{app_root}/{process_name}"
+
+if agent_key in self.appagent_dict:
+ # Reuse existing AppAgent
+ self._active_appagent = self.appagent_dict[agent_key]
+else:
+ # Create new AppAgent
+ app_agent = AgentFactory.create_agent(**config)
+ self.appagent_dict[agent_key] = app_agent
+ self._active_appagent = app_agent
+```
+
+---
+
+### 3. FINISH State
+
+**Purpose**: Task completed successfully, terminate session.
+
+```python
+@HostAgentStateManager.register
+class FinishHostAgentState(HostAgentState):
+ """The class for the finish host agent state"""
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.FINISH.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Duration** | Permanent |
+| **Next States** | None |
+
+**Behavior**:
+
+- Session terminates successfully
+- All subtasks completed
+- Results available in Blackboard
+
+---
+
+### 4. FAIL State
+
+**Purpose**: Task failed, cannot proceed further.
+
+```python
+@HostAgentStateManager.register
+class FailHostAgentState(HostAgentState):
+ """The class for the fail host agent state"""
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ def next_state(self, agent: "HostAgent") -> AgentState:
+ return FinishHostAgentState() # Transition to FINISH for cleanup
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.FAIL.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Duration** | Permanent |
+| **Next States** | FINISH (for cleanup) |
+
+**Behavior**:
+
+- Task cannot be completed
+- May result from user rejection or irrecoverable error
+- Transitions to FINISH for graceful shutdown
+
+---
+
+### 5. ERROR State
+
+**Purpose**: Unhandled exception or critical system error.
+
+```python
+@HostAgentStateManager.register
+class ErrorHostAgentState(HostAgentState):
+ """The class for the error host agent state"""
+
+ def is_round_end(self) -> bool:
+ return True # Round ends
+
+ def next_state(self, agent: "HostAgent") -> AgentState:
+ return FinishHostAgentState() # Transition to FINISH for cleanup
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.ERROR.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Terminal |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | Yes |
+| **Duration** | Permanent |
+| **Next States** | FINISH (for cleanup) |
+
+**Behavior**:
+
+- Critical system error occurred
+- Unhandled exception during processing
+- Automatically triggers graceful shutdown
+
+**Error vs Fail:**
+
+- **ERROR**: System/code errors (exceptions, crashes)
+- **FAIL**: Logical task failures (user rejection, impossible task)
+
+---
+
+### 6. PENDING State
+
+**Purpose**: Await external event or user input before continuing.
+
+```python
+@HostAgentStateManager.register
+class PendingHostAgentState(HostAgentState):
+ """The class for the pending host agent state"""
+
+ async def handle(self, agent: "HostAgent", context: Optional["Context"] = None):
+ """Ask the user questions to help the agent proceed"""
+ agent.process_asker(ask_user=ufo_config.system.ask_question)
+
+ def next_state(self, agent: "HostAgent") -> AgentState:
+ """Return to CONTINUE after receiving input"""
+ agent.status = HostAgentStatus.CONTINUE.value
+ return ContinueHostAgentState()
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.PENDING.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Waiting |
+| **Processor Executed** | ✗ No |
+| **Round Ends** | No |
+| **Duration** | Until event/timeout |
+| **Next States** | CONTINUE, FAIL |
+
+**Behavior**:
+
+- Requests additional information from user
+- Waits for external event (async operation)
+- Transitions to CONTINUE after receiving input
+- May timeout and transition to FAIL
+
+---
+
+### 7. CONFIRM State
+
+**Purpose**: Request user approval before proceeding with action.
+
+```python
+@HostAgentStateManager.register
+class ConfirmHostAgentState(HostAgentState):
+ """The class for the confirm host agent state"""
+
+ async def handle(self, agent: "HostAgent", context: Optional["Context"] = None):
+ """Request user confirmation"""
+ # Confirmation logic handled by processor
+ pass
+
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.CONFIRM.value
+```
+
+| Property | Value |
+|----------|-------|
+| **Type** | Waiting |
+| **Processor Executed** | ✓ Yes (collect confirmation) |
+| **Round Ends** | No |
+| **Duration** | Until user responds |
+| **Next States** | CONTINUE (approved), FAIL (rejected) |
+
+**Behavior**:
+
+- Displays confirmation request to user
+- Waits for user approval/rejection
+- CONTINUE if approved
+- FAIL if rejected
+
+**Safety Check:**
+
+CONFIRM state provides a safety mechanism for sensitive operations such as application launches, file deletions, and system configuration changes.
+
+---
+
+## State Transition Diagram
+
+
+ 
+ HostAgent State Machine: Visual representation of the 7-state FSM with transitions and conditions
+
+
+---
+
+## State Transition Control
+
+### LLM-Driven Transitions
+
+Most state transitions are controlled by the LLM through the `Status` field in its response:
+
+```json
+{
+ "Observation": "Desktop shows Word and Excel. User wants to extract data from Word.",
+ "Thought": "I should start with Word to extract the table data first.",
+ "Current Sub-Task": "Extract table data from Word document",
+ "ControlLabel": "0",
+ "ControlText": "Microsoft Word - Document1",
+ "Status": "ASSIGN",
+ "Comment": "Delegating data extraction to Word AppAgent"
+}
+```
+
+**Transition Flow**:
+
+1. HostAgent in `CONTINUE` state executes processor
+2. LLM analyzes desktop and decides next action
+3. LLM sets `Status: "ASSIGN"` in response
+4. Processor updates `agent.status = "ASSIGN"`
+5. State machine transitions: `CONTINUE` → `ASSIGN`
+6. `ASSIGN` state creates/gets AppAgent
+7. Transitions to `AppAgent.CONTINUE`
+
+### System-Driven Transitions
+
+Some transitions are automatic and controlled by the system:
+
+| From State | To State | Trigger | Controller |
+|------------|----------|---------|------------|
+| ASSIGN | AppAgent.CONTINUE | AppAgent created | System |
+| AppAgent.CONTINUE | CONTINUE | AppAgent returns | System |
+| PENDING | FAIL | Timeout | System |
+| CONFIRM | CONTINUE | User approved | User Input |
+| CONFIRM | FAIL | User rejected | User Input |
+| ERROR | FINISH | Exception caught | System |
+| FAIL | FINISH | Cleanup needed | System |
+
+---
+
+## Complete Execution Flow Example
+
+### Multi-Application Task
+
+**User Request**: "Extract sales table from Word and create bar chart in Excel"
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant HostAgent
+ participant WordAppAgent
+ participant ExcelAppAgent
+
+ Note over HostAgent: State: CONTINUE
+ User->>HostAgent: "Extract Word table, create Excel chart"
+ HostAgent->>HostAgent: Phase 1: Capture desktop Phase 2: LLM analyzes
+ Note over HostAgent: LLM Decision: Status=ASSIGN
+ HostAgent->>HostAgent: Phase 3: Update context Phase 4: Record memory
+
+ Note over HostAgent: State: ASSIGN
+ HostAgent->>WordAppAgent: create_subagent("Word")
+ Note over HostAgent,WordAppAgent: Agent Handoff
+
+ Note over WordAppAgent: State: AppAgent.CONTINUE
+ WordAppAgent->>WordAppAgent: Capture Word UI Select table Execute copy
+ WordAppAgent->>HostAgent: Return Status=FINISH
+
+ Note over HostAgent: State: CONTINUE
+ HostAgent->>HostAgent: Phase 2: LLM sees Word result Decides Excel next
+ Note over HostAgent: LLM Decision: Status=ASSIGN
+
+ Note over HostAgent: State: ASSIGN
+ HostAgent->>ExcelAppAgent: create_subagent("Excel")
+ Note over HostAgent,ExcelAppAgent: Agent Handoff
+
+ Note over ExcelAppAgent: State: AppAgent.CONTINUE
+ ExcelAppAgent->>ExcelAppAgent: Paste data Insert chart Format
+ ExcelAppAgent->>HostAgent: Return Status=FINISH
+
+ Note over HostAgent: State: CONTINUE
+ HostAgent->>HostAgent: Phase 2: LLM confirms complete
+ Note over HostAgent: LLM Decision: Status=FINISH
+
+ Note over HostAgent: State: FINISH
+ HostAgent->>User: Task completed!
+```
+
+### Step-by-Step State Transitions
+
+| Step | Agent | State | Action | Next State |
+|------|-------|-------|--------|------------|
+| 1 | HostAgent | CONTINUE | Analyze desktop, select Word | ASSIGN |
+| 2 | HostAgent | ASSIGN | Create WordAppAgent | AppAgent.CONTINUE |
+| 3 | WordAppAgent | CONTINUE | Extract table | FINISH |
+| 4 | HostAgent | CONTINUE | Analyze result, select Excel | ASSIGN |
+| 5 | HostAgent | ASSIGN | Create ExcelAppAgent | AppAgent.CONTINUE |
+| 6 | ExcelAppAgent | CONTINUE | Create chart | FINISH |
+| 7 | HostAgent | CONTINUE | Verify completion | FINISH |
+| 8 | HostAgent | FINISH | Session ends | - |
+
+---
+
+## Implementation Details
+
+### State Class Hierarchy
+
+```python
+# Base state interface
+class HostAgentState(AgentState):
+ """Abstract class for host agent states"""
+
+ async def handle(self, agent: "HostAgent", context: Optional["Context"] = None):
+ """Execute state-specific logic"""
+ pass
+
+ def next_state(self, agent: "HostAgent") -> AgentState:
+ """Determine next state based on agent status"""
+ status = agent.status
+ return HostAgentStateManager().get_state(status)
+
+ def next_agent(self, agent: "HostAgent") -> "HostAgent":
+ """Get agent for next step (usually same agent)"""
+ return agent
+
+ def is_round_end(self) -> bool:
+ """Check if round should end"""
+ return False
+
+ @classmethod
+ def agent_class(cls) -> Type["HostAgent"]:
+ from ufo.agents.agent.host_agent import HostAgent
+ return HostAgent
+```
+
+### State Registration Pattern
+
+```python
+# Registration decorator adds state to manager
+@HostAgentStateManager.register
+class ContinueHostAgentState(HostAgentState):
+ @classmethod
+ def name(cls) -> str:
+ return HostAgentStatus.CONTINUE.value
+
+# Manager can look up states by name
+state = HostAgentStateManager().get_state("CONTINUE")
+# Returns: ContinueHostAgentState instance
+```
+
+**Lazy Loading:**
+
+States are loaded lazily by `HostAgentStateManager` only when needed, reducing initialization overhead.
+
+---
+```
+
+### State Transition in Round Execution
+
+```python
+# In Round.run() method
+while not state.is_round_end():
+ # Execute current state
+ await state.handle(agent, context)
+
+ # Get next state based on agent.status
+ state = state.next_state(agent)
+
+ # Check if agent switched (HostAgent → AppAgent)
+ agent = state.next_agent(agent)
+```
+
+!!!tip "Lazy Loading"
+ States are loaded lazily by `HostAgentStateManager` only when needed, reducing initialization overhead.
+
+---
+
+## State Transition Table
+
+### Complete Transition Matrix
+
+| From \ To | CONTINUE | ASSIGN | FINISH | FAIL | ERROR | PENDING | CONFIRM | AppAgent.CONTINUE |
+|-----------|----------|--------|--------|------|-------|---------|---------|-------------------|
+| **CONTINUE** | ✓ LLM | ✓ LLM | ✓ LLM | ✗ | ✓ System | ✓ LLM | ✓ LLM | ✗ |
+| **ASSIGN** | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ System |
+| **FINISH** | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
+| **FAIL** | ✗ | ✗ | ✓ System | ✗ | ✗ | ✗ | ✗ | ✗ |
+| **ERROR** | ✗ | ✗ | ✓ System | ✗ | ✗ | ✗ | ✗ | ✗ |
+| **PENDING** | ✓ User | ✗ | ✗ | ✓ Timeout | ✗ | ✗ | ✗ | ✗ |
+| **CONFIRM** | ✓ User | ✗ | ✗ | ✓ User | ✗ | ✗ | ✗ | ✗ |
+| **AppAgent.CONTINUE** | ✓ System | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
+
+**Legend**:
+- ✓ LLM: Transition controlled by LLM decision
+- ✓ System: Automatic system transition
+- ✓ User: User input required
+- ✓ Timeout: Timeout triggers transition
+- ✗: Transition not allowed
+
+---
+
+## Related Documentation
+
+**Architecture & Design:**
+
+- **[Overview](overview.md)**: HostAgent high-level architecture
+- **[Processing Strategy](strategy.md)**: 4-phase processing pipeline
+- **[State Design Pattern](../../infrastructure/agents/design/state.md)**: General state framework
+- **[AppAgent State Machine](../app_agent/state.md)**: AppAgent FSM comparison
+
+**System Integration:**
+
+- **[Round Management](../../infrastructure/modules/round.md)**: How states execute in rounds
+- **[Session Management](../../infrastructure/modules/session.md)**: Session lifecycle
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **7 States**: CONTINUE, ASSIGN, FINISH, FAIL, ERROR, PENDING, CONFIRM
+- **LLM Control**: Most transitions driven by LLM's `Status` field
+- **Agent Handoff**: ASSIGN state transitions to AppAgent.CONTINUE
+- **Terminal States**: FINISH, FAIL, ERROR end the session
+- **Safety Checks**: CONFIRM and PENDING provide user control
+- **State Pattern**: Implements Gang of Four State design pattern
+- **Singleton Registry**: HostAgentStateManager manages all states
+
+**Next Steps:**
+
+- Read [Processing Strategy](strategy.md) to understand what happens in CONTINUE state
+- Read [Command System](commands.md) for available desktop operations
+- Read [AppAgent State Machine](../app_agent/state.md) for comparison
diff --git a/documents/docs/ufo2/host_agent/strategy.md b/documents/docs/ufo2/host_agent/strategy.md
new file mode 100644
index 000000000..8966da50d
--- /dev/null
+++ b/documents/docs/ufo2/host_agent/strategy.md
@@ -0,0 +1,1167 @@
+# HostAgent Processing Strategy
+
+HostAgent executes a **4-phase processing pipeline** in **CONTINUE** and **CONFIRM** states. Each phase handles a specific aspect of desktop orchestration: **data collection**, **LLM decision making**, **action execution**, and **memory recording**. This document details the implementation of each strategy based on the actual codebase.
+
+---
+
+## Strategy Assembly
+
+Processing strategies are **assembled and orchestrated** by the `HostAgentProcessor` class defined in `ufo/agents/processors/host_agent_processor.py`. The processor acts as the **coordinator** that initializes, configures, and executes the 4-phase pipeline.
+
+### HostAgentProcessor Overview
+
+The `HostAgentProcessor` extends `ProcessorTemplate` and serves as the main orchestrator for HostAgent workflows:
+
+```python
+class HostAgentProcessor(ProcessorTemplate):
+ """
+ Enhanced processor for Host Agent with comprehensive functionality.
+
+ Manages the complete workflow including:
+ - Desktop environment analysis and screenshot capture
+ - Application window detection and registration
+ - Third-party agent integration and management
+ - LLM-based decision making with context-aware prompting
+ - Action execution including application selection and command dispatch
+ - Memory management with detailed logging and state tracking
+ """
+
+ processor_context_class = HostAgentProcessorContext
+
+ def __init__(self, agent: "HostAgent", global_context: Context):
+ super().__init__(agent, global_context)
+```
+
+### Strategy Registration
+
+During initialization, `HostAgentProcessor._setup_strategies()` registers all four processing strategies:
+
+```python
+def _setup_strategies(self) -> None:
+ """Configure processing strategies with error handling and logging."""
+
+ # Phase 1: Desktop data collection (critical - fail_fast=True)
+ self.strategies[ProcessingPhase.DATA_COLLECTION] = (
+ DesktopDataCollectionStrategy(
+ fail_fast=True # Desktop data collection is critical
+ )
+ )
+
+ # Phase 2: LLM interaction (critical - fail_fast=True)
+ self.strategies[ProcessingPhase.LLM_INTERACTION] = (
+ HostLLMInteractionStrategy(
+ fail_fast=True # LLM failure should trigger recovery
+ )
+ )
+
+ # Phase 3: Action execution (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.ACTION_EXECUTION] = (
+ HostActionExecutionStrategy(
+ fail_fast=False # Action failures can be handled gracefully
+ )
+ )
+
+ # Phase 4: Memory update (graceful - fail_fast=False)
+ self.strategies[ProcessingPhase.MEMORY_UPDATE] = (
+ HostMemoryUpdateStrategy(
+ fail_fast=False # Memory update failures shouldn't stop process
+ )
+ )
+```
+
+| Phase | Strategy Class | fail_fast | Rationale |
+|-------|---------------|-----------|-----------|
+| **DATA_COLLECTION** | `DesktopDataCollectionStrategy` | ✓ True | Desktop screenshot and window info are critical for LLM context |
+| **LLM_INTERACTION** | `HostLLMInteractionStrategy` | ✓ True | LLM response failure requires immediate recovery mechanism |
+| **ACTION_EXECUTION** | `HostActionExecutionStrategy` | ✗ False | Action failures can be gracefully handled and reported |
+| **MEMORY_UPDATE** | `HostMemoryUpdateStrategy` | ✗ False | Memory failures shouldn't block the main execution flow |
+
+**Fail-Fast vs Graceful:**
+
+The `fail_fast` parameter controls error propagation behavior:
+
+- **fail_fast=True**: Errors immediately halt the pipeline and trigger recovery (used for critical phases)
+- **fail_fast=False**: Errors are logged but don't stop execution (used for non-critical phases)
+
+### Middleware Configuration
+
+The processor also configures specialized logging middleware:
+
+```python
+def _setup_middleware(self) -> None:
+ """Set up enhanced middleware chain with comprehensive monitoring."""
+ self.middleware_chain = [
+ HostAgentLoggingMiddleware(), # Specialized logging for Host Agent
+ ]
+```
+
+**HostAgentLoggingMiddleware** provides:
+
+- Round and step progress tracking
+- Rich Panel displays with color coding
+- Application selection logging
+- Detailed error context reporting
+
+---
+
+## Processing Pipeline Architecture
+
+```mermaid
+graph LR
+ DC[Phase 1: DATA_COLLECTION DesktopDataCollectionStrategy] --> LLM[Phase 2: LLM_INTERACTION HostLLMInteractionStrategy]
+ LLM --> AE[Phase 3: ACTION_EXECUTION HostActionExecutionStrategy]
+ AE --> MU[Phase 4: MEMORY_UPDATE HostMemoryUpdateStrategy]
+
+ style DC fill:#e1f5ff
+ style LLM fill:#fff4e6
+ style AE fill:#e8f5e9
+ style MU fill:#fce4ec
+```
+
+Each phase is implemented as a separate **strategy class** inheriting from `BaseProcessingStrategy`. Strategies declare their dependencies and outputs using `@depends_on` and `@provides` decorators for automatic data flow management.
+
+---
+
+## Phase 1: DATA_COLLECTION
+
+### Strategy: `DesktopDataCollectionStrategy`
+
+**Purpose**: Gather comprehensive desktop environment context for LLM decision making.
+
+```python
+@depends_on("command_dispatcher", "log_path", "session_step")
+@provides(
+ "desktop_screenshot_url",
+ "desktop_screenshot_path",
+ "application_windows_info",
+ "target_registry",
+ "target_info_list",
+)
+class DesktopDataCollectionStrategy(BaseProcessingStrategy):
+ """Enhanced strategy for collecting desktop environment data"""
+
+ def __init__(self, fail_fast: bool = True):
+ super().__init__(name="desktop_data_collection", fail_fast=fail_fast)
+```
+
+### Execution Steps
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant CommandDispatcher
+ participant Desktop
+ participant TargetRegistry
+
+ Strategy->>CommandDispatcher: capture_desktop_screenshot
+ CommandDispatcher->>Desktop: Screenshot all screens
+ Desktop-->>Strategy: screenshot_url
+ Strategy->>Strategy: Save to log_path
+
+ Strategy->>CommandDispatcher: get_desktop_app_target_info
+ CommandDispatcher->>Desktop: Query windows
+ Desktop-->>Strategy: app_windows_info[]
+
+ Strategy->>TargetRegistry: Register applications
+ Strategy->>TargetRegistry: Register third-party agents
+ TargetRegistry-->>Strategy: target_registry
+
+ Strategy->>Strategy: Prepare target_info_list
+ Strategy-->>Strategy: Return ProcessingResult
+```
+
+### Step 1: Capture Desktop Screenshot
+
+**Code**:
+```python
+async def _capture_desktop_screenshot(
+ self,
+ command_dispatcher: BasicCommandDispatcher,
+ save_path: str,
+) -> str:
+ """Capture desktop screenshot with error handling"""
+ result = await command_dispatcher.execute_commands([
+ Command(
+ tool_name="capture_desktop_screenshot",
+ parameters={"all_screens": True},
+ tool_type="data_collection",
+ )
+ ])
+
+ desktop_screenshot_url = result[0].result
+ utils.save_image_string(desktop_screenshot_url, save_path)
+ return desktop_screenshot_url
+```
+
+**Outputs**:
+- `desktop_screenshot_url`: Base64 encoded screenshot for LLM
+- `desktop_screenshot_path`: File path for logging (`action_step{N}.png`)
+
+**Multi-Screen Support:**
+
+The `all_screens: True` parameter captures all connected monitors in a single composite image, providing complete desktop context.
+
+### Step 2: Collect Application Window Information
+
+**Code**:
+```python
+async def _get_desktop_application_info(
+ self, command_dispatcher: BasicCommandDispatcher
+) -> List[TargetInfo]:
+ """Get comprehensive desktop application information"""
+ result = await command_dispatcher.execute_commands([
+ Command(
+ tool_name="get_desktop_app_target_info",
+ parameters={
+ "remove_empty": True,
+ "refresh_app_windows": True
+ },
+ tool_type="data_collection",
+ )
+ ])
+
+ app_windows_info = result[0].result or []
+ target_info = [TargetInfo(**control_info) for control_info in app_windows_info]
+ return target_info
+```
+
+**Outputs**:
+- List of `TargetInfo` objects containing:
+ - `id`: Unique identifier (index-based)
+ - `name`: Window title or process name
+ - `kind`: Target type (APPLICATION, PROCESS, etc.)
+ - `type`: Detailed type information
+ - Additional metadata (position, size, state)
+
+**Window Filtering:**
+
+`remove_empty: True` filters out windows without valid handles or titles, reducing noise for LLM decision making.
+
+### Step 3: Register Applications and Third-Party Agents
+
+**Code**:
+```python
+def _register_applications_and_agents(
+ self,
+ app_windows_info: List[TargetInfo],
+ target_registry: TargetRegistry = None,
+) -> TargetRegistry:
+ """Register desktop applications and third-party agents"""
+ if not target_registry:
+ target_registry = TargetRegistry()
+
+ # Register desktop application windows
+ target_registry.register(app_windows_info)
+
+ # Register third-party agents
+ third_party_count = self._register_third_party_agents(
+ target_registry, len(app_windows_info)
+ )
+
+ return target_registry
+
+def _register_third_party_agents(
+ self, target_registry: TargetRegistry, start_index: int
+) -> int:
+ """Register enabled third-party agents"""
+ third_party_agent_names = ufo_config.system.enabled_third_party_agents
+
+ third_party_agent_list = []
+ for i, agent_name in enumerate(third_party_agent_names):
+ agent_id = str(i + start_index + 1)
+ third_party_agent_list.append(
+ TargetInfo(
+ kind=TargetKind.THIRD_PARTY_AGENT.value,
+ id=agent_id,
+ type="ThirdPartyAgent",
+ name=agent_name,
+ )
+ )
+
+ target_registry.register(third_party_agent_list)
+ return len(third_party_agent_list)
+```
+
+**Target Registry**:
+
+| Component | Purpose |
+|-----------|---------|
+| **TargetRegistry** | Centralized registry of all selectable targets |
+| **Applications** | Desktop windows (Word, Excel, browser, etc.) |
+| **Third-Party Agents** | Custom agents from configuration |
+| **Indexing** | Sequential IDs for LLM selection (0, 1, 2, ...) |
+
+**Target Registry Example:**
+
+```json
+[
+ {"id": "0", "name": "Microsoft Word - Document1", "kind": "APPLICATION"},
+ {"id": "1", "name": "Microsoft Excel - Workbook1", "kind": "APPLICATION"},
+ {"id": "2", "name": "Chrome - GitHub", "kind": "APPLICATION"},
+ {"id": "3", "name": "HardwareAgent", "kind": "THIRD_PARTY_AGENT"}
+]
+```
+
+### Processing Result
+
+**Outputs**:
+```python
+ProcessingResult(
+ success=True,
+ data={
+ "desktop_screenshot_url": "data:image/png;base64,...",
+ "desktop_screenshot_path": "C:/logs/action_step1.png",
+ "application_windows_info": [TargetInfo(...), ...],
+ "target_registry": TargetRegistry(...),
+ "target_info_list": [{"id": "0", "name": "Word", "kind": "APPLICATION"}, ...]
+ },
+ phase=ProcessingPhase.DATA_COLLECTION
+)
+```
+
+---
+
+## Phase 2: LLM_INTERACTION
+
+### Strategy: `HostLLMInteractionStrategy`
+
+**Purpose**: Construct context-aware prompts and obtain LLM decisions for application selection and task decomposition.
+
+```python
+@depends_on("target_info_list", "desktop_screenshot_url")
+@provides(
+ "parsed_response",
+ "response_text",
+ "llm_cost",
+ "prompt_message",
+ "subtask",
+ "plan",
+ "result",
+ "host_message",
+ "status",
+ "question_list",
+ "function_name",
+ "function_arguments",
+)
+class HostLLMInteractionStrategy(BaseProcessingStrategy):
+ """Enhanced LLM interaction strategy for Host Agent"""
+
+ def __init__(self, fail_fast: bool = True):
+ super().__init__(name="host_llm_interaction", fail_fast=fail_fast)
+```
+
+### Execution Steps
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant HostAgent
+ participant Blackboard
+ participant Prompter
+ participant LLM
+
+ Strategy->>HostAgent: Get previous plan from memory
+ Strategy->>Blackboard: Get blackboard context
+ Blackboard-->>Strategy: blackboard_prompt[]
+
+ Strategy->>Prompter: Build comprehensive prompt
+ Prompter->>Prompter: Construct system message
+ Prompter->>Prompter: Construct user message
+ Prompter-->>Strategy: prompt_message
+
+ Strategy->>Strategy: Log request data
+
+ Strategy->>LLM: Send prompt with retry logic
+ LLM-->>Strategy: response_text, cost
+
+ Strategy->>Strategy: Parse & validate response
+ Strategy->>HostAgent: print_response()
+
+ Strategy->>Strategy: Extract structured data
+ Strategy-->>Strategy: Return ProcessingResult
+```
+
+### Step 1: Build Comprehensive Prompt
+
+**Code**:
+```python
+async def _build_comprehensive_prompt(
+ self,
+ agent: "HostAgent",
+ target_info_list: List[Any],
+ desktop_screenshot_url: str,
+ prev_plan: List[Any],
+ previous_subtasks: List[Any],
+ request: str,
+ session_step: int,
+ request_logger,
+) -> Dict[str, Any]:
+ """Build comprehensive prompt message"""
+ host_agent: "HostAgent" = agent
+
+ # Get blackboard context if available
+ blackboard_prompt = []
+ if not host_agent.blackboard.is_empty():
+ blackboard_prompt = host_agent.blackboard.blackboard_to_prompt()
+
+ # Build complete prompt message
+ prompt_message = host_agent.message_constructor(
+ image_list=[desktop_screenshot_url] if desktop_screenshot_url else [],
+ os_info=target_info_list,
+ plan=prev_plan,
+ prev_subtask=previous_subtasks,
+ request=request,
+ blackboard_prompt=blackboard_prompt,
+ )
+
+ return prompt_message
+```
+
+**Prompt Components**:
+
+| Component | Source | Purpose |
+|-----------|--------|---------|
+| **System Message** | Prompter template | Define agent role and capabilities |
+| **Desktop Screenshot** | Phase 1 | Visual context |
+| **Target List** | Phase 1 | Available applications |
+| **User Request** | Session context | Original task description |
+| **Previous Subtasks** | Session context | Completed steps |
+| **Previous Plan** | Agent memory | Future steps from last round |
+| **Blackboard** | Shared memory | Inter-agent communication |
+
+**Blackboard Integration:**
+
+The Blackboard provides inter-agent communication by including results from AppAgents in the prompt:
+
+```python
+blackboard_prompt = [
+ {"role": "user", "content": "Previous result from Word AppAgent: Table data extracted"}
+]
+```
+
+### Step 2: Get LLM Response with Retry
+
+**Code**:
+```python
+async def _get_llm_response_with_retry(
+ self, host_agent: "HostAgent", prompt_message: Dict[str, Any]
+) -> tuple[str, float]:
+ """Get LLM response with retry logic for JSON parsing failures"""
+ max_retries = ufo_config.system.json_parsing_retry
+
+ for retry_count in range(max_retries):
+ try:
+ # Run synchronous LLM call in thread executor
+ loop = asyncio.get_event_loop()
+ response_text, cost = await loop.run_in_executor(
+ None,
+ host_agent.get_response,
+ prompt_message,
+ AgentType.HOST,
+ True, # use_backup_engine
+ )
+
+ # Validate response can be parsed as JSON
+ host_agent.response_to_dict(response_text)
+
+ return response_text, cost
+
+ except Exception as e:
+ if retry_count < max_retries - 1:
+ self.logger.warning(f"Retry {retry_count + 1}/{max_retries}: {e}")
+ else:
+ raise Exception(f"Failed after {max_retries} attempts: {e}")
+```
+
+!!!note "WebSocket Timeout Fix"
+ The code uses `run_in_executor` to prevent blocking the event loop during long LLM responses, which could cause WebSocket ping/pong timeouts in MCP connections.
+
+### Step 3: Parse and Validate Response
+
+**Code**:
+```python
+def _parse_and_validate_response(
+ self, host_agent: "HostAgent", response_text: str
+) -> HostAgentResponse:
+ """Parse and validate LLM response"""
+ # Parse response to dictionary
+ response_dict = host_agent.response_to_dict(response_text)
+
+ # Create structured response object
+ parsed_response = HostAgentResponse.model_validate(response_dict)
+
+ # Validate required fields
+ self._validate_response_fields(parsed_response)
+
+ # Print response for user feedback
+ host_agent.print_response(parsed_response)
+
+ return parsed_response
+
+def _validate_response_fields(self, response: HostAgentResponse):
+ """Validate response contains required fields"""
+ if not response.observation:
+ raise ValueError("Response missing required 'observation' field")
+ if not response.thought:
+ raise ValueError("Response missing required 'thought' field")
+ if not response.status:
+ raise ValueError("Response missing required 'status' field")
+
+ valid_statuses = ["CONTINUE", "FINISH", "CONFIRM", "ERROR", "ASSIGN"]
+ if response.status.upper() not in valid_statuses:
+ self.logger.warning(f"Unexpected status value: {response.status}")
+```
+
+**HostAgentResponse Structure**:
+
+```python
+class HostAgentResponse(BaseModel):
+ observation: str # What the agent sees
+ thought: str # Reasoning process
+ current_subtask: str # Current subtask description
+ message: str # Message for AppAgent
+ control_label: str # Selected target ID
+ control_text: str # Selected target name
+ plan: List[str] # Future subtasks
+ status: str # Next state (ASSIGN/CONTINUE/FINISH/etc.)
+ comment: str # User-facing comment
+ questions: List[str] # Clarification questions
+ function: str # Command to execute
+ arguments: Dict[str, Any] # Command arguments
+ result: str # Result description
+```
+
+### Processing Result
+
+**Outputs**:
+```python
+ProcessingResult(
+ success=True,
+ data={
+ "parsed_response": HostAgentResponse(...),
+ "response_text": '{"Observation": "...", ...}',
+ "llm_cost": 0.025,
+ "prompt_message": [...],
+ "subtask": "Extract table from Word",
+ "plan": ["Create chart in Excel"],
+ "host_message": "Starting extraction",
+ "status": "ASSIGN",
+ "result": "",
+ "question_list": [],
+ "function_name": "select_application_window",
+ "function_arguments": {"id": "0"}
+ },
+ phase=ProcessingPhase.LLM_INTERACTION
+)
+```
+
+!!!example "LLM Response Example"
+ ```json
+ {
+ "Observation": "Desktop shows Word with table and Excel empty",
+ "Thought": "Need to extract table from Word first before creating chart",
+ "Current Sub-Task": "Extract sales table from Word document",
+ "Message": "Please extract the table data for chart creation",
+ "ControlLabel": "0",
+ "ControlText": "Microsoft Word - Sales Report",
+ "Plan": ["Extract table", "Create bar chart in Excel"],
+ "Status": "ASSIGN",
+ "Comment": "Starting data extraction from Word",
+ "Questions": [],
+ "Function": "select_application_window",
+ "Args": {"id": "0"}
+ }
+ ```
+
+---
+
+## Phase 3: ACTION_EXECUTION
+
+### Strategy: `HostActionExecutionStrategy`
+
+**Purpose**: Execute LLM-decided actions including application selection, third-party agent assignment, and generic command execution.
+
+```python
+@depends_on("target_registry", "command_dispatcher")
+@provides(
+ "execution_result",
+ "action_info",
+ "selected_target_id",
+ "selected_application_root",
+ "assigned_third_party_agent",
+ "target",
+)
+class HostActionExecutionStrategy(BaseProcessingStrategy):
+ """Enhanced action execution strategy for Host Agent"""
+
+ SELECT_APPLICATION_COMMAND: str = "select_application_window"
+
+ def __init__(self, fail_fast: bool = False):
+ super().__init__(name="host_action_execution", fail_fast=fail_fast)
+```
+
+### Execution Flow
+
+```mermaid
+graph TD
+ Start[Start Action Execution] --> CheckFunc{Function Name?}
+
+ CheckFunc -->|select_application_window| SelectApp[Execute Application Selection]
+ CheckFunc -->|Other Command| Generic[Execute Generic Command]
+ CheckFunc -->|None| NoAction[No Action]
+
+ SelectApp --> CheckKind{Target Kind?}
+
+ CheckKind -->|THIRD_PARTY_AGENT| ThirdParty[Assign Third-Party Agent]
+ CheckKind -->|APPLICATION| RegularApp[Select Regular Application]
+
+ ThirdParty --> CreateAction[Create Action Info]
+ RegularApp --> MCP[Execute MCP Command]
+ MCP --> CreateAction
+ Generic --> CreateAction
+ NoAction --> CreateAction
+
+ CreateAction --> Return[Return ProcessingResult]
+
+ style SelectApp fill:#e3f2fd
+ style ThirdParty fill:#fff3e0
+ style RegularApp fill:#f1f8e9
+ style Generic fill:#fce4ec
+```
+
+### Application Selection
+
+**Code**:
+```python
+async def _execute_application_selection(
+ self,
+ parsed_response: HostAgentResponse,
+ target_registry: TargetRegistry,
+ command_dispatcher: BasicCommandDispatcher,
+) -> List[Result]:
+ """Execute application selection"""
+ target_id = parsed_response.arguments.get("id")
+ target = target_registry.get(target_id)
+
+ # Handle third-party agent selection
+ if target.kind == TargetKind.THIRD_PARTY_AGENT:
+ return await self._select_third_party_agent(target)
+ # Handle regular application selection
+ else:
+ return await self._select_regular_application(target, command_dispatcher)
+```
+
+#### Third-Party Agent Selection
+
+**Code**:
+```python
+async def _select_third_party_agent(self, target: TargetInfo) -> List[Result]:
+ """Handle third-party agent selection"""
+ self.logger.info(f"Assigned third-party agent: {target.name}")
+
+ return [
+ Result(
+ status="success",
+ result={
+ "id": target.id,
+ "name": target.name,
+ "type": "third_party_agent",
+ },
+ )
+ ]
+```
+
+!!!info "Third-Party Agents"
+ Third-party agents are custom agents registered in configuration:
+ ```yaml
+ enabled_third_party_agents:
+ - HardwareAgent
+ - NetworkAgent
+ ```
+
+ They are selected like applications but don't require window management.
+
+#### Regular Application Selection
+
+**Code**:
+```python
+async def _select_regular_application(
+ self, target: TargetInfo, command_dispatcher: BasicCommandDispatcher
+) -> List[Result]:
+ """Handle regular application selection"""
+ execution_result = await command_dispatcher.execute_commands([
+ Command(
+ tool_name="select_application_window",
+ parameters={"id": str(target.id), "name": target.name},
+ tool_type="action",
+ )
+ ])
+
+ if execution_result and execution_result[0].result:
+ app_root = execution_result[0].result.get("root_name", "")
+ self.logger.info(f"Selected application: {target.name}, root: {app_root}")
+
+ return execution_result
+```
+
+**Window Selection Actions**:
+1. Focuses application window
+2. Brings window to foreground
+3. Retrieves application root name (for AppAgent configuration)
+4. Updates global context with window information
+
+### Generic Command Execution
+
+**Code**:
+```python
+async def _execute_generic_command(
+ self,
+ parsed_response: HostAgentResponse,
+ command_dispatcher: BasicCommandDispatcher,
+) -> List[Result]:
+ """Execute generic command"""
+ function_name = parsed_response.function
+ arguments = parsed_response.arguments or {}
+
+ execution_result = await command_dispatcher.execute_commands([
+ Command(
+ tool_name=function_name,
+ parameters=arguments,
+ tool_type="action",
+ )
+ ])
+
+ return execution_result
+```
+
+**Generic Commands:**
+
+- `launch_application`: Start new application
+- `close_application`: Terminate application
+- `bash_command`: Execute shell command
+- Custom MCP tools
+
+### Action Info Creation
+
+**Code**:
+```python
+def _create_action_info(
+ self,
+ parsed_response: HostAgentResponse,
+ execution_result: List[Result],
+ target_registry: TargetRegistry,
+ selected_target_id: str,
+) -> ActionCommandInfo:
+ """Create action information object for memory"""
+ target_object = None
+ if target_registry and selected_target_id:
+ target_object = target_registry.get(selected_target_id)
+
+ action_info = ActionCommandInfo(
+ function=parsed_response.function,
+ arguments=parsed_response.arguments or {},
+ target=target_object,
+ status=parsed_response.status,
+ result=execution_result[0] if execution_result else Result(status="none"),
+ )
+
+ return action_info
+```
+
+**ActionCommandInfo Structure**:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `function` | str | Command name executed |
+| `arguments` | Dict | Command parameters |
+| `target` | TargetInfo | Selected target object |
+| `status` | str | Agent status after execution |
+| `result` | Result | Execution result |
+
+### Processing Result
+
+**Outputs**:
+```python
+ProcessingResult(
+ success=True,
+ data={
+ "execution_result": [Result(...)],
+ "action_info": ActionCommandInfo(...),
+ "target": TargetInfo(...),
+ "selected_target_id": "0",
+ "selected_application_root": "WINWORD",
+ "assigned_third_party_agent": "",
+ },
+ phase=ProcessingPhase.ACTION_EXECUTION
+)
+```
+
+---
+
+## Phase 4: MEMORY_UPDATE
+
+### Strategy: `HostMemoryUpdateStrategy`
+
+**Purpose**: Record orchestration step in agent memory, update structural logs, and maintain Blackboard trajectories.
+
+```python
+@depends_on("session_step")
+@provides("additional_memory", "memory_item", "memory_keys_count")
+class HostMemoryUpdateStrategy(BaseProcessingStrategy):
+ """Enhanced memory update strategy for Host Agent"""
+
+ def __init__(self, fail_fast: bool = False):
+ super().__init__(name="host_memory_update", fail_fast=fail_fast)
+```
+
+### Execution Steps
+
+```mermaid
+sequenceDiagram
+ participant Strategy
+ participant Context
+ participant MemoryItem
+ participant AgentMemory
+ participant StructuralLogs
+ participant Blackboard
+
+ Strategy->>Context: Extract all processing data
+ Strategy->>Strategy: Create additional_memory
+
+ Strategy->>MemoryItem: new MemoryItem()
+ Strategy->>MemoryItem: add_values_from_dict(response)
+ Strategy->>MemoryItem: add_values_from_dict(additional_memory)
+
+ Strategy->>AgentMemory: add_memory(memory_item)
+ Strategy->>StructuralLogs: add_to_structural_logs(memory_dict)
+
+ Strategy->>Blackboard: add_trajectories(memorized_action)
+
+ Strategy-->>Strategy: Return ProcessingResult
+```
+
+### Step 1: Create Additional Memory Data
+
+**Code**:
+```python
+def _create_additional_memory_data(
+ self, agent: "HostAgent", context: ProcessingContext
+) -> "HostAgentProcessorContext":
+ """Create comprehensive additional memory data"""
+ host_context: HostAgentProcessorContext = context.local_context
+
+ # Update context with current state
+ host_context.session_step = context.get_global(ContextNames.SESSION_STEP.name, 0)
+ host_context.round_step = context.get_global(ContextNames.CURRENT_ROUND_STEP.name, 0)
+ host_context.round_num = context.get_global(ContextNames.CURRENT_ROUND_ID.name, 0)
+ host_context.agent_step = agent.step if agent else 0
+
+ action_info: ActionCommandInfo = host_context.action_info
+
+ # Update action information
+ if action_info:
+ host_context.action = [action_info.model_dump()]
+ host_context.function_call = action_info.function or ""
+ host_context.arguments = action_info.arguments
+ host_context.action_representation = action_info.to_representation()
+
+ if action_info.result and action_info.result.result:
+ host_context.results = str(action_info.result.result)
+
+ # Update application and agent names
+ host_context.application = host_context.selected_application_root or ""
+ host_context.agent_name = agent.name
+
+ return host_context
+```
+
+**Additional Memory Fields**:
+
+| Field | Description |
+|-------|-------------|
+| `session_step` | Global session step counter |
+| `round_step` | Step within current round |
+| `round_num` | Current round number |
+| `agent_step` | HostAgent's own step counter |
+| `action` | Executed action details |
+| `function_call` | Command name |
+| `arguments` | Command parameters |
+| `action_representation` | Human-readable action description |
+| `results` | Execution results |
+| `application` | Selected application root |
+| `agent_name` | "HostAgent" |
+
+### Step 2: Create and Populate Memory Item
+
+**Code**:
+```python
+def _create_and_populate_memory_item(
+ self,
+ parsed_response: HostAgentResponse,
+ additional_memory: "HostAgentProcessorContext",
+) -> MemoryItem:
+ """Create and populate memory item"""
+ memory_item = MemoryItem()
+
+ # Add response data
+ if parsed_response:
+ memory_item.add_values_from_dict(parsed_response.model_dump())
+
+ # Add additional memory data
+ memory_item.add_values_from_dict(additional_memory.to_dict(selective=True))
+
+ return memory_item
+```
+
+**MemoryItem Contents**:
+
+```python
+{
+ # From HostAgentResponse
+ "observation": "Desktop shows Word and Excel...",
+ "thought": "Need to extract table first...",
+ "current_subtask": "Extract table from Word",
+ "plan": ["Create chart in Excel"],
+ "status": "ASSIGN",
+
+ # From Additional Memory
+ "session_step": 1,
+ "round_num": 0,
+ "round_step": 0,
+ "agent_step": 0,
+ "action": [{"function": "select_application_window", ...}],
+ "application": "WINWORD",
+ "agent_name": "HostAgent",
+ ...
+}
+```
+
+### Step 3: Update Structural Logs
+
+**Code**:
+```python
+def _update_structural_logs(self, memory_item: MemoryItem, global_context):
+ """Update structural logs for debugging"""
+ global_context.add_to_structural_logs(memory_item.to_dict())
+```
+
+**Structural Logs:**
+
+Structural logs provide machine-readable JSON logs of every agent step for debugging and analysis, replay and reproduction, performance monitoring, and training data collection.
+
+### Step 4: Update Blackboard Trajectories
+
+**Code**:
+```python
+def _update_blackboard_trajectories(
+ self,
+ host_agent: "HostAgent",
+ memory_item: MemoryItem,
+):
+ """Update blackboard trajectories"""
+ history_keys = ufo_config.system.history_keys
+
+ memory_dict = memory_item.to_dict()
+ memorized_action = {
+ key: memory_dict.get(key) for key in history_keys if key in memory_dict
+ }
+
+ if memorized_action:
+ host_agent.blackboard.add_trajectories(memorized_action)
+```
+
+**Blackboard Trajectories**:
+
+```python
+# Configuration
+history_keys = ["observation", "thought", "current_subtask", "status", "result"]
+
+# Stored in Blackboard
+{
+ "step_0": {
+ "observation": "Desktop shows Word and Excel",
+ "thought": "Extract table first",
+ "current_subtask": "Extract table",
+ "status": "ASSIGN",
+ "result": ""
+ },
+ "step_1": {
+ "observation": "Word AppAgent extracted table",
+ "thought": "Now create chart in Excel",
+ "current_subtask": "Create bar chart",
+ "status": "ASSIGN",
+ "result": "Table data: [...]"
+ }
+}
+```
+
+**Inter-Agent Communication:**
+
+Blackboard trajectories enable AppAgents to access HostAgent's orchestration history, providing context for their execution.
+
+### Processing Result
+
+**Outputs**:
+```python
+ProcessingResult(
+ success=True,
+ data={
+ "additional_memory": HostAgentProcessorContext(...),
+ "memory_item": MemoryItem(...),
+ "memory_keys_count": 25
+ },
+ phase=ProcessingPhase.MEMORY_UPDATE
+)
+```
+
+---
+
+## Complete Processing Flow
+
+### Multi-Step Example
+
+**User Request**: "Extract table from Word and create chart in Excel"
+
+**Round 1**: Select Word
+
+| Phase | Key Operations | Outputs |
+|-------|----------------|---------|
+| DATA_COLLECTION | Capture desktop, list windows | screenshot, [Word, Excel] |
+| LLM_INTERACTION | Analyze, select Word | Status=ASSIGN, target_id=0 |
+| ACTION_EXECUTION | Select Word window | app_root="WINWORD" |
+| MEMORY_UPDATE | Record step | memory_item added |
+
+**Round 2**: Create Excel Chart
+
+| Phase | Key Operations | Outputs |
+|-------|----------------|---------|
+| DATA_COLLECTION | Capture desktop, list windows | screenshot, [Word, Excel] |
+| LLM_INTERACTION | Analyze Word result, select Excel | Status=ASSIGN, target_id=1 |
+| ACTION_EXECUTION | Select Excel window | app_root="EXCEL" |
+| MEMORY_UPDATE | Record step | memory_item added |
+
+**Round 3**: Verify Completion
+
+| Phase | Key Operations | Outputs |
+|-------|----------------|---------|
+| DATA_COLLECTION | Capture desktop | screenshot |
+| LLM_INTERACTION | Verify chart created | Status=FINISH |
+| ACTION_EXECUTION | No action | - |
+| MEMORY_UPDATE | Record completion | memory_item added |
+
+---
+
+## Error Handling
+
+### Strategy-Level Error Handling
+
+Each strategy implements robust error handling:
+
+```python
+async def execute(self, agent, context) -> ProcessingResult:
+ try:
+ # Execute strategy logic
+ return ProcessingResult(success=True, data={...})
+ except Exception as e:
+ error_msg = f"{self.name} failed: {str(e)}"
+ self.logger.error(error_msg)
+ return self.handle_error(e, self.phase, context)
+```
+
+**Error Handling Modes**:
+
+| Strategy | `fail_fast` | Behavior |
+|----------|-------------|----------|
+| DATA_COLLECTION | True | Stop immediately on failure |
+| LLM_INTERACTION | True | Stop immediately on failure |
+| ACTION_EXECUTION | False | Log error, continue |
+| MEMORY_UPDATE | False | Log error, continue |
+
+!!!warning "Critical vs Non-Critical Failures"
+ - **Critical** (fail_fast=True): Desktop capture, LLM interaction
+ - **Non-Critical** (fail_fast=False): Action execution, memory update
+
+ Critical failures prevent further processing, while non-critical failures are logged but don't stop the pipeline.
+
+---
+
+## Performance Considerations
+
+### Async Execution
+
+All strategies use async/await for non-blocking I/O:
+
+```python
+# Non-blocking screenshot capture
+result = await command_dispatcher.execute_commands([...])
+
+# Non-blocking LLM call (with thread executor)
+loop = asyncio.get_event_loop()
+response = await loop.run_in_executor(None, llm_call, ...)
+```
+
+### Retry Logic
+
+LLM interaction includes automatic retry for transient failures:
+
+```python
+max_retries = ufo_config.system.json_parsing_retry # Default: 3
+
+for retry_count in range(max_retries):
+ try:
+ response = await get_llm_response(...)
+ validate_json(response)
+ return response
+ except Exception as e:
+ if retry_count < max_retries - 1:
+ continue
+ raise
+```
+
+### Caching
+
+Target registry can be reused across rounds:
+
+```python
+existing_target_registry = context.get_local("target_registry")
+target_registry = self._register_applications_and_agents(
+ app_windows_info, existing_target_registry
+)
+```
+
+---
+
+## Related Documentation
+
+**Architecture & Design:**
+
+- **[Overview](overview.md)**: HostAgent high-level architecture
+- **[State Machine](state.md)**: When strategies are executed
+- **[Processor Framework](../../infrastructure/agents/design/processor.md)**: General processor architecture
+
+**System Integration:**
+
+- **[Command System](commands.md)**: Available desktop commands
+- **[Blackboard](../../infrastructure/agents/design/blackboard.md)**: Inter-agent communication
+- **[Memory System](../../infrastructure/agents/design/memory.md)**: Memory management
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+- **4 Phases**: DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE
+- **Desktop Context**: Capture screenshot + application list
+- **LLM Decision**: Select application, decompose task, set status
+- **Action Types**: Application selection, third-party agent assignment, generic commands
+- **Memory Persistence**: Record every step for context and replay
+- **Blackboard Integration**: Share trajectories with AppAgents
+- **Error Resilience**: Retry logic, fail-fast configuration, graceful degradation
+
+**Next Steps:**
+
+- Read [Command System](commands.md) for available desktop operations
+- Read [State Machine](state.md) to understand when processing occurs
+- Read [Blackboard](../../infrastructure/agents/design/blackboard.md) for inter-agent communication
+- Learn [Creating Third-Party Agents](../../tutorials/creating_third_party_agents.md) to build custom agents
diff --git a/documents/docs/ufo2/overview.md b/documents/docs/ufo2/overview.md
new file mode 100644
index 000000000..521b53b17
--- /dev/null
+++ b/documents/docs/ufo2/overview.md
@@ -0,0 +1,412 @@
+# UFO² — Windows AgentOS
+
+[](https://arxiv.org/abs/2504.14603)
+
+[](https://opensource.org/licenses/MIT)
+[](https://github.com/microsoft/UFO)
+[](https://www.youtube.com/watch?v=QT_OhygMVXU)
+
+
+**UFO²** is a Windows AgentOS that reimagines desktop automation as a first-class operating system abstraction. Unlike traditional Computer-Using Agents (CUAs) that rely on screenshots and simulated inputs, UFO² deeply integrates with Windows OS through UI Automation APIs, application-specific introspection, and hybrid GUI–API execution—enabling robust, efficient, and non-disruptive automation across 20+ real-world applications.
+
+---
+
+## What is UFO²?
+
+UFO² addresses fundamental limitations of existing desktop automation solutions:
+
+**Traditional RPA (UiPath, Power Automate):**
+❌ Fragile scripts that break with UI changes
+❌ Requires extensive manual maintenance
+❌ Limited adaptability to dynamic environments
+
+**Current CUAs (Claude, Operator):**
+❌ Visual-only inputs with high cognitive overhead
+❌ Miss native OS APIs and application internals
+❌ Lock users out during automation (poor UX)
+
+**UFO² AgentOS:**
+✅ **Deep OS Integration** — Windows UIA, Win32, WinCOM APIs
+✅ **Hybrid GUI–API Actions** — Native APIs + fallback GUI automation
+✅ **Continuous Knowledge Learning** — RAG-enhanced from docs & execution history
+✅ **Picture-in-Picture Desktop** — Parallel automation without user disruption
+✅ **10%+ better success rate** than state-of-the-art CUAs
+
+
+ 
+ Figure 1: Comparison between (a) traditional CUAs that rely on screenshots and simulated inputs, and (b) UFO² AgentOS that deeply integrates with OS APIs, application internals, and hybrid GUI–API execution.
+
+
+---
+
+## Core Architecture
+
+UFO² implements a **hierarchical multi-agent system** optimized for Windows desktop automation:
+
+
+ 
+ Figure 2: UFO² system architecture featuring the two-tier agent hierarchy (HostAgent + AppAgents), hybrid control detection pipeline, continuous knowledge substrate integration, and unified GUI–API action layer coordinated through MCP servers.
+
+
+
+### Two-Tier Agent Hierarchy
+
+| Agent Type | Role | Key Capabilities |
+|------------|------|------------------|
+| **HostAgent** | Desktop Orchestrator | Task decomposition • Application selection • Cross-app coordination • AppAgent lifecycle management |
+| **AppAgent** | Application Executor | UI element interaction • Hybrid GUI–API execution • Application-specific automation • Result reporting |
+
+**Design Philosophy:**
+- **HostAgent** handles **WHAT** (which application) and **WHEN** (task sequencing)
+- **AppAgent** handles **HOW** (UI/API interaction) and **WHERE** (control targeting)
+- **Blackboard** facilitates inter-agent communication without tight coupling
+- **State Machines** ensure deterministic execution flow and error recovery
+
+!!!info "Learn More"
+ - [**HostAgent Documentation**](host_agent/overview.md) — 7-state FSM, desktop orchestration, AppAgent lifecycle
+ - [**AppAgent Documentation**](app_agent/overview.md) — 6-state FSM, UI automation, hybrid action execution
+ - [**Agent Architecture**](../infrastructure/agents/overview.md) — Three-layer design principles
+
+---
+
+## Key Innovations
+
+### 1. Deep OS Integration 🔧
+
+UFO² embeds directly into Windows OS infrastructure:
+
+- **UI Automation (UIA):** Introspects accessibility trees for standard controls
+- **Win32 APIs:** Low-level window management and process control
+- **WinCOM:** Interacts with Office applications (Excel, Word, Outlook)
+- **Hybrid Detection:** Fuses UIA metadata + visual grounding for non-standard UI elements
+
+!!!tip "Hybrid Control Detection"
+ Combines Windows UIA APIs with vision models ([OmniParser](https://arxiv.org/abs/2408.00203)) to detect both standard and custom UI controls—bridging structured accessibility trees and pixel-level perception.
+
+ 📖 [Control Detection Guide](core_features/control_detection/overview.md)
+
+### 2. Unified GUI–API Action Layer ⚡
+
+Traditional CUAs simulate mouse/keyboard only. UFO² chooses the best execution method:
+
+**GUI Actions** (fallback):
+`click`, `type`, `select`, `scroll` → Reliable for any application
+
+**Native APIs** (preferred):
+- Excel: `xlwings` for direct cell/chart manipulation
+- Outlook: `win32com` for email operations
+- PowerPoint: `python-pptx` for slide editing
+→ **51% fewer LLM calls** via speculative multi-action execution
+
+**Model Context Protocol (MCP) Servers:**
+Extensible framework for adding application-specific APIs without modifying agent code.
+
+!!!info "Learn More"
+ 📖 [Hybrid Actions Guide](core_features/hybrid_actions.md) • [MCP Integration](../mcp/overview.md)
+
+### 3. Continuous Knowledge Substrate 📚
+
+UFO² learns from three knowledge sources without model retraining:
+
+| Source | Content | Integration Method |
+|--------|---------|-------------------|
+| **Help Documents** | Official app documentation, API references | Vectorized retrieval (RAG) |
+| **Bing Search** | Real-time web knowledge for latest features | Dynamic query expansion |
+| **Execution History** | Past successful/failed action sequences | Experience replay & pattern mining |
+
+**Result:** Agents improve autonomously by retrieving relevant context at execution time.
+
+!!!info "Knowledge Integration"
+ 📖 [Knowledge Substrate Overview](core_features/knowledge_substrate/overview.md)
+ 📖 [Learning from Help Documents](core_features/knowledge_substrate/learning_from_help_document.md)
+ 📖 [Experience Learning](core_features/knowledge_substrate/experience_learning.md)
+
+### 4. Speculative Multi-Action Execution 🚀
+
+Reduce LLM latency by predicting and validating action sequences:
+
+**Traditional Approach:**
+1 LLM call → 1 action → observe → repeat → **High latency**
+
+**UFO² Speculative Execution:**
+1 LLM call → predict N actions → validate with UI state → execute all → **51% fewer queries**
+
+**Validation Mechanism:**
+Lightweight control-state checks ensure predicted actions remain valid before execution.
+
+!!!example "Efficiency Gain"
+ **Task:** "Fill form fields A1–A10 with sequential numbers"
+
+ - **Traditional CUA:** 10 LLM calls (1 per field) → ~30 seconds
+ - **UFO² Speculative:** 1 LLM call predicts all 10 actions → ~8 seconds
+
+ 📖 [Multi-Action Execution Guide](core_features/multi_action.md)
+
+### 5. Picture-in-Picture Desktop 🖼️
+
+**Problem:** Existing CUAs lock users out during automation (poor UX).
+
+**UFO² Solution:** Nested virtual desktop via Windows Remote Desktop loopback:
+
+- **User Desktop:** Continue working normally
+- **Agent Desktop (PiP):** Automation runs in parallel sandboxed environment
+- **Zero Interference:** User and agent don't compete for mouse/keyboard
+
+**Implementation:**
+Built on Windows native remote desktop infrastructure—secure, isolated, non-disruptive.
+
+!!!success "User Experience"
+ Users can continue email, browsing, or coding while UFO² automates Excel reports in the background PiP desktop.
+
+---
+
+## System Components
+
+### Processing Pipeline
+
+Both HostAgent and AppAgent execute a **4-phase processing cycle**:
+
+| Phase | Purpose | HostAgent Strategy | AppAgent Strategy |
+|-------|---------|-------------------|------------------|
+| **1. Data Collection** | Gather environment state | Desktop screenshot, app list | App screenshot, UI tree, control annotations |
+| **2. LLM Interaction** | Decide next action | Select application, plan subtask | Select control, plan action sequence |
+| **3. Action Execution** | Execute commands | Launch app, create AppAgent | Execute GUI/API actions |
+| **4. Memory Update** | Record execution | Save orchestration step | Save interaction step, update blackboard |
+
+!!!info "Processing Details"
+ 📖 [Strategy Layer](../infrastructure/agents/design/processor.md) — Processing framework and dependency chain
+ 📖 [State Layer](../infrastructure/agents/design/state.md) — FSM design principles
+
+### Command System
+
+Commands are dispatched through **MCP (Model Context Protocol)** servers:
+
+**HostAgent Commands:**
+
+- **Desktop Capture:** `capture_desktop_screenshot`
+- **Window Management:** `get_desktop_app_info`, `get_app_window`
+- **Process Control:** `launch_application`, `close_application`
+
+**AppAgent Commands:**
+
+- **Screenshot:** `capture_screenshot`, `annotate_screenshot`
+- **UI Inspection:** `get_control_info`, `get_ui_tree`
+- **UI Interaction:** `click`, `set_edit_text`, `wheel_mouse_input`
+- **Control Selection:** `select_control_by_index`, `select_control_by_name`
+
+!!!info "Command Architecture"
+ 📖 [Command Layer](../infrastructure/agents/design/command.md) — MCP integration and command dispatch
+ 📖 [MCP Servers](../mcp/overview.md) — Server architecture and custom server creation
+
+---
+
+
+## Configuration
+
+UFO² integrates with a centralized YAML-based configuration system:
+
+```yaml
+# config/ufo/host_agent_config.yaml
+host_agent:
+ visual_mode: true # Enable screenshot-based reasoning
+ max_subtasks: 10 # Maximum subtasks per session
+ llm_config:
+ model: "gpt-4o"
+ temperature: 0.0
+
+# config/ufo/app_agent_config.yaml
+app_agent:
+ visual_mode: true # Enable UI screenshot analysis
+ control_backend: "uia" # UI Automation (uia) or Win32 (win32)
+ max_steps: 20 # Maximum steps per subtask
+```
+
+!!!tip "Complete Configuration Guide"
+ For detailed configuration options, model setup, and advanced customization:
+
+ 📖 **[Configuration & Setup](../configuration/system/overview.md)** — Complete system configuration reference
+ 📖 **[Model Setup](../configuration/models/overview.md)** — LLM provider configuration (OpenAI, Azure, Gemini, Claude, etc.)
+ 📖 **[MCP Configuration](../configuration/system/mcp_reference.md)** — MCP server and extension configuration
+
+---
+
+## Quick Start
+
+### Basic Usage
+
+UFO² is designed to be run from the command line:
+
+**Interactive Mode:**
+```powershell
+# Start UFO² in interactive mode
+python -m ufo --task
+```
+
+**Example:**
+```powershell
+python -m ufo --task excel_demo
+```
+
+This will prompt you to enter your request interactively:
+```
+Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
+Please enter your request to be completed🛸: Create a chart from Sheet1 data in Excel
+```
+
+**Direct Request Mode:**
+```powershell
+# Execute with a specific request directly
+python -m ufo --task -r ""
+```
+
+**Example:**
+```powershell
+python -m ufo --task excel_demo -r "Open Excel and create a chart from Sheet1 data"
+```
+
+!!!tip "Complete Setup Guide"
+ For detailed installation, configuration, and advanced usage options, see the **[Quick Start Guide](../getting_started/quick_start_ufo2.md)**.
+
+### What Happens Under the Hood
+
+1. **Session** creates **HostAgent** with user request
+2. **HostAgent** captures desktop, selects "Microsoft Excel", launches app
+3. **HostAgent** creates **AppAgent** for Excel, delegates subtask
+4. **AppAgent** captures Excel UI, identifies chart insertion control
+5. **AppAgent** executes hybrid action (API if available, GUI fallback)
+6. **AppAgent** reports completion to **HostAgent**
+7. **HostAgent** verifies task, returns success to **Session**
+
+!!!tip "Next Steps"
+ 📖 [Getting Started Guide](../getting_started/quick_start_ufo2.md)
+ 📖 [Creating Your AppAgent](../tutorials/creating_app_agent/overview.md)
+
+---
+
+## Documentation Navigation
+
+### Core Concepts
+
+- [**HostAgent**](host_agent/overview.md) — Desktop orchestrator with 7-state FSM
+- [**AppAgent**](app_agent/overview.md) — Application executor with 6-state FSM
+- [**Agent Types**](../infrastructure/agents/agent_types.md) — Platform-specific implementations
+- [**Evaluation Agent**](evaluation/evaluation_agent.md) — Automated testing and benchmarking
+
+### Advanced Features
+
+- [**Hybrid Actions**](core_features/hybrid_actions.md) — GUI–API execution layer
+- [**Control Detection**](core_features/control_detection/overview.md) — UIA + visual grounding
+- [**Knowledge Substrate**](core_features/knowledge_substrate/overview.md) — RAG-enhanced learning
+- [**Multi-Action Execution**](core_features/multi_action.md) — Speculative action planning
+- [**Follower Mode**](advanced_usage/follower_mode.md) — Human-in-the-loop execution
+- [**Batch Mode**](advanced_usage/batch_mode.md) — Bulk task processing
+
+### System Architecture
+
+- [**Device Agent Overview**](../infrastructure/agents/overview.md) — Three-layer architecture
+- [**State Layer**](../infrastructure/agents/design/state.md) — FSM design principles
+- [**Strategy Layer**](../infrastructure/agents/design/processor.md) — Processing framework
+- [**Command Layer**](../infrastructure/agents/design/command.md) — MCP integration
+
+### Development
+
+- [**Creating AppAgent**](../tutorials/creating_app_agent/overview.md) — Custom agent development
+- [**MCP Servers**](../mcp/overview.md) — Building custom MCP servers
+- [**Configuration**](../configuration/system/overview.md) — System configuration reference
+- [**Prompts**](prompts/overview.md) — Prompt engineering guide
+
+### Benchmarking & Logs
+
+- [**Benchmark Overview**](evaluation/benchmark/overview.md) — WindowsAgentArena, OSWorld
+- [**Performance Logs**](evaluation/logs/overview.md) — Execution logs and debugging
+
+---
+
+## Research Impact
+
+UFO² demonstrates that **system-level integration** and **architectural design** matter more than model size alone:
+
+!!!success "Key Findings"
+ - **10%+ improvement** over Claude/Operator on WindowsAgentArena
+ - **51% fewer LLM calls** via speculative multi-action execution
+ - **Robust to UI changes** through hybrid UIA + visual detection
+ - **Continuous learning** without model retraining via RAG
+ - **Non-disruptive UX** via Picture-in-Picture desktop
+
+**Research Paper:**
+📄 [UFO²: A Grounded OS Agent for Windows](https://arxiv.org/abs/2504.14603)
+
+---
+
+## Get Started
+
+Ready to explore UFO²? Choose your path:
+
+!!!info "Learning Paths"
+ **🚀 New Users:** Start with [Quick Start Guide](../getting_started/quick_start_ufo2.md)
+ **🔧 Developers:** Read [Creating AppAgent](../tutorials/creating_app_agent/overview.md)
+ **🏗️ System Architects:** Study [Device Agent Architecture](../infrastructure/agents/overview.md)
+ **📊 Researchers:** Check [Benchmark Results](evaluation/benchmark/overview.md)
+
+**Next:** [HostAgent Deep Dive](host_agent/overview.md) → Understand desktop orchestration
+
+---
+
+## 🌐 Media Coverage
+
+Check out our official deep dive of UFO on [this Youtube Video](https://www.youtube.com/watch?v=QT_OhygMVXU).
+
+UFO sightings have garnered attention from various media outlets, including:
+
+- [微软正式开源UFO²,Windows桌面迈入「AgentOS 时代」](https://www.jiqizhixin.com/articles/2025-05-06-13)
+- [Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
+- [🚀 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo🌌](https://www.linkedin.com/posts/gutierrezfrancois_ai-ufo-microsoft-activity-7176819900399652865-pLoo?utm_source=share&utm_medium=member_desktop)
+- [The AI PC - The Future of Computers? - Microsoft UFO](https://www.youtube.com/watch?v=1k4LcffCq3E)
+- [下一代Windows系统曝光:基于GPT-4V,Agent跨应用调度,代号UFO](https://baijiahao.baidu.com/s?id=1790938358152188625&wfr=spider&for=pc)
+- [下一代智能版 Windows 要来了?微软推出首个 Windows Agent,命名为 UFO!](https://blog.csdn.net/csdnnews/article/details/136161570)
+- [Microsoft発のオープンソース版「UFO」登場! Windowsを自動操縦するAIエージェントを試す](https://internet.watch.impress.co.jp/docs/column/shimizu/1570581.html)
+
+---
+
+## 📚 Citation
+
+If you build on this work, please cite the AgentOS framework:
+
+**UFO² – The Desktop AgentOS (2025)**
+
+
+```bibtex
+@article{zhang2025ufo2,
+ title = {{UFO2: The Desktop AgentOS}},
+ author = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2504.14603},
+ year = {2025}
+}
+```
+
+**UFO – A UI‑Focused Agent for Windows OS Interaction (2024)**
+
+
+```bibtex
+@article{zhang2024ufo,
+ title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
+ author = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
+ journal = {arXiv preprint arXiv:2402.07939},
+ year = {2024}
+}
+```
+
+---
+
+## 🎨 Related Projects
+
+- **TaskWeaver** — a code‑first LLM agent for data analytics:
+- **LLM‑Brained GUI Agents: A Survey**: • [GitHub](https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey) • [Interactive site](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)
+
+---
+
+## ❓Get Help
+
+- ❔GitHub Issues (preferred)
+- For other communications, please contact [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+
diff --git a/documents/docs/ufo2/prompts/basic_template.md b/documents/docs/ufo2/prompts/basic_template.md
new file mode 100644
index 000000000..946df861a
--- /dev/null
+++ b/documents/docs/ufo2/prompts/basic_template.md
@@ -0,0 +1,22 @@
+# Basic Prompt Template
+
+The basic prompt template is a fixed format used to generate prompts for the `HostAgent`, `AppAgent`, and `EvaluationAgent`. It includes templates for the `system` and `user` roles to construct each agent's prompt.
+
+Default file paths for basic prompt templates:
+
+| Agent | File Path |
+| --- | --- |
+| HostAgent | [ufo/prompts/share/base/host_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/base/host_agent.yaml) |
+| AppAgent | [ufo/prompts/share/base/app_agent.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/share/base/app_agent.yaml) |
+| EvaluationAgent | [ufo/prompts/evaluation/evaluate.yaml](https://github.com/microsoft/UFO/blob/main/ufo/prompts/evaluation/evaluate.yaml) |
+
+You can configure the prompt template in the system configuration files. See the [System Configuration Guide](../../configuration/system/system_config.md) for details.
+
+## Template Structure
+
+Each YAML template contains structured sections for the `system` and `user` roles:
+
+- **System role**: Contains agent instructions, capabilities, and output format requirements
+- **User role**: Defines the structure for runtime context injection (observations, tasks, etc.)
+
+These templates are loaded and populated by the agent's `Prompter` class at runtime. Learn how templates are processed and combined with dynamic content in the [Prompter documentation](../../infrastructure/agents/design/prompter.md).
diff --git a/documents/docs/ufo2/prompts/examples_prompts.md b/documents/docs/ufo2/prompts/examples_prompts.md
new file mode 100644
index 000000000..6c97ed28e
--- /dev/null
+++ b/documents/docs/ufo2/prompts/examples_prompts.md
@@ -0,0 +1,97 @@
+# Example Prompts
+
+Example prompts provide demonstration examples for in-context learning. They are stored in the `ufo/prompts/examples` directory with the following subdirectories:
+
+| Directory | Description |
+| --- | --- |
+| `nonvisual` | Examples for non-visual LLMs |
+| `visual` | Examples for visual LLMs |
+
+You can configure which example prompts to use in the system configuration files. See the [System Configuration Guide](../../configuration/system/system_config.md) for details.
+
+## How Examples Are Used
+
+Example prompts serve as in-context learning demonstrations that help the LLM understand the expected output format and reasoning process. The agent's `Prompter` class:
+
+1. Loads examples from YAML files based on the model type (visual/nonvisual)
+2. Formats them into the system prompt using `examples_prompt_helper()`
+3. Combines them with API documentation and base instructions
+
+See the [Prompter documentation](../../infrastructure/agents/design/prompter.md) for details on how examples are loaded and formatted into the final prompt.
+
+
+## Example Structure
+
+Below are examples for the `HostAgent` and `AppAgent`:
+
+### HostAgent Example
+
+```yaml
+Request: |-
+ My name is Zac. Please send a email to jack@outlook.com to thanks his contribution on the open source.
+Response:
+ observation: |-
+ I observe that the outlook application is visible in the screenshot, with the title of 'Mail - Outlook - Zac'. I can see a list of emails in the application.
+ thought: |-
+ The user request can be solely complete on the outlook application. I need to open the outlook application for the current sub-task. If successful, no further sub-tasks are needed.
+ current_subtask: |-
+ Compose an email to send to Jack (jack@outlook.com) to thank him for his contribution to the open source project on the outlook application, using the name Zac.
+ message:
+ - (1) The name of the sender is Zac.
+ - (2) The email composed should be detailed and professional.
+ status: |-
+ ASSIGN
+ plan: []
+ function: select_application_window
+ arguments:
+ id: "12"
+ name: "Mail - Outlook - Zac"
+ comment: |-
+ It is time to open the outlook application!
+ questions: []
+ result: |-
+ User request in ASSIGN state. Target window 'Mail - Outlook - Zac' (id:12) identified; will call select_application_window to focus Outlook and begin composing.
+```
+
+### AppAgent Example
+
+```yaml
+Request: |-
+ My name is Zac. Please send a email to jack@outlook.com to thanks his contribution on the open source.
+Sub-task: |-
+ Compose an email to send to Jack (jack@outlook.com) to thank him for his contribution to the open source project on the outlook application, using the name Zac.
+Response:
+ observation: |-
+ The screenshot shows that I am on the Main Page of Outlook. The Main Page has a list of control items and email received. The new email editing window is not opened.
+ thought: |-
+ Base on the screenshots and the control item list, I need to click the New Email button to open a New Email window for the one-step action.
+ action:
+ function: |-
+ click_input
+ arguments:
+ {"id": "1", "name": "New Email", "button": "left", "double": false}
+ status: |-
+ CONTINUE
+ plan:
+ - (1) Input the email address of the receiver.
+ - (2) Input the title of the email.
+ - (3) Input the content of the email.
+ - (4) Click the Send button to send the email.
+ comment: |-
+ After I click the New Email button, the New Email window will be opened and available for composing the email.
+ save_screenshot:
+ {"save": false, "reason": ""}
+ result: |-
+ Successfully clicked the 'New Email' button in Outlook to initiate email composition.
+Tips:
+ - Sending an email is a sensitive action that needs to be confirmed by the user before the execution.
+ - You need to draft the content of the email and send it to the receiver.
+```
+
+These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.
+
+## Related Documentation
+
+- **[Prompter Design](../../infrastructure/agents/design/prompter.md)** - Learn how examples are loaded and formatted
+- **[Basic Template](./basic_template.md)** - Understand the YAML template structure
+- **[System Configuration](../../configuration/system/system_config.md)** - Configure which examples to use
\ No newline at end of file
diff --git a/documents/docs/ufo2/prompts/overview.md b/documents/docs/ufo2/prompts/overview.md
new file mode 100644
index 000000000..4c5239929
--- /dev/null
+++ b/documents/docs/ufo2/prompts/overview.md
@@ -0,0 +1,53 @@
+# Prompts
+
+All prompts used in UFO are stored in the `ufo/prompts` directory. The folder structure is as follows:
+
+```
+📦prompts
+ ┣ 📂demonstration # Prompts for summarizing human demonstrations
+ ┣ 📂evaluation # Prompts for the EvaluationAgent
+ ┣ 📂examples # Demonstration examples for in-context learning
+ ┣ 📂nonvisual # Examples for non-visual LLMs
+ ┗ 📂visual # Examples for visual LLMs
+ ┣ 📂experience # Prompts for summarizing agent self-experience
+ ┣ 📂share # Shared prompt templates
+ ┗ 📂base # Basic version of shared prompts
+ ┣ 📜api.yaml # Basic API prompt
+ ┣ 📜app_agent.yaml # Basic AppAgent prompt template
+ ┗ 📜host_agent.yaml # Basic HostAgent prompt template
+ ┗ 📂third_party # Third-party integration prompts (e.g., Linux agents)
+```
+
+Visual LLMs can process screenshots while non-visual LLMs rely on text-only control information.
+
+## Agent Prompts
+
+Agent prompts are constructed from the following components:
+
+| Component | Description | Source |
+| --- | --- | --- |
+| **Basic Template** | Base template with system and user roles | YAML files in `share/base/` |
+| **API Documentation** | Skills and APIs available to the agent | Dynamically generated from MCP tools |
+| **Examples** | In-context learning demonstrations | YAML files in `examples/visual/` or `examples/nonvisual/` |
+
+You can find the base templates in the `share/base` directory.
+
+## How Prompts Are Constructed
+
+The agent's `Prompter` class is responsible for:
+
+1. **Loading** YAML templates from the file system
+2. **Formatting** API documentation from available tools
+3. **Selecting** appropriate examples based on model type (visual/nonvisual)
+4. **Combining** all components into a structured message list for the LLM
+5. **Injecting** runtime context (observations, screenshots, retrieved knowledge)
+
+Each agent type has its own specialized Prompter:
+
+- **HostAgentPrompter**: Desktop-level orchestration with third-party agent support
+- **AppAgentPrompter**: Application-level interactions with multi-action capabilities
+- **EvaluationAgentPrompter**: Task evaluation and success assessment
+
+For comprehensive details about the Prompter class architecture, template loading, and prompt construction workflow, see the [Prompter documentation](../../infrastructure/agents/design/prompter.md).
+
+
diff --git a/documents/mkdocs.yml b/documents/mkdocs.yml
index 2a7eae41f..ddad6b286 100644
--- a/documents/mkdocs.yml
+++ b/documents/mkdocs.yml
@@ -1,100 +1,240 @@
-site_name: UFO Documentation
+site_name: UFO³ Documentation
nav:
- Home: index.md
+ - Choose Your Path: choose_path.md
- Project Directory Structure: project_directory_structure.md
- - Getting Started:
- - Quick Start: getting_started/quick_start.md
- - More Guidance: getting_started/more_guidance.md
- - Basic Modules:
- - Session: modules/session.md
- - Round: modules/round.md
- - Context: modules/context.md
- - Configurations:
- - User Configuration: configurations/user_configuration.md
- - Developer Configuration: configurations/developer_configuration.md
- - Model Pricing: configurations/pricing_configuration.md
- - Supported Models:
- - Overview: supported_models/overview.md
- - OpenAI: supported_models/openai.md
- - Azure OpenAI: supported_models/azure_openai.md
- - OpenAI CUA (Operator): supported_models/operator.md
- - Gemini: supported_models/gemini.md
- - Claude: supported_models/claude.md
- - Qwen: supported_models/qwen.md
- - DeepSeek: supported_models/deepseek.md
- - Ollama: supported_models/ollama.md
- - Custom Model: supported_models/custom_model.md
- - Agents:
- - Overview: agents/overview.md
- - Agent Design:
- - Memory: agents/design/memory.md
- - Blackboard: agents/design/blackboard.md
- - State: agents/design/state.md
- - Prompter: agents/design/prompter.md
- - Processor: agents/design/processor.md
- - HostAgent: agents/host_agent.md
- - AppAgent: agents/app_agent.md
- - FollowerAgent: agents/follower_agent.md
- - EvaluationAgent: agents/evaluation_agent.md
- - Prompts:
- - Overview: prompts/overview.md
- - Basic Prompts: prompts/basic_template.md
- - Examples Prompts: prompts/examples_prompts.md
- - API Prompts: prompts/api_prompts.md
- - Puppeteer:
- - Overview: automator/overview.md
- - GUI Automator: automator/ui_automator.md
- - API Automator: automator/wincom_automator.md
- - Web Automator: automator/web_automator.md
- - Bash Automator: automator/bash_automator.md
- - AI Tool: automator/ai_tool_automator.md
- - Logs:
- - Overview: logs/overview.md
- - Markdown Log Viewer: logs/markdown_log_viewer.md
- - Request Logs: logs/request_logs.md
- - Step Logs: logs/step_logs.md
- - Evaluation Logs: logs/evaluation_logs.md
- - Screenshots: logs/screenshots_logs.md
- - UI Tree: logs/ui_tree_logs.md
- - Advanced Usage:
- - Continuous Knowledge Substrate:
- - Overview: advanced_usage/reinforce_appagent/overview.md
- - Learning from Help Document: advanced_usage/reinforce_appagent/learning_from_help_document.md
- - Learning from Bing Search: advanced_usage/reinforce_appagent/learning_from_bing_search.md
- - Experience Learning: advanced_usage/reinforce_appagent/experience_learning.md
- - Learning from User Demonstration: advanced_usage/reinforce_appagent/learning_from_demonstration.md
- - Follower Mode: advanced_usage/follower_mode.md
- - Batch Mode: advanced_usage/batch_mode.md
- - Speculative Multi-Action Execution: advanced_usage/multi_action.md
- - Operator-as-a-AppAgent: advanced_usage/operator_as_app_agent.md
- - Control Filtering:
- - Overview: advanced_usage/control_filtering/overview.md
- - Text Filtering: advanced_usage/control_filtering/text_filtering.md
- - Semantic Filtering: advanced_usage/control_filtering/semantic_filtering.md
- - Icon Filtering: advanced_usage/control_filtering/icon_filtering.md
- - Control Detection:
- - Overview: advanced_usage/control_detection/overview.md
- - UIA Detection: advanced_usage/control_detection/uia_detection.md
- - Visual Detection: advanced_usage/control_detection/visual_detection.md
- - Hybrid Detection: advanced_usage/control_detection/hybrid_detection.md
- - Customization: advanced_usage/customization.md
- - Creating Your AppAgent:
- - Overview: creating_app_agent/overview.md
- - Help Document Provision: creating_app_agent/help_document_provision.md
- - Demonstration Provision: creating_app_agent/demonstration_provision.md
- - Warpping App-Native API: creating_app_agent/warpping_app_native_api.md
- - Benchmark:
- - Overview: benchmark/overview.md
- - Windows Agent Arena: benchmark/windows_agent_arena.md
- - OSWorld (Windows): benchmark/osworld.md
- - Dataflow:
- - Overview: dataflow/overview.md
- - Instantiation: dataflow/instantiation.md
- - Execution: dataflow/execution.md
- - Windows App Environment: dataflow/windows_app_env.md
- - Result: dataflow/result.md
+ - Getting Started:
+ - Quick Start (UFO³ Agent Galaxy): getting_started/quick_start_galaxy.md
+ - Quick Start (UFO²): getting_started/quick_start_ufo2.md
+ - Quick Start (Linux Agent): getting_started/quick_start_linux.md
+ - Quick Start (Mobile Agent): getting_started/quick_start_mobile.md
+ - Migration UFO² → UFO³: getting_started/migration_ufo2_to_galaxy.md
+ - More Guidance: getting_started/more_guidance.md
+ - Configuration & Setup:
+ - Configuration System:
+ - Overview: configuration/system/overview.md
+ - Agent Configuration: configuration/system/agents_config.md
+ - System Configuration: configuration/system/system_config.md
+ - RAG Configuration: configuration/system/rag_config.md
+ - Pricing Configuration: configuration/system/prices_config.md
+ - Third-Party Configuration: configuration/system/third_party_config.md
+ - MCP Reference: configuration/system/mcp_reference.md
+ - Migration Guide: configuration/system/migration.md
+ - Extending Configuration: configuration/system/extending.md
+ - Galaxy Configuration:
+ - Devices: configuration/system/galaxy_devices.md
+ - Constellation: configuration/system/galaxy_constellation.md
+ - Agent: configuration/system/galaxy_agent.md
+ - Model Setup:
+ - Overview: configuration/models/overview.md
+ - OpenAI: configuration/models/openai.md
+ - Azure OpenAI: configuration/models/azure_openai.md
+ - OpenAI CUA (Operator): configuration/models/operator.md
+ - Gemini: configuration/models/gemini.md
+ - Claude: configuration/models/claude.md
+ - Qwen: configuration/models/qwen.md
+ - DeepSeek: configuration/models/deepseek.md
+ - Ollama: configuration/models/ollama.md
+ - Custom Model: configuration/models/custom_model.md
+ - UFO³ Agent Galaxy:
+ - Overview: galaxy/overview.md
+ - WebUI: galaxy/webui.md
+ - Galaxy Client:
+ - Overview: galaxy/client/overview.md
+ - ConstellationClient: galaxy/client/constellation_client.md
+ - DeviceManager: galaxy/client/device_manager.md
+ - Components: galaxy/client/components.md
+ - AIP Integration: galaxy/client/aip_integration.md
+ - GalaxyClient: galaxy/client/galaxy_client.md
+ - Agent Registration:
+ - Overview: galaxy/agent_registration/overview.md
+ - Agent Profile: galaxy/agent_registration/agent_profile.md
+ - Device Registry: galaxy/agent_registration/device_registry.md
+ - Registration Flow: galaxy/agent_registration/registration_flow.md
+ - Task Constellation (DAG):
+ - Overview: galaxy/constellation/overview.md
+ - TaskStar: galaxy/constellation/task_star.md
+ - TaskStarLine: galaxy/constellation/task_star_line.md
+ - TaskConstellation: galaxy/constellation/task_constellation.md
+ - ConstellationEditor: galaxy/constellation/constellation_editor.md
+ - Constellation Agent:
+ - Overview: galaxy/constellation_agent/overview.md
+ - State Machine: galaxy/constellation_agent/state.md
+ - Strategy Pattern: galaxy/constellation_agent/strategy.md
+ - MCP Commands: galaxy/constellation_agent/command.md
+ - Constellation Orchestrator:
+ - Overview: galaxy/constellation_orchestrator/overview.md
+ - Event-Driven Coordination: galaxy/constellation_orchestrator/event_driven_coordination.md
+ - Asynchronous Scheduling: galaxy/constellation_orchestrator/asynchronous_scheduling.md
+ - Safe Assignment Locking: galaxy/constellation_orchestrator/safe_assignment_locking.md
+ - Consistency Guarantees: galaxy/constellation_orchestrator/consistency_guarantees.md
+ - Batched Editing: galaxy/constellation_orchestrator/batched_editing.md
+ - Constellation Manager: galaxy/constellation_orchestrator/constellation_manager.md
+ - API Reference: galaxy/constellation_orchestrator/api_reference.md
+ - Observer System:
+ - Overview: galaxy/observer/overview.md
+ - Event System: galaxy/observer/event_system.md
+ - Progress Observer: galaxy/observer/progress_observer.md
+ - Agent Output Observer: galaxy/observer/agent_output_observer.md
+ - Synchronizer: galaxy/observer/synchronizer.md
+ - Metrics Observer: galaxy/observer/metrics_observer.md
+ - Visualization Observer: galaxy/observer/visualization_observer.md
+ - Evaluation & Logging:
+ - Trajectory Report: galaxy/evaluation/trajectory_report.md
+ - Performance Metrics: galaxy/evaluation/performance_metrics.md
+ - Result JSON Reference: galaxy/evaluation/result_json.md
+ - UFO² Desktop AgentOS:
+ - Overview: ufo2/overview.md
+ - Using as Galaxy Device: ufo2/as_galaxy_device.md
+ - HostAgent:
+ - Overview: ufo2/host_agent/overview.md
+ - State Machine: ufo2/host_agent/state.md
+ - Processing Strategy: ufo2/host_agent/strategy.md
+ - Command System: ufo2/host_agent/commands.md
+ - AppAgent:
+ - Overview: ufo2/app_agent/overview.md
+ - State Machine: ufo2/app_agent/state.md
+ - Processing Strategy: ufo2/app_agent/strategy.md
+ - Command System: ufo2/app_agent/commands.md
+ - Core Features:
+ - Hybrid GUI–API Actions: ufo2/core_features/hybrid_actions.md
+ - Control Detection:
+ - Overview: ufo2/core_features/control_detection/overview.md
+ - UIA Detection: ufo2/core_features/control_detection/uia_detection.md
+ - Visual Detection: ufo2/core_features/control_detection/visual_detection.md
+ - Hybrid Detection: ufo2/core_features/control_detection/hybrid_detection.md
+ - Knowledge Substrate:
+ - Overview: ufo2/core_features/knowledge_substrate/overview.md
+ - Help Documents: ufo2/core_features/knowledge_substrate/learning_from_help_document.md
+ - Bing Search: ufo2/core_features/knowledge_substrate/learning_from_bing_search.md
+ - Experience Learning: ufo2/core_features/knowledge_substrate/experience_learning.md
+ - Demos: ufo2/core_features/knowledge_substrate/learning_from_demonstration.md
+ - Speculative Multi-Action: ufo2/core_features/multi_action.md
+ - Advanced Usage:
+ - Follower Mode: ufo2/advanced_usage/follower_mode.md
+ - Batch Mode: ufo2/advanced_usage/batch_mode.md
+ - Operator Integration: ufo2/advanced_usage/operator_as_app_agent.md
+ - Customization: ufo2/advanced_usage/customization.md
+ - Prompts:
+ - Overview: ufo2/prompts/overview.md
+ - Basic Template: ufo2/prompts/basic_template.md
+ - Examples: ufo2/prompts/examples_prompts.md
+ - Evaluation:
+ - EvaluationAgent: ufo2/evaluation/evaluation_agent.md
+ - Benchmark Overview: ufo2/evaluation/benchmark/overview.md
+ - Windows Agent Arena: ufo2/evaluation/benchmark/windows_agent_arena.md
+ - OSWorld (Windows): ufo2/evaluation/benchmark/osworld.md
+ - Performance Logs:
+ - Overview: ufo2/evaluation/logs/overview.md
+ - Evaluation Logs: ufo2/evaluation/logs/evaluation_logs.md
+ - Markdown Log Viewer: ufo2/evaluation/logs/markdown_log_viewer.md
+ - Request Logs: ufo2/evaluation/logs/request_logs.md
+ - Screenshots Logs: ufo2/evaluation/logs/screenshots_logs.md
+ - Step Logs: ufo2/evaluation/logs/step_logs.md
+ - UI Tree Logs: ufo2/evaluation/logs/ui_tree_logs.md
+ - Dataflow:
+ - Overview: ufo2/dataflow/overview.md
+ - Instantiation: ufo2/dataflow/instantiation.md
+ - Execution: ufo2/dataflow/execution.md
+ - Windows App Environment: ufo2/dataflow/windows_app_env.md
+ - Result: ufo2/dataflow/result.md
+ - Linux Agent:
+ - Overview: linux/overview.md
+ - Using as Galaxy Device: linux/as_galaxy_device.md
+ - State Machine: linux/state.md
+ - Processing Strategy: linux/strategy.md
+ - MCP Commands: linux/commands.md
+ - Mobile Agent:
+ - Overview: mobile/overview.md
+ - Using as Galaxy Device: mobile/as_galaxy_device.md
+ - State Machine: mobile/state.md
+ - Processing Strategy: mobile/strategy.md
+ - MCP Commands: mobile/commands.md
+ - Tutorials & Development:
+ - Creating Custom MCP Servers: tutorials/creating_mcp_servers.md
+ - Creating Custom Third-Party Agents: tutorials/creating_third_party_agents.md
+ - Creating Custom Device Agents:
+ - Overview: tutorials/creating_device_agent/overview.md
+ - Index: tutorials/creating_device_agent/index.md
+ - Client Setup: tutorials/creating_device_agent/client_setup.md
+ - Core Components: tutorials/creating_device_agent/core_components.md
+ - MCP Server: tutorials/creating_device_agent/mcp_server.md
+ - Configuration: tutorials/creating_device_agent/configuration.md
+ - Testing: tutorials/creating_device_agent/testing.md
+ - Example Mobile Agent: tutorials/creating_device_agent/example_mobile_agent.md
+ - Enhancing AppAgent Capabilities:
+ - Overview: tutorials/creating_app_agent/overview.md
+ - Help Document Provision: tutorials/creating_app_agent/help_document_provision.md
+ - Demonstration Provision: tutorials/creating_app_agent/demonstration_provision.md
+ - Wrapping App-Native APIs: tutorials/creating_app_agent/warpping_app_native_api.md
+ - Infrastructure:
+ - Basic Modules:
+ - Overview: infrastructure/modules/overview.md
+ - Session: infrastructure/modules/session.md
+ - Round: infrastructure/modules/round.md
+ - Context: infrastructure/modules/context.md
+ - Dispatcher: infrastructure/modules/dispatcher.md
+ - Session Factory & Pool: infrastructure/modules/session_pool.md
+ - Platform Sessions: infrastructure/modules/platform_sessions.md
+ - Device Agent Architecture:
+ - Overview: infrastructure/agents/overview.md
+ - Agent Types & Implementation: infrastructure/agents/agent_types.md
+ - Server-Client Architecture: infrastructure/agents/server_client_architecture.md
+ - Architecture Layers:
+ - State Layer (Level-1): infrastructure/agents/design/state.md
+ - Strategy Layer (Level-2): infrastructure/agents/design/processor.md
+ - Strategy Components: infrastructure/agents/design/strategy.md
+ - Command Layer (Level-3): infrastructure/agents/design/command.md
+ - Supporting Systems:
+ - Memory System: infrastructure/agents/design/memory.md
+ - Blackboard: infrastructure/agents/design/blackboard.md
+ - Prompter: infrastructure/agents/design/prompter.md
+ - Agent Interaction Protocol (AIP):
+ - Overview: aip/overview.md
+ - Message Reference: aip/messages.md
+ - Protocol Guide: aip/protocols.md
+ - Transport Layer: aip/transport.md
+ - Endpoints: aip/endpoints.md
+ - Resilience: aip/resilience.md
+ - Agent Server:
+ - Overview: server/overview.md
+ - Quick Start: server/quick_start.md
+ - Session Manager: server/session_manager.md
+ - WebSocket Handler: server/websocket_handler.md
+ - Client Connection Manager: server/client_connection_manager.md
+ - HTTP API: server/api.md
+ - Monitoring: server/monitoring.md
+ - Agent Client:
+ - Overview: client/overview.md
+ - Quick Start: client/quick_start.md
+ - WebSocket Client: client/websocket_client.md
+ - UFO Client: client/ufo_client.md
+ - Computer Manager: client/computer_manager.md
+ - Computer: client/computer.md
+ - Device Info Provider: client/device_info.md
+ - MCP Integration: client/mcp_integration.md
+ - MCP (Model Context Protocol):
+ - Overview: mcp/overview.md
+ - Data Collection Servers: mcp/data_collection.md
+ - Action Servers: mcp/action.md
+ - Configuration Guide: mcp/configuration.md
+ - Local Servers: mcp/local_servers.md
+ - Remote Servers: mcp/remote_servers.md
+ - Server Reference:
+ - UICollector: mcp/servers/ui_collector.md
+ - HostUIExecutor: mcp/servers/host_ui_executor.md
+ - AppUIExecutor: mcp/servers/app_ui_executor.md
+ - CommandLineExecutor: mcp/servers/command_line_executor.md
+ - WordCOMExecutor: mcp/servers/word_com_executor.md
+ - ExcelCOMExecutor: mcp/servers/excel_com_executor.md
+ - PowerPointCOMExecutor: mcp/servers/ppt_com_executor.md
+ - PDFReaderExecutor: mcp/servers/pdf_reader_executor.md
+ - ConstellationEditor: mcp/servers/constellation_editor.md
+ - HardwareExecutor: mcp/servers/hardware_executor.md
+ - BashExecutor: mcp/servers/bash_executor.md
+ - MobileExecutor: mcp/servers/mobile_executor.md
- About:
- Contributing: about/CONTRIBUTING.md
- License: about/LICENSE.md
@@ -107,19 +247,37 @@ markdown_extensions:
- pymdownx.tasklist
- admonition
-# theme:
-# name: material
-# palette:
-# primary: blue
-# accent: light-blue
-# font:
-# text: Roboto
-# code: Roboto Mono
-
theme:
name: readthedocs
analytics:
- gtag: G-FX17ZGJYGC
+ favicon: ./assets/ufo_blue.png
+ features:
+ - content.code.annotate
+ - content.code.copy
+ - content.code.select
+ - content.tooltips
+ - content.tabs.link
+
+extra_javascript:
+ - https://unpkg.com/mermaid@10.6.1/dist/mermaid.min.js
+ - javascripts/mermaid-init.js
+ - https://polyfill.io/v3/polyfill.min.js?features=es6
+ - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
+
+
+markdown_extensions:
+ - admonition
+ - attr_list
+ - md_in_html
+ - pymdownx.arithmatex:
+ generic: true
+ - pymdownx.superfences:
+ custom_fences:
+ - name: mermaid
+ class: mermaid
+ format: !!python/name:pymdownx.superfences.fence_div_format
+
plugins:
@@ -127,15 +285,9 @@ plugins:
- mkdocstrings:
handlers:
python:
- paths: ["../ufo", "../record_processor", "../dataflow"]
+ paths: ["../ufo", "../record_processor", "../dataflow", "../config", "../aip", ".."]
options:
docstring_style: sphinx
docstring_section_style: list
merge_init_into_class: true
show_docstring_returns: true
-
-
-
-
- # logo: ./assets/ufo_blue.png
-favicon: ./assets/ufo_blue.png
diff --git a/galaxy/README.md b/galaxy/README.md
new file mode 100644
index 000000000..860659b1c
--- /dev/null
+++ b/galaxy/README.md
@@ -0,0 +1,751 @@
+
+
+
+ UFO³ : Weaving the Digital Agent Galaxy
+
+
+ Cross-Device Orchestration Framework for Ubiquitous Intelligent Automation
+
+
+---
+
+## 🌟 What is UFO³ Galaxy?
+
+**UFO³ Galaxy** is a revolutionary **cross-device orchestration framework** that transforms isolated device agents into a unified digital ecosystem. It models complex user requests as **Task Constellations** (星座) — dynamic distributed DAGs where nodes represent executable subtasks and edges capture dependencies across heterogeneous devices.
+
+### 🎯 The Vision
+
+Building truly ubiquitous intelligent agents requires moving beyond single-device automation. UFO³ Galaxy addresses four fundamental challenges in cross-device agent orchestration:
+
+
+
+
+
+**🔄 Asynchronous Parallelism**
+Enabling concurrent task execution across multiple devices while maintaining correctness through event-driven coordination and safe concurrency control
+
+**⚡ Dynamic Adaptation**
+Real-time workflow evolution in response to intermediate results, transient failures, and runtime observations without workflow abortion
+
+
+
+
+**🌐 Distributed Coordination**
+Reliable, low-latency communication across heterogeneous devices via WebSocket-based Agent Interaction Protocol with fault tolerance
+
+**🛡️ Safety Guarantees**
+Formal invariants ensuring DAG consistency during concurrent modifications and parallel execution, verified through rigorous proofs
+
+
+
+
+
+---
+
+## ✨ Key Innovations
+
+UFO³ Galaxy realizes cross-device orchestration through five tightly integrated design principles:
+
+---
+
+### 🌟 Declarative Decomposition into Dynamic DAG
+
+User requests are decomposed by the **ConstellationAgent** into a structured DAG of **TaskStars** (nodes) and **TaskStarLines** (edges) encoding workflow logic, dependencies, and device assignments.
+
+**Key Benefits:** Declarative structure for automated scheduling • Runtime introspection • Dynamic rewriting • Cross-device orchestration
+
+
+
+
+
+---
+
+
+
+
+
+### 🔄 Continuous Result-Driven Graph Evolution
+
+The **TaskConstellation** evolves dynamically in response to execution feedback, intermediate results, and failures through controlled DAG rewrites.
+
+**Adaptation Mechanisms:**
+- 🩺 Diagnostic TaskStars for debugging
+- 🛡️ Fallback creation for error recovery
+- 🔗 Dependency rewiring for optimization
+- ✂️ Node pruning after completion
+
+Enables resilient adaptation instead of workflow abortion.
+
+
+
+
+### ⚡ Heterogeneous, Asynchronous & Safe Orchestration
+
+Tasks are matched to optimal devices via **AgentProfiles** (OS, hardware, tools) and executed asynchronously in parallel.
+
+**Safety Guarantees:**
+- 🔒 Safe assignment locking (no race conditions)
+- 📅 Event-driven scheduling (DAG readiness)
+- ✅ DAG consistency checks (structural integrity)
+- 🔄 Batched edits (atomicity)
+- 📐 Formal verification (provable correctness)
+
+Ensures high efficiency with reliability.
+
+
+
+ 🎯 Together, these designs enable UFO³ to decompose, schedule, execute, and adapt distributed tasks efficiently while maintaining safety and consistency across heterogeneous devices.
+
+
+---
+
+## 🎥 Demo Video
+
+See UFO³ Galaxy in action with this comprehensive demonstration of cross-device orchestration:
+
+
🎬 Click to watch: Multi-device workflow orchestration with UFO³ Galaxy
+
+
+---
+
+## 🏗️ Architecture Overview
+
+
+
+
UFO³ Galaxy Layered Architecture — From natural language to distributed execution
+
+
+### Hierarchical Design
+
+
+
+
+
+#### 🎛️ Control Plane
+
+| Component | Role |
+|-----------|------|
+| **🌐 ConstellationClient** | Global device registry with capability profiles |
+| **🖥️ Device Agents** | Local orchestration with unified MCP tools |
+| **🔒 Clean Separation** | Global policies & device independence |
+
+
+
+
+#### 🔄 Execution Workflow
+
+
+
+
+
+
+
+
+
+---
+
+## 🚀 Quick Start
+
+### 🛠️ Step 1: Installation
+
+```powershell
+# Clone repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+
+# Create environment (recommended)
+conda create -n ufo3 python=3.10
+conda activate ufo3
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### ⚙️ Step 2: Configure ConstellationAgent LLM
+
+UFO³ Galaxy uses a **ConstellationAgent** that orchestrates all device agents. Configure its LLM settings:
+
+```powershell
+# Create configuration from template
+copy config\galaxy\agent.yaml.template config\galaxy\agent.yaml
+notepad config\galaxy\agent.yaml
+```
+
+**Configuration File Location:**
+```
+config/galaxy/
+├── agent.yaml.template # Template - COPY THIS
+├── agent.yaml # Your config with API keys (DO NOT commit)
+└── devices.yaml # Device pool configuration (Step 4)
+```
+
+**OpenAI Configuration:**
+```yaml
+CONSTELLATION_AGENT:
+ REASONING_MODEL: false
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_VERSION: "2025-02-01-preview"
+ API_MODEL: "gpt-5-chat-20251003"
+ # ... (prompt configurations use defaults)
+```
+
+**Azure OpenAI Configuration:**
+```yaml
+CONSTELLATION_AGENT:
+ REASONING_MODEL: false
+ API_TYPE: "aoai"
+ API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
+ API_KEY: "YOUR_AOAI_KEY"
+ API_VERSION: "2024-02-15-preview"
+ API_MODEL: "gpt-5-chat-20251003"
+ API_DEPLOYMENT_ID: "YOUR_DEPLOYMENT_ID"
+ # ... (prompt configurations use defaults)
+```
+
+### 🖥️ Step 3: Configure Device Agents
+
+Each device agent (Windows/Linux) needs its own LLM configuration to execute tasks.
+
+```powershell
+# Configure device agent LLMs
+copy config\ufo\agents.yaml.template config\ufo\agents.yaml
+notepad config\ufo\agents.yaml
+```
+
+**Configuration File Location:**
+```
+config/ufo/
+├── agents.yaml.template # Template - COPY THIS
+└── agents.yaml # Device agent LLM config (DO NOT commit)
+```
+
+**Example Configuration:**
+```yaml
+HOST_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "openai" # or "aoai" for Azure OpenAI
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4o"
+
+APP_AGENT:
+ VISUAL_MODE: true
+ API_TYPE: "openai"
+ API_BASE: "https://api.openai.com/v1/chat/completions"
+ API_KEY: "sk-YOUR_KEY_HERE"
+ API_MODEL: "gpt-4o"
+```
+
+> **💡 Tip:** You can use the same API key and model for both ConstellationAgent (Step 2) and device agents (Step 3).
+
+### 🌐 Step 4: Configure Device Pool
+
+```powershell
+# Configure available devices
+copy config\galaxy\devices.yaml.template config\galaxy\devices.yaml
+notepad config\galaxy\devices.yaml
+```
+
+**Example Device Configuration:**
+```yaml
+devices:
+ # Windows Device (UFO²)
+ - device_id: "windows_device_1" # Must match --client-id
+ server_url: "ws://localhost:5000/ws" # Must match server WebSocket URL
+ os: "windows"
+ capabilities:
+ - "desktop_automation"
+ - "office_applications"
+ - "excel"
+ - "word"
+ - "outlook"
+ - "email"
+ - "web_browsing"
+ metadata:
+ os: "windows"
+ version: "11"
+ performance: "high"
+ installed_apps:
+ - "Microsoft Excel"
+ - "Microsoft Word"
+ - "Microsoft Outlook"
+ - "Google Chrome"
+ description: "Primary Windows desktop for office automation"
+ auto_connect: true
+ max_retries: 5
+
+ # Linux Device
+ - device_id: "linux_device_1" # Must match --client-id
+ server_url: "ws://localhost:5001/ws" # Must match server WebSocket URL
+ os: "linux"
+ capabilities:
+ - "server_management"
+ - "log_analysis"
+ - "file_operations"
+ - "database_operations"
+ metadata:
+ os: "linux"
+ performance: "medium"
+ logs_file_path: "/var/log/myapp/app.log"
+ dev_path: "/home/user/projects/"
+ warning_log_pattern: "WARN"
+ error_log_pattern: "ERROR|FATAL"
+ description: "Development server for backend operations"
+ auto_connect: true
+ max_retries: 5
+```
+
+> **⚠️ Critical: IDs and URLs Must Match**
+> - `device_id` **must exactly match** the `--client-id` flag
+> - `server_url` **must exactly match** the server WebSocket URL
+> - Otherwise, Galaxy cannot control the device!
+
+### 🖥️ Step 5: Start Device Agents
+
+Galaxy orchestrates **device agents** that execute tasks on individual machines. You need to start the appropriate device agents based on your needs.
+
+#### Example: Quick Windows Device Setup
+
+**On your Windows machine:**
+
+```powershell
+# Terminal 1: Start UFO² Server
+python -m ufo.server.app --port 5000
+
+# Terminal 2: Start UFO² Client (connect to server)
+python -m ufo.client.client `
+ --ws `
+ --ws-server ws://localhost:5000/ws `
+ --client-id windows_device_1 `
+ --platform windows
+```
+
+> **⚠️ Important: Platform Flag Required**
+> Always include `--platform windows` for Windows devices and `--platform linux` for Linux devices!
+
+#### Example: Quick Linux Device Setup
+
+**On your Linux machine:**
+
+```bash
+# Terminal 1: Start Device Agent Server
+python -m ufo.server.app --port 5001
+
+# Terminal 2: Start Linux Client (connect to server)
+python -m ufo.client.client \
+ --ws \
+ --ws-server ws://localhost:5001/ws \
+ --client-id linux_device_1 \
+ --platform linux
+
+# Terminal 3: Start HTTP MCP Server (for Linux tools)
+python -m ufo.client.mcp.http_servers.linux_mcp_server
+```
+
+**📖 Detailed Setup Instructions:**
+- **For Windows devices (UFO²):** See [UFO² as Galaxy Device](../documents/docs/ufo2/as_galaxy_device.md)
+- **For Linux devices:** See [Linux as Galaxy Device](../documents/docs/linux/as_galaxy_device.md)
+
+### 🌌 Step 6: Launch Galaxy Client
+
+#### 🎨 Interactive WebUI Mode (Recommended)
+
+Launch Galaxy with an interactive web interface for real-time constellation visualization and monitoring:
+
+```powershell
+python -m galaxy --webui
+```
+
+This will start the Galaxy server with WebUI and open your browser to the interactive interface:
+
+
+
+
🎨 Galaxy WebUI - Interactive constellation visualization and chat interface
+
+
+**WebUI Features:**
+- 🗣️ **Chat Interface**: Submit requests and interact with ConstellationAgent in real-time
+- 📊 **Live DAG Visualization**: Watch task constellation formation and execution
+- 🎯 **Task Status Tracking**: Monitor each TaskStar's progress and completion
+- 🔄 **Dynamic Updates**: See constellation evolution as tasks complete
+- 📱 **Responsive Design**: Works on desktop and tablet devices
+
+**Default URL:** `http://localhost:8000` (automatically finds next available port if 8000 is occupied)
+
+---
+
+#### 💬 Interactive Terminal Mode
+
+For command-line interaction:
+
+```powershell
+python -m galaxy --interactive
+```
+
+---
+
+#### ⚡ Direct Request Mode
+
+Execute a single request and exit:
+
+```powershell
+python -m galaxy --request "Extract data from Excel on Windows, process with Python on Linux, and generate visualization report"
+```
+
+---
+
+#### 🔧 Programmatic API
+
+Embed Galaxy in your Python applications:
+
+```python
+from galaxy.galaxy_client import GalaxyClient
+
+async def main():
+ # Initialize client
+ client = GalaxyClient(session_name="data_pipeline")
+ await client.initialize()
+
+ # Execute cross-device workflow
+ result = await client.process_request(
+ "Download sales data, analyze trends, generate executive summary"
+ )
+
+ # Access constellation details
+ constellation = client.session.constellation
+ print(f"Tasks executed: {len(constellation.tasks)}")
+ print(f"Devices used: {set(t.assigned_device for t in constellation.tasks)}")
+
+ await client.shutdown()
+
+import asyncio
+asyncio.run(main())
+```
+
+---
+
+## 🎯 Use Cases
+
+### 🖥️ Software Development & CI/CD
+
+**Request:**
+*"Clone repository on Windows, build Docker image on Linux GPU server, deploy to staging, and run test suite on CI cluster"*
+
+**Constellation Workflow:**
+```
+Clone (Windows) → Build (Linux GPU) → Deploy (Linux Server) → Test (Linux CI)
+```
+
+**Benefit:** Parallel execution reduces pipeline time by 60%
+
+---
+
+### 📊 Data Science Workflows
+
+**Request:**
+*"Fetch dataset from cloud storage, preprocess on Linux workstation, train model on A100 node, visualize results on Windows"*
+
+**Constellation Workflow:**
+```
+Fetch (Any) → Preprocess (Linux) → Train (Linux GPU) → Visualize (Windows)
+```
+
+**Benefit:** Automatic GPU detection and optimal device assignment
+
+---
+
+### 📝 Cross-Platform Document Processing
+
+**Request:**
+*"Extract data from Excel on Windows, process with Python on Linux, generate PDF report, and email summary"*
+
+**Constellation Workflow:**
+```
+Extract (Windows) → Process (Linux) ┬→ Generate PDF (Windows)
+ └→ Send Email (Windows)
+```
+
+**Benefit:** Parallel report generation and email delivery
+
+---
+
+### 🔬 Distributed System Monitoring
+
+**Request:**
+*"Collect server logs from all Linux machines, analyze for errors, generate alerts, create consolidated report"*
+
+**Constellation Workflow:**
+```
+┌→ Collect (Linux 1) ┐
+├→ Collect (Linux 2) ├→ Analyze (Any) → Report (Windows)
+└→ Collect (Linux 3) ┘
+```
+
+**Benefit:** Parallel log collection with automatic aggregation
+
+---
+
+## 🌐 System Capabilities
+
+Building on the five design principles, UFO³ Galaxy delivers powerful capabilities for distributed automation:
+
+
+
+
+
+### ⚡ Efficient Parallel Execution
+- **Event-driven scheduling** monitors DAG for ready tasks
+- **Non-blocking execution** with Python `asyncio`
+- **Dynamic task integration** without workflow interruption
+- **Result:** Up to 70% reduction in end-to-end latency compared to sequential execution
+
+---
+
+### 🛡️ Formal Safety Guarantees
+- **Three formal invariants (I1-I3)** ensure DAG correctness
+- **Safe assignment locking** prevents race conditions
+- **Acyclicity validation** eliminates circular dependencies
+- **State merging** preserves progress during runtime modifications
+- **Formally verified** through rigorous mathematical proofs
+
+
+
+
+### 🔄 Intelligent Adaptation
+- **Dual-mode ConstellationAgent** (creation/editing) with FSM control
+- **Result-driven evolution** based on execution feedback
+- **LLM-powered reasoning** via ReAct architecture
+- **Automatic error recovery** through diagnostic tasks and fallbacks
+- **Workflow optimization** via dynamic rewiring and pruning
+
+---
+
+### 👁️ Comprehensive Observability
+- **Real-time visualization** of constellation structure and execution
+- **Event-driven updates** via publish-subscribe pattern
+- **Rich execution logs** with markdown trajectories
+- **Status tracking** for each TaskStar and dependency
+- **Interactive WebUI** for monitoring and control
+
+
+
+
+
+---
+
+### 🔌 Extensibility & Platform Independence
+
+UFO³ is designed as a **universal orchestration framework** that seamlessly integrates heterogeneous device agents across platforms.
+
+**Multi-Platform Support:**
+- 🪟 **Windows** — Desktop automation via UFO²
+- 🐧 **Linux** — Server management, DevOps, data processing
+- 📱 **Android** — Mobile device automation via MCP
+- 🌐 **Web** — Browser-based agents (coming soon)
+- 🍎 **macOS** — Desktop automation (coming soon)
+- 🤖 **IoT/Embedded** — Edge devices and sensors (coming soon)
+
+**Developer-Friendly:**
+- 📦 **Lightweight template** for rapid agent development
+- 🧩 **MCP integration** for plug-and-play tool extension
+- 📖 **Comprehensive tutorials** and API documentation
+- 🔌 **AIP protocol** for seamless ecosystem integration
+
+**📖 Want to build your own device agent?** See our [Creating Custom Device Agents tutorial](../documents/docs/tutorials/creating_device_agent/overview.md) to learn how to extend UFO³ to new platforms.
+
+---
+
+## 📚 Documentation
+
+| Component | Description | Link |
+|-----------|-------------|------|
+| **Galaxy Client** | Device coordination and ConstellationClient API | [Learn More](../documents/docs/galaxy/client/overview.md) |
+| **Constellation Agent** | LLM-driven task decomposition and DAG evolution | [Learn More](../documents/docs/galaxy/constellation_agent/overview.md) |
+| **Task Orchestrator** | Asynchronous execution and safety guarantees | [Learn More](../documents/docs/galaxy/constellation_orchestrator/overview.md) |
+| **Task Constellation** | DAG structure and constellation editor | [Learn More](../documents/docs/galaxy/constellation/overview.md) |
+| **Agent Registration** | Device registry and agent profiles | [Learn More](../documents/docs/galaxy/agent_registration/overview.md) |
+| **AIP Protocol** | WebSocket messaging and communication patterns | [Learn More](../documents/docs/aip/overview.md) |
+| **Configuration** | Device pools and orchestration policies | [Learn More](../documents/docs/configuration/system/galaxy_devices.md) |
+| **Creating Device Agents** | Tutorial for building custom device agents | [Learn More](../documents/docs/tutorials/creating_device_agent/overview.md) |
+
+---
+
+## 📊 System Architecture
+
+### Core Components
+
+| Component | Location | Responsibility |
+|-----------|----------|----------------|
+| **GalaxyClient** | `galaxy/galaxy_client.py` | Session management, user interaction |
+| **ConstellationClient** | `galaxy/client/constellation_client.py` | Device registry, connection lifecycle |
+| **ConstellationAgent** | `galaxy/agents/constellation_agent.py` | DAG synthesis and evolution |
+| **TaskConstellationOrchestrator** | `galaxy/constellation/orchestrator/` | Asynchronous execution, safety enforcement |
+| **TaskConstellation** | `galaxy/constellation/task_constellation.py` | DAG data structure and validation |
+| **DeviceManager** | `galaxy/client/device_manager.py` | WebSocket connections, heartbeat monitoring |
+
+### Technology Stack
+
+| Layer | Technologies |
+|-------|-------------|
+| **Language** | Python 3.10+, asyncio, dataclasses |
+| **Communication** | WebSockets, JSON-RPC |
+| **LLM** | OpenAI, Azure OpenAI, Gemini, Claude |
+| **Tools** | Model Context Protocol (MCP) |
+| **Config** | YAML, Pydantic validation |
+| **Logging** | Rich console, Markdown trajectories |
+
+---
+
+## 🌟 From Devices to Galaxy
+
+UFO³ represents a paradigm shift in intelligent automation:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#E8F4F8','primaryTextColor':'#1A1A1A','primaryBorderColor':'#7CB9E8','lineColor':'#A8D5E2','secondaryColor':'#B8E6F0','tertiaryColor':'#D4F1F4','fontSize':'16px','fontFamily':'Segoe UI, Arial, sans-serif'}}}%%
+graph LR
+ A["🎈 UFO February 2024 GUI Agent for Windows"]
+ B["🖥️ UFO² April 2025 Desktop AgentOS"]
+ C["🌌 UFO³ Galaxy November 2025 Multi-Device Orchestration"]
+
+ A -->|Evolve| B
+ B -->|Scale| C
+
+ style A fill:#E8F4F8,stroke:#7CB9E8,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+ style B fill:#C5E8F5,stroke:#5BA8D0,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+ style C fill:#A4DBF0,stroke:#3D96BE,stroke-width:2.5px,color:#1A1A1A,rx:15,ry:15
+```
+
+Over time, multiple constellations interconnect, forming a self-organizing **Digital Agent Galaxy** where devices, agents, and capabilities weave together into adaptive, resilient, and intelligent ubiquitous computing systems.
+
+---
+
+## 📄 Citation
+
+If you use UFO³ Galaxy in your research, please cite:
+
+**UFO³ Galaxy Framework:**
+```bibtex
+@article{zhang2025ufo3,
+ title={UFO$^3$: Weaving the Digital Agent Galaxy},
+ author = {Zhang, Chaoyun and Li, Liqun and Huang, He and Ni, Chiming and Qiao, Bo and Qin, Si and Kang, Yu and Ma, Minghua and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2511.11332},
+ year = {2025},
+}
+```
+
+**UFO² Desktop AgentOS:**
+```bibtex
+@article{zhang2025ufo2,
+ title = {{UFO2: The Desktop AgentOS}},
+ author = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
+ journal = {arXiv preprint arXiv:2504.14603},
+ year = {2025}
+}
+```
+
+**First UFO:**
+```bibtex
+@article{zhang2024ufo,
+ title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
+ author = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
+ journal = {arXiv preprint arXiv:2402.07939},
+ year = {2024}
+}
+```
+
+---
+
+## 🤝 Contributing
+
+We welcome contributions! Whether building new device agents, improving orchestration algorithms, or enhancing the protocol:
+
+- 🐛 [Report Issues](https://github.com/microsoft/UFO/issues)
+- 💡 [Request Features](https://github.com/microsoft/UFO/discussions)
+- 📝 [Improve Documentation](https://github.com/microsoft/UFO/pulls)
+- 🧪 [Submit Pull Requests](../../CONTRIBUTING.md)
+
+---
+
+## 📬 Contact & Support
+
+- 📖 **Documentation**: [https://microsoft.github.io/UFO/](https://microsoft.github.io/UFO/)
+- 💬 **Discussions**: [GitHub Discussions](https://github.com/microsoft/UFO/discussions)
+- 🐛 **Issues**: [GitHub Issues](https://github.com/microsoft/UFO/issues)
+- 📧 **Email**: [ufo-agent@microsoft.com](mailto:ufo-agent@microsoft.com)
+
+---
+
+## ⚖️ License
+
+UFO³ Galaxy is released under the [MIT License](../../LICENSE).
+
+See [DISCLAIMER.md](../../DISCLAIMER.md) for privacy and safety notices.
+
+---
+
+
+
Transform your distributed devices into a unified digital collective.
+
UFO³ Galaxy — Where every device is a star, and every task is a constellation.