Perry-Multi-LLM · alinarojas · May 5, 2026 · May 5, 2026
diff --git a/app/src/traceability/README.md b/app/src/traceability/README.md
@@ -0,0 +1,170 @@
+# Traceability & Observability
+
+<br>
+
+<div align="center">
+  <a href="https://logfire.pydantic.dev/docs/" target="_blank">
+    <img src="https://img.shields.io/badge/Pydantic_Logfire-FF5A5F?style=for-the-badge&logo=pydantic&logoColor=white" alt="Logfire Logo">
+  </a>
+</div>
+
+<br>
+
+This project implements robust observability using **[Pydantic Logfire](https://logfire.pydantic.dev/docs/)**. 
+
+Logfire allows us to trace the complete lifecycle of our LLM calls, internal agent logic and data validations, providing a clear **live dashboard** for debugging and performance monitoring.
+
+<div align="center">
+  <img src="./img/logfire-dashboard.jpeg" alt="Logfire General Dashboard">
+  <p><em>Overview of the Logfire tracing dashboard</em></p>
+</div>
+
+---
+
+## Configuration Setup
+
+Logfire is centrally configured in the `src/traceability/logfire_config.py` file. It ensures that the telemetry is initialized only once during the application's runtime.
+
+```python
+import os
+import logfire
+
+_configured = False
+
+def configure_logfire() -> None:
+    global _configured
+
+    if _configured:
+        return
+
+    logfire.configure(
+        token=os.getenv("LOGFIRE_TOKEN"),
+        service_name=os.getenv("LOGFIRE_SERVICE_NAME", "perry-ai"),
+        environment=os.getenv("LOGFIRE_ENVIRONMENT", "development"),
+    )
+
+    # Automatic instrumentation for Pydantic AI (LLM Calls)
+    logfire.instrument_pydantic_ai()
+
+    # Automatic instrumentation for all Pydantic validations
+    logfire.instrument_pydantic(record="all")
+
+    _configured = True
+```
+
+### Environment variables
+To enable Logfire, the following variables must be present in the `.env` file:
+
+*   `LOGFIRE_TOKEN`: This is the authentication token generated from the Logfire dashboard. To generate one:
+    1. Log into your Pydantic Logfire account and select your project.
+    2. Navigate to **Project settings** > **Write tokens** > **New**.
+    3. Enter a **Description** to identify your token.
+    4. Set the **Expiration** (e.g., "No expiration").
+    5. Click the **Create token** button and copy the generated value.
+
+*   `LOGFIRE_SERVICE_NAME`: Set to `perry-ai` by default to identify the source of the logs.
+
+*   `LOGFIRE_ENVIRONMENT`: Identifies the deployment context (`development`, `staging`, `production`).
+
+---
+
+## How to use Logfire in the code
+
+We use Logfire to explicitly track different flows, like the flows from the specific agents. 
+
+Here is how the different logging methods should be used when developing new features:
+
+### 1. Contextual spans (`logfire.span`)
+Use spans to measure the duration and track the internal steps of a specific block of code (like an API call or an LLM execution). 
+```python
+# The span automatically measures how long the block takes to execute
+with logfire.span(
+    "FinanceAgent act",
+    agent_name=self.state.agent_name,
+    sender=last_msg.sender
+):
+    # Code executed inside this context will be grouped in the Logfire dashboard
+    result = await self.agent.run(last_msg.content)
+```
+<div align="center">
+  <img src="./img/logfire-span.jpeg" alt="Logfire Open Span">
+  <p><em>Example of an open span showing execution details and metadata</em></p>
+</div>
+
+### 2. Informational events (`logfire.info`)
+Use this for standard events that indicate the normal flow of the application. Always include relevant metadata.
+```python
+logfire.info(
+    "Pending payment created successfully",
+    agent_name=self.state.agent_name,
+    pending_id=pending_id,
+    amount=data.amount
+)
+```
+
+### 3. Warnings (`logfire.warning`)
+Use this for non-fatal errors or unexpected states that do not stop the execution but require attention.
+```python
+logfire.warning(
+    "FinanceAgent called without memory",
+    agent_name=self.state.agent_name
+)
+```
+
+### 4. Errors without Exceptions (`logfire.error`)
+Use this when a business logic failure occurs (e.g., an external API returns a controlled `{success: false}` payload) but Python itself has not raised an Exception.
+```python
+logfire.error(
+    "Bank balance retrieval failed",
+    agent_name=self.state.agent_name,
+    message=res.get("message")
+)
+```
+
+### 5. Caught Exceptions (`logfire.exception`)
+Use this specifically inside `except` blocks. Logfire will automatically capture the full Python stack trace and send it to the dashboard.
+```python
+try:
+    res = await self.mcp.get_bank_balance()
+except Exception as balance_err:
+    logfire.exception(
+        "Odoo balance check error",
+        agent_name=self.state.agent_name,
+        error=str(balance_err)
+    )
+```
+
+---
+
+## Maintaining observability in new features
+
+When adding new functionalities (like a new Agent or a new MCP Tool), follow these strict guidelines to ensure end-to-end traceability:
+
+1.  **Instrument the initialization:** Add a `logfire.span` or `logfire.info` in the `__init__` method of any new major class or agent.
+
+2.  **Wrap main logic blocks in spans:** Any method that involves network requests, LLM calls or heavy processing should be enclosed in a `with logfire.span("Context Name"):` block.
+
+3.  **Inject context metrics:** Do not just log strings. Always pass relevant key-value pairs to the logfire functions (e.g., `agent_name`, `user_id`, `record_id`, `execution_time`). This is crucial for filtering and searching logs in the dashboard.
+
+4.  **Handle exceptions gracefully:** Ensure that every major `try/except` block utilizes `logfire.exception()` before yielding or returning a `Failed` state to the user. This guarantees that no silent errors occur.
+
+--- 
+
+### Navigating the Logfire Dashboard
+
+Once the local application is running and generating traffic, you can view the traces in real-time:
+
+<div align="center">
+  <img src="./img/logfire-live.png" alt="Logfire Live Dashboard">
+  <p><em>The 'Live' view in Logfire showing a timeline and captured spans</em></p>
+</div>
+
+To inspect the logs effectively:
+
+1. Log in to your **[Pydantic Logfire Dashboard](https://logfire.pydantic.dev/)** and select the active project.
+
+2. In the left sidebar, under the **OBSERVE** section, click on **Live** (to see logs coming in right now) or **Explore** (to search through past logs).
+
+3. **Locate Spans:** Look for entries in the list that have a blue `+ [number]` badge next to them (e.g., `+ 95`). This indicates a **Span** containing multiple nested steps.
+
+4. **Inspect details:** Click anywhere on that row to expand it. A side panel will open showing the complete execution tree, exact LLM prompts, model responses and all the contextual variables (like `agent_name` or `amount`) that are injected in the code.
diff --git a/app/src/traceability/img/logfire-dashboard.jpeg b/app/src/traceability/img/logfire-dashboard.jpeg
diff --git a/app/src/traceability/img/logfire-live.png b/app/src/traceability/img/logfire-live.png
diff --git a/app/src/traceability/img/logfire-span.jpeg b/app/src/traceability/img/logfire-span.jpeg