Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions app/src/traceability/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Traceability & Observability

<br>

<div align="center">
<a href="https://logfire.pydantic.dev/docs/" target="_blank">
<img src="https://img.shields.io/badge/Pydantic_Logfire-FF5A5F?style=for-the-badge&logo=pydantic&logoColor=white" alt="Logfire Logo">
</a>
</div>

<br>

This project implements robust observability using **[Pydantic Logfire](https://logfire.pydantic.dev/docs/)**.

Logfire allows us to trace the complete lifecycle of our LLM calls, internal agent logic and data validations, providing a clear **live dashboard** for debugging and performance monitoring.

<div align="center">
<img src="./img/logfire-dashboard.jpeg" alt="Logfire General Dashboard">
<p><em>Overview of the Logfire tracing dashboard</em></p>
</div>

---

## Configuration Setup

Logfire is centrally configured in the `src/traceability/logfire_config.py` file. It ensures that the telemetry is initialized only once during the application's runtime.

```python
import os
import logfire

_configured = False

def configure_logfire() -> None:
global _configured

if _configured:
return

logfire.configure(
token=os.getenv("LOGFIRE_TOKEN"),
service_name=os.getenv("LOGFIRE_SERVICE_NAME", "perry-ai"),
environment=os.getenv("LOGFIRE_ENVIRONMENT", "development"),
)

# Automatic instrumentation for Pydantic AI (LLM Calls)
logfire.instrument_pydantic_ai()

# Automatic instrumentation for all Pydantic validations
logfire.instrument_pydantic(record="all")

_configured = True
```

### Environment variables
To enable Logfire, the following variables must be present in the `.env` file:

* `LOGFIRE_TOKEN`: This is the authentication token generated from the Logfire dashboard. To generate one:
1. Log into your Pydantic Logfire account and select your project.
2. Navigate to **Project settings** > **Write tokens** > **New**.
3. Enter a **Description** to identify your token.
4. Set the **Expiration** (e.g., "No expiration").
5. Click the **Create token** button and copy the generated value.

* `LOGFIRE_SERVICE_NAME`: Set to `perry-ai` by default to identify the source of the logs.

* `LOGFIRE_ENVIRONMENT`: Identifies the deployment context (`development`, `staging`, `production`).

---

## How to use Logfire in the code

We use Logfire to explicitly track different flows, like the flows from the specific agents.

Here is how the different logging methods should be used when developing new features:

### 1. Contextual spans (`logfire.span`)
Use spans to measure the duration and track the internal steps of a specific block of code (like an API call or an LLM execution).
```python
# The span automatically measures how long the block takes to execute
with logfire.span(
"FinanceAgent act",
agent_name=self.state.agent_name,
sender=last_msg.sender
):
# Code executed inside this context will be grouped in the Logfire dashboard
result = await self.agent.run(last_msg.content)
```
<div align="center">
<img src="./img/logfire-span.jpeg" alt="Logfire Open Span">
<p><em>Example of an open span showing execution details and metadata</em></p>
</div>

### 2. Informational events (`logfire.info`)
Use this for standard events that indicate the normal flow of the application. Always include relevant metadata.
```python
logfire.info(
"Pending payment created successfully",
agent_name=self.state.agent_name,
pending_id=pending_id,
amount=data.amount
)
```

### 3. Warnings (`logfire.warning`)
Use this for non-fatal errors or unexpected states that do not stop the execution but require attention.
```python
logfire.warning(
"FinanceAgent called without memory",
agent_name=self.state.agent_name
)
```

### 4. Errors without Exceptions (`logfire.error`)
Use this when a business logic failure occurs (e.g., an external API returns a controlled `{success: false}` payload) but Python itself has not raised an Exception.
```python
logfire.error(
"Bank balance retrieval failed",
agent_name=self.state.agent_name,
message=res.get("message")
)
```

### 5. Caught Exceptions (`logfire.exception`)
Use this specifically inside `except` blocks. Logfire will automatically capture the full Python stack trace and send it to the dashboard.
```python
try:
res = await self.mcp.get_bank_balance()
except Exception as balance_err:
logfire.exception(
"Odoo balance check error",
agent_name=self.state.agent_name,
error=str(balance_err)
)
```

---

## Maintaining observability in new features

When adding new functionalities (like a new Agent or a new MCP Tool), follow these strict guidelines to ensure end-to-end traceability:

1. **Instrument the initialization:** Add a `logfire.span` or `logfire.info` in the `__init__` method of any new major class or agent.

2. **Wrap main logic blocks in spans:** Any method that involves network requests, LLM calls or heavy processing should be enclosed in a `with logfire.span("Context Name"):` block.

3. **Inject context metrics:** Do not just log strings. Always pass relevant key-value pairs to the logfire functions (e.g., `agent_name`, `user_id`, `record_id`, `execution_time`). This is crucial for filtering and searching logs in the dashboard.

4. **Handle exceptions gracefully:** Ensure that every major `try/except` block utilizes `logfire.exception()` before yielding or returning a `Failed` state to the user. This guarantees that no silent errors occur.

---

### Navigating the Logfire Dashboard

Once the local application is running and generating traffic, you can view the traces in real-time:

<div align="center">
<img src="./img/logfire-live.png" alt="Logfire Live Dashboard">
<p><em>The 'Live' view in Logfire showing a timeline and captured spans</em></p>
</div>

To inspect the logs effectively:

1. Log in to your **[Pydantic Logfire Dashboard](https://logfire.pydantic.dev/)** and select the active project.

2. In the left sidebar, under the **OBSERVE** section, click on **Live** (to see logs coming in right now) or **Explore** (to search through past logs).

3. **Locate Spans:** Look for entries in the list that have a blue `+ [number]` badge next to them (e.g., `+ 95`). This indicates a **Span** containing multiple nested steps.

4. **Inspect details:** Click anywhere on that row to expand it. A side panel will open showing the complete execution tree, exact LLM prompts, model responses and all the contextual variables (like `agent_name` or `amount`) that are injected in the code.
Binary file added app/src/traceability/img/logfire-dashboard.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/src/traceability/img/logfire-live.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/src/traceability/img/logfire-span.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading