diff --git a/app/src/traceability/README.md b/app/src/traceability/README.md new file mode 100644 index 0000000..28152fe --- /dev/null +++ b/app/src/traceability/README.md @@ -0,0 +1,170 @@ +# Traceability & Observability + +
+ +
+ + Logfire Logo + +
+ +
+ +This project implements robust observability using **[Pydantic Logfire](https://logfire.pydantic.dev/docs/)**. + +Logfire allows us to trace the complete lifecycle of our LLM calls, internal agent logic and data validations, providing a clear **live dashboard** for debugging and performance monitoring. + +
+ Logfire General Dashboard +

Overview of the Logfire tracing dashboard

+
+ +--- + +## Configuration Setup + +Logfire is centrally configured in the `src/traceability/logfire_config.py` file. It ensures that the telemetry is initialized only once during the application's runtime. + +```python +import os +import logfire + +_configured = False + +def configure_logfire() -> None: + global _configured + + if _configured: + return + + logfire.configure( + token=os.getenv("LOGFIRE_TOKEN"), + service_name=os.getenv("LOGFIRE_SERVICE_NAME", "perry-ai"), + environment=os.getenv("LOGFIRE_ENVIRONMENT", "development"), + ) + + # Automatic instrumentation for Pydantic AI (LLM Calls) + logfire.instrument_pydantic_ai() + + # Automatic instrumentation for all Pydantic validations + logfire.instrument_pydantic(record="all") + + _configured = True +``` + +### Environment variables +To enable Logfire, the following variables must be present in the `.env` file: + +* `LOGFIRE_TOKEN`: This is the authentication token generated from the Logfire dashboard. To generate one: + 1. Log into your Pydantic Logfire account and select your project. + 2. Navigate to **Project settings** > **Write tokens** > **New**. + 3. Enter a **Description** to identify your token. + 4. Set the **Expiration** (e.g., "No expiration"). + 5. Click the **Create token** button and copy the generated value. + +* `LOGFIRE_SERVICE_NAME`: Set to `perry-ai` by default to identify the source of the logs. + +* `LOGFIRE_ENVIRONMENT`: Identifies the deployment context (`development`, `staging`, `production`). + +--- + +## How to use Logfire in the code + +We use Logfire to explicitly track different flows, like the flows from the specific agents. + +Here is how the different logging methods should be used when developing new features: + +### 1. Contextual spans (`logfire.span`) +Use spans to measure the duration and track the internal steps of a specific block of code (like an API call or an LLM execution). +```python +# The span automatically measures how long the block takes to execute +with logfire.span( + "FinanceAgent act", + agent_name=self.state.agent_name, + sender=last_msg.sender +): + # Code executed inside this context will be grouped in the Logfire dashboard + result = await self.agent.run(last_msg.content) +``` +
+ Logfire Open Span +

Example of an open span showing execution details and metadata

+
+ +### 2. Informational events (`logfire.info`) +Use this for standard events that indicate the normal flow of the application. Always include relevant metadata. +```python +logfire.info( + "Pending payment created successfully", + agent_name=self.state.agent_name, + pending_id=pending_id, + amount=data.amount +) +``` + +### 3. Warnings (`logfire.warning`) +Use this for non-fatal errors or unexpected states that do not stop the execution but require attention. +```python +logfire.warning( + "FinanceAgent called without memory", + agent_name=self.state.agent_name +) +``` + +### 4. Errors without Exceptions (`logfire.error`) +Use this when a business logic failure occurs (e.g., an external API returns a controlled `{success: false}` payload) but Python itself has not raised an Exception. +```python +logfire.error( + "Bank balance retrieval failed", + agent_name=self.state.agent_name, + message=res.get("message") +) +``` + +### 5. Caught Exceptions (`logfire.exception`) +Use this specifically inside `except` blocks. Logfire will automatically capture the full Python stack trace and send it to the dashboard. +```python +try: + res = await self.mcp.get_bank_balance() +except Exception as balance_err: + logfire.exception( + "Odoo balance check error", + agent_name=self.state.agent_name, + error=str(balance_err) + ) +``` + +--- + +## Maintaining observability in new features + +When adding new functionalities (like a new Agent or a new MCP Tool), follow these strict guidelines to ensure end-to-end traceability: + +1. **Instrument the initialization:** Add a `logfire.span` or `logfire.info` in the `__init__` method of any new major class or agent. + +2. **Wrap main logic blocks in spans:** Any method that involves network requests, LLM calls or heavy processing should be enclosed in a `with logfire.span("Context Name"):` block. + +3. **Inject context metrics:** Do not just log strings. Always pass relevant key-value pairs to the logfire functions (e.g., `agent_name`, `user_id`, `record_id`, `execution_time`). This is crucial for filtering and searching logs in the dashboard. + +4. **Handle exceptions gracefully:** Ensure that every major `try/except` block utilizes `logfire.exception()` before yielding or returning a `Failed` state to the user. This guarantees that no silent errors occur. + +--- + +### Navigating the Logfire Dashboard + +Once the local application is running and generating traffic, you can view the traces in real-time: + +
+ Logfire Live Dashboard +

The 'Live' view in Logfire showing a timeline and captured spans

+
+ +To inspect the logs effectively: + +1. Log in to your **[Pydantic Logfire Dashboard](https://logfire.pydantic.dev/)** and select the active project. + +2. In the left sidebar, under the **OBSERVE** section, click on **Live** (to see logs coming in right now) or **Explore** (to search through past logs). + +3. **Locate Spans:** Look for entries in the list that have a blue `+ [number]` badge next to them (e.g., `+ 95`). This indicates a **Span** containing multiple nested steps. + +4. **Inspect details:** Click anywhere on that row to expand it. A side panel will open showing the complete execution tree, exact LLM prompts, model responses and all the contextual variables (like `agent_name` or `amount`) that are injected in the code. diff --git a/app/src/traceability/img/logfire-dashboard.jpeg b/app/src/traceability/img/logfire-dashboard.jpeg new file mode 100644 index 0000000..fc18f8f Binary files /dev/null and b/app/src/traceability/img/logfire-dashboard.jpeg differ diff --git a/app/src/traceability/img/logfire-live.png b/app/src/traceability/img/logfire-live.png new file mode 100644 index 0000000..533122d Binary files /dev/null and b/app/src/traceability/img/logfire-live.png differ diff --git a/app/src/traceability/img/logfire-span.jpeg b/app/src/traceability/img/logfire-span.jpeg new file mode 100644 index 0000000..430440c Binary files /dev/null and b/app/src/traceability/img/logfire-span.jpeg differ