diff --git a/app/src/guardrails.zip b/app/src/guardrails.zip new file mode 100644 index 0000000..95dddea Binary files /dev/null and b/app/src/guardrails.zip differ diff --git a/app/src/guardrails/README.md b/app/src/guardrails/README.md new file mode 100644 index 0000000..9ea44f1 --- /dev/null +++ b/app/src/guardrails/README.md @@ -0,0 +1,19 @@ +# Guardrails + +This module contains the guardrails layer used by the application to validate agent responses before they are returned to the user. + +The current implementation focuses on output validation. + +## Structure + +- `output_guard.py`: Defines the output guard and exposes the main validation entry point. +- `validators.py`: Defines the custom validators used by the guardrails layer. +- `docs/`: Contains detailed documentation for the guardrails module. + +## Documentation + +### Guardrails documentation + +- [Guardrails overview](docs/guardrails-overview.md) +- [Output validation](docs/output-validation.md) +- [Add a new guardrail](docs/add-new-guardrail.md) diff --git a/app/src/guardrails/docs/add-new-guardrail.md b/app/src/guardrails/docs/add-new-guardrail.md new file mode 100644 index 0000000..163973f --- /dev/null +++ b/app/src/guardrails/docs/add-new-guardrail.md @@ -0,0 +1,114 @@ +# Add a new guardrail + +This document explains how to add a new guardrail to the guardrails module. + +Guardrails are used to validate final agent responses before they are returned to the user. Add a new guardrail when a new output pattern needs to be detected, blocked or sanitized. + +## Location + +Validators are defined in: + +```text +src/guardrails/validators.py +```` + +The output guard is configured in: + +```text +src/guardrails/output_guard.py +``` + +## Add the validator + +Create a new validator in `validators.py`. + +Example: + +```python +import re + +from guardrails import register_validator +from guardrails.validator_base import ( + FailResult, + PassResult, + ValidationResult, + Validator, +) + + +@register_validator(name="no_internal_example", data_type="string") +class NoInternalExample(Validator): + def validate(self, value: str, metadata: dict) -> ValidationResult: + pattern = r"internal_example_[a-zA-Z0-9_]+" + + if re.search(pattern, value): + fixed_value = re.sub( + pattern, + "[hidden internal field]", + value, + ) + + return FailResult( + error_message="Internal example field detected.", + fix_value=fixed_value, + ) + + return PassResult() +``` + +## Add it to the output guard + +After creating the validator, include it in the output guard used by: + +```python +validate_agent_output(...) +``` + +This is configured in: + +```text +src/guardrails/output_guard.py +``` + +## Expected behavior + +A guardrail should: + +* Check the final agent response +* Return `PassResult()` when the response is valid +* Return `FailResult(...)` when the response must be sanitized +* Provide a `fix_value` when automatic replacement is possible + +## Test it + +Test at least one valid response and one response that should be sanitized. + +Example: + +```python +from src.guardrails.output_guard import validate_agent_output + + +validate_agent_output( + "The payment was created successfully.", + agent_name="finance_agent", +) + +validate_agent_output( + "The payment was created with internal_example_id 4.", + agent_name="finance_agent", +) +``` + +The second response should replace the internal field with: + +```text +[hidden internal field] +``` + +## Update documentation + +After adding a new guardrail, update: + +* `src/guardrails/docs/output-validation.md` +* `src/guardrails/docs/add-new-guardrail.md` if the process changes diff --git a/app/src/guardrails/docs/guardrails-overview.md b/app/src/guardrails/docs/guardrails-overview.md new file mode 100644 index 0000000..33b1cde --- /dev/null +++ b/app/src/guardrails/docs/guardrails-overview.md @@ -0,0 +1,107 @@ +# Guardrails overview + +This document explains the role of the guardrails layer in the project. + +Guardrails validate agent responses before they are returned to the user. The current implementation focuses on output validation: it sanitizes final text responses to avoid exposing internal Odoo fields, technical identifiers, credentials or traceback details. + +## Role in the architecture + +The guardrails layer runs after an agent has generated a response. + +```text +Agent response + ↓ +validate_agent_output(...) + ↓ +Output guard + ↓ +Sanitized response +```` + +This keeps the agent focused on task handling, while the guardrails layer focuses on cleaning the final user-facing output. + +## Current implementation + +The current output guard is defined in: + +```text +src/guardrails/output_guard.py +``` + +The main validation entry point is: + +```python +validate_agent_output(text, agent_name="finance_agent") +``` + +The validator is defined in: + +```text +src/guardrails/validators.py +``` + +Current validator: + +```text +NoInternalOdooFields +``` + +It detects internal or sensitive patterns in the generated response and replaces them with: + +```text +[hidden internal field] +``` + +## Protected patterns + +The current validator checks for patterns related to: + +* Internal Odoo field names +* Technical record identifiers +* Credential-related terms +* API keys +* Passwords +* Python traceback output + +Examples of protected terms include: + +```text +partner_id +journal_id +payment_method_line_id +create_uid +write_uid +access_token +api_key +password +Traceback +``` + +## Scope + +Guardrails are responsible for final response validation only. + +They should handle: + +* Sanitizing user-facing agent responses +* Removing internal technical fields from final text +* Hiding credential-like strings +* Preventing traceback details from being exposed + +They should not replace: + +* Agent intent detection +* Required field collection +* Payment validation +* Odoo record lookup +* MCP tool schema validation +* XML-RPC error handling +* Database auditing + +Those responsibilities belong to the agent, MCP or service layers. + +## Relationship with MCP + +MCP tools validate structured operations before execution. + +Guardrails validate the final natural language response after the agent or tool flow. diff --git a/app/src/guardrails/docs/output-validation.md b/app/src/guardrails/docs/output-validation.md new file mode 100644 index 0000000..c8b05e2 --- /dev/null +++ b/app/src/guardrails/docs/output-validation.md @@ -0,0 +1,143 @@ +# Output validation + +This document explains how output validation is currently handled in the guardrails layer. + +Output validation sanitizes the final text generated by agents before it is returned to the user. Its purpose is to prevent internal Odoo fields, technical identifiers, credentials and traceback details from appearing in user-facing responses. + +## Location + +Output validation is implemented in: + +```text +src/guardrails/output_guard.py +```` + +The custom validator is defined in: + +```text +src/guardrails/validators.py +``` + +## Entry point + +The main entry point is: + +```python +validate_agent_output(text, agent_name="finance_agent") +``` + +This function receives the final agent response as text and returns a validated string. + +Example: + +```python +from src.guardrails.output_guard import validate_agent_output + +safe_text = validate_agent_output( + text=agent_response, + agent_name="finance_agent", +) +``` + +## Validation flow + +```text +Agent response + ↓ +validate_agent_output(...) + ↓ +Output guard + ↓ +NoInternalOdooFields + ↓ +Sanitized response +``` + +If the response does not contain protected patterns, the original text is returned. + +If protected patterns are detected, they are replaced before the response is returned. + +## Current validator + +The current validator is: + +```text +NoInternalOdooFields +``` + +It checks the response text for internal or sensitive patterns and replaces matches with: + +```text +[hidden internal field] +``` + +## Protected patterns + +The validator currently protects against terms related to: + +* Internal Odoo fields +* Technical identifiers +* Credentials +* API keys +* Passwords +* Traceback output + +Examples include: + +```text +partner_id +journal_id +payment_method_line_id +create_uid +write_uid +access_token +api_key +password +Traceback +``` + +## Example + +Input response: + +```text +The payment was created with journal_id 8 and payment_method_line_id 3. +``` + +Sanitized response: + +```text +The payment was created with [hidden internal field] 8 and [hidden internal field] 3. +``` + +## Usage expectations + +Output validation should be applied to final agent responses, not to internal tool outputs. + +Use it when: + +* A response is about to be returned to the user +* The response may include information from Odoo +* The response may include MCP tool output +* The response may include exception or traceback details + +Do not use it as a replacement for: + +* Pydantic validation +* MCP tool validation +* Odoo service error handling +* Business rule checks + +## Current limitation + +The current implementation sanitizes text by replacing detected patterns. + +It does not: + +* Validate structured tool input +* Check business correctness +* Verify whether an Odoo operation should be allowed +* Rewrite the full answer semantically +* Decide whether the agent should call a tool + +Those responsibilities belong to the agent, MCP and service layers.