feat: Add A2A/MCP integration with OTEL tracing support by yoavkatz · Pull Request #187 · Exgentic/exgentic

yoavkatz · 2026-04-15T14:27:54Z

Overview

This PR adds comprehensive Agent-to-Agent (A2A) and Model Context Protocol (MCP) integration to Exgentic, along with OpenTelemetry (OTEL) tracing support for distributed agent execution.

Key Features

1. A2A Protocol Integration

New CLI Command: exgentic a2a - Expose exgentic agents via the A2A protocol
A2A Executor (src/exgentic/adapters/agents/a2a_executor.py): Full agent-to-agent execution support with OTEL tracing
MCP Wrapper (src/exgentic/adapters/agents/mcp_wrapper.py): Wrapper for MCP server integration with A2A agents

2. MCP Server Enhancements

New CLI Command: exgentic mcp - Expose benchmark actions as MCP tools
Dynamic Session Management: Support for multiple sessions with different tasks
Evaluation Endpoint: Added evaluate_session endpoint to MCP server
Configurable Options: DNS rebinding protection toggle, benchmark-specific parameters via --set

3. OpenTelemetry Tracing

A2A Span Filtering: Filter SDK spans and prevent invalid parent span IDs
Root Span Metadata: Add OTEL root span metadata to A2A executor
Enhanced OTEL Utils (src/exgentic/utils/otel.py): Improved tracing utilities with 183+ lines of additions

4. Performance Testing & Monitoring

A2A Test Harness (misc/performance/test_a2a_agent.py): Comprehensive test harness with 934 lines for A2A agent evaluation
MCP Memory Test (misc/performance/test_mcp_memory_harness.py): Performance testing for MCP memory usage (545 lines)
Parallel Session Timing (misc/performance/time_parallel_gsm8k_create_session.py): Timing script for parallel MCP session creation (223 lines)

5. Bug Fixes & Improvements

Session Management: Fixed session closure issues in evaluate_session
Serialization: Fixed A2A serialization errors with custom Pydantic pickle handler
Process Cleanup: Improved process cleanup with kill() fallback
Lock Contention: Reduced MCP session lock contention
HTTP Client: Handle closed HTTP client in evaluate_session
Timeout Handling: Extended timeout for MCP client to prevent connection closure

Documentation

CLAUDE.md: Added comprehensive guide for Claude Code integration (116 lines)
Updated architecture documentation with A2A/MCP adapter details

Files Changed

16 files changed: 4,041 insertions(+), 643 deletions(-)
New files: 7 major new files including CLI commands, adapters, and test harnesses
Modified files: Enhanced OTEL handlers, transport layer, and MCP server

Testing

Added comprehensive test harnesses for A2A and MCP performance
Memory monitoring and evaluation capabilities
Parallel session creation timing tests

Breaking Changes

None - all changes are additive and backward compatible.

Related Issues

Closes #[issue-number] (if applicable)

Checklist

Code follows project conventions (CLAUDE.md)
All new files have proper SPDX headers
Dependencies have version caps
Tests added for new functionality
Documentation updated
Commits signed off (DCO)

- Created new 'exgentic mcp' CLI command - Accepts --benchmark, --task-id, --subset, --host, --port options - Dynamically generates tool signatures from action argument schemas - Includes timeout protection for tool execution (30s) - Calls session.start() for proper initialization - Registered command in CLI main.py under Tools category Usage: exgentic mcp --benchmark <name> --task-id <id> Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

…command Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Remove --task-id command line parameter - Add list_tasks endpoint to return available benchmark tasks - Add create_session(task_id) endpoint to create sessions on-demand - Add delete_session(task_id) endpoint to close and delete sessions - Store and propagate context to tool functions for thread safety - Update call_mcp_tool.py example to demonstrate new workflow Sessions are now managed dynamically via MCP endpoints instead of being created at server startup. This allows clients to create and destroy sessions as needed during runtime. Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Added --set option to allow passing benchmark-specific parameters - Only benchmark.* parameters are allowed (e.g., benchmark.user_simulator_model) - Validates parameters against benchmark's accepted kwargs - Example: --set benchmark.user_simulator_model='openai/Azure/gpt-4o' Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Added *.svc.cluster.local:* to the allowed hosts and origins lists (lines 63 and 71). This will allow the MCP server to accept connections from Kubernetes services with hostnames like exgentic-mcp-tau2-mcp.team1.svc.cluster.local:8000. Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add --disable-dns-rebinding-protection flag to allow MCP server to accept connections from any host, including Kubernetes services. This is useful for deployments in trusted environments like Kubernetes clusters where the MCP library's wildcard hostname matching doesn't work. When enabled (default), DNS rebinding protection validates Host and Origin headers. When disabled, all hosts are allowed, fixing 'Invalid Host header' warnings for Kubernetes service DNS names. Changes: - Add enable_dns_rebinding_protection parameter to MCPServer - Add --disable-dns-rebinding-protection CLI flag to mcp command - Remove specific Kubernetes hostname patterns (too specific) - Keep Docker/container hostname patterns when protection is enabled Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Changed MCP server to use UUID-based session_id instead of task_id - Updated action tools to accept session_id parameter - Updated delete_session to use session_id parameter - Modified sessions dictionary to be keyed by session_id (UUID) - Updated example code to extract and use session_id - Allows multiple concurrent sessions per task - Tested and verified with call_mcp_tool.py Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Added evaluate_session tool that calls session.score() - Returns success status and score from session evaluation - Updated example to test evaluate_session functionality - Tool accepts session_id parameter and returns evaluation results Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

- SessionScore is a Pydantic model, not a dict - Access fields directly (score_result.success, score_result.score) - Automatically closes session if not done before evaluating - Returns all SessionScore fields in response Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

… method - Add kill() fallback after terminate() timeout in RemoteProcessExecuter - Add kill() fallback in RemoteProcess.close() and _cleanup_resources() - Enhance RemoteSession.close() with proper validation before calling remote close() - Restore and document RemoteSession.shutdown() method for forceful termination - Ensure processes are properly cleaned up even when unresponsive to terminate() This prevents zombie processes and resource leaks when remote processes become unresponsive during shutdown. Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Resolved conflicts: - Merged CLI command groups: kept both 'Tools' (mcp) and 'Infrastructure' (serve) - Added all commands from both branches: install_cmd, uninstall_cmd, mcp_cmd, serve_cmd - Removed deprecated executor files (executer.py, remote_process_class.py) in favor of new runners architecture - Updated __all__ exports to include both mcp_cmd and new commands from main Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Use benchmark.get_evaluator() to access list_tasks() method - Use evaluator.get_session_kwargs() and benchmark.get_session() to create sessions - Fixes 'GSM8kBenchmark' object has no attribute 'list_tasks' error - Fixes 'GSM8kBenchmark' object has no attribute 'create_session' error Tested successfully with gsm8k benchmark - server starts and loads 1319 tasks with 2 action types Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add new 'exgentic a2a' CLI command with --agent, --mcp, --host, --port, --set parameters - Create mcp_wrapper.py to connect to external MCP servers and extract tools - Create a2a_executor.py to run exgentic agents via A2A protocol (JSON-RPC 2.0) - Implement A2A server with agent card, task execution, and event streaming - Add a2a-sdk>=0.2.16 as optional dependency in pyproject.toml - Integrate a2a command into main CLI interface The command allows exgentic agents to be exposed as A2A-compatible agents, enabling agent-to-agent communication. It connects to external MCP servers to extract tools, creates an exgentic agent instance with those tools, and exposes it via the A2A protocol for other agents to interact with. Tested and verified: - Agent card discovery at /.well-known/agent-card.json - JSON-RPC 2.0 request/response handling - Task execution with proper context management - Error reporting via A2A protocol with artifacts - Event streaming for task status updates Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add new 'exgentic a2a' CLI command to expose exgentic agents as A2A servers - Implement MCP wrapper to extract tool metadata from external MCP servers - Create A2A executor that converts MCP tools to ActionTypes and handles execution - Use ThreadPoolExecutor to run synchronous agent.react() without blocking - Handle both tool calls and message actions (agent's final responses) - Add a2a-sdk>=0.2.16 as core dependency - Remove dead code (_dynamic_actions.py) and unused imports The command syntax: exgentic a2a --agent <name> --mcp <address> [--host <host>] [--port <port>] [--set <key=value>] Tested successfully with LiteLLM Tool Calling agent connecting to external MCP server. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Created misc/performance/test_a2a_agent.py test harness - Connects to MCP server to fetch tasks and create sessions - Calls A2A agent server to solve tasks using proper A2A client API - Monitors memory consumption of A2A server process - Tracks task execution time and success/failure rates - Calls evaluate_session after each successful task completion - Pretty-prints evaluation results (success, score, metrics) - Provides summary statistics including evaluation metrics - Enhanced A2A executor logging - Prints log location when each request arrives - Shows: outputs/a2a_<timestamp>/ directory path - Improved debug output - Pretty-prints JSON responses with indentation - For status updates, shows only the text message (not full JSON) - For other events, shows full details - Configurable httpx client timeout - Proper resource cleanup (closes httpx client) Usage: python misc/performance/test_a2a_agent.py \ --mcp-url http://localhost:8000/mcp \ --a2a-url http://localhost:9000 \ --limit 5 --timeout 600 Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Fixed A2A executor to handle ParallelAction properly - Use to_action_list() for all action types (works uniformly) - Execute all actions in the list - Return SingleObservation for single actions - Return MultiObservation for multiple actions (follows core/actions.py pattern) - Each SingleObservation in MultiObservation has its own invoking action - Enhanced task input with session_id instructions - Explicitly tells agent to use session_id in all tool calls - Reminds agent to call submit MCP tool when needed - Improves task completion rates Fixes error: 'ParallelAction' object has no attribute 'name' Signed-off-by: Yoav Katz <katz@il.ibm.com>

Extract context from create_session response and append it to the task input sent to the A2A agent. This provides the agent with additional context information that may be needed to solve the task. Format: - Task description - Context: (if available) - key: value - key: value - Session ID instructions Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Created misc/performance/test_a2a_agent.py: comprehensive test harness for A2A agents - Connects to MCP server and A2A agent server - Creates sessions and calls A2A agent to solve tasks - Evaluates results and tracks success/failure metrics - Monitors memory consumption and execution time - Added --debug flag for detailed output including evaluation responses - Fixed src/exgentic/adapters/agents/a2a_executor.py: - Mark message action with is_message=True flag for proper agent handling - Auto-inject session_id parameter when calling message tool - Extract session_id from task context instead of generating random UUID - Added error handling for missing session_id in task context - Updated src/exgentic/adapters/agents/mcp_wrapper.py: - Filter out admin tools (create_session, delete_session, list_tasks, evaluate_session) - Agents now only see task-specific tools, preventing session management conflicts These changes ensure proper session management and enable comprehensive testing of A2A agents solving MCP server tasks. Signed-off-by: Yoav Katz <katz@il.ibm.com>

The MCP HTTP client was timing out during long-running A2A agent tasks, causing 'Cannot send a request, as the client has been closed' errors when trying to evaluate sessions after task completion. Solution: Create MCP client with extended timeout (2x task timeout) to ensure the connection stays alive throughout the entire task execution and evaluation process. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Make memory monitoring optional when the A2A server PID cannot be detected, which commonly occurs when the server runs in a container. The test now: - Attempts to auto-detect the server PID by port - Falls back to running without memory monitoring if PID not found - Adds conditional checks before all monitor method calls - Provides clear warnings when memory monitoring is disabled This allows the test harness to work with both local and containerized A2A servers, gracefully degrading functionality when process monitoring is not available. Changes: - Initialize monitor to None by default - Skip PID detection errors and continue without monitoring - Wrap all monitor.measure() and monitor.print_*() calls with 'if monitor:' checks - Update status messages to indicate when monitoring is disabled Signed-off-by: Yoav Katz <katz@il.ibm.com>

…st harness This commit fixes the 'Cannot send a request, as the client has been closed' error that occurred when evaluating sessions after A2A agent execution. Root cause: - A2A executor creates its own MCP session with an HTTP client - When A2A agent completes, it closes its MCP session and HTTP client - Benchmark session stores reference to A2A's HTTP client - evaluate_session tries to close the session, which attempts to use the already-closed HTTP client from the A2A executor Changes: 1. MCP Server (src/exgentic/interfaces/cli/commands/mcp.py): - Wrap sess.close() in try-except to handle already-closed clients - Add full traceback to error messages for better debugging 2. Test Script (misc/performance/test_a2a_agent.py): - Move httpx client creation outside try block for proper lifecycle - Set timeout=None for read operations to prevent premature closure - Add finally block to ensure HTTP client cleanup - Rename 'session' to 'mcp_session' for clarity - Improve error handling and debug output for evaluation 3. A2A Command (src/exgentic/interfaces/cli/commands/a2a.py): - Move imports to top of file (code cleanup) The evaluation can now complete successfully even when the A2A agent has already closed its connection to the benchmark session. Signed-off-by: Yoav Katz <katz@il.ibm.com>

…client closure Root cause analysis: - Tau2 benchmark uses service runner which wraps sessions in HTTP transport - When sess.close() is called, it closes the HTTPTransport's httpx.Client - Calling sess.score() after close() fails with 'client has been closed' error - The session's score() method needs the HTTP transport to be open to make RPC calls Solution: - Reorder operations in evaluate_session_tool to call score() BEFORE close() - This ensures the HTTP transport is still available when score() needs it - Add detailed comment explaining why this order is critical This fixes the evaluation error for benchmarks using service runner (tau2, etc.) while maintaining backward compatibility with local sessions (gsm8k, etc.). Signed-off-by: Yoav Katz <katz@il.ibm.com>

Refactored the A2A agent test harness for better maintainability: Changes: - Split large test_a2a_agent() function into smaller, focused functions: * fetch_tasks() - Fetch tasks from MCP server * create_mcp_session() - Create MCP session for a task * build_enhanced_task_input() - Build task input with context * evaluate_mcp_session() - Evaluate session results * delete_mcp_session() - Delete session * print_task_results_summary() - Print results summary - Simplified call_a2a_agent(): * Removed unnecessary timeout parameter from httpx client * Removed verbose debug logging * Cleaner error handling with finally block * Simplified response processing - Removed unnecessary timeout settings added during debugging - Improved code organization and readability - Maintained all functionality while reducing complexity - Better separation of concerns The refactored code is easier to understand, test, and maintain. Signed-off-by: Yoav Katz <katz@il.ibm.com>

…agent.py - Replace all 'if debug:' print statements with logger calls - Remove debug parameters from all functions (call_a2a_agent, fetch_tasks, create_mcp_session, evaluate_mcp_session, delete_mcp_session, test_a2a_agent) - Configure module-specific logger that only affects this file - Use appropriate log levels: debug, info, warning, error, exception - Restore comprehensive debug output for A2A agent communication - Clean up function signatures by removing debug parameter pollution Benefits: - Standard Python logging best practices - Module-specific logging (doesn't affect other loggers) - Proper log levels and automatic exception tracebacks - Cleaner, more maintainable code Signed-off-by: Yoav Katz <katz@il.ibm.com>

…valuation This commit addresses two issues in the A2A agent execution flow: 1. A2A Executor - Terminate on Finish Action: - Added logic to detect when an action has is_finish=True by matching action instances to their ActionType objects by name - After executing a finish action and building the observation, the agent loop now terminates immediately instead of continuing - This prevents unnecessary iterations after task completion - Implementation in a2a_executor.py lines 299-378 2. MCP Session Evaluation - Fix Database Closed Error: - Fixed 'Cannot operate on a closed database' error during session evaluation in AppWorld and other benchmarks - Moved sess.done() check BEFORE sess.score() call, as some benchmarks close database connections during scoring - Added try-except around done() check to gracefully handle failures - This prevents evaluation errors when the database is already closed - Implementation in mcp.py lines 250-265 The finish action termination ensures agents complete tasks efficiently without extra steps, while the evaluation fix prevents spurious errors during session scoring.

- Mark 'submit' action type as finish action alongside existing 'finish' action - Update mock session score to reflect failure state (success=False, score=-1.0) - Maintains is_finished=True flag for proper session termination Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Made create_session_tool and delete_session_tool async with run_in_executor to prevent blocking the event loop during I/O operations - Added request timing middleware to MCPServer for performance monitoring - Updated uvicorn config with limit_concurrency=None and backlog=2048 - Modified test script to use separate MCP sessions per parallel call This enables true concurrent processing of MCP tool requests. Before this change, tool calls were processed sequentially even with multiple client sessions. Now multiple create_session calls complete in ~6s instead of 18s for 3 parallel calls. Fixes sequential processing bottleneck in MCP server tool execution. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add profiling context manager to docker, process, service, thread, and venv runners - Enable performance monitoring and tracing for runner operations - Consistent profiling implementation across all runner types - Fix line length issue in process.py Signed-off-by: Yoav Katz <katz@il.ibm.com>

Replace per-session datasets load_dataset calls with a cached parquet-backed loader using huggingface_hub and pyarrow. This avoids repeated GSM8K metadata resolution overhead while keeping row access thread-safe within the process. Only the benchmark implementation change is included in this commit; the local performance experiment file is intentionally left uncommitted. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add trace context extraction in ExgenticAgentExecutor to support distributed tracing across A2A agent calls. The executor now: - Extracts trace_id and span_id from the current OpenTelemetry span - Stores parent trace context in exgentic Context via OtelContext - Enables trace propagation from HTTP requests to exgentic spans This allows A2A agents to receive trace context from POST requests and maintain trace continuity across service boundaries. Technical details: - Uses otel_trace.get_current_span() to access active span context - Formats trace_id as 32-char hex and span_id as 16-char hex - Stores context in exgentic Context for SessionSpanManager access - Only extracts context when a valid span is present Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

Modify SessionSpanManager to check for parent trace context stored in exgentic Context and use it when creating root spans. This enables distributed tracing by continuing traces from upstream services. Changes: - Check exgentic Context for otel_context before creating spans - Create NonRecordingSpan with parent trace_id and span_id - Set span context with is_remote=True and trace_flags=0x01 - Use parent context when starting new spans via set_span_in_context - Add logging to track trace continuation vs new trace creation This allows exgentic spans to be properly linked to parent spans from A2A HTTP requests, maintaining trace continuity across service boundaries. Technical details: - NonRecordingSpan acts as placeholder referencing parent trace - SpanContext created with trace_id and span_id from parent - Context propagation uses OpenTelemetry's set_span_in_context - Only applies to root spans (when parent_span is None) Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add OpenTelemetry ASGI instrumentation to A2A server to extract W3C trace context from incoming HTTP requests. This enables the server to receive and propagate distributed traces. Changes: - Import OpenTelemetryMiddleware from opentelemetry.instrumentation.asgi - Setup ASGI instrumentation with excluded URLs for health checks - Wrap Starlette app with middleware after routes are configured - Add success logging when middleware is applied The middleware automatically: - Extracts traceparent headers from incoming HTTP requests - Sets OpenTelemetry context for the request handler - Enables trace propagation to downstream exgentic operations - Creates server spans for incoming requests Technical details: - Middleware must be applied AFTER routes are added to avoid errors - Excluded URLs prevent health check endpoints from creating spans - Uses W3C Trace Context propagation standard - Integrates with existing OpenTelemetry tracing infrastructure Requires: opentelemetry-instrumentation-asgi package Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

Disable span filtering in FilteredSpanExporter to see full trace hierarchy during distributed tracing development and debugging. Changes: - Comment out A2A span filtering logic - Allow all spans to be exported regardless of service name - Add TODO comment to re-enable filtering after verification This is a temporary change to help verify that distributed tracing is working correctly across all services. The filtering should be re-enabled once trace propagation is confirmed working. The original filtering logic excluded spans from: - exgentic-a2a service - exgentic-a2a-runner service This helped reduce noise in traces but needs to be disabled during initial distributed tracing implementation to verify all spans are properly connected. Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add HTTPX instrumentation and OpenTelemetry tracing to test_a2a_agent script to enable distributed tracing verification. The test now creates spans that propagate trace context to the A2A server. Changes: - Import OpenTelemetry tracing and OTLP exporter dependencies - Initialize OpenTelemetry with OTLP exporter configuration - Instrument HTTPX client to inject W3C traceparent headers - Wrap call_a2a_agent() with OpenTelemetry span - Add tracer initialization in main() function - Configure OTLP endpoint from environment variable The instrumentation enables: - Automatic injection of traceparent headers in HTTP requests - Creation of client spans for A2A agent calls - Trace context propagation to downstream services - End-to-end distributed tracing verification Technical details: - Uses OTLPSpanExporter with HTTP protocol - Configures BatchSpanProcessor for efficient export - HTTPXClientInstrumentor automatically adds trace headers - Tracer name: 'test_a2a_agent' for service identification - Supports OTEL_EXPORTER_OTLP_ENDPOINT environment variable Requires: - opentelemetry-api - opentelemetry-sdk - opentelemetry-exporter-otlp-proto-http - opentelemetry-instrumentation-httpx Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add required OpenTelemetry instrumentation packages to support distributed tracing in A2A agents. Added dependencies: - opentelemetry-instrumentation-asgi>=0.48b0,<1 Required for A2A server to extract trace context from HTTP headers - opentelemetry-instrumentation-httpx>=0.48b0,<1 Required for test clients to inject trace context into HTTP requests These packages enable: - Automatic W3C trace context propagation via HTTP headers - ASGI middleware for server-side trace extraction - HTTPX client instrumentation for trace injection - End-to-end distributed tracing across services The uv.lock file has been updated with the new dependencies and their transitive dependencies: - asgiref v3.11.1 - opentelemetry-instrumentation v0.59b0 - opentelemetry-instrumentation-asgi v0.59b0 - opentelemetry-instrumentation-httpx v0.59b0 - opentelemetry-util-http v0.59b0 - wrapt v1.17.3 Related: Distributed tracing implementation for A2A agents Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Capture exception details to distinguish race conditions from other errors - Add debug logging for expected race conditions - Add warning logging with stack trace for unexpected RuntimeErrors - Inspect exception message to identify benign vs serious issues - Replace asyncio.ensure_future() with asyncio.create_task() and store references Addresses functionality-error-handling-review issue at line 44 Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add ThreadPoolExecutor.shutdown() in signal and exception handlers to prevent resource leaks - Document max_workers=200 rationale for high concurrency benchmarks - Fix race condition in signal handler by holding lock during session cleanup - Replace silent exception handling with logger.warning() for better debugging - Ensure proper cleanup order: stop server, shutdown executor, close sessions Signed-off-by: Yoav Katz <katz@il.ibm.com>

Remove the custom TimingMiddleware that logged request durations. This simplifies the server implementation and reduces logging verbosity. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add _fire_and_forget() method to properly track background tasks - Move blocking operations to thread pool executor to prevent event loop blocking - agent_instance.get_instance() - agent_instance.start() - agent_instance.close() - flush_traces() - Replace 13 instances of direct asyncio.create_task() calls with _fire_and_forget() - Alphabetize __all__ exports for consistency Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add contextvars.copy_context() to properly propagate context across thread boundaries - Wrap run_in_executor calls with ctx.run() to ensure OTEL tracing context is available - Remove set_context_fallback() calls as contextvars approach provides proper isolation - Add detailed documentation explaining the threading/context propagation pattern - Fixes race condition where concurrent sessions would overwrite module-level context - Ensures correct OTEL span attribution in multi-session scenarios Signed-off-by: Yoav Katz <katz@il.ibm.com>

…refixes MCP gateways can add prefixes to action names, so we need to check the ending of the action name rather than exact matches for 'message', 'finish', and 'submit' actions. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changed LiteLLMToolCallingAgentInstance to search for and use a message action with is_message=True from self._all_actions instead of directly instantiating MessageAction. Changes: - Added search logic in start() to find and store message action type - Updated _observe() to check against dynamic message action type name - Modified react() to use stored message action type with build_action() This makes the agent more flexible by not hardcoding the message action. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Change default timeout from 60s to 30s in check_model_accessible_sync() - Disable retries by setting _HEALTH_MIN_RETRIES from 7 to 0 - This prevents long hangs when model endpoints are misconfigured or unreachable - Particularly affects tau2 benchmark initialization which checks user simulator model The tau2 benchmark was hanging at 'Initializing action types...' because the health check for the user simulator model (openai/Azure/gpt-4.1) was timing out after 60+ seconds with multiple retries. With these changes, failures occur within 30 seconds, providing faster feedback to users. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add _remove_session_id_from_action_types() method to strip session_id from action schemas - Pass cleaned action types (without session_id) to agent initialization - Inject session_id into all MCP tool calls at execution time - This allows session_id to be managed separately from agent input while maintaining compatibility with MCP tools that require it Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Instead of always returning the generic "Session completed" string, extract the actual action arguments when the completing action is a message or finish action. Also bump max_iterations from 50 to 100. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

zeroasterisk · 2026-06-11T13:56:43Z

Hi @yoavkatz — we've been working on the consume-side A2A adapter (PR #232) while your PR covers the serve-side. @elronbandel suggested we coordinate, and we agree — together these make Exgentic a full A2A citizen (both producer and consumer).

A few areas where alignment would help:

Shared A2A types/utilities — both PRs translate between Exgentic types and A2A types. Could share translation helpers (action→tool, observation→message, etc.) rather than duplicating.
MCP handshake convention — we've opened issue Design: Task-scoped MCP provisioning convention for A2A agents #236 proposing a structured-metadata convention for passing MCP endpoints in A2A task messages. This affects how consume-side agents discover their tools. Would value your input since the serve-side would need to understand this convention too.
Agent Card extensions — for declaring A2A+MCP capabilities (e.g., "I accept MCP tool provisioning"). Relevant to both directions.

Happy to align on shared structure however works best for you — issue discussion, shared module, or a quick call.

- Add agent_a2a_images: Docker wrapper for Exgentic agents using A2A protocol - Dockerfile with build-time agent installation - Build script with docker/podman auto-detection and GHCR push support - Entrypoint with runtime configuration via environment variables - Comprehensive README with usage examples - Add benchmark_mcp_images: Docker wrapper for Exgentic benchmarks using MCP - Dockerfile with build-time benchmark installation - Build script with docker/podman auto-detection and GHCR push support - Entrypoint with runtime configuration via environment variables - Comprehensive README with usage examples Both implementations include: - Non-root user execution (UID 1001/1000) - Flexible runtime configuration via EXGENTIC_SET_* environment variables - Support for pushing to GitHub Container Registry - Production-ready error handling and logging - .dockerignore for optimized builds - Example environment files Signed-off-by: Yoav Katz <katz@il.ibm.com>

yoavkatz added 30 commits March 9, 2026 10:33

Added option for multiple session with different tasks in mcp server …

63215cc

…command Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Merge remote-tracking branch 'origin/main' into feature/mcp-command

c5a4b7d

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Merge remote-tracking branch 'origin/main' into feature/mcp-command

b4d3b8c

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add task and context to create_session tool return value

28f2940

Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Remove call_mcp_tool.py

fd45c59

Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Make mcp_cmd fail if unable to create session for task initialization

0084b3a

Signed-off-by: Yoav Katz <katz@il.ibm.com> Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add MCP memory test harness for performance testing

fc79054

Signed-off-by: Yoav Katz <katz@il.ibm.com>

yoavkatz and others added 25 commits April 19, 2026 16:19

refactor(mcp): remove timing middleware from MCP server

6a88493

Remove the custom TimingMiddleware that logged request durations. This simplifies the server implementation and reduces logging verbosity. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Adjust A2A executor session handling

254082b

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Update A2A performance test harness

8e2edc9

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Add request and session lifecycle logging

d60605a

Signed-off-by: Yoav Katz <katz@il.ibm.com>

Remove optional cleanup flag and always delete A2A test sessions

48037b1

Signed-off-by: Yoav Katz <katz@il.ibm.com>

This was referenced Jun 8, 2026

feat: A2A agent adapter for consuming external A2A agents #232

Open

Design: Task-scoped MCP provisioning convention for A2A agents #236

Open

elronbandel mentioned this pull request Jun 15, 2026

Design: A2A consume-side adapter — agent layering + task-scoped MCP handshake #237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add A2A/MCP integration with OTEL tracing support#187

feat: Add A2A/MCP integration with OTEL tracing support#187
yoavkatz wants to merge 66 commits into
mainfrom
feature/mcp-command

yoavkatz commented Apr 15, 2026

Uh oh!

zeroasterisk commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yoavkatz commented Apr 15, 2026

Overview

Key Features

1. A2A Protocol Integration

2. MCP Server Enhancements

3. OpenTelemetry Tracing

4. Performance Testing & Monitoring

5. Bug Fixes & Improvements

Documentation

Files Changed

Testing

Breaking Changes

Related Issues

Checklist

Uh oh!

zeroasterisk commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants