Skip to content

langwatch/multimodal-ai

Repository files navigation

Multimodal AI Agent

A production-ready AI agent built with Agno that demonstrates handling and parsing various file types (PDF, CSV, etc.) in conversational interactions. This project showcases best practices for agent development using the Better Agents framework, including prompt management with LangWatch, comprehensive testing with Scenario, and proper instrumentation.

Features

  • File Parsing: Supports parsing PDF, CSV, and other file formats provided by users in conversations
  • Conversational AI: Maintains context across multi-turn conversations
  • Structured Responses: Uses Pydantic schemas for consistent output formatting
  • Comprehensive Testing: End-to-end Scenario tests ensure reliability
  • Prompt Management: Version-controlled prompts using LangWatch CLI
  • Instrumentation: Full LangWatch integration for monitoring and analytics

Project Structure

├── app/                    # Main application code
│   └── main.py            # Agent implementation
├── prompts/               # Version-controlled prompt files
│   └── *.yaml
├── tests/
│   ├── evaluations/       # Component evaluation notebooks
│   └── scenarios/         # End-to-end scenario tests
├── .env                   # Environment variables (copy from .env.example)
├── prompts.json           # Prompt registry
└── AGENTS.md              # Development guidelines

Setup

Prerequisites

  • Python 3.8+
  • uv package manager
  • API keys for OpenAI and LangWatch (already configured in .env)

Installation

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Initialize the project:

    uv init
  3. Install dependencies:

    uv add agno langwatch pytest python-dotenv
    uv add --dev pytest-asyncio
  4. Install LangWatch CLI:

    uv tool install langwatch
  5. Set up environment:

    cp .env.example .env
    # Edit .env with your API keys (already done)
  6. Install Scenario for testing:

    uv add scenario-langwatch

Usage

Running the Agent

Execute the main application:

uv run python app/main.py

The agent will start and be ready to handle conversational interactions with file uploads.

Development Server

To run a development server (if available):

uv run python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Access the agent at: http://localhost:8000

Testing

Scenario Tests

Run end-to-end scenario tests:

uv run pytest tests/scenarios/ -v

Evaluations

Run evaluation notebooks in tests/evaluations/ for component-level testing.

Prompt Management

Creating Prompts

Use LangWatch CLI to create and manage prompts:

langwatch prompt create multimodal_agent
# Edit the created YAML file in prompts/
langwatch prompt sync

Using Prompts in Code

import langwatch

prompt = langwatch.prompts.get("multimodal_agent")
agent = Agent(prompt=prompt.prompt)

File Handling

The agent supports the following file types:

  • PDF: Text extraction and analysis
  • CSV: Data parsing and summarization
  • Images: OCR and description (if supported)
  • Other formats: As needed for specific use cases

Development Guidelines

Follow the guidelines in AGENTS.md for:

  • Prompt management best practices
  • Testing strategies (Scenario tests first)
  • Code organization
  • Performance optimization

Contributing

  1. Follow the development workflow in AGENTS.md
  2. Create Scenario tests for new features
  3. Use LangWatch for prompt versioning
  4. Ensure all tests pass before submitting

Resources

About

examples for multimodal-ai

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published