Implementation of Multi-Input LangGraph Agent and its MCP Implementation

This repository contains a LangGraph-based multi-input AI agent that can handle text, images, audio, and video inputs. It uses OpenAI models bound to multiple tools for web search, image analysis, audio transcription, and video summarization, executing them dynamically based on the user's query. Both the simple LangGraph implementation and its MCP Client-Server implementation are done.

Features

Dynamic Tool Execution: The agent intelligently decides when to call tools via tool_calls.
Multi-Modal Support:
- Web search for up-to-date information.
- Image analysis using OpenAI vision models.
- Audio transcription using Whisper.
- Video summarization (YouTube placeholder implementation).
Stateful Workflow with LangGraph managing transitions between tool usage and final output.
Batch Processing: Capable of fetching tasks from an API, running them through the agent, and submitting results.

Tools Used

LangChain – Framework for building LLM-powered applications.
LangGraph – Graph-based orchestration for multi-step reasoning and tool use.
OpenAI Python Client – For GPT models, vision, and Whisper.
python-dotenv – Environment variable management.
pandas – Data handling utilities.
requests – HTTP client for API interaction.
langsmith – Tracing and debugging for LangChain apps.

Main Workflow

The core workflow follows this structure:

1. Main Node (`main_node`)

The LLM is invoked (with tools bound) and may emit tool_calls.

2. Tool Node (`ToolNode`)

Inside tool_node, the following tools are implemented as Python functions under the @tool decorator:

🛠️ compile_code – For compiling and running code snippets.

🌐 openai_web_search – For performing web searches using OpenAI’s integration.

🖼️ image_analyzer_tool – For analyzing images and extracting useful details.

🎙️ audio_transcription_tool – For converting speech/audio into text.

🎥 video_analysis_tool – For analyzing video content.

4. Conditional Routing

If further tool calls are needed, return to main_node; otherwise, proceed to output.

5. Output Node (`output_node`)

Produces the final answer.

This flow is visualized in graph_diagram.png.

How to Run

Clone the repository:
```
git clone <repo_url>
cd <repo_dir>
```
Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file with your API keys:

OPENAI_API_KEY=your_key_here
HF_USERNAME=your_hf_username
API_URL=https://your.api.url/

Run the agent:
```
python multi_input_agent.py
```

Agent Evaluation

The evaluation is done on a subset of the GAIA: A Benchmark for General AI Assistants, originally released by META. The agent is supposed to process the attached files e.g. Pictures, Videos, Audios, Excel Files, Codes etc.

API Batch Mode

The script’s run_and_submit_all() function can fetch multiple tasks from a specified API, run them, and submit answers back.

HuggingFace Course

This GitHub repo is built as the final project for HuggingFace AI Agents Course.

After finishing the course, one can get the official certificate from HuggingFace.

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
agent-on-mcp		agent-on-mcp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent_on_mcp.png		agent_on_mcp.png
app.py		app.py
certificate.png		certificate.png
graph_diagram.png		graph_diagram.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Multi-Input LangGraph Agent and its MCP Implementation

Features

Tools Used

Main Workflow

1. Main Node (`main_node`)

2. Tool Node (`ToolNode`)

🛠️ compile_code – For compiling and running code snippets.

🌐 openai_web_search – For performing web searches using OpenAI’s integration.

🖼️ image_analyzer_tool – For analyzing images and extracting useful details.

🎙️ audio_transcription_tool – For converting speech/audio into text.

🎥 video_analysis_tool – For analyzing video content.

4. Conditional Routing

5. Output Node (`output_node`)

How to Run

Agent Evaluation

API Batch Mode

HuggingFace Course

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementation of Multi-Input LangGraph Agent and its MCP Implementation

Features

Tools Used

Main Workflow

1. Main Node (main_node)

2. Tool Node (ToolNode)

🛠️ compile_code – For compiling and running code snippets.

🌐 openai_web_search – For performing web searches using OpenAI’s integration.

🖼️ image_analyzer_tool – For analyzing images and extracting useful details.

🎙️ audio_transcription_tool – For converting speech/audio into text.

🎥 video_analysis_tool – For analyzing video content.

4. Conditional Routing

5. Output Node (output_node)

How to Run

Agent Evaluation

API Batch Mode

HuggingFace Course

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Main Node (`main_node`)

2. Tool Node (`ToolNode`)

5. Output Node (`output_node`)

Packages