Skip to content

bg-l2norm/awesome-ai-agent-papers--

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Awesome Autonomous AI Agents

Awesome GitHub stars Last Updated License PRs Welcome

A curated list of truly exceptional autonomous AI agent frameworks, tools, benchmarks, and papers.

Frameworks Β· OpenClaw Ecosystem Β· Coding Agents Β· Multi-Agent Β· Web & Browser Β· Chinese Contributions Β· Benchmarks Β· Papers


Why this list?

The AI agent space exploded in 2023 and hasn't stopped. Hundreds of repos claim to be "autonomous agent frameworks." Most are weekend wrappers around API calls. This list only includes projects that moved the field forward β€” frameworks with real architectural ideas, benchmarks that changed how we evaluate agents, and papers that introduced paradigms now used everywhere.

Every entry was verified for active development status, genuine technical contribution, and community adoption. Chinese-origin projects are marked with πŸ‡¨πŸ‡³ β€” the Chinese AI ecosystem has produced some of the most impactful agent work and deserves first-class visibility.

Selection criteria: Novel architecture OR significant community adoption (1,000+ β˜…) OR foundational paper (100+ citations) OR production deployment at scale. No vaporware, no thin wrappers.


Contents


πŸ—οΈ General-Purpose Frameworks

Project Stars Description Key Differentiator
AutoGPT ~183k β˜… The original autonomous GPT-4 agent (April 2023), now evolved into a visual agent-builder platform with marketplace and cloud deployment. Pioneered autonomous agents β€” defined the category; most-starred agent repo on all of GitHub.
LangGraph ~27k β˜… Low-level graph-based orchestration for stateful agents with durable execution, human-in-the-loop, and streaming. Most downloaded agent framework (~34.5M monthly PyPI downloads); Pregel/Apache Beam–inspired execution model.
AutoGen / AG2 ~54k β˜… Microsoft's multi-agent conversation framework; AG2 (ag2ai/ag2) is the community fork continuing active development. Best Paper at ICLR 2024 LLM Agents Workshop; defined the multi-agent conversation paradigm. Now merging into Microsoft Agent Framework with Semantic Kernel.
CrewAI ~44k β˜… Standalone framework for orchestrating role-playing AI agents with "Crews" (collaborative teams) and "Flows" (event-driven pipelines). 1.4B agentic automations claimed; 60% of Fortune 500 as users. Fully independent of LangChain.
Semantic Kernel ~27k β˜… Microsoft's enterprise SDK for .NET, Python, and Java with planners, function-calling, and deep Azure integration. Enterprise-first β€” merging with AutoGen into unified Microsoft Agent Framework (GA Q1 2026).
smolagents ~25k β˜… HuggingFace's minimal agent library where agents write Python code as actions. ~1,000 lines of core logic. Radical simplicity β€” code-as-action paradigm with Hub integrations for sharing tools. Sandboxed execution via E2B/Docker.
Agno ~26k β˜… High-performance multi-modal agent runtime (formerly Phidata) with memory, knowledge graphs, and MCP support. Claims 5,000Γ— faster than LangGraph; levels 0–3 of progressive agent autonomy.
PydanticAI ~15k β˜… Type-safe, code-first agent framework using Python type hints β€” the "FastAPI for agents." Zero magic β€” built for senior Python teams wanting explicit control over every agent decision.
Open Interpreter ~62k β˜… Natural-language interface that lets LLMs run code locally β€” open-source alternative to OpenAI Code Interpreter. Full local access β€” no internet restrictions, file size limits, or time limits. Supports --os mode for computer control.
OpenAI Agents SDK ~19k β˜… Lightweight, production-ready Python SDK (evolution from Swarm) with built-in web search, file search, and computer use tools. Official OpenAI agent primitive β€” designed as the canonical way to build agents with OpenAI models.
Google ADK ~17k β˜… Code-first agent toolkit optimized for Gemini with Agent-to-Agent (A2A) protocol support and MCP Toolbox for Databases. A2A-native β€” first framework with built-in support for Google's agent interoperability protocol. Model-agnostic via LiteLLM.
Mastra ~19k β˜… TypeScript-first agent framework from the Gatsby team with workflows, RAG, evals, and tool calling. 300k+ weekly npm downloads. Fills the JS/TS gap β€” native OpenTelemetry, workflow engine, and first-class TypeScript support.
Strands Agents Growing AWS's model-agnostic agent framework with optional deep Bedrock integrations and first-class OpenTelemetry tracing. AWS-native β€” production-ready with deep AWS service integrations while remaining model-agnostic.
DSPy ~33k β˜… Stanford's framework for programming (not prompting) LLMs β€” compiles declarative calls into self-improving pipelines with optimizers. Paradigm-shifting β€” replaces prompt engineering with compiled, optimizable programs. ICLR 2024 paper. MIPROv2 optimizer.
Haystack ~23k β˜… Production-ready orchestration framework for NLP/AI pipelines with retrieval, generation, and agent components. Mature production tooling β€” battle-tested since pre-LLM era; comprehensive pipeline abstraction for RAG and agents.

🐾 The OpenClaw Ecosystem

OpenClaw is the fastest-growing open-source project in GitHub history, spawning 52+ derivatives across 9 programming languages as of March 2026. Community directory: shelldex.com.

Core Projects

Project Stars Description Key Differentiator
OpenClaw ~337k β˜… Messaging-native autonomous personal AI agent connecting through 20+ platforms (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, WeChat, etc.). 8 core agents, SOUL.md personality config, ClawHub skill marketplace (13,700+ skills). Defined a new category β€” always-on, messaging-native autonomous agents. Beat React's 10-year star record in ~60 days. Created by Peter Steinberger; transferred to independent foundation after he joined OpenAI.
NemoClaw ~16k β˜… NVIDIA's enterprise security wrapper for OpenClaw β€” adds OS-level sandboxing (Landlock + seccomp + network namespaces), default-deny networking, YAML policy engine, and privacy routing to local Nemotron models. Enterprise-grade security layer β€” not a standalone agent but the missing security infrastructure for OpenClaw. Launched at GTC 2026. Hardware-agnostic despite NVIDIA origin.
Hermes Agent ~14k β˜… NousResearch's self-improving personal AI agent with built-in RL training via Atropos, GEPA-based skill evolution, and cross-session memory. Multi-platform gateway (Telegram, Discord, Slack, WhatsApp, Signal). Only agent with a closed-loop learning system β€” uses GEPA (ICLR 2026 Oral) + DSPy to optimize its own skills from execution traces. hermes claw migrate imports from OpenClaw.

Major Rewrites & Alternatives

Project Stars Language Key Differentiator
Nanobot ~33k β˜… Python Ultra-lightweight (~4,000 lines vs OpenClaw's ~430,000); research-ready with minimal dependencies.
ZeroClaw ~27k β˜… Rust 99% less RAM than OpenClaw; 10ms cold boot; targets edge/IoT devices.
NanoClaw ~26k β˜… TypeScript Container-first security β€” every agent runs in an isolated Linux container by default.
PicoClaw ~13k β˜… Go Runs on $10 hardware with <10MB RAM; built for embedded and edge deployments.
OpenFang Growing Rust "Agent Operating System" with 7 autonomous modules, 38 tools, and 40 messaging channels.
Moltis ~2k β˜… Rust Enterprise Rust β€” 150K lines with zero unsafe blocks; native Prometheus/Grafana observability.

Ecosystem Tools

Project Description
awesome-openclaw-agents 187 production-ready agent templates across 24 categories.
hermes-agent-self-evolution GEPA-based self-improvement module for Hermes Agent.
claw0 Educational 10-section tutorial for building an OpenClaw-compatible agent from scratch.
ClawWork Agents that earn income from professional tasks and pay for their own token usage β€” earned $15K in 11 hours.

πŸ’» Coding Agents

Project Stars Description Key Differentiator
OpenHands ~69k β˜… Open-source AI software development platform β€” modifies code, runs terminals, browses the web, executes multi-step dev tasks in Docker sandboxes. Most-starred coding agent on GitHub. 77.6% on SWE-bench Verified. Model-agnostic. $18.8M Series A (All Hands AI).
Aider ~41k β˜… AI pair programming in your terminal β€” every AI edit auto-committed to git. Builds a map of your entire codebase for context. Git-native workflow β€” terminal-first, BYOM, top SWE-bench scores. Aider wrote 21% of its own recent code.
GPT-Engineer ~55k β˜… Specify what you want built and it generates the entire codebase. Asks clarifying questions, writes specs, then codes. Precursor to Lovable.dev β€” one of the first "prompt-to-codebase" tools. Now mostly archived; commercial product spun out.
GPT-Pilot ~33k β˜… AI developer with "95% AI / 5% human" paradigm β€” writes full features while keeping human-in-the-loop for review. Agent roles: Architect, Developer, etc. Human-in-the-loop by design β€” YC W24 backed. Powers the Pythagora VS Code extension.
SWE-agent ~19k β˜… Takes a GitHub issue and automatically fixes it. Introduced the Agent-Computer Interface (ACI) concept. Also handles cybersecurity CTFs via EnIGMA mode. NeurIPS 2024 paper. SoTA on SWE-bench with Claude 3.7. Open-weights SWE-agent-LM-32b. Princeton + Stanford.
Devin N/A First widely-publicized fully autonomous AI software engineer β€” plans, codes, debugs, deploys with its own shell, editor, and browser. Proprietary but category-defining. 13.86% on SWE-bench at launch (vs 1.96% prior SOTA). Cognition valued at $billions. Acquired Windsurf.
Cursor N/A AI-native code editor (VS Code fork) with Tab completion, Composer multi-file edits, and agent mode. $1B ARR in <24 months, $29.3B valuation. 50%+ of Fortune 500. Proprietary but sets the standard for AI-assisted coding.

πŸ‘₯ Multi-Agent Systems

Project Stars Description Key Differentiator
MetaGPT πŸ‡¨πŸ‡³ ~58k β˜… Simulates an entire software company β€” product managers, architects, and engineers collaborate via SOPs to turn one-line requirements into working code. ICLR 2024 Oral (top 1.2%). Philosophy: "Code = SOP(Team)." Also produced AFlow (ICLR 2025 oral). DeepWisdom (Chinese company).
ChatDev πŸ‡¨πŸ‡³ ~32k β˜… Chat-powered virtual software company. Now ChatDev 2.0: zero-code multi-agent orchestration platform for "Developing Everything." NeurIPS 2025 paper on evolving orchestration. From Tsinghua/OpenBMB. Bilingual docs (CN/EN).
CAMEL πŸ‡¨πŸ‡³ ~15k β˜… Pioneering role-playing framework for multi-agent cooperation β€” agents assume roles and collaborate via inception prompting. First multi-agent framework (March 2023). NeurIPS 2023. Training data used in Microsoft Phi and OpenHermes. OWL subproject hit #1 on GAIA benchmark.
AgentVerse πŸ‡¨πŸ‡³ ~5k β˜… Dual-framework design: task-solving mode (collaborative problem-solving) and simulation mode (observing emergent behaviors). Unique simulation capabilities β€” observe and study emergent multi-agent behaviors. OpenBMB/Tsinghua.
Generative Agents ~21k β˜… 25 AI agents inhabiting "Smallville" β€” they form relationships, gossip, coordinate events, all emerging from memory + reflection + planning. Landmark UIST 2023 paper. Stanford (Park, Liang, Bernstein). Introduced "believable simulacra of human behavior."
Agency Swarm Growing Framework for creating collaborative AI agent swarms with role-based organization and inter-agent communication. Swarm-native β€” agents self-organize with customizable communication topologies.

🌐 Web & Browser Agents

Project Stars Description Key Differentiator
Browser Use ~55k β˜… Open-source framework enabling LLMs to control browsers for web automation. Claims 89% on WebVoyager (SOTA). Y Combinator backed. Released open-source bu-30b model. MIT licensed. Cloud offering available.
GPT Researcher ~26k β˜… Autonomous deep research agent β€” scrapes and synthesizes 20+ web sources to produce factual, cited research reports. Original open-source deep research agent (May 2023, predating the "deep research" trend by ~2 years). Created by Tavily founder.
BrowserGym Growing Unified gym-like environment for web agent research β€” standardized observation/action spaces across WebArena, MiniWoB, WorkArena, etc. TMLR 2025 paper. ServiceNow. De facto standard for web agent evaluation infrastructure.
MindSearch πŸ‡¨πŸ‡³ ~5k β˜… LLM-based multi-agent web search engine (like Perplexity Pro). Supports DuckDuckGo, Bing, Brave, Google, Tencent search backends. Open-source Perplexity alternative from Shanghai AI Lab. Bilingual (CN/EN) query support.

πŸ–₯️ Computer Use & GUI Agents

Project Stars Description Key Differentiator
Agent TARS / UI-TARS πŸ‡¨πŸ‡³ ~10k β˜… ByteDance's multimodal AI agent stack β€” CLI and Web UI for browser automation + native GUI automation with remote computer/browser operators. Pioneering GUI agent paper (arXiv:2501.12326). Ships as both CLI agent and desktop application.
AppAgent πŸ‡¨πŸ‡³ ~5k β˜… Multimodal agent that operates smartphone apps through tap/swipe β€” two-phase approach: exploration (learning) then deployment (executing). Mobile-first agent. Tencent. Tested on 50 tasks across 10 apps. Supports GPT-4V and Qwen-VL-Max.
Voyager ~7k β˜… First LLM-powered embodied lifelong learning agent in Minecraft β€” continuously explores, acquires skills, and makes discoveries without human intervention. NeurIPS 2023 paper. NVIDIA (Jim Fan). 3.3Γ— more unique items, 15.3Γ— faster tech tree milestones vs prior SOTA. Code as action space.
OSWorld Research First scalable real computer environment benchmark for multimodal agents β€” 369 tasks across Ubuntu, Windows, macOS. NeurIPS 2024. Best agents achieve ~30% vs much higher human baselines. The hardest computer-use benchmark.

πŸ”¬ Research & Science Agents

Project Stars Description Key Differentiator
BabyAGI ~20k β˜… Pioneering AI task management system (140 lines of code) β€” creates, prioritizes, and executes tasks in a loop. Now on BabyAGI 3 with scheduling. Defined task-driven agents (March 2023). Created by VC Yohei Nakajima. One of the first viral autonomous agent concepts.
Tongyi DeepResearch πŸ‡¨πŸ‡³ Growing Leading open-source deep research agent using on-policy RL with Group Relative Policy Optimization. End-to-end RL approach to deep research β€” not prompt-engineered. Alibaba NLP. Compatible with ReAct and IterResearch paradigms.
OpenAgents ~5k β˜… Three specialized agents: Data Agent, Plugins Agent (200+ tools), Web Agent β€” open-source ChatGPT Plus alternative. COLM 2024 paper. XLANG Lab (HKU). First attempt at replicating full ChatGPT Plus as open source.

🧠 Memory & Knowledge Systems

Project Stars Description Key Differentiator
Letta (MemGPT) ~22k β˜… Platform for stateful agents with self-editing memory inspired by OS virtual memory β€” two-tier architecture: core memory + archival memory. Foundational memory paper ("MemGPT: Towards LLMs as Operating Systems"). UC Berkeley. $10M seed (Felicis). Transitioning to Letta V1 for GPT-5/Claude 4.5.
Mem0 Growing Memory layer for AI agents with personalization β€” short-term + long-term memory via conversation history and user preferences. Plug-and-play memory β€” flexible embedding and search configs. Works with any agent framework.
Zep Growing Session-based memory with automatic summarization, graph-based knowledge extraction, and fact extraction. Graph-based knowledge memory β€” maintains conversation transcripts as memory blocks with automated summarization.

πŸ”§ Tool-Use & Function Calling

Project Stars Description Key Differentiator
Gorilla ~12k β˜… LLM fine-tuned for API/function calling. Hosts the Berkeley Function Calling Leaderboard (BFCL V4 Agentic). GoEx runtime for safe execution. BFCL is the industry standard for evaluating function-calling in LLMs. Apache 2.0. UC Berkeley.
ToolBench / ToolLLM πŸ‡¨πŸ‡³ Growing Training framework for LLMs to master 16,000+ real-world APIs with DFSDT (Depth-First Search Decision Tree) reasoning. ICLR 2024. Introduced backtracking in tool-use reasoning. Three-stage pipeline: API collection β†’ instruction gen β†’ solution annotation.
Composio Growing Integration platform: 90+ tool connections for AI agents with managed authentication and hierarchical task execution. Managed auth + 90 integrations β€” solves the "connecting agents to real tools" problem out of the box.
AgentLego πŸ‡¨πŸ‡³ ~500 β˜… Versatile tool API library extending agents with multimodal capabilities β€” vision, image gen/edit, speech, VL reasoning. Multimodal tool library from Shanghai AI Lab. Integrates with LangChain, Transformers Agents, and Lagent.

🧩 Planning & Reasoning Frameworks

Project Stars Description Key Differentiator
DeerFlow 2.0 πŸ‡¨πŸ‡³ ~45k β˜… ByteDance's "SuperAgent harness" for research, coding, and creative tasks. v2.0 is a complete rewrite with sandboxed execution, persistent memory, sub-agent orchestration. Hit #1 GitHub Trending and reached 45k stars in weeks. Built on LangGraph. MIT license. Supports Doubao, DeepSeek, Kimi, GPT-4o, Claude.
XAgent πŸ‡¨πŸ‡³ ~8k β˜… Autonomous agent with three-component architecture: Dispatcher (routing), Planner (milestones), Actor (execution). Proactive human collaboration via AskForHumanHelp tool. Dispatcher-Planner-Actor paradigm. OpenBMB/Tsinghua. Docker-sandboxed. ToolServer with file editor, Python notebooks, shell, web browser.
TaskWeaver ~6k β˜… Microsoft's code-first agent for data analytics β€” converts natural language to executable Python with two-layer planning (Planner β†’ Code Generator β†’ Executor). Data analytics specialist β€” unique focus on rich data structures (pandas DataFrames). Stateful conversations. ⚠️ Archived.
SuperAGI ~17k β˜… Full-featured agent platform with toolkit marketplace, GUI dashboard, Docker deployment, and multi-agent workflows. Early marketplace approach for agent toolkits and templates. Dev has slowed significantly since mid-2024.
AgentGPT ~36k β˜… First browser-based autonomous agent UI β€” name a custom AI and set it on any goal. YC-backed. Pioneered browser-based agent UX. ⚠️ Archived β€” Reworkd team pivoted to data extraction.

πŸ‡¨πŸ‡³ Chinese Contributions

China has produced some of the most impactful agent research and production frameworks. This section highlights projects not already featured above.

Alibaba Ecosystem

Project Stars Description Key Differentiator
AgentScope ~20k β˜… Production-ready multi-agent framework with ReAct agents, MCP/A2A support, real-time voice, memory management, and evaluation. Broadest Chinese agent ecosystem β€” AgentScope-Java (2.2k β˜…), ReMe memory kit, CoPaw assistant, Trinity-RFT for agentic RL. Biweekly community meetings.
CoPaw ~13k β˜… Personal AI assistant workstation β€” multi-channel (DingTalk, Feishu, QQ, Discord, iMessage), local LLM support via llama.cpp/MLX, cron scheduling. Chinese-platform native β€” first-class support for DingTalk, Feishu, QQ, WeCom. Built on AgentScope.
Qwen-Agent ~13k β˜… Official agent framework for the Qwen model family β€” function calling, MCP, Code Interpreter, RAG (1M+ token context), Chrome extension. Qwen ecosystem's agent layer. Supports Qwen3, Qwen3-Coder, QwQ. Includes DeepPlanning evaluation benchmark.
ModelScope-Agent ~4k β˜… Lightweight framework with MCP support, deep research (55.43 on DeepResearch Bench), code generation, and AgentFabric for custom agent creation. Alibaba's model-hub-native agent with Anthropic Agent Skills protocol support.
Qwen Code Growing Terminal AI agent for coding (CLI), forked from Gemini CLI and optimized for Qwen models. Qwen's answer to Claude Code β€” terminal-first coding agent with Qwen optimization.

ByteDance Ecosystem

Project Stars Description Key Differentiator
Trae Agent ~2k β˜… LLM-based agent for general software engineering β€” features "Lakeview" for concise step summarization, trajectory recording, YAML config. Research-friendly design with trajectory recording for academic analysis. Multi-LLM support including Doubao.

Shanghai AI Lab / InternLM

Project Stars Description Key Differentiator
Lagent ~2k β˜… Lightweight, PyTorch-inspired agent framework with imperative Pythonic style, async/sync dual interface, ReAct/ReWOO agents. PyTorch-like API for agent building. InternLM ecosystem integration. HTTP deployment for distributed multi-agent apps.
HuixiangDou Growing Domain-specific QA agent β€” technical group chat assistant for answering questions in specialized domains. Domain QA specialist from Shanghai AI Lab's InternLM team.

Tencent

Project Stars Description Key Differentiator
C3-Benchmark Recent Comprehensive agent evaluation covering all action spaces β€” bilingual (CN/EN) data generation, controllable task generation. Most analysis dimensions among agent eval frameworks. Tencent Hunyuan. Multi-agent data generation.

Community & Education

Project Description
Hello-Agents Chinese-language educational tutorial for building AI agents from scratch. Datawhale community.
awesome-hermes-agent Curated resource list for the Hermes Agent ecosystem.

πŸ“Š Benchmarks & Evaluation

Benchmark GitHub / Link Paper Key Differentiator
SWE-bench arXiv:2310.06770 ICLR 2024 Oral. 2,294 real GitHub issue-PR pairs. Gold standard for coding agents. Variants: Lite, Verified (500 human-validated), Multimodal. The benchmark every coding agent cites.
AgentBench πŸ‡¨πŸ‡³ arXiv:2308.03688 ICLR 2024. First comprehensive multi-domain agent benchmark β€” 8 environments (OS, DB, KG, web, games). Revealed massive gap between commercial and open-source LLMs on agent tasks.
Ο„-bench arXiv:2406.12045 Emulates user-agent-tool conversations with domain-specific APIs and policy guidelines. pass^k reliability metric. Even GPT-4o succeeds on <50% of tasks. Sierra AI (Shunyu Yao). Expanded to τ²-bench and τ³-bench.
WebArena arXiv:2307.13854 Realistic web environment with functional website replicas. 812 tasks. Spawned VisualWebArena, WorkArena, TheAgentCompany. BrowserGym integration.
GAIA Meta AI 466 real-world questions requiring reasoning, multimodality, web browsing, tool use. Humans: 92%, GPT-4: 15%. Conceptually simple for humans yet devastatingly hard for AI.
ToolBench πŸ‡¨πŸ‡³ arXiv:2307.16789 ICLR 2024. 16,000+ real-world APIs from RapidAPI with DFSDT reasoning. Introduced backtracking reasoning for tool use. ToolLLaMA model.
OSWorld arXiv:2404.07972 NeurIPS 2024. Real computer environment (Ubuntu/Windows/macOS). 369 tasks. Best agents ~30%. Most challenging computer-use benchmark. Requires actual VM execution.
MLAgentBench Stanford Evaluates agents on ML research tasks: running experiments, analyzing results, writing code. Tests the "AI researcher" use case end-to-end.
Mind2Web NeurIPS 2023 2,000+ crowd-sourced web tasks across 137 real websites. Large-scale real-website benchmark.
BFCL UC Berkeley / Gorilla Berkeley Function Calling Leaderboard β€” V4 Agentic evaluates tool-calling in real-world agentic settings. Industry standard for function-calling evaluation.
ScienceAgentBench ICLR 2025 102 tasks for data-driven scientific discovery. First rigorous benchmark for science agents.

πŸ”— Protocols & Standards

Protocol Origin Description
MCP (Model Context Protocol) Anthropic (Nov 2024) β†’ Linux Foundation (Dec 2025) Open standard for connecting AI models to tools and data sources. 97M+ monthly SDK downloads. Supported by ChatGPT, Claude, Gemini, Cursor, VS Code.
A2A (Agent-to-Agent Protocol) Google β†’ Linux Foundation Protocol for agent discovery, capability advertisement, and collaboration across frameworks. 150+ supporting organizations.

πŸ“„ Foundational Papers

These papers introduced paradigms now embedded in virtually every agent framework.

Paper Year Venue Key Contribution
ReAct: Synergizing Reasoning and Acting 2023 ICLR 2023 The agent paradigm. Interleaved reasoning traces and actions β€” the foundation of modern LLM agents. Shunyu Yao et al. (Princeton).
Chain-of-Thought Prompting 2022 NeurIPS 2022 Step-by-step reasoning. The foundational technique underlying all agent reasoning. Jason Wei et al. (Google).
Reflexion: Verbal Reinforcement Learning 2023 NeurIPS 2023 Self-correction without weight updates. Agents learn from verbal self-reflection. 91% pass@1 on HumanEval. Noah Shinn et al. (Princeton).
Tree of Thoughts 2023 NeurIPS 2023 Search-based reasoning. Extends CoT to explore multiple reasoning paths with BFS/DFS. Shunyu Yao et al. (Princeton).
Toolformer 2023 NeurIPS 2023 Self-supervised tool learning. LLMs learn when/how to call tools. 6B model outperforms GPT-3 175B on math. Meta AI.
HuggingGPT 2023 NeurIPS 2023 LLM as model orchestrator. ChatGPT plans and dispatches tasks to expert models from HuggingFace Hub.
LATS: Language Agent Tree Search 2024 ICML 2024 Unifies reasoning + acting + planning via MCTS. 92.7% pass@1 on HumanEval with GPT-4. Andy Zhou et al. (UIUC).
GEPA: Reflective Prompt Evolution 2026 ICLR 2026 Oral Self-improving agents. Outperforms GRPO by 6% with 35Γ— fewer rollouts. Powers Hermes Agent's self-evolution.
AutoGen 2024 ICLR 2024 Workshop (Best Paper) Multi-agent conversation paradigm. Defined how agents collaborate through structured dialogue. Microsoft.
MetaGPT 2024 ICLR 2024 Oral SOP-driven multi-agent collaboration. "Code = SOP(Team)" β€” encodes human workflows as agent prompts.
CAMEL 2023 NeurIPS 2023 Role-playing agents. First framework for multi-agent cooperation via inception prompting.
Generative Agents 2023 UIST 2023 Believable simulacra of human behavior. Memory streams + reflection + planning = emergent social behaviors. Stanford.
OpenHands 2024 β€” AI software developer as generalist agent. Defines the platform architecture for coding agents.
SWE-agent (ACI) 2024 NeurIPS 2024 Agent-Computer Interfaces. How interface design changes agent performance. Princeton/Stanford.
Voyager 2023 NeurIPS 2023 Embodied lifelong learning. Auto-curriculum + skill library + self-verification in Minecraft. NVIDIA.
MemGPT 2023 β€” LLMs as operating systems. Self-editing memory with virtual context management. UC Berkeley.
A Survey on LLM-based Autonomous Agents 2023 Updated Mar 2025 Most-cited agent survey. Unified framework: Profile + Memory + Planning + Action modules.

πŸ“š Surveys & Curated Lists

Resource Description
awesome-ai-agents 1,500+ resources on AI agents β€” the largest general-purpose collection.
awesome-ai-agents 300+ agentic resources with star counts and categorization.
AI Agent Benchmark Compendium 50+ benchmarks categorized and compared.
shelldex.com Community directory tracking 52+ OpenClaw derivatives across 9 languages. Weekly growth rankings.
Agentic AI Survey Dual-paradigm framework (Symbolic vs Neural); PRISMA-based review of 90 studies.
LLM Agent Methodology Survey March 2025 methodology-centered taxonomy of agent architectures.

Contributing

Contributions welcome! Please read the guidelines below before submitting a PR.

What belongs here

  • Open-source frameworks with a real GitHub repo, >500 stars or a published paper
  • Academic papers that introduced a technique now widely used in agent systems
  • Benchmarks that are actively used by the research community
  • Production deployments that are open source and demonstrate novel architecture

What doesn't belong

  • Thin API wrappers with no novel architecture
  • Closed-source products without open-source components (exceptions: category-defining products like Devin and Cursor)
  • Projects with no activity for 12+ months and <1,000 stars
  • Tutorials, courses, or blog posts (except as supplementary links)

How to contribute

  1. Fork this repository
  2. Add your entry in the appropriate category with: [Project Name](URL) | stars | One-line description | Key differentiator
  3. Verify the GitHub URL is live and the project is active
  4. Submit a PR with a brief explanation of why this project is notable

⭐ Star this repo if you found it useful!

Last updated: March 2026 Β· Maintained with care Β· CC0 1.0 Universal

About

A curated list of autonomous AI agent frameworks, tools, benchmarks, and papers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages