Content

<h1 align="center"> Best of Agent Harnesses and Harness Techniques </h1> 🏆  Curated list of AI agent harnesses, orchestration frameworks, and harness techniques for reliable agentic systems. <a href="https://best-of.org" title="Best-of Badge"><img src="http://bit.ly/3o3EHNN"></a> <a href="#contents" title="Project Count"><img src="https://img.shields.io/badge/projects-110-blue.svg?color=5ac4bf"></a> <a href="#contribution" title="Contributions welcome"><img src="https://img.shields.io/badge/contributions-welcome-green.svg"></a> <a href="https://github.com/RyanAlberts/best-of-Agent-Harnesses/releases" title="Updates"><img src="https://img.shields.io/github/release-date/RyanAlberts/best-of-Agent-Harnesses?color=green&label=updated"></a> ## What is an agent harness? A model answers; an agent acts. An agent harness is the runtime that turns one into the other — the model thinks; the harness decides what that thinking is allowed to touch. Every prior wave of automation was constrained by brittleness: you scripted exact behavior, and when the world deviated, the system broke. Foundation models inverted that problem—they're flexible but directionless, stateless, and disconnected from anything real. The agent harness exists to bridge that gap: it is the orchestration infrastructure that converts a model's per-turn reasoning into sustained, tool-using, error-recovering, goal-directed behavior across time. Architecturally, it plays the role the kernel played in operating systems or the controller played in industrial robotics—mediating between raw capability and a messy environment—but with a critical difference: the "capability" it governs is general-purpose cognition, which means the harness is simultaneously a scheduler, a permission system, a memory manager, and a policy enforcement layer, all under-specified and evolving in real time. ## Why harnesses matter Better models make harnesses more important: more capabilities mean more failure modes, and production needs retry logic, fallbacks, and validation. Harness quality—not just model quality—determines whether agents actually ship. This list ranks projects by relevance to harness concerns (environment, orchestration, lifecycle, guardrails) and by stars/activity. ## The landscape at a glance [![The Agent Harness Landscape — all projects plotted by adoption surface area against GitHub stars](assets/landscape.svg)](assets/landscape.svg) _Every project in the list, plotted by adoption surface area (the [simplicity ↔ capability axis](#guide-to-rankings)) against GitHub stars. Colors are categories; the largest projects in each tier are labeled._ [![Autonomy × Recovery — every loop-owning project placed by designed autonomy regime and failure-recovery tier](assets/axes-grid.svg)](assets/axes-grid.svg) _The same projects placed by how much unsupervised rope they're designed to give (autonomy) and what happens when a run dies (recovery). In the tables below, ★ marks headless-ready projects and ✱ marks durable ones. Both charts regenerate from the list data on every refresh._ ## How to Pick a Harness _Start with the guide, then the head-to-head decision pages — grounded in the same data as the tables below:_ - [**How to pick a harness**](comparisons/how-to-pick-a-harness.md) — six questions that turn this list into a decision, including the post–June 2026 billing reality - [**OpenClaw vs Hermes**](comparisons/openclaw-vs-hermes.md) — the always-on personal-agent debate: presence vs discipline, plus what the field reports actually say - [**Terminal coding agents** — opencode vs Codex vs Gemini CLI vs crush vs goose](comparisons/terminal-coding-agents.md) - [**Multi-agent orchestration** — OpenAI Agents SDK vs CrewAI vs AutoGen vs LangGraph](comparisons/multi-agent-orchestration.md) - [**Agent memory layers** — Mem0 vs Letta vs claude-mem](comparisons/memory-layers.md) ## Pick by use case _Reader's index: pick by what you want to do, not by category. Tag chips (e.g. `mcp` · `memory`) next to each row let you cross-filter by capability — see [TAGS.md](TAGS.md) for the full cross-reference._ - **I want a turnkey coding agent today** — [opencode](https://github.com/anomalyco/opencode), [Cline](https://github.com/cline/cline), [Codex](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), [OpenHands](https://github.com/OpenHands/OpenHands), [crush](https://github.com/charmbracelet/crush), [Roo Code](https://github.com/RooCodeInc/Roo-Code) · see [Coding agent products (IDEs, CLIs, full suites)](#coding-agent-products-ides-clis-full-suites) - **I want an always-on personal agent that lives in my chat apps** — [OpenClaw](https://github.com/openclaw/openclaw), [Hermes](https://github.com/NousResearch/hermes-agent), [Khoj](https://github.com/khoj-ai/khoj), [Agent Zero](https://github.com/agent0ai/agent-zero), [OpenHarness (HKUDS)](https://github.com/HKUDS/OpenHarness) · see [Personal agent runtimes](#personal-agent-runtimes) - **I want to extend Claude Code, Codex, or OpenCode with skills and slash commands** — [Anthropic Skills](https://github.com/anthropics/skills), [everything-claude-code](https://github.com/affaan-m/ECC), [superpowers](https://github.com/obra/superpowers), [GStack](https://github.com/garrytan/gstack), [pmstack](https://github.com/RyanAlberts/pmstack) · see [Coding harness configs and SDKs](#coding-harness-configs-and-sdks) - **I want to build my own coding harness from scratch** — [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python), [Google ADK](https://github.com/google/adk-python), [AutoHarness](https://github.com/aiming-lab/AutoHarness), [SWE-agent](https://github.com/SWE-agent/SWE-agent), [RepoMaster](https://github.com/QuantaAlpha/RepoMaster), [claw-code-agent](https://github.com/HarnessLab/claw-code-agent) · see [Coding harness configs and SDKs](#coding-harness-configs-and-sdks) - **I want a drop-in memory layer for agents** — [Mem0](https://github.com/mem0ai/mem0), [claude-mem](https://github.com/thedotmack/claude-mem), [agentlog](https://github.com/RyanAlberts/agentlog), [agno](https://github.com/agno-agi/agno), [letta](https://github.com/letta-ai/letta) · see [Plugins, MCPs, CLI tools](#plugins-mcps-cli-tools) - **I want to plug hundreds to thousands of tools without context bloat** — [MCP-Zero](https://github.com/xfey/MCP-Zero), [ToolGen](https://github.com/Reason-Wang/ToolGen), [ToolRAG](https://github.com/antl3x/ToolRAG), [langgraph-bigtool](https://github.com/langchain-ai/langgraph-bigtool), [spring-ai-tool-search-tool](https://github.com/spring-ai-community/spring-ai-tool-search-tool) · see [Progressive disclosure harnesses](#progressive-disclosure-harnesses) - **I want multi-agent orchestration** — [openai-agents-python](https://github.com/openai/openai-agents-python), [crewAI](https://github.com/crewAIInc/crewAI), [autogen](https://github.com/microsoft/autogen), [Microsoft Agent Framework](https://github.com/microsoft/agent-framework), [PraisonAI](https://github.com/MervinPraison/PraisonAI), [agent-squad](https://github.com/2FastLabs/agent-squad) · see [Multi-agent and orchestration](#multi-agent-and-orchestration) - **I want a general LLM app framework** — [langgraph](https://github.com/langchain-ai/langgraph), [langchain](https://github.com/langchain-ai/langchain), [llama-index](https://github.com/run-llama/llama_index), [pydantic-ai](https://github.com/pydantic/pydantic-ai), [agno](https://github.com/agno-agi/agno) · see [Frameworks](#frameworks) - **I want low-code / visual workflows** — [langflow](https://github.com/langflow-ai/langflow), [Flowise](https://github.com/FlowiseAI/Flowise), [Dify](https://github.com/langgenius/dify), [n8n](https://github.com/n8n-io/n8n) · see [Frameworks](#frameworks) - **I want browser-using agents** — [browser-use](https://github.com/browser-use/browser-use), [WebVoyager](https://github.com/MinorJerry/WebVoyager), [puppeteer-real-browser-mcp](https://github.com/withLinda/puppeteer-real-browser-mcp-server) · see [Plugins, MCPs, CLI tools](#plugins-mcps-cli-tools) - **I want sandboxed code execution for agent-generated code** — [E2B](https://github.com/e2b-dev/E2B), [Daytona](https://github.com/daytonaio/daytona), [smolagents](https://github.com/huggingface/smolagents), [OpenHands](https://github.com/OpenHands/OpenHands) · see [Libraries and SDKs](#libraries-and-sdks) - **I want to evaluate or benchmark agents** — [SWE-bench](https://github.com/SWE-bench/SWE-bench), [AgencyBench](https://github.com/GAIR-NLP/AgencyBench), [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai), [WebArena](https://github.com/web-arena-x/webarena), [ARC-AGI-2](https://github.com/arcprize/ARC-AGI-2), [VitaBench](https://github.com/meituan-longcat/vitabench) · see [Evaluation and benchmarking harnesses](#evaluation-and-benchmarking-harnesses) - **I want a deep research / autonomous research agent** — [deepagents](https://github.com/langchain-ai/deepagents), [gpt-researcher](https://github.com/assafelovic/gpt-researcher), [openagents](https://github.com/OpenAgentsInc/openagents) · see [Research and task-specific harnesses](#research-and-task-specific-harnesses) - **I want a provider-agnostic LLM pipe (not a framework)** — [LiteLLM](https://github.com/BerriAI/litellm), [vercel/ai](https://github.com/vercel/ai) · see [Libraries and SDKs](#libraries-and-sdks) ## For agents This list is also published in machine-readable form, so coding agents and research agents can recommend harnesses — not just humans browsing GitHub: - [**harnesses.json**](harnesses.json) — every project with category, complexity tier, capability tags, stars, license signal, and a concrete example link, plus the full use-case index. - [**llms.txt**](llms.txt) — the entire list in one agent-readable file. Point any agent at the [raw URL](https://raw.githubusercontent.com/RyanAlberts/best-of-Agent-Harnesses/main/llms.txt). - [**MCP server**](mcp/) — `pick_harness` (with complexity/autonomy/recovery filters), `search_harnesses`, `get_harness`, `list_categories`, plus `list_comparisons`/`get_comparison` for the decision guides. Published to PyPI and the [official MCP registry](https://registry.modelcontextprotocol.io) as `io.github.RyanAlberts/agent-harnesses`. One-line install (needs [uv](https://docs.astral.sh/uv/)): ```sh claude mcp add agent-harnesses -- uvx agent-harnesses-mcp ``` ## Contents - [The landscape at a glance](#the-landscape-at-a-glance) - [How to Pick a Harness](#how-to-pick-a-harness) - [Pick by use case](#pick-by-use-case) - [For agents: harnesses.json, llms.txt, MCP server](#for-agents) - [Progressive disclosure harnesses](#progressive-disclosure-harnesses) _7 projects_ - [Coding agent products (IDEs, CLIs, full suites)](#coding-agent-products-ides-clis-full-suites) _11 projects_ - [Coding harness configs and SDKs](#coding-harness-configs-and-sdks) _10 projects_ - [Personal agent runtimes](#personal-agent-runtimes) _7 projects_ - [Frameworks](#frameworks) _23 projects_ - [Multi-agent and orchestration](#multi-agent-and-orchestration) _8 projects_ - [Plugins, MCPs, CLI tools](#plugins-mcps-cli-tools) _12 projects_ - [Evaluation and benchmarking harnesses](#evaluation-and-benchmarking-harnesses) _16 projects_ - [Research and task-specific harnesses](#research-and-task-specific-harnesses) _2 projects_ - [Libraries and SDKs](#libraries-and-sdks) _14 projects_ ## Guide to rankings - ⭐ **Stars** — GitHub star count, captured 2026-06-14; tables sort by stars descending. - ⚖️ **Simplicity ↔ capability** — adoption surface, 4 tiers: **super simple** (a format, one concept) → **mostly simple** (thin layer) → **slightly complex** (real SDK) → **complex** (product suite). - ★ **Headless-ready** — designed for unattended runs, batches, and fleets (the top of the autonomy scale: step-gated → checkpoint-gated → bounded → headless). - ✱ **Durable** — persisted execution state survives restarts mid-task (the top of the recovery scale: none → retry → resumable → durable). - ✅ **Open source** — ✅ standard OSS license · ⚠️ source-available/restricted · ❓ no or unclear license. - 🏷️ **Tags** — capability chips auto-derived from descriptions; full cross-reference in [TAGS.md](TAGS.md). - 🎯 **Examples** — one concrete "show me it in action" link per project, not a docs root. Every project's full autonomy and recovery tier is plotted in the [grid above](#the-landscape-at-a-glance) and carried in [harnesses.json](harnesses.json) and [llms.txt](llms.txt); scores are editorial, from public docs — maintainer corrections via issue/PR are merged fast. ## Progressive disclosure harnesses <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Formats, runtimes, and patterns that reveal context, tools, or instructions in layers—index first, details on demand—to control tokens and improve agent focus (the "map, not encyclopedia" principle)._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**awesome-cursorrules**](https://github.com/PatrickJS/awesome-cursorrules) | [40k](https://github.com/PatrickJS/awesome-cursorrules/stargazers) | Curated .cursorrules and skills that leverage Cursor's index-then-load model; the canonical collection for rules-as-progressive-disclosure in the IDE. `ide` | ✅ | super simple (content bundle) | [PyTorch cursorrules](https://github.com/PatrickJS/awesome-cursorrules/blob/main/rules/pytorch-scikit-learn-cursorrules-prompt-file.mdc) | | 2 | [**agents.md**](https://github.com/agentsmd/agents.md) | [22.2k](https://github.com/agentsmd/agents.md/stargazers) | Open format for repo-scoped agent briefings; v1.1 adds hierarchical scope and progressive disclosure so agents get a map of what exists, then load only what's relevant. `typescript` | ✅ | super simple (format only) | [Self-hosting AGENTS.md](https://github.com/agentsmd/agents.md/blob/main/AGENTS.md) | | 3 | [**langgraph-bigtool**](https://github.com/langchain-ai/langgraph-bigtool) ✱ | [543](https://github.com/langchain-ai/langgraph-bigtool/stargazers) | Build LangGraph agents with large tool sets; retrieval and on-demand tool loading so agents scale beyond context without stuffing every schema upfront. `tool-discovery` · `python` | ✅ | slightly complex (large tool sets) | [Math-library tool agent](https://github.com/langchain-ai/langgraph-bigtool#quickstart) | | 4 | [**MCP-Zero**](https://github.com/xfey/MCP-Zero) | [488](https://github.com/xfey/MCP-Zero/stargazers) | Active tool discovery for autonomous agents: model requests tools by requirement; hierarchical semantic routing over 308 servers / 2,797 tools with ~98% token reduction (APIBank). `tool-discovery` | ✅ | complex (3k tools, full routing) | [APIBank experiment](https://github.com/xfey/MCP-Zero/blob/master/MCP-zero/experiment_apibank.py) | | 5 | [**ToolGen**](https://github.com/Reason-Wang/ToolGen) | [179](https://github.com/Reason-Wang/ToolGen/stargazers) | ICLR 2025: unified tool retrieval and calling via generation; 47k+ tools without context stuffing—retrieval and invocation in one generative step. `tool-discovery` · `python` | ❓ | complex (47k+ tools) | [Full eval pipeline](https://github.com/Reason-Wang/ToolGen/blob/master/scripts/eval_full_pipeline.sh) | | 6 | [**spring-ai-tool-search-tool**](https://github.com/spring-ai-community/spring-ai-tool-search-tool) | [74](https://github.com/spring-ai-community/spring-ai-tool-search-tool/stargazers) | Dynamic tool discovery for Spring AI: model gets a search tool first, then pulls definitions for relevant tools; 34–64% token reduction across providers. `tool-discovery` | ✅ | mostly simple (search-then-load) | [Tool Search demo app](https://github.com/spring-ai-community/spring-ai-tool-search-tool/tree/main/examples/tool-search-tool-demo) | | 7 | [**ToolRAG**](https://github.com/antl3x/ToolRAG) | [28](https://github.com/antl3x/ToolRAG/stargazers) | Semantic tool retrieval for LLMs; serves only the tools the user query demands (MCP-compatible), unlimited tool sets with zero context penalty. `mcp` · `tool-discovery` | ✅ | mostly simple (query-driven retrieval) | [MCP server retrieval](https://github.com/antl3x/ToolRAG/blob/main/packages/%40antl3x-toolrag/README.md) | ## Coding agent products (IDEs, CLIs, full suites) <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Turnkey coding agents you install and run: IDE extensions, terminal CLIs, Dockerized workspaces. Each entry notes which part is the harness (the agent loop, tool wiring, approval model) versus the UI shell (VS Code extension, TUI, browser client)._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**opencode**](https://github.com/anomalyco/opencode) ★ | [174k](https://github.com/anomalyco/opencode/stargazers) | Open-source terminal coding agent (formerly `sst/opencode`; transferred to anomalyco). The **harness** is a multi-provider tool-call loop (Claude, OpenAI, Gemini, local) with strong plugin and MCP support; the TUI is the shell. 100% OSS, very actively shipped. `mcp` · `provider-agnostic` · `cli` · `tui` · `typescript` | ✅ | slightly complex (multi-provider, plugins, MCP) | [Agent system page](https://opencode.ai/docs/agents/) | | 2 | [**Gemini CLI**](https://github.com/google-gemini/gemini-cli) | [105k](https://github.com/google-gemini/gemini-cli/stargazers) | Google's first-party terminal agent for Gemini. The **harness** is the plugin/MCP tool-call loop; the terminal is the shell—Google's parallel to Claude Code / Codex, not just an API. `mcp` · `cli` · `typescript` | ✅ | slightly complex (official CLI, plugins, MCP) | [MCP server setup](https://github.com/google-gemini/gemini-cli/blob/main/docs/tools/mcp-server.md) | | 3 | [**Codex**](https://github.com/openai/codex) | [91k](https://github.com/openai/codex/stargazers) | OpenAI's terminal coding agent. The **harness** is the sandboxed tool-call loop with multi-provider support; the CLI is the shell. Reference implementation for "official CLI that ships code." `sandbox` · `provider-agnostic` · `cli` | ✅ | slightly complex (reference CLI, sandboxed) | [Sandboxing concept](https://developers.openai.com/codex/concepts/sandboxing) | | 4 | [**OpenHands**](https://github.com/OpenHands/OpenHands) ★ | [77k](https://github.com/OpenHands/OpenHands/stargazers) | Dockerized software-engineering agent. The **harness** is the bash/editor/browser toolset with micro-agents and event-stream session bridging; Docker is the sandbox. Main OSS choice for teams self-hosting autonomous repo work. `memory` · `browser` · `sandbox` · `python` | ⚠️ (multi-license) | complex (Docker runtime, multi-surface agent — product suite) | [Repository microagents](https://docs.all-hands.dev/usage/prompting/microagents-repo) | | 5 | [**Open Interpreter**](https://github.com/openinterpreter/openinterpreter) | [63.9k](https://github.com/openinterpreter/openinterpreter/stargazers) | Lightweight terminal coding agent oriented to open models (DeepSeek, Kimi, Qwen). The **harness** is a code-execution loop — the model writes code, the harness executes it with confirmation gates; the CLI is the shell. The original "let the LLM run code on my machine" project, reborn for open weights. `cli` · `python` | ✅ | mostly simple (lean code-exec loop) | [Quick start](https://github.com/openinterpreter/openinterpreter#readme) | | 6 | [**Cline**](https://github.com/cline/cline) | [63.3k](https://github.com/cline/cline/stargazers) | VS Code extension whose **harness** is a plan-then-act loop with per-step human approval and cost transparency; the VS Code integration is the UI shell. Open-source counterweight to Cursor. `ide` · `typescript` | ✅ | slightly complex (plan-then-act, approval gates) | [Plan & Act mode](https://docs.cline.bot/features/plan-and-act) | | 7 | [**goose**](https://github.com/aaif-goose/goose) ★ | [49.3k](https://github.com/aaif-goose/goose/stargazers) | Block-originated Rust agent, now stewarded by the Linux Foundation's Agentic AI Foundation (`aaif-goose/goose`). The **harness** is the MCP/ACP extension model with recipes and provider choice; there's no fixed UI slot—you bolt it into whatever shell you use. `mcp` · `rust` | ✅ | slightly complex (extensions, MCP/ACP) | [Goose recipes guide](https://block.github.io/goose/docs/guides/recipes/) | | 8 | [**crush**](https://github.com/charmbracelet/crush) | [25.3k](https://github.com/charmbracelet/crush/stargazers) | Charm's terminal coding agent (Charm's fork of the original OpenCode). The **harness** is the tool-calling loop with session persistence; the Bubble Tea TUI is the shell. `memory` · `cli` · `tui` | ⚠️ FSL-1.1-MIT | slightly complex (terminal agent, TUI) | [Crush launch post](https://charm.land/blog/crush-comes-home/) | | 9 | [**Roo Code**](https://github.com/RooCodeInc/Roo-Code) | [24.2k](https://github.com/RooCodeInc/Roo-Code/stargazers) | VS Code/Cursor extension in the Cline lineage. The **harness** is the approval-gated agent with custom modes and a strong MCP story; the IDE is the UI. Popular community fork when you want that workflow without the upstream extension. `mcp` · `workflow` · `ide` · `typescript` | ✅ | slightly complex (IDE extension, MCP-first) | [Custom modes guide](https://docs.roocode.com/features/custom-modes) | | 10 | [**claw-code-agent**](https://github.com/HarnessLab/claw-code-agent) | [508](https://github.com/HarnessLab/claw-code-agent/stargazers) | Python reimplementation of the Claude Code agent architecture with zero external dependencies; interactive chat, streaming, plugin runtime, nested agent delegation, cost tracking, MCP transport—portable harness without the Rust/TS toolchain. `mcp` · `rust` · `python` · `typescript` | ❓ | slightly complex (pure Python, plugin runtime) | [Quick Start guide](https://github.com/HarnessLab/claw-code-agent#-quick-start) | | 11 | [**coderClaw**](https://github.com/SeanHogg/BuilderForceAgents) | [3](https://github.com/SeanHogg/BuilderForceAgents/stargazers) | Self-hosted multi-role coding system (Creator, Reviewer, Test, Refactor, etc.) with AST and semantic maps; IDE-agnostic, chat-channel triggers. `ide` · `typescript` | ❓ | slightly complex (multi-role, AST/semantic) | [Multi-agent README](https://github.com/SeanHogg/BuilderForceAgents#readme) | ## Coding harness configs and SDKs <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Skill packs, slash-command libraries, meta-prompting frameworks, and official SDKs that give you the harness (the agent loop, planning, memory, hooks) without bundling a specific IDE or CLI shell._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**superpowers**](https://github.com/obra/superpowers) | [228k](https://github.com/obra/superpowers/stargazers) | Performance-oriented harness pack for Claude Code, Codex, OpenCode, Cursor: skills, instincts, memory, security, research-first workflows. Treats harness engineering itself as the performance lever. `memory` · `ide` | ✅ | complex (multi-IDE skill stack — product suite) | [TDD skill](https://github.com/obra/superpowers/blob/main/skills/test-driven-development/SKILL.md) | | 2 | [**everything-claude-code**](https://github.com/affaan-m/ECC) | [215k](https://github.com/affaan-m/ECC/stargazers) | The breakout 2026 harness pack for Claude Code: 28 specialized subagents, 119 reusable skills, 60 slash commands, 34 rules, 20+ automated hooks. Ships a full "AI engineering team" as config. `multi-agent` | ✅ | complex (subagents + skills + hooks — product suite) | [autonomous-agent-harness skill](https://github.com/affaan-m/ECC/blob/main/skills/autonomous-agent-harness/SKILL.md) | | 3 | [**Anthropic Skills**](https://github.com/anthropics/skills) | [151k](https://github.com/anthropics/skills/stargazers) | Anthropic's official Agent Skills repository: SKILL.md-based folders (instructions, scripts, resources) Claude dynamically loads on Claude Code, Claude.ai, and the API. The reference for progressive-disclosure skill packs in 2026. | ✅ | mostly simple (official skills format) | [docx skill](https://github.com/anthropics/skills/blob/main/skills/docx/SKILL.md) | | 4 | [**GStack**](https://github.com/garrytan/gstack) | [110k](https://github.com/garrytan/gstack/stargazers) | Garry Tan's Claude Code skill stack: 23 slash-command modes (CEO/eng/design review, QA, ship, browse, retro, …) that structure one assistant as a virtual engineering team. Daily driver while running YC. `typescript` | ✅ | slightly complex (multi-role slash-command harness) | [/ship SKILL.md](https://github.com/garrytan/gstack/blob/main/ship/SKILL.md) | | 5 | [**get-shit-done**](https://github.com/gsd-build/get-shit-done) | [64.2k](https://github.com/gsd-build/get-shit-done/stargazers) | Goal-backward planning and wave-based execution over fresh context windows; avoids context rot by design. Python/JS meta-prompting for Claude Code, OpenCode, Gemini CLI. `cli` · `python` | ✅ | mostly simple (meta-prompting, you own stack) | [gsd:ship command](https://github.com/gsd-build/get-shit-done/blob/main/commands/gsd/ship.md) | | 6 | [**SWE-agent**](https://github.com/SWE-agent/SWE-agent) ★ | [19.5k](https://github.com/SWE-agent/SWE-agent/stargazers) | LM-driven harness built for SWE-bench: edit state, command execution, and issue-focused loop—the reference agent stack next to the benchmark itself. `memory` · `evals` · `python` | ✅ | slightly complex (SWE-bench pairing, stateful edits) | [Default agent config](https://github.com/SWE-agent/SWE-agent/blob/main/config/default.yaml) | | 7 | [**Claude Agent SDK**](https://github.com/anthropics/claude-agent-sdk-python) ★ | [7.3k](https://github.com/anthropics/claude-agent-sdk-python/stargazers) | Official Anthropic SDK (Python + [TypeScript](https://github.com/anthropics/claude-agent-sdk-typescript), [demos](https://github.com/anthropics/claude-agent-sdk-demos), [quickstarts](https://github.com/anthropics/claude-quickstarts)): built-in tools, MCP, long-running coding agents with session bridging. `mcp` · `memory` · `python` · `typescript` | ✅ | complex (full SDK, session bridging — product suite) | [Research agent demo](https://github.com/anthropics/claude-agent-sdk-demos/blob/main/research-agent/research_agent/agent.py) | | 8 | [**RepoMaster**](https://github.com/QuantaAlpha/RepoMaster) ★ | [528](https://github.com/QuantaAlpha/RepoMaster/stargazers) | Repo-scoped research harness: builds function-call and module-dependency graphs to explore only what's needed; large relative gains on MLE-bench and GitTaskBench with lower token use. `workflow` · `python` | ❓ | slightly complex (graph-based exploration) | [PDF-parse case study](https://github.com/QuantaAlpha/RepoMaster/blob/main/example/pdf_parse.md) | | 9 | [**AutoHarness**](https://github.com/aiming-lab/AutoHarness) | [324](https://github.com/aiming-lab/AutoHarness/stargazers) | Lightweight governance harness: wraps any LLM client in ~2 lines for automated harness engineering—6–14 step pipeline, YAML constitution, risk-pattern matching, session persistence with cost tracking, multi-agent profiles. `memory` · `multi-agent` · `provider-agnostic` · `python` | ✅ | super simple (2-line wrapper, YAML gov) | [Full pipeline demo](https://github.com/aiming-lab/AutoHarness/blob/main/examples/full_pipeline_demo.py) | | 10 | [**pmstack**](https://github.com/RyanAlberts/pmstack) | [2](https://github.com/RyanAlberts/pmstack/stargazers) | Claude Code config for AI product managers: CLAUDE.md plus skills for competitive analysis, PRD-from-signal, metric frameworks, stakeholder briefs, and agent eval design. "GStack for PMs." `evals` | ✅ | super simple (skills bundle, PM-focused) | [PRD-from-signal skill](https://github.com/RyanAlberts/pmstack/blob/main/skills/prd-from-signal.md) | ## Personal agent runtimes <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Always-on, self-hosted agents you run as a daemon and talk to from chat apps: gateway runtimes, second brains, and self-improving assistants. The agent as a product you operate, not a library you build with._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**OpenClaw**](https://github.com/openclaw/openclaw) ★ | [379k](https://github.com/openclaw/openclaw/stargazers) | Self-hosted, always-on personal agent (formerly Clawdbot/Moltbot): a gateway + event-loop runtime that treats messages, heartbeats, crons, and webhooks as one input queue, persists state to local files, and lives in your chat apps (WhatsApp, Telegram, Slack, Discord). 13,700+ community skills; the fastest-growing repo in GitHub history. `typescript` · `multi-agent` | ✅ | complex (always-on runtime, channels, skill ecosystem — product suite) | [Agent runtime architecture](https://github.com/openclaw/openclaw/blob/main/docs/agent-runtime-architecture.md) | | 2 | [**Hermes**](https://github.com/NousResearch/hermes-agent) ★ | [193k](https://github.com/NousResearch/hermes-agent/stargazers) | Nous Research's self-improving agent: a learning loop turns experience into reusable skills, builds a persistent user model across sessions, and checkpoints state to disk with rollback; lean enough for a $5 VPS, driven from chat, and model-agnostic (Nous Portal, OpenRouter, OpenAI, or any endpoint). `memory` · `python` · `provider-agnostic` | ✅ | slightly complex (lean runtime, learning loop, disk-first memory) | [Built-in skills](https://github.com/NousResearch/hermes-agent/tree/main/skills) | | 3 | [**Khoj**](https://github.com/khoj-ai/khoj) ★ | [35.1k](https://github.com/khoj-ai/khoj/stargazers) | Self-hostable "AI second brain": answers over your docs and the web, custom agents, scheduled automations, and multi-client reach (web, Obsidian, Emacs, WhatsApp). A personal-agent harness with retrieval at the core. `python` | ✅ | complex (server + clients — product suite) | [Feature tour](https://github.com/khoj-ai/khoj#readme) | | 4 | [**Eliza**](https://github.com/elizaOS/eliza) ★ | [18.6k](https://github.com/elizaOS/eliza/stargazers) | Open "agentic operating system" (elizaOS): persistent multi-agent runtime with character files, a plugin ecosystem, and social/platform integrations — the harness behind a large share of autonomous social agents. `memory` · `multi-agent` · `typescript` | ✅ | complex (runtime + plugin ecosystem — product suite) | [Agent quickstart](https://github.com/elizaOS/eliza#readme) | | 5 | [**Agent Zero**](https://github.com/agent0ai/agent-zero) | [18.1k](https://github.com/agent0ai/agent-zero/stargazers) | Organic, prompt-defined personal agent framework: hierarchical sub-agents, persistent memory, browser and code tools, and self-modifying behavior; runs in Docker with a web UI. `memory` · `multi-agent` · `browser` · `sandbox` · `python` | ❓ | slightly complex (prompt-defined, Docker + web UI) | [Framework tour](https://github.com/agent0ai/agent-zero#readme) | | 6 | [**OpenHarness (HKUDS)**](https://github.com/HKUDS/OpenHarness) | [13.8k](https://github.com/HKUDS/OpenHarness/stargazers) | Open agent harness with a built-in personal agent ("Ohmo") that runs across Feishu, Slack, Telegram, and Discord; core tool-use, skills, memory, multi-agent coordination with auto-compaction for multi-day sessions. `memory` · `multi-agent` | ✅ | complex (personal agent + multi-channel — product suite) | [harness-eval skill](https://github.com/HKUDS/OpenHarness/blob/main/.claude/skills/harness-eval/SKILL.md) | | 7 | [**AIlice**](https://github.com/myshell-ai/AIlice) | [1.4k](https://github.com/myshell-ai/AIlice/stargazers) | Fully autonomous general-purpose agent; one binary, Docker-ready, for when you want "set goal and walk away" without a framework. `sandbox` · `python` | ✅ | slightly complex (autonomous, one binary) | [Task showcase](https://github.com/myshell-ai/AIlice#cool-things-we-can-do) | ## Frameworks <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _General-purpose agent and LLM application frameworks (the app layer, not harnesses per se)._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**n8n**](https://github.com/n8n-io/n8n) ★ ✱ | [192k](https://github.com/n8n-io/n8n/stargazers) | Fair-code workflow engine with 400+ nodes and native AI nodes; the self-hosted Zapier that actually does agents and LangChain. `workflow` · `local` · `typescript` | ⚠️ Fair-code | complex (400+ nodes, workflow engine — product suite) | [Agent vs chain workflow](https://github.com/n8n-io/n8n-docs/blob/main/docs/advanced-ai/examples/agent-chain-comparison.md) | | 2 | [**AutoGPT**](https://github.com/Significant-Gravitas/AutoGPT) ★ | [185k](https://github.com/Significant-Gravitas/AutoGPT/stargazers) | The original autonomous loop: goal in, agent iterates with tools and memory; Forge is the dev framework, Benchmark the eval harness. `memory` · `evals` · `python` | ⚠️ Polyform-SU | complex (autonomous loop, tools, memory — product suite) | [Medium blogger graph](https://github.com/Significant-Gravitas/AutoGPT/blob/master/autogpt_platform/graph_templates/Medium%20Blogger_v28.json) | | 3 | [**langflow**](https://github.com/langflow-ai/langflow) ★ | [150k](https://github.com/langflow-ai/langflow/stargazers) | Low-code UI to build and deploy LangChain/LangGraph flows; visual DAG editor and one-click run. `low-code` · `python` | ✅ | complex (low-code, visual — product suite) | [Chat with RAG flow](https://github.com/langflow-ai/langflow/blob/main/docs/docs/Tutorials/chat-with-rag.mdx) | | 4 | [**Dify**](https://github.com/langgenius/dify) ★ | [145k](https://github.com/langgenius/dify/stargazers) | One-stop LLM app platform: visual workflows, RAG pipeline, 50+ tools, model management; "ship from prototype to prod" in a single UI. `low-code` · `rag` · `python` | ⚠️ Fair-code | complex (one-stop platform — product suite) | [Customer-service bot](https://github.com/langgenius/dify-docs/blob/main/en/use-dify/tutorials/customer-service-bot.mdx) | | 5 | [**langchain**](https://github.com/langchain-ai/langchain) | [139k](https://github.com/langchain-ai/langchain/stargazers) | Chains, tools, retrievers, and agents; the usual entry point for "add tools to an LLM" in Python/JS. `python` | ✅ | complex (kitchen-sink ecosystem — product suite) | [Build an agent notebook](https://github.com/langchain-ai/langchain-academy/blob/main/module-1/agent.ipynb) | | 6 | [**browser-use**](https://github.com/browser-use/browser-use) | [98.8k](https://github.com/browser-use/browser-use/stargazers) | Python layer over Playwright: natural-language goals become browser actions—web-agent loop without hand-rolling MCP or a custom driver for every site. `mcp` · `browser` · `python` | ✅ | slightly complex (LLM + browser, Playwright) | [Grocery shopping agent](https://github.com/browser-use/browser-use/blob/main/examples/use-cases/shopping.py) | | 7 | [**Flowise**](https://github.com/FlowiseAI/Flowise) ★ | [53.6k](https://github.com/FlowiseAI/Flowise/stargazers) | Drag-and-drop LangChain UI; deploy flows without code. The low-code sibling to Langflow, with a different component and hosting story. `low-code` · `typescript` | ⚠️ Apache+CLA | complex (low-code, drag-drop — product suite) | [Agentic RAG flow](https://github.com/FlowiseAI/Flowise/blob/main/packages/server/marketplaces/agentflowsv2/Agentic%20RAG.json) | | 8 | [**llama-index**](https://github.com/run-llama/llama_index) | [50.1k](https://github.com/run-llama/llama_index/stargazers) | Data-centric: indexing, RAG, and query engines; agent abstractions sit on top of your data pipelines. `rag` · `python` | ✅ | complex (RAG + agents — product suite) | [Research assistant workflow](https://github.com/run-llama/llama_index/blob/main/docs/examples/agent/agent_workflow_research_assistant.ipynb) | | 9 | [**agno**](https://github.com/agno-agi/agno) | [40.7k](https://github.com/agno-agi/agno/stargazers) | Python agents with memory, knowledge bases, tools, and structured outputs; continues the PhiData-era product line under the Agno name—production apps, evals, and pipelines. `memory` · `evals` · `python` | ✅ | complex (memory, KB, observability — product suite) | [Agent with tools](https://github.com/agno-agi/agno/blob/main/cookbook/02_agents/01_quickstart/agent_with_tools.py) | | 10 | [**langgraph**](https://github.com/langchain-ai/langgraph) ★ ✱ | [34.7k](https://github.com/langchain-ai/langgraph/stargazers) | State-machine graphs over LLM steps; checkpointing, human-in-the-loop, and durable execution so workflows survive restarts. `workflow` · `python` | ✅ | slightly complex (graphs, checkpointing, durable exec) | [Customer support agent](https://github.com/langchain-ai/langgraph/blob/main/examples/customer-support/customer-support.ipynb) | | 11 | [**semantic-kernel**](https://github.com/microsoft/semantic-kernel) | [28.1k](https://github.com/microsoft/semantic-kernel/stargazers) | Microsoft's plugin and planner layer for LLMs; C#, Python, Java; strong on enterprise auth and orchestration. `python` | ✅ | complex (enterprise, multi-language — product suite) | [Chat completion agent](https://github.com/microsoft/semantic-kernel/blob/main/python/samples/getting_started_with_agents/chat_completion/step01_chat_completion_agent_simple.py) | | 12 | [**mastra**](https://github.com/mastra-ai/mastra) ✱ | [25.1k](https://github.com/mastra-ai/mastra/stargazers) | TypeScript-first; agents, tools, and workflows with a single runtime and minimal boilerplate. `typed` · `typescript` | ⚠️ Elastic-2.0 | slightly complex (TS-first, minimal boilerplate) | [Durable research agent](https://github.com/mastra-ai/mastra/tree/main/examples/durable-agents) | | 13 | [**letta**](https://github.com/letta-ai/letta) ★ ✱ | [23.3k](https://github.com/letta-ai/letta/stargazers) | Python agent runtime with tool use and control flow; lean API; stateful agents with long-horizon memory. `memory` · `python` | ✅ | mostly simple (lean API) | [Loop .af agent file](https://github.com/letta-ai/agent-file/tree/main/agents/%40letta-ai/loop) | | 14 | [**rasa**](https://github.com/RasaHQ/rasa) ★ | [21.2k](https://github.com/RasaHQ/rasa/stargazers) | Conversational AI stack (NLU, dialogue, actions); long-standing OSS choice for chat and voice bots. `voice` · `python` | ✅ | complex (full stack — product suite) | [Sara conversational demo](https://github.com/RasaHQ/rasa-demo) | | 15 | [**Google ADK**](https://github.com/google/adk-python) ★ | [20.1k](https://github.com/google/adk-python/stargazers) | Google's official Agent Development Kit: code-first Python toolkit for building, evaluating, and deploying agents. Optimized for Gemini but model-agnostic; deploys to Cloud Run / Vertex AI; ships a dev UI with eval and a code-execution sandbox. `evals` · `sandbox` · `python` | ✅ | complex (official Google SDK, eval, deploy — product suite) | [Travel concierge agent](https://github.com/google/adk-samples/tree/main/python/agents/travel-concierge) | | 16 | [**botpress**](https://github.com/botpress/botpress) ★ | [14.7k](https://github.com/botpress/botpress/stargazers) | Visual bot builder and runtime; multi-channel, open-source alternative to commercial bot platforms. `low-code` · `typescript` | ✅ | complex (visual builder, multi-channel — product suite) | [Inter-bot delegation](https://github.com/botpress/v12/tree/master/examples/interbot) | | 17 | [**R2R**](https://github.com/SciPhi-AI/R2R) ★ | [7.9k](https://github.com/SciPhi-AI/R2R/stargazers) | RAG-first: hybrid search, knowledge graphs, multimodal; the framework for "production RAG" when you care more about retrieval than chat UI. `vision` · `rag` · `workflow` · `python` | ✅ | complex (production RAG — product suite) | [hello_r2r RAG example](https://github.com/SciPhi-AI/R2R/blob/main/py/core/examples/hello_r2r.py) | | 18 | [**agent-squad**](https://github.com/2FastLabs/agent-squad) | [7.7k](https://github.com/2FastLabs/agent-squad/stargazers) | AWS-originated orchestrator (now under 2FastLabs): intent classification, streaming, SupervisorAgent; "agent-as-tools" so one agent delegates to a squad. `multi-agent` | ✅ | slightly complex (squad orchestration) | [E-commerce support sim](https://github.com/2FastLabs/agent-squad/tree/main/examples/ecommerce-support-simulator) | | 19 | [**AgentVerse**](https://github.com/OpenBMB/AgentVerse) ★ | [5.1k](https://github.com/OpenBMB/AgentVerse/stargazers) | Task-solving and simulation envs for multi-LLM agents; deploy many agents in custom environments without building infra from scratch. `multi-agent` · `python` | ✅ | complex (simulation envs, multi-agent — product suite) | [NLP classroom sim](https://github.com/OpenBMB/AgentVerse/blob/main/agentverse/tasks/simulation/nlp_classroom_9players/config.yaml) | | 20 | [**Bee Agent Framework**](https://github.com/i-am-bee/beeai-framework) | [3.3k](https://github.com/i-am-bee/beeai-framework/stargazers) | Python + TypeScript, LF AI–backed; MCP/ACP, workflows, Requirement Agent; the one that pushes "production multi-agent" without LangChain. `mcp` · `multi-agent` · `python` · `typescript` | ✅ | complex (production multi-agent — product suite) | [ReAct agent example](https://github.com/i-am-bee/beeai-framework/blob/main/python/examples/agents/react.py) | | 21 | [**AgentStack**](https://github.com/agentstack-ai/AgentStack) | [2.2k](https://github.com/agentstack-ai/AgentStack/stargazers) | Scaffolds full agent projects; plugs in CrewAI, LangGraph, OpenAI Swarm, LlamaStack and wires AgentOps observability from day one. | ✅ | slightly complex (scaffold, multi-backend) | [Research assistant crew](https://github.com/agentstack-ai/AgentStack/tree/main/examples/research_assistant) | | 22 | [**AgentSilex**](https://github.com/howl-anderson/agentsilex) | [451](https://github.com/howl-anderson/agentsilex/stargazers) | ~300 lines of readable agent code on top of LiteLLM; the "I want to see the whole loop" option for learning or minimal production. `python` | ✅ | super simple (~300 LOC) | [Simple weather agent](https://github.com/howl-anderson/agentsilex/blob/main/demo/simple_agent.py) | | 23 | [**SuperAgentX**](https://github.com/superagentxai/superagentx) | [200](https://github.com/superagentxai/superagentx/stargazers) | Lightweight multi-agent orchestrator with an AGI-angle; minimal surface, docs-first, for teams that want orchestration without the kitchen sink. `multi-agent` · `python` | ✅ | mostly simple (minimal surface) | [Parallel marketing agents](https://github.com/superagentxai/superagentx/blob/master/examples/agents/parallel_agents.py) | ## Multi-agent and orchestration <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Harnesses and patterns for multi-agent coordination and handoffs._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**MetaGPT**](https://github.com/FoundationAgents/MetaGPT) ★ | [68.8k](https://github.com/FoundationAgents/MetaGPT/stargazers) | The "AI software company" multi-agent framework: role-played PM, architect, and engineer agents turn a one-line requirement into specs, designs, and code along an SOP assembly line. The landmark of the genre; development pace has slowed in 2026. `multi-agent` · `python` | ✅ | complex (role pipeline, SOPs — product suite) | [Build a customized agent](https://github.com/FoundationAgents/MetaGPT/blob/main/examples/build_customized_agent.py) | | 2 | [**autogen**](https://github.com/microsoft/autogen) | [58.9k](https://github.com/microsoft/autogen/stargazers) | Conversable agents and group chats; code execution and human-in-the-loop; Microsoft origin, AG2 ecosystem. `multi-agent` · `python` | ✅ CC-BY | complex (group chat, code exec, AG2 — product suite) | [Distributed group chat](https://github.com/microsoft/autogen/tree/main/python/samples/core_distributed-group-chat) | | 3 | [**crewAI**](https://github.com/crewAIInc/crewAI) | [53.5k](https://github.com/crewAIInc/crewAI/stargazers) | Role-based agents (roles, goals, backstories) in Crews; Flows add event-driven and hierarchical control for production. `python` | ✅ | complex (roles, Flows, production — product suite) | [Trip planner crew](https://github.com/crewAIInc/crewAI-examples/blob/main/crews/trip_planner/trip_agents.py) | | 4 | [**ChatDev**](https://github.com/OpenBMB/ChatDev) ★ | [33.4k](https://github.com/OpenBMB/ChatDev/stargazers) | Multi-agent software-company simulation (CEO, CTO, programmer, tester) built on chat chains with communicative dehallucination; ChatDev 2.0 continues the line. MetaGPT's conversational sibling. `python` | ✅ | slightly complex (chat-chain simulation) | [Company simulation quickstart](https://github.com/OpenBMB/ChatDev#readme) | | 5 | [**openai-agents-python**](https://github.com/openai/openai-agents-python) | [27.1k](https://github.com/openai/openai-agents-python/stargazers) | Handoffs, guardrails, and multi-LLM routing; minimal surface so you own the loop. `python` | ✅ | mostly simple (minimal surface) | [Airline customer service handoffs](https://github.com/openai/openai-agents-python/blob/main/examples/customer_service/main.py) | | 6 | [**Microsoft Agent Framework**](https://github.com/microsoft/agent-framework) | [11.3k](https://github.com/microsoft/agent-framework/stargazers) | Microsoft's convergence of AutoGen and Semantic Kernel: build, orchestrate, and deploy agents and multi-agent workflows in Python and .NET, with graph-based workflows and checkpointing — the designated successor harness for both lines. `multi-agent` · `workflow` · `python` | ✅ | slightly complex (Python/.NET SDK, graph workflows) | [Python samples](https://github.com/microsoft/agent-framework/tree/main/python/samples) | | 7 | [**PraisonAI**](https://github.com/MervinPraison/PraisonAI) | [8.1k](https://github.com/MervinPraison/PraisonAI/stargazers) | Autonomous multi-agent teams with a single entry point; emphasis on minimal config. `multi-agent` · `python` | ✅ | mostly simple (single entry, minimal config) | [Orchestrator-workers pattern](https://github.com/MervinPraison/PraisonAI/blob/main/examples/python/general/orchestrator-workers.py) | | 8 | [**AgentRL**](https://github.com/THUDM/AgentRL) ★ | [298](https://github.com/THUDM/AgentRL/stargazers) | Multitask, multiturn RL for LLM agents; Ray-based scaling, rollout/actor workers—for teams that want to train agents, not just run them. `training` · `python` | ✅ | complex (RL, Ray, train agents — product suite) | [Async GRPO trainer](https://github.com/THUDM/AgentRL/blob/main/examples/training/async_trainer.py) | ## Plugins, MCPs, CLI tools <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _IDE plugins, concrete MCP servers, and CLI tools that give agents tools and context._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**claude-mem**](https://github.com/thedotmack/claude-mem) | [82.2k](https://github.com/thedotmack/claude-mem/stargazers) | Claude Code plugin that captures everything an agent does during a session, AI-compresses it (via claude-agent-sdk), and injects the relevant context into future sessions—session-to-session memory as a drop-in. `memory` | ✅ | slightly complex (session capture + compression) | [Lifecycle hooks config](https://github.com/thedotmack/claude-mem/blob/main/plugin/hooks/hooks.json) | | 2 | [**aider**](https://github.com/Aider-AI/aider) | [46.2k](https://github.com/Aider-AI/aider/stargazers) | Git-aware CLI pair programmer; edits in-repo, supports multiple models and MCP so agents see version control and tools. `mcp` · `cli` · `python` | ✅ | slightly complex (CLI, git-aware, MCP) | [Repo map source](https://github.com/Aider-AI/aider/blob/main/aider/repomap.py) | | 3 | [**continue**](https://github.com/continuedev/continue) | [33.7k](https://github.com/continuedev/continue/stargazers) | Open-source IDE extension (VS Code, JetBrains); in-editor completion and chat with local or API models. `ide` · `typescript` | ✅ | complex (IDE extension, multi-editor — product suite) | [VS Code extension demos](https://github.com/continuedev/continue/blob/main/extensions/vscode/README.md) | | 4 | [**github-mcp-server**](https://github.com/github/github-mcp-server) | [30.7k](https://github.com/github/github-mcp-server/stargazers) | GitHub's official MCP server (Go): repos, issues, PRs, code search, Actions. Replaces the older community `cyanheads/github-mcp-server` as the canonical way to give agents GitHub access. `mcp` | ✅ | slightly complex (official GitHub MCP) | [Remote server toolsets](https://github.com/github/github-mcp-server/blob/main/docs/remote-server.md) | | 5 | [**MCP Python SDK**](https://github.com/modelcontextprotocol/python-sdk) | [23.3k](https://github.com/modelcontextprotocol/python-sdk/stargazers) | Official SDK to build and consume MCP servers/clients in Python; stdio and SSE transports. `mcp` · `python` | ✅ | mostly simple (SDK only) | [Website fetcher server](https://github.com/modelcontextprotocol/python-sdk/blob/main/examples/servers/simple-tool/mcp_simple_tool/server.py) | | 6 | [**MCP TypeScript SDK**](https://github.com/modelcontextprotocol/typescript-sdk) | [12.7k](https://github.com/modelcontextprotocol/typescript-sdk/stargazers) | Official MCP implementation for Node/TS; reference for the protocol. `mcp` · `typescript` | ✅ | mostly simple (protocol reference) | [Streamable HTTP server](https://github.com/modelcontextprotocol/typescript-sdk/blob/main/examples/server/src/simpleStreamableHttp.ts) | | 7 | [**MCP Inspector**](https://github.com/modelcontextprotocol/inspector) | [10.1k](https://github.com/modelcontextprotocol/inspector/stargazers) | GUI to test and debug MCP servers; inspect tools, resources, and prompts. `mcp` · `typescript` | ✅ | super simple (debug GUI) | [Inspector UI walkthrough](https://github.com/modelcontextprotocol/inspector/blob/main/README.md) | | 8 | [**MCP Registry**](https://github.com/modelcontextprotocol/registry) | [6.9k](https://github.com/modelcontextprotocol/registry/stargazers) | Official, community-driven registry for MCP servers—the "app store" MCP clients use to discover servers. Maintained by Anthropic + ecosystem maintainers; v0.1 API frozen, production-grade. `mcp` | ✅ | slightly complex (official discovery layer) | [Registry seed entries](https://github.com/modelcontextprotocol/registry/blob/main/data/seed.json) | | 9 | [**Docker MCP Gateway**](https://github.com/docker/mcp-gateway) | [1.5k](https://github.com/docker/mcp-gateway/stargazers) | Docker's official MCP CLI plugin / gateway; container-aware MCP tooling from Docker (replaces deprecated `docker/mcp-servers` path). `mcp` · `sandbox` · `cli` | ✅ | slightly complex (Docker-aware MCPs) | [Gateway usage walkthrough](https://github.com/docker/mcp-gateway/blob/main/docs/mcp-gateway.md) | | 10 | [**puppeteer-real-browser-mcp**](https://github.com/withLinda/puppeteer-real-browser-mcp-server) | [23](https://github.com/withLinda/puppeteer-real-browser-mcp-server/stargazers) | Puppeteer MCP with real-browser and anti-detection; for agents that need to drive sites that block headless. `mcp` · `browser` · `typescript` | ❓ | mostly simple (real browser, anti-detect) | [11 anti-detection tools](https://github.com/withLinda/puppeteer-real-browser-mcp-server/blob/main/README.md) | | 11 | [**Better-OpenCodeMCP**](https://github.com/ajhcs/Better-OpenCodeMCP) | [7](https://github.com/ajhcs/Better-OpenCodeMCP/stargazers) | MCP server for OpenCode/Crush: async task execution, model bridging (e.g. Claude→Gemini), process pooling. `mcp` · `typescript` | ✅ | mostly simple (MCP server, model bridging) | [opencode delegate tool](https://github.com/ajhcs/Better-OpenCodeMCP/blob/main/src/tools/opencode.tool.ts) | | 12 | [**agentlog**](https://github.com/RyanAlberts/agentlog) | [0](https://github.com/RyanAlberts/agentlog/stargazers) | Persistent decision memory for any project: `remember`, `recall`, `reflect`. Single-file Python CLI that stores decisions as JSONL and uses Claude or Gemini to retrieve and synthesize patterns—Karpathy's LLM Wiki concept as a CLI. `memory` · `cli` · `python` | ✅ | super simple (one file, three commands) | [Sample decisions.jsonl](https://github.com/RyanAlberts/agentlog/blob/main/example-log/decisions.jsonl) | ## Evaluation and benchmarking harnesses <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Agentic eval systems, reasoning benchmarks, and open agent benchmarks._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**Agent Lightning**](https://github.com/microsoft/agent-lightning) ★ | [17.3k](https://github.com/microsoft/agent-lightning/stargazers) | Microsoft's training-oriented harness: optimization loops for agent behavior—when you need to improve policies over rollouts, not only score a fixed prompt. `evals` · `training` · `python` | ✅ | complex (agent training, Microsoft stack — product suite) | [APO room-booking example](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md) | | 2 | [**SWE-bench**](https://github.com/SWE-bench/SWE-bench) ★ | [5.2k](https://github.com/SWE-bench/SWE-bench/stargazers) | LMs resolve real GitHub issues; Docker harness, instance IDs; standard for code-agent evals. `evals` · `sandbox` · `python` | ✅ | slightly complex (real GitHub issues, standard) | [SWE-bench Verified leaderboard](https://www.swebench.com/verified.html) | | 3 | [**AgentBench**](https://github.com/THUDM/AgentBench) ★ | [3.5k](https://github.com/THUDM/AgentBench/stargazers) | ICLR'24 benchmark: agents across AlfWorld, DB, knowledge graphs, OS, webshop; Docker Compose, function-calling interface. `evals` · `sandbox` · `rag` · `workflow` · `python` | ✅ | complex (multi-env, Docker Compose — product suite) | [AgentBench ICLR'24 paper](https://arxiv.org/abs/2308.03688) | | 4 | [**inspect_ai**](https://github.com/UKGovernmentBEIS/inspect_ai) ★ | [2.2k](https://github.com/UKGovernmentBEIS/inspect_ai/stargazers) | Inspect AI core: composable eval tasks, sandboxes, scorers, and multi-model runs; the framework behind inspect_evals, not just the task bundle. `evals` · `sandbox` · `python` | ✅ | complex (eval framework, AISI stack — product suite) | [Inspect tutorial example](https://inspect.aisi.org.uk/tutorial.html) | | 5 | [**WebArena**](https://github.com/web-arena-x/webarena) ★ | [1.5k](https://github.com/web-arena-x/webarena/stargazers) | Realistic web env (e.g. e‑commerce, CMS, dev tools); 812 tasks; measures end-to-end web agent success. `python` | ✅ | complex (812 tasks, web env — product suite) | [WebArena leaderboard](https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit) | | 6 | [**WebVoyager**](https://github.com/MinorJerry/WebVoyager) ★ | [1.1k](https://github.com/MinorJerry/WebVoyager/stargazers) | End-to-end web agent with LMMs: screenshots + actions on real sites; benchmark on 15 sites, GPT-4V for automatic eval. `evals` · `vision` | ✅ | slightly complex (LMMs, screenshots, 15 sites) | [643 web tasks dataset](https://github.com/MinorJerry/WebVoyager/blob/main/data/WebVoyager_data.jsonl) | | 7 | [**ARC-AGI-2**](https://github.com/arcprize/ARC-AGI-2) | [712](https://github.com/arcprize/ARC-AGI-2/stargazers) | ARC Prize task set: grid-based abstraction/reasoning; public and private splits for generalization. | ✅ | super simple (task set) | [ARC Prize leaderboard](https://arcprize.org/leaderboard) | | 8 | [**SWE-Gym**](https://github.com/SWE-Gym/SWE-Gym) ★ | [689](https://github.com/SWE-Gym/SWE-Gym/stargazers) | Training and evaluation for SWE agents and verifiers (ICML 2025). `evals` · `training` · `python` | ✅ | slightly complex (training + eval, ICML) | [SWE-Gym ICML 2025 paper](https://arxiv.org/abs/2412.21139) | | 9 | [**swe-smith**](https://github.com/SWE-bench/SWE-smith) ★ | [676](https://github.com/SWE-bench/SWE-smith/stargazers) | Data generation for SWE agents; 50k+ instances across 128 repos; used for SWE-agent-LM training. `training` · `python` | ✅ | slightly complex (50k+ instances, data gen) | [SWE-smith trajectories](https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories) | | 10 | [**inspect_evals**](https://github.com/UKGovernmentBEIS/inspect_evals) ★ | [536](https://github.com/UKGovernmentBEIS/inspect_evals/stargazers) | UK AISI/Arcadia/Vector: GAIA and other evals in Inspect AI; level 1–3, sandboxed, tool-calling solvers. `evals` · `sandbox` | ✅ | slightly complex (Inspect AI, UK gov) | [inspect SWE-bench eval](https://github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/swe_bench/README.md) | | 11 | [**arc-agi-benchmarking**](https://github.com/arcprize/arc-agi-benchmarking) ★ | [350](https://github.com/arcprize/arc-agi-benchmarking/stargazers) | Runner for ARC-AGI: multi-provider (OpenAI, Anthropic, Gemini, etc.), rate limits, retries, and scoring. `evals` · `provider-agnostic` · `python` | ✅ | mostly simple (runner, multi-provider) | [o3 prompt example](https://github.com/arcprize/arc-agi-benchmarking/blob/main/docs/examples/prompt_example_o3.md) | | 12 | [**VitaBench**](https://github.com/meituan-longcat/vitabench) ★ | [145](https://github.com/meituan-longcat/vitabench/stargazers) | ICLR'26: 66 tools, real-world apps (delivery, travel, retail); 100 cross-scenario + 300 single-scenario tasks; adopted by Qwen/Seed. | ✅ | complex (66 tools, cross-scenario — product suite) | [VitaBench paper](https://arxiv.org/abs/2509.26490) | | 13 | [**AgencyBench**](https://github.com/GAIR-NLP/AgencyBench) ★ | [87](https://github.com/GAIR-NLP/AgencyBench/stargazers) | Long-horizon agent benchmark: 32 scenarios, 138 tasks, ~1M tokens and ~90 tool calls; Docker sandbox and rubric-based + LLM judges. `evals` · `sandbox` · `python` | ✅ | complex (32 scenarios, Docker, judges — product suite) | [AgencyBench leaderboard](https://github.com/GAIR-NLP/AgencyBench#leaderboard) | | 14 | [**letta-evals**](https://github.com/letta-ai/letta-evals) ★ | [72](https://github.com/letta-ai/letta-evals/stargazers) | Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship. `memory` · `python` | ✅ | mostly simple (Letta-specific harness) | [LoCoMo memory benchmark](https://github.com/letta-ai/letta-leaderboard/blob/main/leaderboard/locomo/locomo_benchmark.py) | | 15 | [**SUPER**](https://github.com/allenai/super-benchmark) ★ | [53](https://github.com/allenai/super-benchmark/stargazers) | Agents that set up and run ML/NLP from GitHub repos; 45 expert problems, 152 masked tasks, 602 AutoGen tasks; Docker-based. `sandbox` · `python` | ✅ | slightly complex (ML/NLP repos, Docker) | [SUPER EMNLP paper](https://arxiv.org/abs/2409.07440) | | 16 | [**TRAIL**](https://github.com/patronus-ai/trail-benchmark) | [18](https://github.com/patronus-ai/trail-benchmark/stargazers) | Trace reasoning and agentic issue localization; 148 long-context traces, 841 errors, 20+ error types; Hugging Face dataset. | ✅ | mostly simple (traces, Hugging Face) | [TRAIL dataset card](https://huggingface.co/datasets/PatronusAI/TRAIL) | ## Research and task-specific harnesses <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Deep research, document QA, and domain-specific agent loops._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**gpt-researcher**](https://github.com/assafelovic/gpt-researcher) | [27.7k](https://github.com/assafelovic/gpt-researcher/stargazers) | Autonomous deep-research agent: web + local sources, citation-grounded reports, multi-agent and deep-research modes. The reference open-source research harness. `multi-agent` · `python` | ✅ | complex (deep research, multi-agent — product suite) | [Multi-agent LangGraph walkthrough](https://github.com/assafelovic/gpt-researcher/blob/master/docs/blog/2024-05-19-gptr-langgraph/index.md) | | 2 | [**openagents**](https://github.com/OpenAgentsInc/openagents) ★ | [424](https://github.com/OpenAgentsInc/openagents/stargazers) | Platform for autonomous agents and autopilot-style workflows; decentralized/Nostr-oriented (Pylon runtime, actively shipped in 2026). | ✅ | complex (platform, decentralized — product suite) | [Production earning proof](https://github.com/OpenAgentsInc/openagents/blob/main/docs/reports/nexus/2026-04-23-autopilot-pylon-production-earning-proof.md) | ## Libraries and SDKs <a href="#contents"><img align="right" width="15" height="15" src="https://git.io/JtehR" alt="Back to top"></a> _Lightweight runtimes, tool loops, and provider-agnostic harness primitives._ | # | Project | ⭐ Stars | Description | Open source | Simplicity ↔ capability | Examples | |---|---------|---------|-------------|-------------|-------------------------|----------| | 1 | [**Daytona**](https://github.com/daytonaio/daytona) | [72.4k](https://github.com/daytonaio/daytona/stargazers) | Elastic dev environments for AI-generated code: workspaces, Git, previews—infra harness between "the model wrote a patch" and "it ran in a real machine." `sandbox` | ✅ | slightly complex (dev env API, isolation) | [Charts in sandbox](https://github.com/daytonaio/daytona/tree/main/examples/python/charts) | | 2 | [**Mem0**](https://github.com/mem0ai/mem0) | [58.5k](https://github.com/mem0ai/mem0/stargazers) | Universal memory layer for AI agents: stores user/org/session memory, retrieves on demand. Apache-2.0; the de-facto memory primitive paired with most harnesses in 2026. `memory` · `python` | ✅ | slightly complex (memory layer, multi-platform) | [Next.js memory demo](https://github.com/mem0ai/mem0/tree/main/examples/mem0-demo) | | 3 | [**LiteLLM**](https://github.com/BerriAI/litellm) | [50.3k](https://github.com/BerriAI/litellm/stargazers) | One interface to 100+ LLMs; routing, caching, budgets. Not an agent framework—the pipe every agent framework uses. `provider-agnostic` · `python` | ✅ | mostly simple (LLM pipe only) | [Anthropic Agent SDK gateway](https://github.com/BerriAI/litellm/blob/main/cookbook/anthropic_agent_sdk/main.py) | | 4 | [**Composio**](https://github.com/ComposioHQ/composio) | [28.8k](https://github.com/ComposioHQ/composio/stargazers) | 1,000+ toolkits with auth, tool search, and a sandboxed workbench—drop-in tool layer so agents stop reinventing OAuth + integrations. Python and TypeScript. `sandbox` · `tool-discovery` · `python` · `typescript` | ✅ | complex (1k+ tools, auth, search — product suite) | [HackerNews agent quickstart](https://github.com/ComposioHQ/composio#quick-start) | | 5 | [**smolagents**](https://github.com/huggingface/smolagents) | [27.8k](https://github.com/huggingface/smolagents/stargazers) | Code-as-action agents: model outputs Python executed in sandbox (E2B, Modal, etc.); ~1k LOC core. `sandbox` · `python` | ✅ | mostly simple (code-as-action, ~1k LOC) | [RAG code agent](https://github.com/huggingface/smolagents/blob/main/examples/rag.py) | | 6 | [**vercel/ai**](https://github.com/vercel/ai) | [24.9k](https://github.com/vercel/ai/stargazers) | React and Node SDK for streaming, tool calls, and agent-style UIs; provider-agnostic. `provider-agnostic` · `typescript` | ✅ | slightly complex (React/Node SDK, provider-agnostic) | [Next.js agent example](https://github.com/vercel/ai/tree/main/examples/next-agent) | | 7 | [**deepagents**](https://github.com/langchain-ai/deepagents) ✱ | [24.6k](https://github.com/langchain-ai/deepagents/stargazers) | LangChain's Python+TypeScript agent harness on top of LangGraph: planning tool, virtual filesystem, shell sandbox, sub-agent spawning—the "Claude Code-style" harness as a reusable library. `multi-agent` · `sandbox` · `python` · `typescript` | ✅ | slightly complex (planning, files, sub-agents) | [Deep research agent](https://github.com/langchain-ai/deepagents/tree/main/examples/deep_research) | | 8 | [**pydantic-ai**](https://github.com/pydantic/pydantic-ai) ✱ | [17.7k](https://github.com/pydantic/pydantic-ai/stargazers) | Type-safe Python agents with Pydantic I/O; multi-provider, MCP, Logfire observability, and human-in-the-loop. `mcp` · `typed` · `provider-agnostic` · `python` | ✅ | slightly complex (type-safe, MCP, Logfire) | [Bank support agent](https://github.com/pydantic/pydantic-ai/blob/main/examples/pydantic_ai_examples/bank_support.py) | | 9 | [**E2B**](https://github.com/e2b-dev/E2B) | [12.6k](https://github.com/e2b-dev/E2B/stargazers) | Firecracker sandboxes for executing agent-generated code; the hosted isolation layer many tool-calling demos use instead of running arbitrary LLM output on your laptop. `sandbox` · `python` | ✅ | slightly complex (sandbox API, code execution) | [Claude Code in sandbox](https://github.com/e2b-dev/e2b-cookbook/tree/main/examples/anthropic-claude-code-in-sandbox-python) | | 10 | [**strands-agents**](https://github.com/strands-agents/harness-sdk) | [6.1k](https://github.com/strands-agents/harness-sdk/stargazers) | Model-driven Python SDK; decorators for tools, native MCP, multi-agent; "minimal code" without sacrificing provider choice. `mcp` · `multi-agent` · `typed` · `python` | ✅ | mostly simple (decorators, MCP, minimal code) | [First agent tutorial](https://github.com/strands-agents/samples/tree/main/python/01-learn/01-first-agent) | | 11 | [**Cloudflare Agents**](https://github.com/cloudflare/agents) ★ ✱ | [5.1k](https://github.com/cloudflare/agents/stargazers) | Persistent, stateful agents on Durable Objects: state, websockets, scheduling, and AI chat baked in. The serverless answer to "where does the agent live?" `memory` · `typescript` | ✅ | slightly complex (Durable Objects, stateful) | [SDK playground app](https://github.com/cloudflare/agents/tree/main/examples/playground) | | 12 | [**openai-agents-js**](https://github.com/openai/openai-agents-js) | [3.2k](https://github.com/openai/openai-agents-js/stargazers) | Official OpenAI Agents SDK for Node/TS: handoffs, guardrails, voice; the JS counterpart to openai-agents-python. `multi-agent` · `voice` · `typescript` | ✅ | slightly complex (handoffs, guardrails, voice) | [Financial research agent](https://github.com/openai/openai-agents-js/tree/main/examples/financial-research-agent) | | 13 | [**open-harness**](https://github.com/MaxGfeller/open-harness) | [566](https://github.com/MaxGfeller/open-harness/stargazers) | TypeScript Agent class on Vercel AI SDK; streaming events, filesystem/bash tools, MCP, and subagent delegation. `mcp` · `multi-agent` · `typescript` | ✅ | slightly complex (streaming, tools, subagents) | [Terminal CLI agent](https://github.com/MaxGfeller/open-harness/tree/main/examples/cli) | | 14 | [**Community-curated agent lists**](https://github.com/brandonhimpfen/awesome-ai-agents) | [11](https://github.com/brandonhimpfen/awesome-ai-agents/stargazers) | Broader directories: e.g. [brandonhimpfen/awesome-ai-agents](https://github.com/brandonhimpfen/awesome-ai-agents), [axioma-ai-labs/awesome-ai-agent-frameworks](https://github.com/axioma-ai-labs/awesome-ai-agent-frameworks), [mb-mal/awesome-ai-agents-frameworks](https://github.com/mb-mal/awesome-ai-agents-frameworks)—differ by scope and update cadence. | ❓ | super simple (curated lists) | [Frameworks section](https://github.com/brandonhimpfen/awesome-ai-agents#frameworks) | --- ## Related Resources - [**Awesome**](https://github.com/sindresorhus/awesome): Awesome lists on many topics - [**OpenAI – Harness engineering**](https://openai.com/index/harness-engineering/): Environment design, intent, feedback loops, repo-as-system-of-record - [**Anthropic – Effective harnesses for long-running agents**](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents): Session bridging, feature lists, incremental progress, testing - [**Aakash Gupta (Medium) – 2026 is agent harnesses**](https://aakashgupta.medium.com/2025-was-agents-2026-is-agent-harnesses-heres-why-that-changes-everything-073e9877655e): Harness as moat, minimal intervention, progressive disclosure - [**LangChain**](https://python.langchain.com/), [**Anthropic**](https://docs.anthropic.com/), [**OpenAI**](https://platform.openai.com/docs): Official docs for major agent platforms ## Contribution Contributions are welcome. To add or suggest projects: - Open an [issue](https://github.com/RyanAlberts/best-of-Agent-Harnesses/issues) with the repo URL, category, and a short description. - Or submit a [pull request](https://github.com/RyanAlberts/best-of-Agent-Harnesses/pulls) editing [projects.yaml](https://github.com/RyanAlberts/best-of-Agent-Harnesses/blob/main/projects.yaml) (and optionally README.md). For contribution guidelines, see [CONTRIBUTING.md](https://github.com/RyanAlberts/best-of-Agent-Harnesses/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/RyanAlberts/best-of-Agent-Harnesses/blob/main/.github/CODE_OF_CONDUCT.md). ### Show your listing If your project is in this list, you're welcome to show it in your README: [![Best of Agent Harnesses](https://img.shields.io/badge/%F0%9F%8F%86_Best_of-Agent_Harnesses-5ac4bf)](https://github.com/RyanAlberts/best-of-Agent-Harnesses) ```md [![Best of Agent Harnesses](https://img.shields.io/badge/%F0%9F%8F%86_Best_of-Agent_Harnesses-5ac4bf)](https://github.com/RyanAlberts/best-of-Agent-Harnesses) ``` ## License [![CC BY-SA 4.0](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by-sa.svg)](https://creativecommons.org/licenses/by-sa/4.0/)

best-of-Agent-Harnesses

Content

Connection Info

You Might Also Like

everything-claude-code

markitdown

firecrawl

cc-switch

servers

servers

best-of-Agent-Harnesses

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

everything-claude-code

markitdown

firecrawl

cc-switch

servers

servers