Content
# Agent Architecture Overview: From Prompt to Context Engineering for Building AI Agents
Over the past two to three years, I have conducted Agent development training for senior developers at several companies; and in the past month, I have been designing and training AI Agents for graduates. It wasn't until this week, after the completion of an AI-powered simulation project showcase, that I truly clarified how to systematically construct a learning path for Agent development for developers at different stages. This process has also made me deeply aware of the existence of the "curse of knowledge" – knowledge that I take for granted may be the biggest obstacle for beginners.
We can simply divide the learning process into four parts:
- Structured Prompt Engineering - How to engineer efficient and reusable prompts.
- Context Engineering and Knowledge Retrieval - Knowledge retrieval, generation, and compression of context information to produce high-quality knowledge background.
- Systematic Design of Tool Functions - Design and implement tools and interfaces that can be called by the Agent.
- Agent Planning and Multi-Agent - Construct task planning and execution paths to achieve closed-loop automation.
Before we begin, we should simply define AI Agents, considering that there will be many different definitions of Agent => You can refer to the real-world examples given by Anthropic in "Building effective agents":
> Some customers define them as fully autonomous systems that can operate independently for long periods of time and use a variety of tools to complete complex tasks; others use it to describe systems that follow predefined workflows.
Therefore, a simple task around a prompt can also be considered an AI Agent; and a task around a complex system, multiple tools, and multiple steps is also an AI Agent.
## Structured Prompt Engineering
Although Context Engineering is a very popular term, writing good prompts is still the focus of our entry. There is already a lot of content related to prompts online, but from my experience, we can focus on three parts:
- Structuring prompt input and output
- Chain and modular design of complex problems
- Prompt routing and distribution tasks
Combined with some necessary AI frameworks or tools, we can complete our tasks very well.
### Structuring Prompt Input and Output
In the current process of developing Agents, although the model can generate some prompts, adjusting prompts is still the focus of the work. We hope that the content output by the model can be JSON, XML, or Java classes, so that it can be used together with other code.
> Prompts are the art and science of input design used to guide AI models to generate **specific outputs**. By carefully designing and wording the input, the model's response direction and results can be effectively influenced and controlled, so that the AI generates the expected output.
We can directly look at the Structured Output Converter in the Spring AI documentation as an example:

The yellow parts in the figure are the two cores:
**Formatted Input Instructions**
Generally speaking, we need structured prompt templates to dynamically generate prompts, and use structured text to design input:
- Dynamic Prompt Template (PromptTemplate). It adopts a classic template engine to dynamically combine context, such as Jinja2 in LangChain, and StringTemplate in Spring AI. This method allows injecting context, user input, system status and other information at runtime to achieve flexible Prompt construction.
- Structured text structure. In order to ensure the reliability and parsability of AI output, the prompt needs to be structured, including role positioning (Role), task description (Task), constraints (Constraints), output format, etc.
- Example-driven. Providing example inputs (Few-shots) and expected outputs can significantly improve the stability and consistency of model output. For example, when implementing QA, implementation examples for different scenarios will be given.
**Convert Model Output Results**
That is, adopt appropriate output formats for different scenarios, and implement corresponding parsing implementations and **exception handling**.
- Domain-specific output formats. We will adopt different designs such as JSON, XML, YAML or Markdown based on different scenarios to display in a way that provides a better user experience. For example: The advantage of JSON is that it can be directly serialized and transmitted, but the real-time rendering experience is poor and not robust enough. YAML can better handle streaming problems and has lower transmission costs.
- Parsing implementation. Parse code blocks from plain text, and then perform deserialization and object mapping. Use Schema validation (JSON Schema, XSD) to ensure that the model output field types and structures meet the conventions.
- Exception handling. Due to the uncertainty of model generation, the output may have missing, incorrect types, or non-compliant formats. For example: when fields are missing, use default values or fallback strategies, which can trigger the model to retry generating specific fields.
When the ability is appropriate, you can fine-tune/train the model based on existing data and other information to improve its ability in this area.
### Prompt Routing and Distribution Tasks
In complex AI systems, especially in scenarios where multiple Agents or multiple modules collaborate, a single prompt often cannot complete all tasks. So we need prompt routing:

> Prompt Routing is an engineering pattern in multi-task, multi-Agent, or complex AI processes that splits tasks, analyzes input, and intelligently assigns them to the most appropriate model or sub-task prompt.
The core idea is to dynamically determine the information processing path, which prompt to use, or which tool or sub-Agent to call by analyzing the input and context, so as to achieve non-linear, conditional task execution. Take a typical QA scenario as an example:
- Non-system related questions → Directly tell the user that this type of question is not supported
- Basic knowledge questions → Call document retrieval and QA model
- Complex analysis questions → Call data analysis tools and then generate a summary
- ……
Through prompt routing, the system can intelligently select the most appropriate processing method according to the question type, while maintaining modularity and scalability. In some AI frameworks, such as RouterChain in LangChain, similar capabilities can be provided, and there are also methods such as [Routing by semantic similarity](https://python.langchain.com/docs/how_to/routing/#routing-by-semantic-similarity).
### Chain and Modular Design of Complex Problems
With prompt routing as a prerequisite, complex problems can be systematically broken down through **Prompt Chaining**. Prompt chains allow a large task to be broken down into multiple sub-tasks, each corresponding to different prompts or model calls, and the results are finally integrated. This method is more suitable for fixed processes, and some steps can be skipped.

This can achieve better modular design:
- Each sub-task focuses on processing specific stage tasks
- You can rewrite a sub-task as needed, adding or replacing prompts
- Dynamically adjust subsequent prompts based on the output of the previous stage
Taking common software requirements as an example, the ideas proposed by the product manager can be broken down into:

1. Idea collection: Collect product ideas and preliminary requirements
2. Requirement logic sorting: Sort out requirement logic and function priorities
3. Requirement pre-scheduling: Form a preliminary requirement document or task list
4. Requirement finalization: Confirm the final requirements and generate formal documents
Each link can be processed by different prompts or sub-Agents. For example, idea collection can be done with the help of an AI Agent with search capabilities, and requirement logic sorting can be done with tools such as Dify and Copilot 365. Finally, each link is executed according to a chain process, while maintaining the flexibility of modular design, and sub-tasks can be adjusted or replaced at any time as needed.
## Context Engineering and Knowledge Retrieval
Generally speaking, we have NoCode and ProCode to support Agent development with context.
- NoCode solution (suitable for rapid verification): Use low-code platforms (such as Dify, N8N, Coze, etc.) and pre-configured RAG pipelines to quickly configure retrieval strategies through the UI.
- ProCode solution (suitable for customized needs): Use frameworks (LangChain, Spring AI) to customize retrieval processes and optimization strategies, which can realize multi-stage HyDE + hybrid retrieval + reordering pipelines.

The context itself is also part of the prompt. Before various automations are implemented, we usually manually copy it from the document to the AI chat tool. It's just that as we enter the deep water area of the model, we need to start thinking about automated construction methods, that is, thinking about this problem from an engineering perspective. Before we begin, we still need to define AI Agents. Here we can quote the definition given by Anthropic's official "Effective context engineering for AI agents" (because it also has science and art):
> Context engineering is the art and science of carefully selecting the most relevant content from the ever-changing universe of information and placing it in a limited context window.
### Context Window
Simply put: focus on selecting the most critical information in a limited context window to make model understanding and reasoning more efficient. The following are 6 common context engineering techniques summarized by [Drew Breunig](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html) [drawn by Langchain](https://github.com/langchain-ai/how_to_fix_your_context):

Here, I will briefly describe it as: RAG and the engineering of the context window. The content of a complete context window (i.e. prompts) should usually include:
- System prompt section:
- Input instruction context: Tell it "who you are" and "what you want to do", including system prompts, user input, and role definitions.
- Formatted output context: Specify the structured pattern of the model output format, such as requiring it to be returned in JSON format to ensure the availability of the output.
- Function call section:
- Tool-related context: This gives the model the ability to interact with the outside world. It includes the definitions of available tools or functions, as well as the responses returned after calling these tools.
- Dynamic context section:
- Time and memory context: Short-Term Memory, Long-Term Memory.
- External knowledge context: Facts retrieved from external information repositories such as documents and databases to reduce the model's "nonsense" errors.
- Global state/staging area: Temporary storage for the model to process complex tasks, which is equivalent to its "working memory".
- External knowledge: Information retrieved from external knowledge bases (such as documents and databases) through technologies such as Retrieval-Augmented Generation (RAG) to provide the model with factual basis and reduce hallucinations.
In addition to the fixed system prompt section, **the acquisition of external knowledge** and **memory** will maximize the impact on the entire window, so the design and optimization of the two are the top priorities of context engineering.
### Knowledge Retrieval Augmented Generation

> RAG (Retrieval-Augmented Generation) is one of the core technologies for building Agents. It enhances the generation capabilities of large language models by retrieving relevant information from external knowledge bases. In complex scenarios such as code base question answering, simple vector retrieval is often not accurate enough, and multiple retrieval strategies need to be combined to improve accuracy.
Simply put, it is to enrich the context through search. According to the implementation complexity and scenario requirements, we can divide the retrieval strategies into the following categories:
- **Keyword Search**. The most basic retrieval method, suitable for precise matching scenarios. For example, when searching for specific function names, class names, or variable names in a code base, keyword retrieval is often more effective than semantic retrieval. Common implementations include:
- **Full-text search**: Use search engines such as Elasticsearch and Solr, and adopt algorithms such as BM25 and TF-IDF.
- **Regular expression matching**: Tools such as ripgrep and grep. Cursor uses a hybrid method of ripgrep + vector retrieval
- **Semantic Search**. Understand the semantic meaning of the query through vector embeddings, rather than just literal matching. This is especially important for natural language queries:
- Use pre-trained embedding models (such as OpenAI text-embedding-3-large, Jina embeddings v3) to convert text into vectors
- Calculate the similarity between the query and the document in the vector space (usually using cosine similarity or dot product)
- **Graph-based Search**. Graph retrieval not only focuses on "content similarity", but also focuses on relationships and contextual dependencies.
- Code scenario: Construct code call relationship graph, dependency relationship graph, and use AST (abstract syntax tree) to extract structures such as methods, classes, and constructors
- Examples: Microsoft's [GraphRAG](https://github.com/microsoft/graphrag), Aider's repomap, or infrastructures such as Joern and CodeQL
Before retrieval, in order to ensure that the generated retrieval results are reliable, it is necessary to introduce **Query Rewriting**, that is, gradually transform the user's vague intentions into precise queries that the database can efficiently execute. Modify the user's original query to improve its relevance to the documents in the knowledge base, and solve the "impedance mismatch" between natural language questions and stored data blocks.
#### RAG Example in Code Scenario
Generally speaking, multiple different retrieval strategies can be used in combination to improve the retrieval effect. The following is a [Codebase RAG implementation](https://blog.lancedb.com/rag-codebase-1/) officially given by the vector database LanceDB:

In addition to using TreeSitter to generate knowledge in the indexing stage, the [retrieval stage](https://blog.lancedb.com/building-rag-on-codebases-part-2/) will also use:
- HyDE (Hypothetical Document Embedding): First, let the model generate a "hypothetical" document or code snippet based on the query, and then use the generated content for vector search, which makes it easier to find semantically related code.
- BM25 (Keyword Search): A traditional keyword search algorithm, which is good at finding code containing precise terms or API names, and can also be used in conjunction with vector search.
- Hybrid Search: Combine BM25 and semantic search, which can accurately match keywords and understand code semantics, and obtain better results by adjusting the weights of the two.
- Re-ranking: After the vector search obtains preliminary results, use the cross-attention mechanism to reorder the results to improve the relevance and accuracy of the final answer.
Of course, in the previous indexing stage, this example will also generate **meta-feature data**, that is, for each element or code snippet, we first generate a text description of the code, and then perform vector embedding on the description to obtain all the meta-features of the code, and its features are extracted by the fine-tuned LLM.
### Engineering of Context Window

Two years ago, the context system built by [GitHub Copilot](https://code.visualstudio.com/docs/copilot/chat/prompt-crafting) for completion was the most worthy of research in the industry (there is no other):
- Continuous signal monitoring. The Copilot plug-in continuously monitors a series of signals from the IDE to dynamically adjust the priority of the context. Such as inserting or deleting characters, changes in the currently edited file and language, cursor movement, scrolling position changes, and file opening and closing.
- [Priority sorting](https://github.com/mengjian-github/copilot-analysis) of context sources. In the final prompt sent to the model, it will be sorted and filtered according to the optimization level:
- Highest priority: The code around the cursor position, including the content before and after the cursor, is the most direct context.
- High priority: The rest of the file currently being edited.
- Medium priority: Other files or tabs opened in the IDE (i.e. "neighboring files").
- Auxiliary context: Other information is also taken into consideration, including file paths, repository URLs, import statements in the code, and code information retrieved by RAG.
- Prompt assembly under context length constraints. "Score" each piece of information according to the above priorities, and then assemble an optimal prompt.
This can provide us with a very good reference:
- Freshness first. Recently edited or accessed content gets higher priority, and the weight of outdated content gradually decays.
- Signal fusion and dynamic scoring. Fuse multiple editing signals (such as cursor movement, file switching, import changes, etc.) to dynamically adjust context weights.
- Sliding window and incremental update. Adopt a sliding window mechanism to only incrementally update the changed parts to avoid full reconstruction.
- Budget awareness and automatic truncation. Real-time estimate of token usage, automatically crop or summarize low-priority content when approaching the limit.
Of course, this is a very complex design, and it is only worth adopting such a design in systems with sufficiently high value. Combined with the current popular Cursor Rule/Spec, use AGENTS.md to store key information across sessions in persistent memory (Memory System) to provide long-term background information for subsequent queries.
### Agentic Retrieval

> Agentic refers to a characteristic that enables AI systems to have autonomous perception, dynamic decision-making, and goal-oriented execution capabilities, enabling them to actively optimize context, generate retrieval strategies, and continuously self-iterate during the task process.
In the field of AI Coding, such as Cursor, Claude Code, etc., we can observe the running process, which is essentially Agent to execute RAG. Compared with ordinary RAG, it is easier to obtain rich context, thereby ensuring that the context is not missing throughout the process. We can see examples of some existing relatively mature AI applications:
- Cursor will optimize the use of `file + ripgrep` to directly retrieve code. When there are not enough results, it will call vectorized inspection or Git history and other related retrievals
- Google DeepResearch also completes a research in a similar process during the generation process: identification of mainstream context engineering tools, preliminary understanding of tool functions and differences, and next steps: in-depth exploration of tool details
Simply put, for complex retrieval, we can build it into an Agent, and the Agent will judge which retrieval tools and strategies to use. When the context is not enough, continue to use new parameters to call the tools to get enough context.
#### DeepResearch Example
The following is an example of the [Open DeepResearch](https://github.com/langchain-ai/open_deep_research) process built by Langchain AI:

The Deep Research Agent demonstrates a more systematic Agentic retrieval method:
1. Divide the task into a planning stage (Manager Agent) and an execution stage (Execution Agent)
- Manager Agent is responsible for task understanding, sub-task decomposition, and retrieval strategy design
- Execution Agent is responsible for actual search, web page or document crawling, and content parsing
2. During the retrieval process, the Agent will maintain the status of the topic structure, covered sub-problems, and information gaps to determine the next exploration direction
3. User review (HITL mode) can be inserted at key stages to enhance control and accuracy
4. Finally, the Agent will integrate the collected fragmented information into a structured report with source citations
Usually, observing their interaction and thinking process will better help us understand this process. On this basis, we can also see that Agentic Context Engineering further enables LLM to autonomously generate, organize, and iterate context, realizing intelligent and scalable context management, thereby optimizing the retrieval and reasoning efficiency of complex tasks.

That is, optimize how to retrieve based on historical conversations or experience to make the Agent more suitable for the scenario.
## Engineering Design of Agent Tool Systems
In the process of building an Agent, the design of the Tool System is the most engineering-oriented aspect. It determines what the Agent can do, how well it can do it, and whether it can efficiently collaborate with the external world.
A tool can be any API, such as data query (e.g., database access), real-world operations (e.g., sending emails, booking meetings), or interfaces for collaborating with other services. As we mentioned earlier, RAG is also a type of tool in Agentic, and LlamaIndex provides this explicit encapsulation:
- FunctionTool: Can easily encapsulate any Python function into a tool available to the agent.
- QueryEngineTool: Can convert any data query engine (e.g., a vector index) into a tool, enabling the agent to query and reason on it.
This data-centric approach can simplify our understanding of tools.
### Semantic Tools: Function Interfaces Designed for Agents
**Tools** are essentially a class of semantically understandable function interfaces. They not only contain logical execution capabilities but also carry meta-information that allows the model to understand:
- Name: The unique identifier of the tool, usually the function name, such as `getWeather`.
- Description: A natural language description of the tool's function, purpose, and applicable scenarios. This is a crucial aspect because the model mainly relies on this description to determine when and how to use the tool.
- Parameters: An object defining the tool's input parameters, including the name, data type (e.g., string, number), description, and whether each parameter is required.
In terms of execution mechanisms, the two common paradigms are:
- ReAct Framework (Reasoning + Acting): The core of the ReAct paradigm is to have the LLM alternately generate "thoughts" (reasoning traces) and "actions" (tool calls),
thereby forming an explicit think-act-observe loop.
- Direct Function Calling: This is a more structured approach. The LLM determines in a single step of reasoning that the user's query can be best answered by calling one or
more predefined functions. Then, it outputs a structured JSON object, clearly indicating the function name to be called and its parameters.
We need to decide which method to use based on the model's support and the designed interaction and intent.
### Design Principles of Tools
Generally speaking, when building a Coding Agent, we follow the following principles:
- Semantic Clarity: The name, description, and parameter names of the tool must be extremely clear, descriptive, and unambiguous to the LLM. The description field of the tool is regarded as a kind of
AI-oriented "micro-prompt" to be carefully written.
- Stateless **Objective** Functions: Only encapsulate complex technical logic or domain knowledge, and avoid making strategic or subjective decisions.
- Atomicity and Single Responsibility: Each tool should only be responsible for one and only one clearly defined function, that is, performing an atomic operation. If the Agent is used as a tool, it should also follow similar principles and only complete one thing.
- Minimum Permissions: Each tool should only be granted the minimum permissions and capabilities necessary to complete its clearly defined task.
#### Workflow-Based Tool Orchestration: Task Chain Design
It also applies to AI Agents in non-programming fields. Based on the above principles, we can decompose "Plan my trip to Beijing next week" into a set of discrete, single-responsibility tools.
- search_flights(origin: str, destination: str, outbound_date: str, return_date: str): Search for flight information.
- search_hotels(location: str, check_in_date: str, check_out_date: str, adults: int): Search for hotel information.
- get_local_events(query: str, date: str): Get local events or attractions information for a specific date.
- book_cruise(passenger_name: str, itinerary_id: str): Book a cruise itinerary.
- lookup_vacation_packages(query: str): Query vacation packages
The key features of this orchestration method are: strong predictability, clear logic, and easy to model as a visual process (such as DAG) in the platform. It is very suitable for stable processes and tasks with dependencies.
Agents (such as travel, customer service, and data pipeline scenarios).
#### Classification-Based Tool Calling: Dynamic Intent Decision
Such as [Understanding GitHub Copilot’s Internal Architecture](https://iitsnl.com/blog/understanding-github-copilots-internal-architecture/)
The Copilot orchestrator introduced in will determine to call one or more internal tools to complete the task based on the analysis results of the user request by the "Intent Classifier":
- File operations: including `read_file` (read file), `edit_file` (edit file) and `create_file` (create new file), etc., so that Copilot
can directly interact with the user's code base.
- Code execution: Through the `run_in_terminal` tool, Copilot can execute commands in the user's terminal, such as running tests or building scripts.
- Search and analysis: This is one of the most critical tool sets, including traditional `grep_search` (text search), `list_code_usages` (list code references), and the most powerful
`semantic_search` **(semantic search)**.
The key features of this model are: high flexibility and strong scalability, but it relies on a good classification system and semantic matching capabilities. It is more suitable for dynamic scenarios, such as code generation, debugging, and document Q&A.
### Using the MCP Protocol to Build a Composable Tool Network
When the number of tools and Agents continues to grow, we need a mechanism to standardize the description, dynamically register, and cross-Agent call tools.
MCP (Model Context Protocol) is a universal protocol layer designed for this purpose. Through MCP, AI Agents no longer rely on hard-coded interfaces or specific systems, but are able to call tools in a unified format,
Get data or collaborate with other Agents. The core value of MCP lies in standardization, dynamization and composability:
- Standardization: Unified tool call format, so that different Agents can share tool sets.
- Dynamization: Support runtime registration and access to tools, and Agents can choose the most suitable tools according to task requirements.
- Composable: Different Agents and tools can be combined like building blocks to realize the decomposition and collaborative execution of complex tasks.
Combined with the atomic tool function designed earlier, MCP can integrate these tools into a reusable and collaborative tool network, making Agents more flexible and efficient in solving complex problems.
#### Other Tool Networks
In addition, we can also see the emergence of GitHub Copilot Extension or Claude Code Plugin, which also indicates that even with protocols such as MCP and A2A,
The AI Agent ecosystem will not be as unified as we expected. Such as https://github.com/wshobson/agents The project records (2025.10.14):
> A comprehensive production-oriented system consisting of 84 dedicated AI Agents, 15 multi-Agent workflow orchestrators, and 44 development tools organized into 62
> Focused and single-responsibility plugins for Claude Code.
## Agent Planning and Transcending Monolithic Agents
> Agents are software systems that use AI to achieve goals and complete tasks on behalf of users. It exhibits reasoning, planning, and memory capabilities, and has a certain degree of autonomy, and can learn, adapt, and make decisions autonomously. - Google Cloud
Agents are goal-oriented. In order to achieve goals, they usually need to **perceive**-**plan**-**act**, and have memory. Complex AI Agents
The system will include **collaboration**, **self-improvement** and other capabilities. In the previous content, we have introduced several basic capabilities:
- Through **structured prompts and prompt chains**, the Agent has a thinking structure for planning and decision-making;
- Through **context engineering**, the Agent has the ability to "perceive the world" and can capture information from external knowledge and the environment;
- Through the engineering design of the **tool system**, the Agent has the ability to interact with the external world and perform tasks.
Based on this, the further development direction of Agent lies in:
- **Collaboration** —— Multiple Agents collaborate through A2A (Agent-to-Agent) communication protocols or task allocation mechanisms to achieve role division and information sharing;
- **Self-improvement** —— The Agent accumulates experience through the memory system and reflection mechanism, optimizes its own prompt words and planning strategies, so as to have continuous learning and self-evolution capabilities.
And because this is a rapidly developing field,
### Modular System Prompts: Agent's Thinking Blueprint
The first step in building an effective Agent is to define its "thinking blueprint" - the System Prompt. An excellent system prompt design not only defines what the Agent
should do, but also clarifies what should not be done. In the Coding Agent field, the system prompt of an Agent is often extremely complex. For example [Cursor](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Cursor%20Prompts/Agent%20Prompt%202025-09-03.txt)
The system prompt contains detailed specifications on roles, tool calls, security boundaries, task planning, etc.
Combining tools such as Cursor, Claude Code, Augment, and Junie, we can summarize a series of modular design practices:
- **Structural Layering and Modularization**: Organize prompts with clear layers (role/communication/tool/security/task) to avoid "unified" text, which is convenient for maintenance and dynamic loading.
- **Tool Priority and Parallelization**: Prioritize dedicated tools and parallelize them if possible, which significantly reduces latency and costs (such as calling `read_file` in parallel to read multiple files, and using `search_replace` instead of sed for editing).
- **Security Boundaries and Permission Model**: The default sandbox has minimum permissions, and dangerous operations require explicit authorization (such as `required_permissions: ["network"|"git_write"|"all"]`), and high-risk actions such as prohibiting strong pushes to `main/master` are prohibited.
- **Task Management Minimum Sufficiency**: Use TODO management for multi-step complex tasks (the first one is marked as in_progress after creation, and completed immediately after completion), and execute simple and direct tasks immediately.
- **Context Uniqueness and Safe Modification**: Code editing requires a uniquely locatable context (`old_string` is unique in the file, about 3–5 lines before and after), and multiple modifications are performed in stages to avoid accidental modifications.
- **Communication Specifications and User Experience**: Hide internal tool names, use natural language "say-do-summarize" to keep it concise and scannable; use backticks to mark file/function names, and provide the minimum available example when necessary
This design, which evolves from monolithic prompts to modular, hierarchical, and dynamic prompts, is like the transformation from monolithic applications to microservice architectures, providing structural support for Agent's advanced reasoning, system scalability, and maintainability.
### From Retrieval to Planning: Use Prompt to Let Agent Decompose Goals
It is not enough to just tell the Agent to "make a plan". We must guide its decomposition process through a clear set of principles, just like formulating specifications for software modules.
The intelligence upper limit of a monolithic Agent often depends on its "planning ability" - whether it can decompose vague goals into clear and executable subtasks.
This involves two core strategies:
- **Pre-decomposition**: This strategy is also known as static planning. It completely decomposes the entire complex task into a sequence of subtasks or plans before the task execution begins.
- **Interleaved Decomposition**: This strategy is also known as dynamic planning. It does not formulate a complete plan at the beginning of the task, but dynamically determines the next subtask during the execution process.
For example, the architecture of BabyAGI embodies this "task-driven" planning: It contains three core Agents ——
task_creation_agent (task generation), execution_agent (task execution) and prioritization_agent (task prioritization), forming a continuously circulating task update and execution system.
In modern systems (such as Augment, Claude Code), the planning logic is often embedded in the system prompt in the form of todo_spec, which has the following characteristics:
- **Atomicity and Action-Oriented**: Each to-do item should be an independent and indivisible "atomic" task
- **Meaningful Abstraction Level**: To-do items should not be trivial operational actions (such as "read file a.txt" or "fix linter errors"), but high-level, meaningful, and non-trivial tasks.
- **Appropriate Scope**: Specifications tend to be "fewer, larger to-do items" rather than a lengthy list of tiny steps.
- **Implementation-Centric**: If the user's request is to implement a certain function, then the list of to-do items generated by the Agent is the final plan.
Through this structured planning, the Agent can transform "user needs" into "system plans", laying a semantic interface for multi-Agent collaboration.
### Multi-Agent Collaboration System: From Individuals to Organizations

The capabilities of a monolithic Agent are limited, and the Multi-Agent System (MAS) is an engineering direction suitable for the development of intelligent agent systems.
Just as the microservice system realizes high cohesion and low coupling by disassembling monolithic applications, the multi-Agent system realizes the horizontal expansion of intelligence by splitting the responsibilities of intelligent agents.
By allowing multiple Agents to achieve more complex goals through collaboration, it can also be similar to the collaborative work of a "team" in software development.
Common collaboration topologies (refer to [LangGraph](https://langchain-ai.github.io/langgraph/concepts/multi_agent/), AutoGen, etc.):
- Supervisor-Expert Mode (Hierarchical Structure): A "Supervisor Agent" or "Coordinator Agent" is responsible for receiving high-level user goals, decomposing them into a series of subtasks, and then assigning them to the corresponding "Expert Agent" according to the nature of each subtask.
- Parallel Mode (Swarm Intelligence): Also known as "concurrent mode" or "swarm mode". Multiple Agents execute the same task or different parts of the task simultaneously and independently, and then synthesize their outputs.
- Sequential Mode (Pipeline): Agents work in a predefined order like on a pipeline. The output of the previous Agent becomes the input of the next Agent.
- Network Mode (Conversational/Dynamic Mode): Agents can communicate freely in a many-to-many network without a fixed hierarchical structure. The next acting Agent is usually dynamically determined according to the flow of the conversation.
The choice of multi-Agent topology directly reflects the underlying structure of the problem to be solved. The architecture is not arbitrarily chosen, but attempts to create a "cognitive model" that can mirror the problem dependency graph.
Of course, it will inevitably encounter various problems similar to the complexity in the microservice architecture.
#### A2A Protocol: Building an Agent Network to Accelerate Intelligent Capability Sharing
2A is designed specifically for Agent-to-Agent communication and complements other standards such as the Model Context Protocol (MCP) that handles Agent-to-Tool communication. It plays the role of a public internet protocol, allowing different
Agent systems to connect and interoperate with each other.
Although, we do not necessarily need to introduce the A2A architecture. For example, the mechanism we implemented in AutoDev is to expose the Agent of the A2A protocol to the Agent in the form of MCP tools.
Without adding system complexity, the collaboration between Agent and Agent is realized.
#### Self-Improvement: Reflection, Memory, and Evaluation Closed Loop
The real power of an evolving Agent comes from the close integration of reflection loops and persistent memory systems.
- Reflection Mechanism: The Agent reviews its own output, identifies errors and generates improvement suggestions;
- Memory Storage: Persist task experience and context (such as `AGENTS.md`, Knowledge Graph) to provide long-term reference for subsequent tasks.
For memory, there should be a mechanism for weighted retrieval of memories based on recency, relevance, and importance, as well as a reflective memory management system that can autonomously decide what to remember, what to forget, and how to organize information.
> The ultimate goal of an advanced Agent architecture is to create a self-reinforcing flywheel: actions generate experience, reflection refines experience into knowledge, and memory stores knowledge to improve future actions. This will
Agent is transformed from a static program into a dynamic learning entity.
## Summary
> The status of the System Prompt in the Agent system is far more than a simple set of instructions; it is actually the Agent
> The core "operating system" needs to be treated with the height of system architecture design for prompt words and context engineering.
Using markup languages such as Markdown or XML to build structured instruction modules can significantly improve the LLM's ability to understand and follow complex rules.
Through explicit role activation, detailed behavior specifications, and context engineering techniques such as "instant" loading of data, developers can create a stable and predictable "cognitive environment" for the Agent,
thereby guiding its behavior to the desired track. Excellent context engineering is the foundation for achieving Agent behavior reliability.
Related resources:
- https://docs.spring.io/spring-ai/reference/api/structured-output-converter.html
- Agentic Design Patterns: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0
- Agentic Context Engineering: https://www.arxiv.org/pdf/2510.04618
- A Survey on Large Language Model based Autonomous Agents: https://arxiv.org/pdf/2308.11432
- [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- [AGENTIC RETRIEVAL-AUGMENTED GENERATION: A SURVEY ON AGENTIC RAG](https://arxiv.org/pdf/2501.09136)
- [How to build reliable AI workflows with agentic primitives and context engineering](https://github.blog/ai-and-ml/github-copilot/how-to-build-reliable-ai-workflows-with-agentic-primitives-and-context-engineering/)
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.