Content
# Overview of Agent Architecture: Building AI Agents from Prompt to Context Engineering
In the past two to three years, I have conducted Agent development training for senior developers at several companies; in the last month, I have also been designing and training AI Agents for graduates. It wasn't until this week, after completing a showcase project that integrates AI capabilities, that I truly clarified how to systematically construct a learning path for Agents tailored to developers at different stages. This process has also made me deeply aware of the existence of the "curse of knowledge"—what I take for granted may be the biggest obstacle for beginners.
We can simply divide the learning process into four parts:
- Structured Prompt Engineering — How to engineer efficient and reusable prompts.
- Context Engineering and Knowledge Retrieval — Retrieving, generating, and compressing contextual information to produce high-quality knowledge backgrounds.
- Systematic Design of Tool Functions — Designing and implementing tools and interfaces that can be called by the Agent.
- Agent Planning and Multi-Agent — Constructing task planning and execution paths to achieve closed-loop automation.
Before we begin, we should briefly define AI Agents. Considering that there are numerous definitions of Agents, we can refer to the real-world examples provided by Anthropic in "Building effective agents":
> Some clients define them as fully autonomous systems that can operate independently over the long term and use various tools to complete complex tasks; others describe them as systems that follow predefined workflows.
Therefore, a simple task centered around a prompt can also be viewed as an AI Agent; while a task involving a complex system, multiple tools, and multiple steps is also an AI Agent.
## Structured Prompt Engineering
Although Context Engineering is a very popular term, how to write effective Prompts remains a key focus for us to get started. There is already a wealth of content related to prompts available online, but from my experience, we can focus on three main areas:
- Structuring the input and output of prompts
- Chain and modular design for complex problems
- Prompt routing and distribution tasks
With the addition of some necessary AI frameworks or tools, we can accomplish our tasks quite effectively.
### Structured Input and Output of Prompts
In the current development of Agents, although the model can generate some prompts, tuning the prompts remains a key focus of the work. We hope that the content output by the model can be in JSON, XML, or Java classes for integration with other code.
> Prompts are the art and science of designing inputs to guide AI models to generate **specific outputs**. By carefully designing and wording the inputs, we can effectively influence and control the direction and results of the model's responses, enabling the AI to generate outputs that meet expectations.
We can directly refer to the Structured Output Converter in the Spring AI documentation as an example:

The yellow part in the diagram represents the two core components:
**Formatted Input Instructions**
Generally, we need structured prompt templates to dynamically generate prompts, using structured text to design the inputs:
- Dynamic Prompt Template (PromptTemplate). This uses classic template engines to dynamically combine context, such as Jinja2 in LangChain and StringTemplate in Spring AI. This approach allows for injecting context, user input, system state, and other information at runtime, enabling flexible prompt construction.
- Structured Text Structure. To ensure the reliability and parseability of AI outputs, prompts need to be designed structurally, including role positioning (Role), task description (Task), constraints (Constraints), output format, etc.
- Example-driven. By providing example inputs (Few-shots) and expected outputs, we can significantly improve the stability and consistency of the model's outputs. For instance, when implementing QA, different scenario examples are provided.
**Transforming Model Output Results**
This involves using appropriate output formats for different scenarios and implementing corresponding parsing and **exception handling**.
- Domain-specific Output Formats. Based on different scenarios, we will use various designs such as JSON, XML, YAML, or Markdown to present in a user-friendly manner. For example: JSON has the advantage of being directly serializable for transmission but lacks real-time rendering, leading to a poor experience and robustness. YAML, on the other hand, can better handle streaming issues and has lower transmission costs.
- Parsing Implementation. Extracting code blocks from plain text, followed by deserialization and object mapping. Using Schema validation (JSON Schema, XSD) to ensure that the model's output field types and structures conform to the agreed specifications.
- Exception Handling. Due to the uncertainty in model generation, outputs may have missing fields, type errors, or fail to conform to the expected format. For instance, when fields are missing, default values or fallback strategies can be used, and the model can be triggered to retry generating specific fields.
When capabilities are appropriate, the model can be fine-tuned/trained based on existing data and other information to enhance its abilities in this area.
### Prompt Routing Distribution Task
In complex AI systems, especially in scenarios involving multiple agents or modules working together, a single prompt often cannot accomplish all tasks. Therefore, we need prompt routing:

> Prompt Routing is an engineering model that splits tasks, analyzes inputs, and intelligently allocates them to the most suitable model or sub-task prompts in multi-task, multi-agent, or complex AI processes.
The core idea is: by analyzing inputs and context, dynamically determine the information processing path, which prompt to use, or which tool or sub-agent to call, thereby achieving non-linear and conditional task execution. Taking a typical QA scenario as an example:
- Non-system-related questions → Directly inform the user that this type of question is not supported
- Basic knowledge questions → Call document retrieval and QA models
- Complex analysis questions → Call data analysis tools, then generate a summary
- ……
Through prompt routing, the system can intelligently select the most appropriate processing method based on the type of question while maintaining modularity and scalability. In some AI frameworks, such as RouterChain in LangChain, similar capabilities can be supported, along with methods like [Routing by semantic similarity](https://python.langchain.com/docs/how_to/routing/#routing-by-semantic-similarity).
### Chained and Modular Design for Complex Problems
Under the premise of prompt routing, complex problems can be systematically decomposed through **Prompt Chaining**. Prompt chaining allows a large task to be broken down into multiple subtasks, each corresponding to different prompts or model calls, and finally integrates the results. This approach is particularly suitable for processes with fixed steps, some of which can be skipped.

This enables better modular design:
- Each subtask focuses on handling a specific phase of the task.
- A subtask can be rewritten as needed, adding or replacing prompts.
- Subsequent prompts can be dynamically adjusted based on the output of the previous phase.
Taking a common software requirement as an example, the ideas proposed by a product manager can be decomposed into a prompt chain as follows:

1. Idea Collection: Gather product ideas and initial requirements.
2. Requirement Logic Sorting: Clarify requirement logic and functional priorities.
3. Preliminary Requirement Scheduling: Formulate a preliminary requirement document or task list.
4. Final Requirement Confirmation: Confirm the final requirements and generate formal documentation.
Each stage can be handled by different prompts or sub-Agents. For instance, idea collection can utilize an AI Agent with search capabilities, while requirement logic sorting can be accomplished using tools like Dify or Copilot 365. Ultimately, each stage is executed in a chained process while maintaining the flexibility of modular design, allowing for adjustments or replacements of subtasks as needed.
## Context Engineering and Knowledge Retrieval
Generally speaking, we have NoCode and ProCode to support the development of context-aware Agents.
- NoCode Solution (suitable for rapid validation): Quickly configure retrieval strategies through low-code platforms (such as Dify, N8N, Coze, etc.) and pre-configured RAG pipelines via UI.
- ProCode Solution (suitable for customized needs): Customize retrieval processes and optimization strategies using frameworks (LangChain, Spring AI), enabling multi-stage HyDE + hybrid retrieval + re-ranking pipelines.

Context itself is also part of the prompt. Before achieving various automations, we typically copy content manually from documents into AI chat tools. However, as we delve deeper into the model, we need to start thinking about how to build automation, approaching the problem from an engineering perspective. Before we begin, we still need to define AI Agents. Here, we can reference the definition provided by Anthropic in their official document "Effective context engineering for AI agents" (as it encompasses both science and art):
> Context engineering is the art and science of carefully selecting and placing the most relevant content from an ever-changing universe of information into a limited context window.
### Context Window
In simple terms: focus on selecting the most critical information within a limited context window to enable the model to understand and reason more efficiently. Below are the 6 common context engineering techniques summarized by [Drew Breunig](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html) as illustrated by [Langchain](https://github.com/langchain-ai/how_to_fix_your_context):

Here, I will summarize it as: the engineering of RAG (Retrieval-Augmented Generation) and context windows. The content of a complete context window (i.e., prompts) should typically include:
- System Prompt Section:
- Input Instruction Context: Telling it "who you are" and "what you want to do," including system prompts, user inputs, and role definitions.
- Formatted Output Context: Specifying a structured pattern for the model's output format, such as requesting a return in JSON format to ensure the usability of the output.
- Function Call Section:
- Tool-Related Context: This grants the model the ability to interact with the external world. It includes definitions of available tools or functions, as well as the responses returned after calling these tools.
- Dynamic Context Section:
- Time and Memory Context: Short-Term Memory (短期记忆), Long-Term Memory (长期记忆).
- External Knowledge Context: Facts retrieved from external information repositories such as documents and databases, helping the model to make fewer "nonsense" errors.
- Global State/Buffer: Temporary storage for the model when handling complex tasks, akin to its "working memory."
- External Knowledge: Information retrieved from external knowledge bases (such as documents and databases) using techniques like Retrieval-Augmented Generation (RAG) to provide factual basis for the model and reduce hallucinations.
In addition to the fixed system prompt section, **the acquisition of external knowledge** and **memory** will have the greatest impact on the entire window, making the design and optimization of these two aspects the top priority in context engineering.
### Retrieval-Augmented Generation of Knowledge

> RAG (Retrieval-Augmented Generation) is one of the core technologies for building Agents. It enhances the generative capabilities of large language models by retrieving relevant information from external knowledge bases. In complex scenarios such as codebase question answering, simple vector retrieval is often not precise enough, necessitating the combination of multiple retrieval strategies to improve accuracy.
In simple terms, it enriches the context through search. Depending on the complexity of implementation and the requirements of the scenario, we can categorize retrieval strategies into the following types:
- **Keyword Search**. The most basic retrieval method, suitable for exact match scenarios. For instance, when searching for specific function names, class names, or variable names in a codebase, keyword search is often more effective than semantic search. Common implementations include:
- **Full-text Search**: Using search engines like Elasticsearch, Solr, employing algorithms such as BM25, TF-IDF.
- **Regular Expression Matching**: Tools like ripgrep, grep, etc. Cursor adopts a hybrid approach of ripgrep + vector retrieval.
- **Semantic Search**. Understanding the semantic meaning of queries through vector embeddings, rather than just literal matching. This is particularly important for natural language queries:
- Using pre-trained embedding models (such as OpenAI text-embedding-3-large, Jina embeddings v3) to convert text into vectors.
- Calculating the similarity between queries and documents in vector space (usually using cosine similarity or dot product).
- **Graph-based Search**. Graph retrieval focuses not only on "content similarity" but also on relationships and contextual dependencies.
- In code scenarios: Constructing code call graphs, dependency graphs, utilizing AST (Abstract Syntax Tree) to extract structures like methods, classes, constructors, etc.
- Examples: Microsoft's [GraphRAG](https://github.com/microsoft/graphrag), Aider's repomap, or infrastructures like Joern and CodeQL.
Before retrieval, to ensure the reliability of the generated search results, **Query Rewriting** must be introduced, which gradually transforms the user's vague intent into precise queries that can be efficiently executed by the database. Modifying the user's original query enhances its relevance to the documents in the knowledge base, addressing the "impedance mismatch" issue between natural language queries and stored data blocks.
#### RAG Example in Code Scenarios
Generally speaking, various retrieval strategies can be combined to enhance retrieval effectiveness. Below is an official [Codebase RAG implementation](https://blog.lancedb.com/rag-codebase-1/) provided by the vector database LanceDB:

In addition to using TreeSitter to generate knowledge during the indexing phase, the [retrieval phase](https://blog.lancedb.com/building-rag-on-codebases-part-2/) will also utilize:
- HyDE (Hypothetical Document Embedding): The model first generates a "hypothetical" document or code snippet based on the query, and then uses this generated content for vector search, making it easier to find semantically relevant code.
- BM25 (Keyword Search): A traditional keyword search algorithm that excels at finding code containing exact terms or API names, which can also be used in conjunction with vector search.
- Hybrid Search: Combines BM25 and semantic search, allowing for precise keyword matching while also understanding code semantics, achieving better results by adjusting the weights of both.
- Re-ranking: After obtaining preliminary results from vector search, a cross-attention mechanism is used to reorder the results, improving the relevance and accuracy of the final answers.
Of course, in the previous indexing phase, this example also generates **meta-feature data**, which means that for each element or code snippet, we first generate a textual description of the code, and then perform vector embedding on that description to capture all meta-features of the code, with features extracted by a fine-tuned LLM.
### Engineering Context Windows

Two years ago, [GitHub Copilot](https://code.visualstudio.com/docs/copilot/chat/prompt-crafting) established one of the most noteworthy context systems in the industry for completion (without exception):
- Continuous signal monitoring. The Copilot plugin continuously monitors a series of signals from the IDE to dynamically adjust the priority of the context. This includes character insertion or deletion, changes in the currently edited file and language, cursor movement, scroll position changes, and file opening and closing.
- [Priority sorting](https://github.com/mengjian-github/copilot-analysis) of context sources. In the final prompt sent to the model, the information is sorted and filtered based on optimization levels:
- Highest priority: Code around the cursor position, including content before and after the cursor, which is the most direct context.
- High priority: The rest of the currently edited file.
- Medium priority: Other files or tabs open in the IDE (i.e., "neighboring files").
- Auxiliary context: Other information is also considered, including file paths, repository URLs, import statements in the code, and code information retrieved by RAG.
- Prompt assembly under context length constraints. Each piece of information is "scored" based on the above priorities, and then an optimal prompt is assembled.
This provides us with a very good reference:
- Freshness priority. Recently edited or accessed content receives higher priority, while outdated content gradually loses weight.
- Signal fusion and dynamic scoring. Multiple editing signals (such as cursor movement, file switching, import changes, etc.) are fused to dynamically adjust context weight.
- Sliding window and incremental updates. A sliding window mechanism is used to perform incremental updates only on changed parts, avoiding full reconstruction.
- Budget-aware and automatic truncation. Real-time estimation of token usage, with automatic trimming or summarization of low-priority content as it approaches limits.
Of course, this is a very complex design, and such an approach is only worth adopting in systems of sufficiently high value. Additionally, combining various popular Cursor Rules/Specs with persistent memory (Memory System) as outlined in AGENTS.md can store key information across sessions, providing long-term background information for subsequent queries.
### Agentic Retrieval

> Agentic refers to a characteristic that enables AI systems to possess autonomous perception, dynamic decision-making, and goal-oriented execution capabilities, allowing them to actively optimize context, generate retrieval strategies, and continuously self-iterate during task processes.
In the field of AI Coding, such as with Cursor and Claude Code, we can observe their operational processes, which essentially involve Agents executing RAG. Compared to traditional RAG, it is easier for them to obtain rich context, ensuring that context is not lost throughout the process. We can see examples of some relatively mature AI applications:
- Cursor optimizes the retrieval of code using `file + ripgrep`, and when the results are insufficient, it calls for vectorized checks or relevant searches like Git history.
- Google DeepResearch follows a similar process in generating research: identifying mainstream tools for context engineering, gaining a preliminary understanding of tool functions and differences, and determining the next steps: delving deeper into tool details.
In simple terms, for complex retrievals, we can construct it as an Agent, which determines which retrieval tools and strategies to use. When context is insufficient, it continues to use new parameters to call tools to obtain enough context.
#### DeepResearch Example
Below is an example of the process of building [Open DeepResearch](https://github.com/langchain-ai/open_deep_research) by Langchain AI:

The Deep Research Agent demonstrates a more systematic Agentic retrieval approach:
1. The task is divided into a planning phase (Manager Agent) and an execution phase (Execution Agent)
- The Manager Agent is responsible for task understanding, sub-task decomposition, and retrieval strategy design
- The Execution Agent is responsible for actual searching, web or document scraping, and content parsing
2. During the retrieval process, the Agent maintains the state of the topic structure, covered sub-questions, and information gaps to determine the next exploration direction
3. User review can be inserted at critical stages (HITL mode) to enhance control and accuracy
4. Ultimately, the Agent integrates the collected fragmented information into a structured report, accompanied by source citations
Observing their interactions and thought processes typically helps us better understand this process. On this basis, we can also see that Agentic Context Engineering allows LLM to autonomously generate, organize, and iterate context, achieving intelligent and scalable context management, thereby optimizing the retrieval and reasoning efficiency of complex tasks.

That is, optimizing how to retrieve based on historical conversations or experiences to make the Agent more suitable for the scenario.
## Engineering Design of the Agent Tool System
In the process of building the Agent, the design of the Tool System is the aspect that best reflects engineering thinking. It determines what the Agent can do, how well it can perform, and whether it can efficiently collaborate with the external world.
Tools can be any API, such as data queries (like database access), real-world operations (like sending emails, booking meetings), or interfaces that collaborate with other services. As we mentioned earlier, RAG (Retrieval-Augmented Generation) is also a type of tool under Agentic, and LlamaIndex provides such explicit encapsulation:
- FunctionTool: Easily encapsulates any Python function into a tool that can be used by the agent.
- QueryEngineTool: Converts any data query engine (for example, a vector index) into a tool, enabling the agent to perform queries and reasoning on it.
This data-centric approach simplifies our understanding of tools.
### Semantic Tools: Function Interfaces Designed for Agents
**Tools** are essentially a class of semantically understandable function interfaces. They not only contain logical execution capabilities but also carry meta-information that allows the model to understand:
- Name: The unique identifier of the tool, usually the function name, such as `getWeather`.
- Description: A natural language description of the tool's functionality, purpose, and applicable scenarios. This is a crucial element, as the model primarily relies on this description to determine when and how to use the tool.
- Parameters: An object defining the input parameters for the tool, including the name of each parameter, data type (e.g., string, number), description, and whether it is a required parameter.
In terms of execution mechanisms, the two common paradigms are:
- ReAct Framework (Reasoning + Acting): The core of the ReAct paradigm is to allow the LLM to interleave generating "thoughts" (reasoning traces) and "actions" (tool calls), thereby forming an explicit think-act-observe loop.
- Direct Function Calling: This is a more structured approach. The LLM determines in a single-step reasoning process that the user's query can be best answered by calling one or more predefined functions. It then outputs a structured JSON object that clearly specifies the function names to be called and their parameters.
We need to decide which method to use based on the model's support, as well as the designed interactions and intentions.
### Design Principles of the Tool
Generally speaking, when building the Coding Agent, we adhere to the following principles:
- Clear semantics: The names, descriptions, and parameter names of the tools must be extremely clear, descriptive, and unambiguous for the LLM. The description field of the tool is crafted as a form of AI-oriented "micro prompt."
- Stateless **objective** functions: Only encapsulate complex technical logic or domain knowledge, avoiding strategic or subjective decision-making.
- Atomicity and single responsibility: Each tool should be responsible for one and only one clearly defined function, which is to perform an atomic operation. If the Agent acts as a tool, it should also follow similar principles and accomplish only one task.
- Least privilege: Each tool should be granted only the minimum permissions and capabilities necessary to complete its clearly defined tasks.
#### Workflow-based Tool Orchestration: Task Chain Design
It is also applicable to AI Agents in non-programming domains. Based on the principles mentioned above, we can break down "Plan my trip to Beijing next week" into a set of discrete, single-responsibility tools.
- search_flights(origin: str, destination: str, outbound_date: str, return_date: str): Search for flight information.
- search_hotels(location: str, check_in_date: str, check_out_date: str, adults: int): Search for hotel information.
- get_local_events(query: str, date: str): Get information about local events or attractions for a specific date.
- book_cruise(passenger_name: str, itinerary_id: str): Book a cruise itinerary.
- lookup_vacation_packages(query: str): Query vacation packages.
The key features of this orchestration method are: high predictability, clear logic, and ease of modeling as a visual workflow (such as a DAG) within the platform. It is particularly suitable for Agents with stable processes and task dependencies (such as travel, customer service, and data pipeline scenarios).
#### Classification-Based Tool Invocation: Dynamic Intent Decision Making
The Copilot orchestrator, as described in [Understanding GitHub Copilot’s Internal Architecture](https://iitsnl.com/blog/understanding-github-copilots-internal-architecture/), decides to invoke one or more internal tools to complete tasks based on the analysis results from the "Intent Classifier":
- File Operations: This includes `read_file`, `edit_file`, and `create_file`, allowing Copilot to interact directly with the user's codebase.
- Code Execution: With the `run_in_terminal` tool, Copilot can execute commands in the user's terminal, such as running tests or building scripts.
- Search and Analysis: This is one of the most critical toolsets, including traditional `grep_search`, `list_code_usages`, and the most powerful `semantic_search` (语义搜索).
The key features of this model are high flexibility and strong scalability, but it relies on a good classification system and semantic matching capabilities. It is more suitable for dynamic scenarios such as code generation, debugging, and documentation Q&A.
### Building a Composable Tool Network with the MCP Protocol
As the number of tools and Agents continues to grow, we need a mechanism to standardize the description, dynamic registration, and cross-Agent invocation of tools. The MCP (Model Context Protocol) is a universal protocol layer designed for this purpose. With MCP, AI Agents no longer rely on hard-coded interfaces or specific systems, but can call tools, retrieve data, or collaborate with other Agents in a unified format. The core values of MCP lie in standardization, dynamism, and composability:
- **Standardization**: A unified tool invocation format allows different Agents to share a common set of tools.
- **Dynamism**: Supports runtime registration and access to tools, enabling Agents to select the most suitable tool based on task requirements.
- **Composability**: Different Agents and tools can be combined like building blocks, facilitating the decomposition and collaborative execution of complex tasks.
By integrating the atomic tool functions designed earlier, MCP can consolidate these tools into a reusable and collaborative tool network, allowing Agents to be more flexible and efficient in solving complex problems.
#### Other Tool Networks
Additionally, we can see the emergence of tools like GitHub Copilot Extension and Claude Code Plugin, which indicates that even with protocols like MCP and A2A, the AI Agent ecosystem may not be as unified as we expected. For example, the project at https://github.com/wshobson/agents documents (2025.10.14):
> A comprehensive system for production environments, consisting of 84 dedicated AI Agents, 15 multi-Agent workflow orchestrators, and 44 development tools, organized into 62 focused and single-responsibility plugins for Claude Code.
## Agent Planning and Beyond Monolithic Agents
> An Agent is a software system that uses AI to achieve goals and perform tasks on behalf of users. It exhibits reasoning, planning, and memory capabilities, and possesses a degree of autonomy, allowing it to learn, adapt, and make decisions independently. - Google Cloud
Agents are goal-oriented and typically require **perception** - **planning** - **action**, along with memory, to achieve their objectives. More complex AI Agent systems may also include capabilities such as **collaboration** and **self-improvement**. In the previous sections, we have introduced several fundamental capabilities:
- Through **structured prompts and prompt chains**, Agents have developed a cognitive structure for planning and decision-making;
- Through **context engineering**, Agents have gained the ability to "perceive the world," capturing information from external knowledge and environments;
- Through the engineering design of **tool systems**, Agents have acquired the ability to interact with the external world and execute tasks.
Building on this foundation, the further development directions for Agents include:
- **Collaboration** — Multiple Agents work together through A2A (Agent-to-Agent) communication protocols or task allocation mechanisms, achieving role division and information sharing;
- **Self-improvement** — Agents accumulate experience through memory systems and reflection mechanisms, optimizing their prompts and planning strategies, thereby acquiring the ability for continuous learning and self-evolution.
As this is a rapidly evolving field,
### Modular System Prompts: The Thinking Blueprint of an Agent
The first step in building an effective Agent is to define its "thinking blueprint"—the System Prompt. A well-designed System Prompt not only defines what the Agent should do but also clarifies what it should not do. In the realm of Coding Agents, a System Prompt can often be quite complex. For example, the System Prompt of [Cursor](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Cursor%20Prompts/Agent%20Prompt%202025-09-03.txt) includes detailed specifications regarding roles, tool invocation, safety boundaries, task planning, and more.
By combining tools like Cursor, Claude Code, Augment, and Junie, we can summarize a series of modular design practices:
- **Structural Layering and Modularity**: Organize prompts with clear hierarchies (roles/communication/tools/safety/tasks) to avoid "one-size-fits-all" text, making it easier to maintain and dynamically load.
- **Tool Prioritization and Parallelization**: Prioritize specialized tools and parallelize when possible to significantly reduce latency and costs (e.g., parallel calls to `read_file` to read multiple files, using `search_replace` for editing instead of sed).
- **Safety Boundaries and Permission Models**: Default to a sandbox with minimal permissions, requiring explicit authorization for dangerous operations (e.g., `required_permissions: ["network"|"git_write"|"all"]`), and prohibit high-risk actions like force-pushing to `main/master`.
- **Minimal Sufficient Task Management**: Manage multi-step complex tasks with TODOs (the first created is marked as in_progress, and upon completion, it is immediately marked as completed), while simple and direct tasks are executed right away.
- **Context Uniqueness and Safe Modifications**: Code editing requires uniquely identifiable contexts (e.g., `old_string` must be unique in the file, with 3–5 lines before and after), and multiple modifications should be executed in batches to avoid erroneous changes.
- **Communication Norms and User Experience**: Hide internal tool names, use natural language to "say-do-summarize," and keep it concise and skimmable; use backticks to denote file/function names and provide minimal usable examples when necessary.
This evolution from monolithic prompts to modular, hierarchical, and dynamic designs mirrors the transition from monolithic applications to microservices architecture, providing structural support for advanced reasoning, system scalability, and maintainability of Agents.
### From Retrieval to Planning: Using Prompts to Enable Agents to Decompose Goals
Simply telling an Agent to "make a plan" is far from sufficient; we must guide its decomposition process through a set of clear principles, much like establishing specifications for software modules. The intelligence ceiling of a monolithic Agent often depends on its "planning capability"—whether it can break down vague goals into clear, executable sub-tasks.
This involves two core strategies:
- **Pre-decomposition**: This strategy, also known as static planning, involves breaking down the entire complex task into a sequence of sub-tasks or a plan before the task execution begins.
- **Interleaved decomposition**: This strategy, also known as dynamic planning, does not create a complete plan at the start of the task but dynamically decides the next sub-task during execution.
For example, the architecture of BabyAGI embodies this "task-driven" planning: it consists of three core Agents—task_creation_agent (task generation), execution_agent (task execution), and prioritization_agent (task prioritization), forming a continuously looping system of task updates and execution.
In modern systems (such as Augment and Claude Code), planning logic is often embedded in the system prompts in the form of todo_spec, featuring the following characteristics:
- **Atomicity and Action-Oriented**: Each to-do item should be an independent, indivisible "atomic" task.
- **Meaningful Abstraction Levels**: To-do items should not be trivial operational actions (such as "read file a.txt" or "fix linter errors"), but rather high-level, meaningful, and non-trivial tasks.
- **Appropriate Scope**: Specifications tend to favor "fewer, larger to-do items" rather than a lengthy list of minor steps.
- **Implementation-Centric**: If the user's request is to implement a certain feature, then the to-do list generated by the Agent itself constitutes the final plan.
Through this structured planning, Agents can transform "user needs" into "system plans," laying the semantic interface for multi-Agent collaboration.
### Multi-Agent Collaboration System: From Individuals to Organizations

The capabilities of a single Agent are limited, while Multi-Agent Systems (MAS) represent an engineering direction suitable for the development of intelligent systems. Just as microservices architecture achieves high cohesion and low coupling by decomposing monolithic applications, Multi-Agent Systems achieve intelligent horizontal scaling by splitting the responsibilities of agents. By enabling multiple Agents to collaborate to achieve more complex goals, they can work together similarly to a "team" in software development.
Common collaboration topologies (refer to [LangGraph](https://langchain-ai.github.io/langgraph/concepts/multi_agent/), AutoGen, etc.):
- Supervisor-Expert Model (Hierarchical Structure): A "Supervisor Agent" or "Coordinator Agent" is responsible for receiving high-level user goals, breaking them down into a series of subtasks, and then assigning them to the appropriate "Expert Agents" based on the nature of each subtask.
- Parallel Model (Collective Intelligence): Also known as "Concurrent Model" or "Swarm Model." Multiple Agents independently execute the same task or different parts of a task simultaneously, and then their outputs are synthesized.
- Sequential Model (Pipeline): Agents work in a predefined order, like on an assembly line. The output of the previous Agent becomes the input for the next Agent.
- Network Model (Conversational/Dynamic Model): Agents can freely communicate in a many-to-many network without a fixed hierarchical structure. The next acting Agent is usually determined dynamically based on the flow of the conversation.
The choice of Multi-Agent topology directly reflects the underlying structure of the problem to be solved. The architecture is not arbitrarily chosen but aims to create a "cognitive model" that mirrors the problem dependency graph. Of course, it is also inevitable to encounter various issues, such as complexity, similar to those found in microservices architecture.
#### A2A Protocol: Building Agent Networks to Accelerate Intelligent Capability Sharing
The A2A protocol is specifically designed for Agent-to-Agent communication, complementing other standards such as the Model Context Protocol (MCP) that handle Agent-to-Tool communication. It serves as a public internet protocol, allowing different Agent systems to connect and interoperate with each other.
However, we do not necessarily need to introduce the A2A architecture, as the mechanism we implemented in AutoDev exposes the A2A protocol's Agents in the form of MCP tools for Agent use, enabling collaboration between Agents without adding system complexity.
#### Self-Improvement: Reflection, Memory, and Evaluation Loop
The true power of an evolving Agent comes from the close integration of the reflection loop and a persistent memory system.
- Reflection Mechanism: The Agent reviews its outputs, identifies errors, and generates improvement suggestions;
- Memory Storage: Persistent storage of task experiences and contexts (such as `AGENTS.md`, Knowledge Graph) provides long-term references for subsequent tasks.
For memory, there should be a mechanism for weighted retrieval based on recency, relevance, and importance, as well as a reflective memory management system that can autonomously decide what to remember, what to forget, and how to organize information.
> The ultimate goal of an advanced Agent architecture is to create a self-reinforcing flywheel: actions generate experiences, reflection distills experiences into knowledge, and memory stores knowledge to improve future actions. This transforms the Agent from a static program into a dynamic learning entity.
## Summary
> The System Prompt holds a position in the Agent system that goes far beyond a simple instruction set; it is essentially the core "operating system" of the Agent and should be treated with the high regard of system architecture design when it comes to prompts and context engineering.
Utilizing markup languages such as Markdown or XML to construct structured instruction modules can significantly enhance the LLM's (Large Language Model) understanding and adherence to complex rules. Through clear role activation, detailed behavior specifications, and context engineering techniques such as "on-the-fly" data loading, developers can shape a stable and predictable "cognitive environment" for the Agent, guiding its behavior onto the desired track. Excellent context engineering is the foundation for achieving reliability in Agent behavior.
Related Resources:
- https://docs.spring.io/spring-ai/reference/api/structured-output-converter.html
- Agentic Design Patterns: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0
- Agentic Context Engineering: https://www.arxiv.org/pdf/2510.04618
- A Survey on Large Language Model based Autonomous Agents: https://arxiv.org/pdf/2308.11432
- [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- [AGENTIC RETRIEVAL-AUGMENTED GENERATION: A SURVEY ON AGENTIC RAG](https://arxiv.org/pdf/2501.09136)
- [How to build reliable AI workflows with agentic primitives and context engineering](https://github.blog/ai-and-ml/github-copilot/how-to-build-reliable-ai-workflows-with-agentic-primitives-and-context-engineering/)
Connection Info
You Might Also Like
MarkItDown MCP
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
AEnvironment
AEnvironment is a unified platform for Agentic RL, embodying 'Everything as...
MARM-Systems
MARM: An AI with persistent memory for better conversation continuity.
enterprise-mcp-course
An open-source project for building scalable AI automation systems using MCP.