Ai Agent
You may already be accustomed to this kind of interaction: you ask a question, and the AI gives an answer.\n\n* You ask it to write an article, it writes an article.\n\n* You ask it to translate a sentence, it translates a sentence.\n\nIn this mode, the AI is more like a consultantβit gives you advice, but doesn't directly do things.\n\nBut what if the task is slightly more complex? For example: check tomorrow's weather in Beijing for me, and if the temperature exceeds 25 degrees, recommend a short-sleeve shirt within a 300 yuan budget, then compile the results into an email to my boss.\n\nWith ordinary conversational AI, you'd have to break it into several steps:\n\n* Step 1: Check tomorrow's weather in Beijing.\n\n* Step 2: If it exceeds 25 degrees, recommend a short-sleeve shirt within 300 yuan.\n\n* Step 3: Write an email to my boss with the content...\n\nEach step requires you to manually push forward.\n\nThe idea behind AI Agent is: you only need to say it once, and it will do the rest itself.\n\nIt will automatically determine what needs to be done, what tools to call, in what order to execute, and how to adjust when problems arise, until the task is completed.\n\n!(#)\n\n> Simple understanding: ordinary LLM is like a military advisor, helping you strategize; AI Agent is the executor, taking the goal and getting things done itself.\n\n* * *\n\n## What is AI Agent\n\nAI Agent is an autonomous system that can perceive the environment, make decisions, and take actions.\n\nA more academic definition is: an Agent is an entity situated in some environment that can perceive the environment through sensors and act upon the environment through actuators to achieve a set of goals.\n\nThis definition sounds a bit abstract, so let's break it down in plain language.\n\n### Agent vs Ordinary LLM\n\nFirst, look at a comparison table:\n\n| Feature | Ordinary LLM | AI Agent |\n| --- | --- | --- |\n| Interaction Mode | Question and answer, user-driven | Autonomous execution, goal-driven |\n| Capability Boundary | Only built-in model capabilities | Capabilities can be extended through tools |\n| Memory | Conversation context (limited) | Short-term + long-term memory |\n| Planning | None (or requires user guidance) | Autonomous task decomposition and planning |\n| Feedback Loop | None (single generation) | Observation-thought-action loop |\n\nA concrete example: "Help me book a flight from Shanghai to Beijing tomorrow, under 1500 yuan."\n\nAn ordinary LLM might respond like this: "Sure, I can help you write a piece of code to query flights, or tell you which website to check." But it won't actually check flights, let alone book one for you.\n\nA qualified Agent would:\n\n* 1. Understand the goal: book a flight from Shanghai to Beijing tomorrow, under 1500 yuan.\n\n* 2. Decide on action: I need to call a flight query API.\n\n* 3. Execute action: call the API, get the flight list.\n\n* 4. Observe results: got 10 flights, 3 of which are under 1500 yuan.\n\n* 5. Think again: which one to choose? Might need to check departure/arrival times, or ask user preference.\n\n* 6. Continue action: might call another API to check on-time rate, or directly recommend options to the user.\n\n* 7. Complete task: lock in a seat for the user, generate an order link.\n\nSee the difference? Ordinary LLM gives you answers; Agent gives you results.\n\n### Core Capabilities of Agent\n\nA complete Agent typically has the following three core capabilities:\n\n| Capability | Description | Analogy |\n| --- | --- | --- |\n| Perception | Obtain environmental information and feedback | Eyes, ears |\n| Decision | Think about what to do and how to do it | Brain |\n| Action | Execute specific operations | Hands, feet |\n\n* **Perception** is what the Agent can see happening. For example, a user sent a message, an API returned a result, a new file appeared in the file systemβthese are all perceptual inputs.\n\n* **Decision** is the Agent determining what to do next based on perceived information. Answer the user directly? Call a tool? Break a big task into smaller tasks? This is all decision-making.\n\n* **Action** is the Agent actually executing decisions. Calling a search API, reading files, sending emails, operating databasesβthese are all actions.\n\n* * *\n\n## Basic Architecture of Agent\n\nNow let's look at the standard "four-piece puzzle" of an Agent: brain, tools, memory, and planning.\n\nFirst, an architecture diagram:\n\n\n\n### Brain: LLM as the Reasoning Core\n\nThe brain of an Agent is usually a large language model, such as GPT-4, Claude, Llama, etc.\n\nThis LLM is responsible for:\n\n* 1. Understanding user intent: when a user says help me book a ticket, what do they really want?\n\n* 2. Making decisions: what should I do next?\n\n* 3. Generating tool call parameters: to call a flight query API, what parameters are needed?\n\n* 4. Organizing final answers: now that enough information is collected, how to give the user a clear summary?\n\nThe LLM is the core of the Agent, but not the wholeβjust like the human brain is important, but also needs hands and feet to get things done.\n\n### Tools: Expanding AI's Capability Boundaries\n\nNative LLMs have two obvious limitations:\n\n* 1. Knowledge cutoff: it doesn't know what happened after training ended.\n\n* 2. Unable to interact: it cannot directly read files, query databases, or call APIs.\n\nTools are used to solve these problems.\n\nCommon tool types:\n\n| Tool Type | Function | Examples |\n| --- | --- | --- |\n| Search Tools | Get latest information | Google Search, Bing Search |\n| Calculation Tools | Perform mathematical operations | Calculator, Wolfram Alpha |\n| File Operations | Read/write local files | Read PDF, write CSV |\n| API Calls | Interact with external systems | Book flights, send emails, check weather |\n| Database Queries | Store and retrieve structured data | SQL queries, vector retrieval |\n| Code Execution | Run code to solve problems | Python REPL, Jupyter |\n\nGiving tools to an Agent is like giving a person a computerβinstantly transforming from only being able to think to being able to do.\n\n### Memory: Short-term vs Long-term Memory\n\nLLMs are naturally amnesiacβevery conversation is brand new unless you stuff context into it.\n\nAnd Agents need to remember many things:\n\n* Short-term memory: what did the user just say? What tool did I call in the previous step? What result did I get?\n\n* Long-term memory: what are the user's preferences? How were similar tasks handled in the past? What lessons were learned?\n\nCommon implementations of memory systems:\n\n| Memory Type | Stored Content | Implementation |\n| --- | --- | --- |\n| Short-term Memory | Current conversation history, intermediate steps | Directly put in LLM context window |\n| Long-term Memory | User preferences, historical tasks, knowledge documents | Vector database + similarity retrieval |\n| Summary Memory | Compressed historical summaries | Have LLM periodically generate summaries |\n\nA good memory system can make an Agent seem like it remembers you, rather than feeling like meeting for the first time every time.\n\n### Planning: Task Decomposition\n\nComplex tasks won't be completed in one step; Agents need to be able to break big goals into small steps.\n\nFor example: help me plan a 10-person birthday party.\n\nAn Agent might plan like this:\n\n* 1. First ask for clarification: what's the budget? When? Where? What are the birthday person's preferences?\n\n* 2. Then: check nearby venues.\n\n* 3. Next: design the menu.\n\n* 4. Then: create a shopping list.\n\n* 5. Finally: generate a schedule.\n\nCommon planning strategies:\n\n* Chain-of-Thought: think step by step, what to do first, what to do after.\n\n* Tree-of-Thought: consider multiple possible paths simultaneously, choose the optimal one.\n\n* Reflection: after completing a step, look back to see if there are problems, whether adjustments are needed.\n\n* * *\n\n## Tool Use: Tool Use / Function Calling\n\nTool calling is the most basic and important capability of an Agent.\n\nLet's start with what it is.\n\n### What is Function Calling\n\nSimply put: Function Calling is when the LLM outputs a structured JSON telling you which function it wants to call and what parameters to pass.\n\nIt's not that the LLM actually executes this functionβit just tells you it wants to call it; you still have to do the execution.\n\nThe typical flow is:\n\n* 1. You tell the LLM: "Here are some functions you can call, each function's name, parameters, and purpose are..."\n\n* 2. User sends message: "Help me check tomorrow's weather in Beijing."\n\n* 3. LLM responds: "I want to call the get_weather function, parameters are city='Beijing', date='tomorrow'."\n\n* 4. You call this function (actually query the weather API), get results.\n\n* 5. You feed the result back to the LLM: "The previous function call returned: temperature 25 degrees, sunny."\n\n* 6. LLM based on this result, gives the user a natural language answer: "Tomorrow in Beijing is sunny, temperature 25 degrees, very comfortable."\n\nSee? The LLM does the thinking, you do the executing.\n\n### Defining Tool JSON Schema\n\nTo let the LLM call tools, you first need to tell it what tools are available.\n\nThis telling process is describing functions with JSON Schema.\n\nHere's a standard function definition format:\n\n## Example\n\n# This is a typical tool definition (represented as Python dict, will be converted to JSON eventually)\n\n tool_definition ={\n\n"type": "function",\n\n"function": {\n\n"name": "get_weather",# function name\n\n"description": "Query the weather for a specified city on a specified date",# function purpose description, LLM will read this\n\n"parameters": {\n\n"type": "object",\n\n"properties": {\n\n"city": {\n\n"type": "string",\n\n"description": "City name, such as 'Beijing', 'Shanghai', 'Shenzhen'",\n\n},\n\n"date": {\n\n"type": "string",\n\n"description": "Date, format as YYYY-MM-DD, such as '2024-06-18'",\n\n},\n\n"unit": {\n\n"type": "string",\n\n"enum": ["celsius","fahrenheit"],\n\n"description": "Temperature unit, Celsius or Fahrenheit, default is Celsius",\n\n},\n\n},\n\n"required": ["city","date"],# required parameters\n\n},\n\n},\n\n}\n\n# Another tool: search\n\n search_tool ={\n\n"type": "function",\n\n"function": {\n\n"name": "web_search",\n\n"description": "Search the internet for latest information, suitable for news, real-time data, unknown knowledge",\n\n"parameters": {\n\n"type": "object",\n\n"properties": {\n\n"query": {\n\n"type": "string",\n\n"description": "Search keyword or question",\n\n},\n\n"num_results": {\n\n"type": "integer",\n\n"description": "How many results to return, default 5",\n\n"default": 5,\n\n},\n\n},\n\n"required": ,\n\n},\n\n},\n\n}\n\n# Another tool: calculator\n\n calculator_tool ={\n\n"type": "function",\n\n"function": {\n\n"name": "calculate",\n\n"description": "Perform mathematical calculations, supports addition, subtraction, multiplication, division, power operations, etc.",\n\n"parameters": {\n\n"type": "object",\n\n"properties": {\n\n"expression": {\n\n"type": "string",\n\n"description": "Mathematical expression, such as '25 * 4 + 10', 'sqrt(16)'",\n\n},\n\n},\n\n"required": ,\n\n},\n\n},\n\n}\n\nprint(f"Defined {len([tool_definition, search_tool, calculator_tool])} tools: get_weather, web_search, calculate")\n\n# Output: Defined 3 tools: get_weather, web_search, calculate\n\nKey points:\n\n* description is very importantβthe LLM relies on this description to understand what this tool does and when to use it.\n\n* Parameters should also be clearly describedβfor example, unit has enum constraints, so the LLM knows it can only choose from these two values.\n\n* required marks mandatory parametersβthe LLM will ensure these parameters have values.\n\n### Code Practice: Adding Search and Calculation Tools to AI\n\nNow let's write a complete, runnable example.\n\nFor demonstration, we use Python to simulate the complete Function Calling flow.\n\n## Example\n\n# ============================================\n\n# A simplified Agent tool calling demonstration\n\n# No real API Key needed, using simulated data for demonstration\n\n# ============================================\n\nimport json\n\nimport random\n\nfrom typing import Dict, Any, List, Optional\n\nclass SimpleAgent:\n\n"""A simple Agent demonstration class"""\n\ndef __init__ (self):\n\n# Register available tools\n\nself.tools=self._define_tools()\n\n# Conversation history (memory)\n\nself.messages: List=[]\n\ndef _define_tools(self) -> List:\n\n"""Define all available tools"""\n\nreturn[\n\n{\n\n"name": "web_search",\n\n"description": "Search the internet to get latest information",\n\n"parameters": {\n\n"query": {"type": "string","description": "Search keyword"},\n\n},\n\n"required": ,\n\n},\n\n{\n\n"name": "calculate",\n\n"description": "Perform mathematical calculations",\n\n"parameters": {\n\n"expression": {"type": "string","description": "Mathematical expression"},\n\n},\n\n"required": ,\n\n},\n\n{\n\n"name": "get_weather",\n\n"description": "Query weather",\n\n"parameters": {\n\n"city": {"type": "string","description": "City name"},\n\n"date": {"type": "string","description": "Date"},\n\n},\n\n"required": ["city","date"],\n\n},\n\n]\n\ndef _call_tool(self, tool_name: str, parameters: Dict[str, Any]) ->str:\n\n"""(Simulated) call tool and return result"""\n\nprint(f" {tool_name}({parameters})")\n\nif tool_name =="web_search":\n\n query = parameters\n\n# Simulate search results\n\n results ={\n\n"tutorial": "Tutorial () is a programming learning website providingA large number of programming tutorials and examples.",\n\n"2024Year6Month18Daily Beijing Weather": "Beijing 2024-06-18: Sunny, 25Β°C, humidity 45%.",\n\n"Shanghai population": "Shanghai 2024 permanent resident population about 24.89 million.",\n\n}\n\nreturn results.get(query, f"Search results: No exact information found for '{query}', this is a simulated result.")\n\nelif tool_name =="calculate":\n\n expr = parameters\n\ntry:\n\n# Note: don't use eval in production, this is just for demonstration\n\n result =eval(expr,{"__builtins__": None},{\n\n"sqrt": lambda x: x**0.5,\n\n"pow": pow,\n\n})\n\nreturn f"Calculation result: {expr} = {result}"\n\nexcept Exception as e:\n\nreturn f"Calculation error: {e}"\n\nelif tool_name =="get_weather":\n\n city = parameters\n\n date = parameters\n\n# Simulate weather data\n\n weathers =["Sunny","Cloudy","Light Rain","Overcast"]\n\n temp =random.randint(15,35)\n\nreturn f"{city} {date}: {random.choice(weathers)}, {temp}Β°C"\n\nelse:\n\nreturn f"Unknown tool: {tool_name}"\n\ndef _decide_action(self, user_input: str) -> Dict[str, Any]:\n\n"""\n\n (Simulated) LLM decision process\n\n In actual projects this would call a real LLM API\n\n """\n\n# Here using simple rules for simulation, real scenarios should use LLM\n\nif"Search"in user_input or"Look it up"in user_input or"What is"in user_input:\n\n# Extract search term (simple simulation)\n\n query = user_input.replace("Search","").replace("Look it up","").replace("What is","").strip()\n\nif not query:\n\n query ="tutorial"\n\nreturn{\n\n"action":
YouTip