Deep dive into the technical foundations: planning, memory, tool use, and the reasoning loop that makes agents work at scale. Understand how to build robust, production-ready agent systems.
Every functional agent rests on four core components. These aren't optional — without any one of them, your agent will fail or behave unpredictably. Let's build a mental model of how they interact.
The agent's ability to break down a complex goal into subtasks and decide the next action. This is where the LLM's reasoning shines. It's the "thinking" phase.
Short-term (conversation), long-term (vector database), working (scratchpad). Agents need memory to avoid infinite loops and maintain context.
External functions the agent can invoke: APIs, databases, calculators, web search. Tools ground the agent in reality.
The framework that calls tools, parses results, and feeds them back to the agent. The "doing" phase.
User Input → Planning (LLM decides what to do) → Tool Selection → Action Execution → Result Processing → Memory Update → Loop back to Planning (if goal not met). This cycle is the agent.
ReAct stands for "Reasoning + Acting". It's a proven pattern where the agent generates a reasoning step (what to do), then an action step (which tool to use), then observes the result. This cycle is remarkably effective.
def react_loop(user_input: str) -> str: # Build the context with tools available available_tools = ["web_search", "calculator", "code_executor"] # Keep track of the thought-action-observation history history = [] while not goal_reached: # 1. THOUGHT: What should I do? thought = llm.generate( f"Given {user_input} and my tools {available_tools}, what's my next step?" ) history.append({"type": "thought", "content": thought}) # 2. ACTION: Execute the tool tool_name, tool_args = parse_action(thought) result = execute_tool(tool_name, tool_args) history.append({"type": "action", "tool": tool_name}) # 3. OBSERVATION: What happened? history.append({"type": "observation", "content": result}) # Check if we're done goal_reached = check_goal(result) return synthesize_final_answer(history)
By separating thought from action, the model becomes more explicit about its reasoning. This makes debugging easier and helps the model avoid mistakes. It's slower than single-shot responses but far more reliable for complex tasks.
Agents need different types of memory to function effectively. Think of it like human memory: short-term (what you're thinking right now), long-term (facts you learned long ago), and working (your notepad).
The conversation history. Recent messages and tool results. Limited to context window (4K-200K tokens). Cleared each session.
Persistent knowledge. Vector databases for semantic search. Facts learned across sessions. Can be enormous (millions of documents).
Scratchpad for current task. Intermediate results, reasoning chains, state variables. Kept in short-term but separate from conversation.
Short: Last 10 messages. Long: Customer history in vector DB. Working: Current subtask list, iteration count.
class AgentMemory: def __init__(self): # Short-term: conversation history self.conversation = [] # Long-term: vector database self.vector_db = VectorDatabase() # Working: current task state self.scratchpad = { "current_task": None, "subtasks_remaining": [], "iteration_count": 0, "max_iterations": 10 } def add_observation(self, text: str): self.conversation.append({"role": "observation", "content": text}) self.vector_db.add(text) # Also store for long-term retrieval def get_context(self, query: str) -> str: # Combine short-term + relevant long-term recent = self.conversation[-10:] # Last 10 messages relevant = self.vector_db.search(query, top_k=5) return format_context(recent + relevant)
The agent needs to know what tools are available and when to use each one. This is solved with a tool registry pattern where each tool is described with a JSON schema.
TOOL_REGISTRY = {
"web_search": {
"description": "Search the web for current information",
"parameters": {
"query": {"type": "string", "description": "Search terms"}
},
"function": web_search_impl
},
"execute_code": {
"description": "Run Python code safely",
"parameters": {
"code": {"type": "string", "description": "Python code to execute"}
},
"function": execute_code_impl
}
}
def execute_tool(tool_name: str, args: dict) -> str:
if tool_name not in TOOL_REGISTRY:
raise ValueError(f"Tool {tool_name} not found")
func = TOOL_REGISTRY[tool_name]["function"]
try:
result = func(**args)
return result
except Exception as e:
return f"Error: {str(e)}"
By defining tools in JSON, the agent can read the descriptions and decide which tool to use. This is how Claude's tool use works — the model sees the schema and chooses intelligently.
As an agent works through a task, it needs to track state: what's been done, what's pending, error counts, etc. This is critical for resumability and debugging.
agent_state = {
"goal": "Write a report on quantum computing",
"status": "in_progress", # pending, in_progress, completed, failed
"iteration": 3,
"max_iterations": 10,
"current_step": "searching_for_recent_papers",
"completed_steps": ["plan_outline", "search_quantum_basics"],
"pending_steps": ["synthesize_findings", "write_draft", "review"],
"errors": {
"search_failed": 1,
"rate_limited": 0
},
"results": [],
"memory_tokens_used": 3456,
"started_at": "2025-02-17T10:30:00Z",
"updated_at": "2025-02-17T10:35:22Z"
}
Long-running agents should persist state to disk or database. If the agent crashes, it can resume from where it left off. This is crucial for production systems handling expensive operations.
Production agents will fail. Tools fail, networks are unreliable, models make mistakes. Great agents have strategies to recover gracefully.
Tool fails → Wait (exponential backoff) → Retry up to 3 times → Fallback to different tool or human
Tool returns unexpected result → Agent re-reasons → Chooses different tool or approach
Tool fails 3+ times → Mark as failed → Skip step → Try next step or ask human
Tool fails repeatedly → Disable it temporarily → Use fallback tools → Monitor for recovery
def execute_with_recovery(tool_name: str, args, max_retries=3): retry_count = 0 wait_time = 1 # seconds, exponential backoff while retry_count < max_retries: try: result = execute_tool(tool_name, args) return {"success": True, "result": result} except ToolError as e: retry_count += 1 if retry_count < max_retries: time.sleep(wait_time) wait_time *= 2 # exponential backoff continue else: return { "success": False, "error": str(e), "fallback_tool": "manual_review" }
Good agents degrade gracefully. If a search tool fails, try a cached version. If vision fails, ask the user. If code execution times out, simplify the problem. Graceful degradation keeps agents useful even when things break.
Let's see how all four pillars connect in a complete, functional example. This is the mental model you need to build production agents.
class ReasoningAgent: def __init__(self): self.memory = AgentMemory() self.tools = TOOL_REGISTRY self.state = {"status": "idle"} self.iteration_count = 0 def run(self, user_goal: str): self.state["goal"] = user_goal self.state["status"] = "in_progress" while (self.iteration_count < 10 and self.state["status"] == "in_progress"): # 1. PERCEIVE: Gather context context = self.memory.get_context(user_goal) # 2. REASON: What to do next? thought = self.llm.generate( f"Goal: {user_goal}\nContext: {context}\nWhat's next?" ) # 3. ACT: Execute tool tool_name, tool_args = parse_tool(thought) result = execute_with_recovery(tool_name, tool_args) # 4. LEARN: Update memory and state self.memory.add_observation(result) self.iteration_count += 1 # Check if goal is reached done = self.llm.judge( f"Is this goal achieved? {user_goal}\nEvidence: {result}" ) if done: self.state["status"] = "completed" return synthesize_answer(self.memory)
This simple architecture includes everything: planning (thought), tool use (action), error recovery, state tracking, and goal checking. Real production systems add more layers (logging, monitoring, caching), but this core loop is universal.
1. What does ReAct stand for?
2. Which memory type is best for semantic search across millions of documents?
3. What is a tool registry used for?
4. In the state management pattern, what does "completed_steps" track?
Agent architecture rests on four pillars: Planning & Reasoning (the ReAct pattern), Memory (short-term/long-term/working), Tools (a registry of available functions), and Action Execution (the framework that calls tools safely). ReAct (Reasoning + Acting) separates the thinking phase from the doing phase, making agents more reliable. Memory systems prevent infinite loops and enable knowledge reuse. Tool registries let agents choose what to do next intelligently. State management enables resumability and debugging. Error recovery strategies (retry, replan, escalate, circuit breaker) keep agents functional under stress.
Next up → Topic 14: Agent Frameworks
Now you'll explore existing frameworks (LangChain, CrewAI, Claude SDK) that implement these architectures for you.