The first figure is structural anatomy (modules you can split across prompts and services). The second is the runtime control loop — how those pieces turn over time in one agent. The third is multi-agent coordination.
Most "agent" conversations stop at the model. That misses the architecture.
An agent is a loop: plan, act, remember, and reason — often with tools in the middle. A *system* of agents adds another layer: who owns which slice of the problem, how context moves between actors, and what happens when something breaks.
This post is a reading-friendly synthesis of that stack, with sources inline so you can chase the originals.
The single-agent core
Planning loop
Agents tackle work by decomposing goals, ordering subtasks, executing, and re-planning when the world pushes back. Two patterns show up everywhere:
- ReAct-style interleaving — reasoning traces and actions alternate so the model can adjust as new observations arrive. Formalized in Yao et al., *ReAct: Synergizing Reasoning and Acting in Language Models* (ICLR 2023).
- Plan–then–execute — lay out a longer sequence up front, then run it with checkpoints. Fits approval gates and auditable workflows; complements ReAct rather than replacing it.
Microsoft's public framing of factory-style agent patterns — decomposition, tooling, guardrails — is a useful industry anchor: *"Agent Factory: The new era of agentic AI — common use cases and design patterns"* (Azure blog).
Memory is not one thing
Roughly four layers show up in real designs:
| Layer | What it is | Why it matters |
|---|---|---|
| Parametric knowledge | Weights | Static "facts of training"; slow to change. |
| Working context | Current messages / state | What the model sees *right now*. |
| Session memory | Task variables, scratchpad | Keeps one run coherent. |
| Persistent memory | Stores you control (vector DB, logs, CRM) | Personalization and long-horizon tasks. |
When teams blur those layers, handoffs get expensive and reasoning drifts. Architectures like MemGPT (Packer et al., 2023) make the hierarchy explicit — treating context management almost like an OS — which is a good mental model even if you do not ship their exact design.
Tools and the outer world
Tooling is the bridge from language to systems: APIs, databases, browsers, human approvals. Good tool layers add schemas, timeouts, retries, cost limits, and structured return formats so the reasoning loop can recover instead of hallucinating success. Vellum's write-up on workflow shapes is a practical catalog: *Agentic workflows: emerging architectures and design patterns*.
Reasoning mechanisms
- Chain-of-thought — elicit intermediate steps before an answer. Wei et al., 2022 showed strong gains on arithmetic and reasoning-style benchmarks by prompting for step-by-step work.
- Retrieve–then–reason — generate sub-questions, pull evidence, iterate. This is the honest version of "RAG done right": reasoning grounded in sources, not vibes.
- Explicit decision scaffolding — goals, options, and constraints written so the model (and humans) can audit the path.
Multi-agent systems: where leverage actually lives
Frontier models will keep improving. Coordination — roles, protocols, degradation — is the part that does not commoditize overnight.
A useful mental model from product positioning: most stacks are parallel tools until you add role clarity, handoffs, and failure handling. Then they behave like one system instead of a chatty pile of automations.
Three pillars
1. Role clarity Each agent needs a non-overlapping job: constraints, interfaces, and expertise boundaries. Overlap creates contradictory advice and duplicated work.
2. Handoffs
The next agent must receive *scoped* context — summaries, structured payloads, attributed tool results — not a raw dump or nothing. CrewAI's community notes on context between tasks are a concrete pattern: passing context between agents.
3. Failure handling Define what happens on low confidence, tool errors, or stale data: alternate model, different specialist, cached answer, or human escalation. Without this, one weak step stalls the entire workflow.
Routing as traffic control
Routers map task type, confidence, and capacity to the right specialist. Patterns include rules, learned policies, and event-driven triggers — with logging so you can audit *why* a route happened.
RopMura is a recent academic example: a multi-hop QA stack with routing and specialist agents; the authors report large improvements over single-agent baselines on their multi-hop settings (see their experiments for exact numbers).
Orchestration patterns
Salesforce documents a mainstream enterprise version of the orchestrator idea: a primary agent routes to secondary specialists via the Atlas reasoning engine — see Agentforce multi-agent orchestration and the deeper pattern guide *Enterprise agentic architecture and design patterns*. Their materials also emphasize open protocols (A2A, MCP) for interoperability — same problem domain as "how do handoffs stay standard across vendors?"
Production-shaped examples
Security operations — ContraForce's Security Delivery Agents automate investigation steps (context from logs, enrichment, governed response) with phased adoption and human-in-the-loop controls. Product overview: *Introducing Security Delivery Agents*. Platform framing: agentic security delivery.
Multi-hop QA — RopMura: router + planner + specialists + synthesis; useful if you want a paper trail for *why* routing beats a monolithic prompt on complex questions.
Enterprise orchestration — Salesforce Agentforce: primary agent, delegated specialists, Atlas routing — a reference for how vendors talk about governance and scale.
Design principles worth enforcing
Separability — agents should be testable in isolation; failures become diagnosable.
Transparency — log handoffs, routes, and tool decisions. Audits are not optional in production.
Graceful degradation — partial results beat total stalls; define fallbacks explicitly.
Feedback loops — measure outcomes and feed them back into routing and prompts.
Why this matters for builders
Models will keep getting cheaper and sharper. The durable craft is systems: memory boundaries, tool contracts, routing, and the social contract between agents (who decides, who escalates, what is logged).
If you are implementing, start by drawing roles, data passed at handoff, and three failure modes — then reach for the patterns above and the linked papers and vendor docs when you need receipts.
Sources and further reading
- Yao, S. et al. (2023). *ReAct: Synergizing Reasoning and Acting in Language Models.* arXiv:2210.03629
- Wei, J. et al. (2022). *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.* arXiv:2201.11903
- Packer, C. et al. (2023). *MemGPT: Towards LLMs as Operating Systems.* arXiv:2310.08560
- Microsoft Azure (blog). *Agent Factory: The new era of agentic AI — common use cases and design patterns.* Azure announcement
- Vellum. *Agentic workflows: emerging architectures and design patterns.* Vellum article
- CrewAI community. *Passing context between agents.* Forum thread
- RopMura (multi-agent QA). arXiv HTML
- Salesforce. *Agentforce multi-agent orchestration.* Salesforce
- Salesforce Architects. *Enterprise agentic architecture and design patterns.* Architect guide
- ContraForce. *Introducing Security Delivery Agents.* Product blog
- ContraForce. *Agentic Security Delivery Platform.* Product page
Draft expanded from internal research notes (2026-04-10); this published version prioritizes citable public sources.