Definitions & Mental Models
Agentic AI
An autonomous reasoning loop where an LLM plans, takes actions, observes outcomes, and iterates toward a goal — without a fixed, pre-specified execution path.
- 🔁Think → Act → Observe loop
- 🎯Goal-directed, not instruction-directed
- 🛠️Decides what tools to use and when
- 📐Can decompose multi-step tasks
- 🧩Framework: LangGraph, AutoGen, CrewAI
MCP (Model Context Protocol)
A standardized protocol (by Anthropic) that defines how LLMs connect to external tools, data sources, and services — a universal plug-in interface.
- 🔌Protocol, not a framework
- 📦Exposes: Tools, Resources, Prompts
- 🌐Transport: stdio, HTTP/SSE
- 📋JSON-RPC 2.0 message format
- 🏗️Hosts: Claude Desktop, custom apps
The Key Mental Model
Agentic AI is the behavior — MCP is the plumbing. Agentic AI describes how an LLM reasons autonomously across multiple steps. MCP describes how the LLM accesses external capabilities. You can have agentic AI without MCP (raw function calls), and MCP without agentic AI (single-turn tool use).
Core Primitives
Agent
An LLM + instructions + tools that can autonomously execute multi-step workflows. Has its own memory, reasoning chain, and decision loop.
MCP Tool
A callable function exposed by an MCP server. The LLM can discover, invoke, and get results from tools. Tools have JSON Schema inputs.
MCP Resource
Read-only data the model can access (files, DB rows, API responses). Unlike tools, resources don't perform actions — they provide context.
MCP Prompt
Pre-built, parameterized prompt templates exposed by an MCP server. Useful for standardizing how an LLM interacts with a specific domain.
Tool Call (native)
Raw function-calling in OpenAI/Anthropic APIs. Not standardized — each integration is bespoke. Predecessor pattern to MCP.
Orchestrator
The system that manages agent state, routes between agents, and aggregates results. Can be LangGraph, custom state machine, or a parent agent.
Core Differences
| Dimension | Agentic AI | MCP |
|---|---|---|
| Nature | Behavioral paradigm / system design | Communication protocol / standard |
| Scope | End-to-end task completion, reasoning loops | How LLMs connect to external capabilities |
| Decision-making | LLM decides what to do and when | Protocol delivers how to access a capability |
| State | Maintains task state, memory, context across steps | Stateless per call (server can be stateful) |
| Composability | Agents can spawn sub-agents (multi-agent) | Servers can be composed by listing multiple |
| Failure handling | Agent retries, replans, backtracks | Protocol errors → caller handles |
| Auth | Embedded in agent design (API keys, tokens) | OAuth 2.1 / bearer tokens in protocol headers |
| Portability | Framework-specific (LangGraph ≠ CrewAI) | Any MCP host can use any MCP server |
| Latency | Multi-step = higher end-to-end latency | Single server roundtrip per tool call |
| Cost | Multiple LLM calls accumulate fast | N/A (protocol layer, not LLM calls) |
Architecture Patterns
MCP Architecture
Agentic Loop Architecture (ReAct Pattern)
(plan step)
(function call)
(result)
(done?)
Loop continues
If "done?" → No, the arrow from "LLM Evaluates" loops back to "LLM Thinks" with the new observation appended to context. This is the fundamental agentic loop.
Multi-Agent Architecture
# Orchestrator → Specialist Agent pattern from langgraph.graph import StateGraph, END from langgraph.prebuilt import create_react_agent # Specialist agents (each has focused tools) researcher = create_react_agent(llm, tools=[web_search, arxiv_search]) writer = create_react_agent(llm, tools=[draft_doc, format_output]) validator = create_react_agent(llm, tools=[fact_check, cite_sources]) def router(state): # Orchestrator decides next node if state["phase"] == "research": return "researcher" if state["phase"] == "write": return "writer" if state["phase"] == "validate": return "validator" return END graph = StateGraph(AgentState) graph.add_node("researcher", researcher) graph.add_node("writer", writer) graph.add_node("validator", validator) graph.add_conditional_edges("orchestrator", router)
When to Use Which
| Scenario | Use | Why |
|---|---|---|
| Simple Q&A with a database lookup | MCP Only | Single tool call, no planning needed. MCP server wraps the DB query. |
| Research a topic and write a report | Agentic | Requires iterative search → synthesize → draft → refine loops. |
| Standardize tool access across teams | MCP Only | Protocol portability. One MCP server, many host applications. |
| Execute a multi-step manufacturing workflow | Both | Agent orchestrates logic; MCP servers expose ERP/MES APIs. |
| IDE code completion + context | MCP Only | File system + repo resources via MCP. No autonomous loop needed. |
| Autonomous bug fix with PR submission | Both | Agent plans fix; MCP tools for GitHub, file I/O, test runners. |
| LLM-as-Judge evaluation pipeline | Agentic | Judge LLM needs to reason across criteria, aggregate scores, decide. |
| Real-time customer support bot | Both | Lightweight agent loop + MCP tools for CRM, ticketing, KB search. |
| BOM data enrichment pipeline | Agentic | Multi-stage transform (classify → enrich → validate) with branching. |
| Expose internal data to Claude Desktop | MCP Only | MCP server is the right primitive for exposing context to a host. |
Decision Flowchart
// Ask these questions in order:
The Hybrid Pattern: Agentic AI + MCP
The most production-ready systems combine both. The agent provides reasoning and orchestration; MCP provides standardized, portable, discoverable tool access.
from langchain_mcp_adapters.client import MultiServerMCPClient from langgraph.prebuilt import create_react_agent from langchain_anthropic import ChatAnthropic # Connect to MCP servers (tools discovered automatically) client = MultiServerMCPClient({ "filesystem": {"command": "npx", "args": ["@modelcontextprotocol/server-filesystem", "/data"]}, "postgres": {"url": "http://localhost:8001/sse"}, "slack": {"url": "http://localhost:8002/sse"}, }) async def run_agent(task: str): # Tools are auto-discovered from MCP servers tools = await client.get_tools() agent = create_react_agent( ChatAnthropic(model="claude-opus-4-5"), tools=tools, state_modifier="You are a factory planning assistant..." ) async for chunk in agent.astream({"messages": [("user", task)]}): print(chunk) # streamed agent steps
Why this pattern wins
Agent logic stays in your code. Tool implementations stay in MCP servers. Swapping an MCP server (e.g., upgrading your Postgres MCP) doesn't change the agent. Swapping the agent (e.g., from LangGraph to custom) doesn't change the tools.
Agentic AI Implementation Patterns
The most common pattern. Interleaves Reasoning and Acting. The LLM generates a thought, picks an action, gets an observation, repeats.
Thought: I need to find the TAKT time for Station 3.
Action: query_database(table="stations", filter="id=3")
Observation: {"id":3, "cycle_time":27, "unit":"seconds"}
Thought: Cycle time is 27s. I need to compare with demand.
Action: calculate_takt(demand=1000, shift_hours=8)
Observation: {"takt_time": 28.8, "unit": "seconds"}
Thought: Station 3 (27s) is under TAKT (28.8s). OK.
Final Answer: Station 3 is within capacity.
Two-phase: an LLM Planner creates a structured task list, then an Executor works through each step. Better for complex, predictable workflows.
# Phase 1: Planner generates structured steps plan = planner_llm.invoke({ "task": "Validate machine selection for bearing line" }) # plan.steps = ["1. Fetch BOM", "2. Get TAKT", "3. Match machines", "4. Validate"] # Phase 2: Executor runs each step with tools results = [] for step in plan.steps: result = executor_agent.run(step, context=results) results.append(result) if result.should_replan: plan = planner_llm.replan(plan, results) # replan if stuck
An Orchestrator agent routes subtasks to Specialist agents. Each specialist has a focused system prompt and tool set. Great for domain isolation.
Trust Boundary Warning
In multi-agent systems, the orchestrator should not blindly trust sub-agent outputs. Validate outputs structurally (Pydantic) before feeding to the next agent. An adversarial tool result could inject malicious instructions.
Supervisor Pattern
One orchestrator LLM decides which agent to invoke next. Clean but bottlenecked on the orchestrator.
Swarm / Handoff
Agents pass control peer-to-peer. More flexible, harder to reason about execution paths.
Hierarchical
Multiple layers of orchestrators. Use when tasks decompose into deeply nested subtasks.
Agent reflects on its own failures, generates critique, and retries. Particularly powerful for code generation and factual tasks.
for attempt in range(max_retries): output = agent.generate(task) evaluation = evaluator_llm.score(output, rubric) if evaluation.score >= threshold: break reflection = reflector_llm.critique( task=task, output=output, score=evaluation ) task = task + "\n\nPrevious attempt critique:\n" + reflection
MCP Implementation Guide
Minimal MCP Server (Python / FastMCP)
from fastmcp import FastMCP from pydantic import BaseModel mcp = FastMCP("factory-tools", description="NeoFAB manufacturing tools") # ── Tool: callable action ── @mcp.tool() def calculate_takt_time(demand_per_day: int, shift_hours: float = 8.0) -> dict: """Calculate TAKT time in seconds given daily demand and shift hours.""" available_seconds = shift_hours * 3600 takt = available_seconds / demand_per_day return {"takt_seconds": round(takt, 2), "demand": demand_per_day} # ── Resource: read-only data ── @mcp.resource("machines://catalog") def get_machine_catalog() -> str: """Returns the full machine database as JSON.""" return load_machine_db().to_json() # ── Prompt template ── @mcp.prompt() def validate_line_prompt(line_name: str, takt: float) -> str: return f"Validate machine selections for {line_name} with TAKT={takt}s..." if __name__ == "__main__": mcp.run(transport="stdio") # or transport="sse" for remote
MCP Server via HTTP/SSE (remote / multi-tenant)
# Production: run as a service, connect over HTTP mcp.run( transport="sse", host="0.0.0.0", port=8001, # Add auth middleware in production! )
Client Registration (claude_desktop_config.json)
{
"mcpServers": {
"factory-tools": {
"command": "python",
"args": ["/path/to/server.py"]
},
"remote-db": {
"url": "http://localhost:8001/sse"
}
}
}
Tool Description is a First-Class Citizen
The docstring of your tool IS the prompt the LLM uses to decide when to call it. Be precise: what it does, what inputs it expects, and what it returns. Vague docstrings = wrong tool invocations.
Evaluation & Observability
LLM-as-Judge for Agentic Pipelines
from pydantic import BaseModel class AgentEvalResult(BaseModel): correctness: float # 0-1: did it get the right answer? tool_usage: float # 0-1: did it use the right tools? efficiency: float # 0-1: fewest steps needed? faithfulness: float # 0-1: no hallucinations? reasoning: str # explanation of scores # Run judge against agent trace judge_result = judge_llm.with_structured_output(AgentEvalResult).invoke({ "task": original_task, "agent_trace": tool_call_history, "final_answer": agent_output, "ground_truth": expected_answer # optional })
Key Metrics to Track
Step Count
Average tool calls per task. Proxy for efficiency. Spikes indicate prompt drift or tool confusion.
Token Cost / Task
Agentic loops accumulate context fast. Track tokens per task, not per call. Set hard budgets.
Retry Rate
How often does the agent fail and retry? High retry rate = poor tool design or ambiguous prompts.
End-to-End Latency
Multi-step latency can be 10-50× single-call latency. Use streaming and parallelism where possible.
Task Success Rate
Fraction of tasks completed correctly. Define "correct" explicitly with a rubric, not vibes.
Hallucination Rate
MCP tool results are ground truth. Any agent claim contradicting tool results = hallucination.
Observability Stack
Recommended Tools
LangSmith — trace every LangGraph step, inspect tool inputs/outputs. Weights & Biases — log eval metrics, compare runs. OpenTelemetry — for custom spans in production. Arize / Phoenix — drift detection over time. Always log: model, version, tool_calls, latency_ms, token_count, success/fail.
Pitfalls, Loopholes & Anti-Patterns
Agentic AI Pitfalls
🔥 Infinite Loop / Runaway Agent
Agent gets stuck in a loop, making tool calls without converging. Always set max_iterations and a budget cap. Use LangGraph's interrupt/checkpoint mechanism.
🔥 Prompt Injection via Tool Results
A malicious tool result contains instructions like "Ignore previous instructions...". Sanitize all tool outputs before re-injecting into context. Never trust external data as instructions.
⚡ Context Window Explosion
Agentic loops accumulate tool results in context. A 20-step task with verbose tool outputs can exceed 128K tokens. Summarize observations, prune history, or use external memory (Redis).
⚡ Over-Reliance on the LLM's Plan
The model's plan is often wrong, especially for novel domains. Validate intermediate outputs structurally (Pydantic schemas). Use human-in-the-loop checkpoints for critical actions.
⚡ Tool Overload
Giving an agent 40+ tools degrades performance. LLMs struggle with large tool sets. Keep per-agent tool count under 10-15. Use routing to assign agents specialized subsets.
🔵 Non-Determinism in Evaluations
Two runs of the same task may take different paths. Don't evaluate on single runs. Use statistical aggregates over N runs (typically 10-50) for reliable metrics.
🔵 "Works in Dev, Fails in Prod" Drift
Dev runs on clean, small datasets; prod encounters messy, unexpected data. Test agents against adversarial inputs. Add fallback/default behaviors for unrecognized inputs.
MCP Pitfalls
🔥 No Auth on SSE Servers
An HTTP/SSE MCP server without auth is a public API. Always add bearer token middleware or mTLS in production. Never expose internal tools to the internet without auth.
🔥 Overly Permissive Tool Permissions
A execute_sql tool with write access is a footgun. Design tools with least privilege. Separate read tools from write tools. Gate destructive operations with confirmation.
⚡ Schema Drift
MCP server tool schema changes without notifying the LLM. The model's in-context tool list gets stale. Use versioning in tool names (v2_get_machine) and test after schema changes.
⚡ Poor Tool Descriptions
Vague docstrings cause wrong tool selection. The model is doing semantic matching on descriptions. Write descriptions as if explaining to a smart intern who doesn't know your system.
🔵 Long-running Tool Calls with No Timeout
An MCP tool that calls a slow API can hang the agent indefinitely. Add timeouts (e.g., 30s) to all tool calls. Return structured errors that the agent can reason about.
✅ Mitigation: Error Types Matter
Return structured error types (TIMEOUT, NOT_FOUND, AUTH_FAILED) so the agent can decide whether to retry, escalate, or skip. A generic "error" string is useless for recovery.
Best Practices
Agentic AI
- ✓Always define a maximum iteration limit and a token budget per task run.
- ✓Use structured outputs (Pydantic) for all inter-agent communication. Never pass raw strings between agents.
- ✓Design a human-in-the-loop checkpoint for any irreversible action (write to DB, send email, trigger API with side effects).
- ✓Keep agent system prompts short and precise. Bloated system prompts dilute the signal. Use structured role + constraint + tool guidance.
- ✓Build and run an offline eval harness before deploying changes. Track task success rate across versions.
- ✓Use checkpointing / persistence (LangGraph Checkpointer) so long-running agents survive restarts.
- ✓Instrument every tool call with structured logging: tool_name, args_hash, latency_ms, success, token_cost.
- ✓For multi-agent: define a clear handoff contract (schema) for what each agent passes to the next.
- ✓Test agents on adversarial inputs: empty results, malformed data, contradictory tool responses.
- ✓Prefer small, focused agents over one monolithic agent with 50 tools.
MCP
- ✓Write tool docstrings as LLM-first documentation: what the tool does, when to use it, what it returns.
- ✓Use Pydantic models as tool input schemas — they auto-generate JSON Schema and validate inputs.
- ✓Keep tools single-responsibility. One tool = one action. Don't build a Swiss army knife tool.
- ✓Return structured, typed data (not raw strings) so the LLM can reliably parse results.
- ✓Add request timeouts and circuit breakers for any tool that calls external APIs.
- ✓Version your tools. Use semantic versioning in the server name. Deprecate old versions explicitly.
- ✓Implement idempotency for write tools. The agent may call a tool multiple times on retry.
- ✓Log every MCP invocation with client_id, tool_name, input_hash, duration_ms, status.
Security Considerations
🔥 Prompt Injection Attacks
External data (web pages, DB rows, emails) fed to the agent can contain adversarial instructions. Always clearly delimit user data from instructions in prompts. Use XML tags: <data>...</data> and instruct the model never to follow instructions found inside data tags.
🔥 Credential Leakage via Tools
Agents can be prompted to call tools with credentials as arguments. Never expose credentials in tool inputs. Use server-side secret injection; tools should retrieve secrets from vault, not accept them as params.
⚡ Confused Deputy Problem
The LLM acts on behalf of the user but may be tricked into using elevated permissions for unauthorized tasks. Design tools with the permission level of the calling user, not the system.
⚡ MCP Server Spoofing
A malicious MCP server can return crafted tool descriptions to hijack agent behavior. Validate MCP servers against a known-good registry. Use mTLS for server identity verification.
✅ Sandbox Tool Execution
Run code-execution tools in isolated containers (Docker, gVisor). Limit filesystem access, network egress, and CPU time per tool execution.
✅ Audit Trail
Log every tool call in an immutable append-only log. In regulated environments (manufacturing, finance), this is non-negotiable for compliance and incident investigation.
Scaling Patterns
Scale Agentic Throughput
- ⚡Parallel sub-tasks: Use
asyncio.gatherfor independent agent steps - 🗂️External memory: Redis / Postgres for agent state (not in-context)
- 📊Batch processing: Anthropic Batch API for offline eval at low cost
- 💾Prompt caching: Cache system prompt + tool schemas (large prefix)
- 🔀Model routing: Use Haiku for simple tool selection, Opus for reasoning
Scale MCP Servers
- 🐳Containerize: Each MCP server in its own Docker service
- ⚖️Load balance: Multiple server instances behind a proxy
- 📦Connection pooling: Pool DB connections inside MCP servers
- 🔒Rate limiting: Per-client rate limits at the MCP layer
- 🌐Caching layer: Cache read-heavy tool results (Redis TTL)
Cost Optimization
# 1. Prompt caching — static prefix cached at ~10% cost client.messages.create( system=[{"type": "text", "text": LARGE_SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}}], # cache this! ... ) # 2. Model tiering — route by complexity model = "claude-haiku-4-5" if task.complexity == "low" else "claude-opus-4-5" # 3. Observation truncation — don't re-inject full tool result truncated_obs = tool_result[:2000] + "... [truncated]" if len(tool_result) > 2000 else tool_result # 4. Batch API for offline evals (50% cost savings) batch = client.beta.messages.batches.create(requests=[...])
Tool Design Principles
The SMART Tool Framework
S — Specific
One tool = one action. get_machine_by_id not manage_machines. Specific names → correct LLM selection.
M — Minimal Input
Require only what's necessary. Optional params with sensible defaults. The LLM will hallucinate unknown required params.
A — Atomic
Tools should not call other tools internally. Composition is the orchestrator's job. Atomic tools are independently testable.
R — Rich Error Returns
Return structured errors with codes, not exceptions or empty strings. The agent needs actionable error info to recover.
T — Typed Outputs
Return typed data (dict with known keys) not raw strings. LLMs parse structured data much more reliably.
Tool Description Template
def get_station_cycle_time(station_id: str, product_line: str) -> dict: """ Retrieve the measured cycle time for a manufacturing station. Use this when you need the actual (measured) cycle time for a station to compare against TAKT time. Do NOT use for theoretical times. Args: station_id: Station identifier (e.g., "ST-003", "WASH-01") product_line: Product line code (e.g., "2kWh", "12kWh", "HeroPack") Returns: { "station_id": str, "cycle_time_seconds": float, "last_measured": ISO8601 date, "measurement_count": int } Errors: STATION_NOT_FOUND — station_id does not exist LINE_NOT_FOUND — product_line does not exist NO_DATA — station exists but no measurements yet """
Quick Reference Cheatsheet
| Topic | Key Fact |
|---|---|
| MCP Transport (local) | stdio — subprocess communication, zero network overhead |
| MCP Transport (remote) | HTTP + SSE — stateful connection per session, supports auth headers |
| MCP Message Format | JSON-RPC 2.0 — {"jsonrpc":"2.0","method":"tools/call","params":{},"id":1} |
| MCP Lifecycle | initialize → list capabilities → call tools → terminate |
| ReAct max steps | Default: 10-15. For complex tasks: 25-50. Always set explicitly. |
| LangGraph state | TypedDict. All agent data lives in state. Nodes read + write state. |
| Prompt caching savings | ~90% cost reduction on cached tokens. Min 1024 tokens to cache. |
| Best tool count per agent | 5-15 tools. Over 20 → significant performance degradation. |
| Structured output reliability | Pydantic + with_structured_output() > regex parsing > raw string parsing |
| Agent memory types | In-context (ephemeral), External DB (persistent), Semantic (vector store) |
| FastMCP install | uv add fastmcp (Python 3.10+) |
| LangGraph install | uv pip install langgraph langchain-anthropic langchain-mcp-adapters |
The 3-Layer Stack for Production
Layer 1 — Tool Layer: MCP servers expose capabilities (domain tools, data access). Layer 2 — Agent Layer: LangGraph agents orchestrate multi-step reasoning using MCP tools. Layer 3 — Eval Layer: LLM-as-Judge + W&B track quality over time and catch regressions.