Model Context Protocol (MCP) is Anthropic's open standard for connecting LLMs to external tools and data sources. After building several MCP servers in production — wiring Claude to internal procurement APIs, PostgreSQL databases, and third-party services — here's a practical walkthrough of what actually matters.
What MCP Actually Is
MCP is a JSON-RPC protocol. Your LLM talks to an MCP server; the server exposes tools, resources, and prompts. The model decides when to call a tool, calls it, gets structured results back, and continues reasoning.
Think of it as a standardised way to give an LLM hands — the ability to read from and write to the world beyond its context window.
Client (Claude) <-> MCP Server <-> Your API / Database / Service The key insight: the model doesn't call your API directly. It calls the MCP server, which validates, authenticates, and executes the action. You control what the model can and can't do at the server layer.
Building Your First MCP Server
The TypeScript SDK makes this straightforward. A minimal server that exposes a database query tool:
import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
const server = new Server({ name: 'procurement-db', version: '1.0.0' }, {
capabilities: { tools: {} }
})
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [{
name: 'query_contracts',
description: 'Query active contracts by vendor or status',
inputSchema: {
type: 'object',
properties: {
vendor: { type: 'string' },
status: { type: 'string', enum: ['active', 'expired', 'pending'] }
}
}
}]
}))
server.setRequestHandler(CallToolRequestSchema, async (req) => {
if (req.params.name === 'query_contracts') {
const rows = await db.query(req.params.arguments)
return { content: [{ type: 'text', text: JSON.stringify(rows) }] }
}
})
const transport = new StdioServerTransport()
await server.connect(transport) Production Considerations
The toy implementation above works. Production adds three hard problems: auth, rate limiting, and error handling.
Authentication
Never expose raw database access through MCP. Your server is the security boundary. We use short-lived JWT tokens per session, scoped to what the user's role permits. The MCP server validates the token on every tool call, not just at connection time.
Rate Limiting
LLMs are enthusiastic tool-callers. Without limits, a single agentic workflow can hammer your API with dozens of calls in seconds. We enforce per-session limits at the MCP server layer — 20 tool calls per minute, with exponential backoff signalled back to the model via error messages it can reason about.
Tool Design
This is where most implementations go wrong. Tool descriptions are prompts — they directly influence when and how the model uses them. Vague descriptions produce erratic tool use.
- Be specific about what the tool does AND doesn't do — "Returns contracts by vendor name. Does not support partial matches."
- Include examples in the description — the model pattern-matches on them.
- Return structured, minimal data — don't dump 50-field DB rows into context. Project only what the model needs.
A tool with a bad description is worse than no tool. The model will call it at the wrong time and blame you.
Agentic Workflows in Production
Our procurement assistant uses four tools: query_contracts, get_vendor_info, create_approval_request, and send_notification. A single user request can trigger a 6-step chain: query, check vendor, assess risk, draft approval, notify stakeholders, log audit trail.
Key learnings from running this in production:
- Human-in-the-loop checkpoints for any write operation. The model drafts; a human approves before
create_approval_requestfires. - Idempotent tools — agents retry on failure. If
create_approval_requestisn't idempotent, you get duplicate approvals. - Structured logging per tool call — when something goes wrong you need to trace exactly what the model did and why.
Latency
Multi-step agentic workflows are inherently slower than single-shot queries. Our 6-step workflow runs in 3-8 seconds. Acceptable for async tasks (drafting an approval); not acceptable for real-time queries.
The rule: use agents for tasks where accuracy and completeness matter more than speed. Use direct LLM calls for conversational, low-stakes interactions.
MCP is still early. The tooling is good, the ecosystem is growing fast, and the pattern of "LLM + structured tool access" is genuinely the right abstraction. The hard part isn't the protocol — it's designing tools that make your agent reliably useful rather than impressively unpredictable.