What Is an MCP Server? Definition, Architecture, and When to Build One

A Model Context Protocol (MCP) server is a small service that exposes tools, prompts, and resources to AI clients (Claude, Cursor, ChatGPT desktop, internal agents) through a standard JSON-RPC interface. Instead of writing one-off integrations for every model and every assistant, you publish one MCP server and any compliant client can call it.

That is the practical definition. The rest of this post is the architecture, the transport, the capability surface, and when an MCP server is the right call versus a plain HTTP API.

What problem MCP actually solves

Every team that builds AI features ends up writing the same plumbing. The model needs to read a file, query a database, call an internal API, fetch a customer record, or trigger a workflow. Each of those becomes a custom tool implementation per assistant: one for Claude desktop, another for Cursor, another for the in-house agent, another for ChatGPT. The contracts drift. The auth drifts. The error handling drifts.

MCP collapses that into one shape. The server defines tools, prompts, and resources once. Any compliant client discovers and invokes them through the same protocol. The integration cost goes from N times M (clients times tools) to N plus M.

The three primitives

MCP servers expose three kinds of capabilities.

Tools are the action surface. A tool has a name, a JSON Schema describing its inputs, and an implementation that runs server-side. When a model decides to call a tool, the client sends a tools/call request, the server runs the action, and the result comes back as structured content. Tools are the only thing the model can invoke directly.

Resources are read-only data the model can pull into context. A resource has a URI, a MIME type, and content. The model can list resources, subscribe to changes, and read specific URIs. Resources are good for things like file contents, database records, or anything else the model needs to read but should not modify.

Prompts are reusable templates the user (not the model) invokes. They show up as slash commands or menu items in the client. A prompt can take arguments, expand into a templated message, and bundle pre-fetched resources. Prompts are how you ship a "do this common workflow" experience without writing UI.

The current spec also adds a tasks capability for long-running work, where a tool kicks off an async task and the client polls or subscribes for status.

The transport

The current MCP transport is Streamable HTTP. The client opens an HTTP POST to a single endpoint and either gets a single JSON response or upgrades to a streaming response (server-sent events) for long-lived sessions. Older specs used HTTP plus a separate SSE channel. That pattern is deprecated. Streamable HTTP is the one to build against.

Authentication is OAuth 2.1 with PKCE. The client presents a bearer token on every request. The server validates the token and resolves the caller identity before any tool runs. This matters because tools usually need to act as a specific user against a downstream system (a CRM, a database, a billing platform), and you want the identity binding to be explicit and audited.

What "compliant" means in practice

A compliant MCP server speaks JSON-RPC 2.0 over Streamable HTTP, advertises its capabilities at handshake time, validates tool inputs against the declared schemas, and returns structured errors. Capability advertisement matters because not every server needs to support every primitive. A server that only exposes tools should not pretend to support resources. Clients adapt their UI based on what the server advertises.

The newest spec versions also formalize logging, progress, sampling (where the server can request a model completion from the client), and elicitation (where the server can ask the user a question mid-tool-call). Most production servers do not need every capability; pick the ones that match your use case.

When to build an MCP server

An MCP server is the right shape when:

Multiple AI clients (Claude, ChatGPT, an internal agent, Cursor) need to hit the same backend.
You want a single audit log of every action AI took against your systems.
The tool surface is stable enough to publish a contract, but the consuming clients will keep changing.
You need per-user authorization at the tool boundary, not just at the API gateway.

An MCP server is the wrong shape when:

Only one assistant will ever call it. A direct integration is simpler.
The action is a one-time RPC with no auth, no state, and no observability needs.
You are still in throwaway-prototype territory. Build the prototype, then promote the stable tools into an MCP surface.

The architecture in one diagram

`` AI client (Claude, ChatGPT, agent) │ ▼ JSON-RPC 2.0 over Streamable HTTP, OAuth 2.1 bearer MCP server │ ├── tools → handlers that call your APIs / DB ├── resources → adapters that expose files, records, knowledge └── prompts → templates the user invokes from the client UI │ ▼ Your existing systems (CRM, billing, vector store, internal APIs) ``

The MCP server sits between the AI client and your existing backend. It is not a replacement for your APIs. It is a standardized, auditable, identity-bound entry point that lets any compliant assistant talk to those APIs.

Common build mistakes

A few things teams get wrong early.

Treating MCP as another REST API. The schema and the streaming behavior matter. Tool inputs should be validated against the declared JSON Schema before the handler runs. Errors should follow the structured error format, not be raw exception strings.

Skipping identity propagation. The bearer token represents a user. If a tool calls a downstream system as a service account instead of impersonating the user, you have just lost every per-user permission your backend enforces.

Exposing too many tools. Each tool is a surface the model can invoke. The fewer, sharper tools you publish, the better the model performs. Resist the urge to wrap every API endpoint.

Forgetting to handle long-running work. A tool that takes 90 seconds will time out on most clients. Use the tasks capability or break the work into a kickoff plus polling.

Where to go next

If you are deciding between an MCP server and a custom integration, the deeper write-up is in MCP vs API: when to use each. If you want a worked example of a production MCP server with auth, observability, and scoped tool surfaces, see the MCP server development pillar.

If you want CloudNSite to design and build the server, the MCP server development expertise page outlines how we approach scoping, transport, identity, and rollout. We build and operate the server. We do not hand over a tarball and hope it stays maintained.

What Is an MCP Server? A Practical Definition for Engineering Teams

What problem MCP actually solves

The three primitives

The transport

What "compliant" means in practice

When to build an MCP server

The architecture in one diagram

Common build mistakes

Where to go next

Need Help with Architecture?

Related Articles

MCP vs API: When to Use Each (and When You Need Both)

RAG Chatbot Architecture: What Production Actually Looks Like

Solutions for this work

Custom AI Agents

Private AI Deployment

Sales AI Automation

Consulting for this category

SaaS Consulting

Healthcare Consulting

Decision Guides

How to Switch from Manual Workflows to AI Agents

Alternatives to Generic Chatbots for Business Operations

Best AI Agents for Small Medical Practices