HomeExpertiseMCP Server Development

    Technical guide from CloudNSite engineering

    MCP Server Development for Production AI Agents

    CloudNSite designs, builds, and operates Model Context Protocol servers that expose your internal tools, data sources, and approval workflows to Claude, GPT-5.5, Cursor, and custom LLM hosts. Streamable HTTP transport, OAuth 2.1, scoped tool surfaces, and an evaluation harness before any production call. Most servers ship in 3 to 6 weeks with ongoing tuning and monitoring.

    System diagram

    MCP server architecture engineering plate showing clients (App, Agent, IDE, CLI), transport (HTTP, SSE, WebSocket), authorization (API Key, OAuth, JWT), capability negotiation (Tools, Resources, Prompts), observability (Tracing, Metrics, Health), and the central logging, metrics, and audit boundary.
    MCP Server Architecture

    Direct answer

    A Model Context Protocol (MCP) server is a process that exposes tools, resources, and prompts to LLM clients through a standardized JSON-RPC interface. It lets one server work with Claude, GPT, Cursor, and custom hosts without rewriting glue code. Production MCP servers add OAuth 2.1, scoped permissions, structured error envelopes, and evaluation before any tool reaches a model.

    Key definitions

    MCP host
    The LLM application that consumes one or more MCP servers. Claude Desktop, Cursor, and custom agent apps are all hosts.
    MCP server
    The process that exposes capabilities (tools, resources, prompts, tasks) to a host over JSON-RPC.
    Streamable HTTP
    The current MCP remote transport per the 2025-11-25 spec. A single /mcp endpoint accepts POST requests and may upgrade to a Server-Sent Events stream for server-initiated messages.
    Tool
    A model-callable function with a JSON Schema input, structured response, and an idempotency contract. The server validates inputs and authorizes per call.
    Resource
    Addressable read-only data exposed via URI templates. Hosts and models can list, read, and optionally subscribe to changes.

    Anatomy of a production MCP server

    A production MCP server has six layers that each get reviewed independently before shipping. CloudNSite uses the same skeleton across every engagement, with the tool and resource surface scoped to the actual workflow rather than a wide open API mirror.

    • Transport layer

      Streamable HTTP from a single /mcp endpoint with POST plus an optional SSE stream. Strict Origin validation and DNS rebinding protection are enforced for any server reachable from a local host.

    • Authorization layer

      OAuth 2.1 with PKCE for remote servers, bearer tokens scoped per tool group, and mTLS or VPN-only deployment for regulated environments.

    • Capability negotiation

      Initialize handshake declaring logging, prompts, resources, tools, and (in the 2025-11-25 spec) tasks support, with the protocol version pinned to the spec date the server was built against.

    • Tool surface

      Each tool ships with a JSON Schema input, an idempotency contract, a side-effect classification, and a structured error envelope. Each server is scoped to one workflow, not a generic API mirror.

    • Resource layer

      URI-templated read endpoints with pagination, change subscriptions where the underlying system supports them, and per-call authorization enforced server-side.

    • Observability and evaluation

      OpenTelemetry traces per JSON-RPC call, request and response logging with secrets redacted, plus an evaluation harness that exercises every tool against fixtures before any client connection.

    When to use this

    • You need the same toolset to work across Claude, GPT, Cursor, and your own agent host without rewriting glue per client.
    • You are exposing internal systems (CRMs, ticketing, databases, billing, EHRs) to an LLM and need scoped permissions, structured errors, and audit logs.
    • You want one place to update tool behavior rather than redeploying every agent that calls it.
    • You need server-side authorization checks per tool call because the data is regulated or multi-tenant.
    • You want to ship resources and prompts alongside tools, with change subscriptions for live data.

    When not to use this

    • You only have one LLM client and a couple of internal API calls. A direct tool-use SDK is simpler than running an MCP server.
    • The work is a single rule-based workflow with no LLM in the loop. An orchestrator like Temporal or n8n is the right shape.
    • Your data sits behind a single SaaS that already ships an official MCP server. Use theirs rather than rebuild it.
    • The integration is one-shot batch processing without a chat or agent surface. A scheduled job is cheaper to operate.

    How CloudNSite implements it

    1. 1

      Scope the tool surface

      Interview the workflow owners, list the model-callable actions, and prune to the smallest set that finishes the workflow. Generic API mirrors blow up context budgets and confuse models.

    2. 2

      Design the auth and transport

      Choose Streamable HTTP for remote servers, stdio for local desktop integration. Pin OAuth 2.1 with PKCE, define scope groups per tool category, and lock down Origin and DNS rebinding posture before anything is exposed to a host.

    3. 3

      Build tools and resources against fixtures

      Every tool gets a JSON Schema, a structured error envelope, and a fixture-driven test before it ever sees a real model. Resources get URI templates and pagination contracts written before implementation.

    4. 4

      Wire evaluation and observability

      Connect OpenTelemetry traces, structured logs with secrets redacted, and an evaluation harness that scores tool calls on accuracy, refusals, and side-effect correctness. Regressions block deploys.

    5. 5

      Roll out behind capability negotiation

      Ship to one host first, watch the JSON-RPC traffic, then expand to additional MCP clients once tool behavior is stable. CloudNSite continues to tune and operate the server after launch.

    Tools and standards we use

    Protocol spec

    Model Context Protocol 2025-11-25

    Current normative reference. Defines Streamable HTTP transport, the tasks capability, and the authorization profile we pin to.

    Wire protocol

    JSON-RPC 2.0

    Every MCP request, response, and notification rides this contract.

    Input validation

    JSON Schema

    Required for every tool input definition. Validated server-side before any tool body runs.

    Authorization

    OAuth 2.1 with PKCE

    Standard for remote MCP servers per the MCP authorization profile. We add per-tool scopes for regulated environments.

    Observability

    OpenTelemetry

    Distributed tracing across host, server, and downstream systems on every engagement we operate.

    Reference SDKs

    @modelcontextprotocol/sdk (TypeScript) and mcp (Python)

    Official SDKs we extend rather than fork, pinned to the latest spec version.

    From the field

    Internal agentic RAG with scoped MCP tools

    Our agentic RAG connectors case study describes a CloudNSite-built MCP server that exposes a vector store, a connector index, and approval-gated write tools to a Claude-based agent. The same server backs an internal tooling host and a customer-facing agent without changes to the tool layer.

    Read the full case study

    Frequently asked questions

    Is MCP a standard or a product?

    MCP is an open protocol maintained at modelcontextprotocol.io. The current spec version is 2025-11-25. The protocol is implementation-agnostic, so a single MCP server works with any compliant client including Claude, GPT-5.5, Cursor, and custom agent hosts.

    How is an MCP server different from a REST API?

    A REST API is consumed by application code. An MCP server is consumed by an LLM through a host process, with capability negotiation, JSON Schema input validation, structured error envelopes, and capability discovery built into the protocol. You can wrap a REST API in an MCP server, but the framing, scope, and contracts are different.

    When should we build a custom MCP server instead of using an existing one?

    Build custom when the workflow touches private systems, regulated data, multi-tenant authorization, or internal approvals that no off-the-shelf server covers. Use an official server when the integration is to a single SaaS that already ships one and the off-the-shelf scope matches what you need.

    Which transport should we use: Streamable HTTP, stdio, or HTTP+SSE?

    Streamable HTTP is the current remote transport in the 2025-11-25 spec. Use stdio for local desktop integrations where the host launches the server as a subprocess. The older HTTP+SSE transport pair is deprecated and we do not ship it in new builds.

    How does authentication work for production MCP servers?

    Remote MCP servers use OAuth 2.1 with PKCE per the MCP authorization profile. Bearer tokens are sent in the Authorization header and validated server-side on every JSON-RPC call. For regulated environments we add per-tool scope checks, IP allowlists, and where required mTLS or VPN-only deployment.

    How do we keep an MCP server from blowing out the model's context budget?

    Cap the tool count per server to the workflow scope, paginate resource reads, return structured summaries instead of full payloads, and put detail behind a follow-up tool call. CloudNSite reviews token-budget behavior in evaluation before any production traffic.

    Who maintains the MCP server after launch?

    CloudNSite. We build the server, operate it inside your infrastructure or ours per the engagement contract, monitor JSON-RPC traffic and tool accuracy, and ship updates as the spec evolves and your workflow changes.

    Can a single MCP server expose tools to multiple agent hosts at once?

    Yes. That is the point of the protocol. Once capability negotiation and authorization scopes are correctly modeled, the same server can back a Claude Desktop host, a GPT-5.5 agent, an internal app, and a Cursor IDE integration without per-client code.