
A lot of agent systems still talk about MCP as if adding a new server were just another integration step. Point the agent at a different server, let it discover one more tool, and assume the role surface is effectively unchanged.
That is the wrong shape for production AI engineering. A live MCP server publishes executable capability. If a candidate profile adds a destructive tool or changes the required arguments for an existing tool, that is a contract change, not just a wiring detail.
In this issue, we build a local C# gate that launches a real MCP server over STDIO, discovers both a baseline and a candidate profile through the MCP client, compares their live tool contracts, probes the required tools through real MCP calls, and deterministically decides whether the candidate should be promoted or rolled back.
What You Are Building
You are building a production-shaped MCP promotion workflow that keeps the live tool contract explicit:
- Load runtime configuration from
appsettings.jsonandMCPGATE_environment overrides - Launch a real MCP server profile over STDIO using
dotnet <server.dll> baselineorcandidate - Discover the live MCP tools with
ListToolsAsync() - Extract actual tool names, input schemas, and MCP annotations from the protocol response
- Diff baseline and candidate profiles for added tools, removed tools, schema breakage, read-only hint regressions, and newly added destructive tools
- Replay frozen role cases by calling the live MCP tools with structured arguments
- Fail when forbidden tools leak into the role surface or when a required MCP schema no longer matches
- Apply deterministic promotion rules to decide
Promote,Hold, orRollback - Persist the full report as JSON for later review
This is the control layer that becomes necessary once MCP servers can evolve independently of the agent prompt and application code.
System Structure
The architecture is intentionally small. One MCP server host exposes either a baseline or candidate tool profile. The gate client launches that server twice over STDIO, performs live MCP discovery against both profiles, compares the discovered tool contracts, replays frozen role probes through real MCP calls, applies the promotion gate, and saves a report.
The diagram below shows the high-level control flow:
Runtime Configuration First
The app starts by loading the gate profile before any MCP session is opened:
{
"Experiment": {
"ServerCommand": "dotnet",
"BaselineProfile": "baseline",
"CandidateProfile": "candidate",
"DatasetPath": "data/mcp_tool_gate_eval.json",
"ReportDirectory": "data/reports"
},
"Promotion": {
"BlockOnRemovedTools": true,
"BlockOnSchemaBreakage": true,
"BlockOnReadOnlyHintRegression": true,
"BlockOnNewDestructiveTools": true,
"MaxNewDestructiveTools": 0,
"MaxBrokenCases": 0
}
}This matters because the promotion boundary is operational. Which server command to launch, which profile is baseline, which profile is candidate, which dataset defines the role contract, and which changes trigger rollback are visible system controls rather than hidden release assumptions.
The Server Is Real MCP
This project does not fake tool discovery. The server is a real MCP host using STDIO transport:
var toolTypes = profile switch
{
"baseline" => new[] { typeof(BaselineSupportOpsTools) },
"candidate" => new[] { typeof(CandidateSupportOpsTools) },
_ => throw new InvalidOperationException($"Unknown MCP server profile '{profile}'.")
};
builder.Services
.AddMcpServer()
.WithStdioServerTransport()
.WithTools(toolTypes);That matters because the gate is reading live MCP protocol output, not a hand-maintained mirror of what the server was supposed to expose.
Live Tool Discovery Defines the Contract
The client launches the server and discovers the tool surface through MCP:
var transport = new StdioClientTransport(
new StdioClientTransportOptions
{
Name = $"Support Ops MCP Server ({profile})",
Command = serverCommand,
Arguments = [serverDllPath, profile]
},
loggerFactory);
var client = await McpClientFactory.CreateAsync(
transport,
new McpClientOptions
{
ClientInfo = new() { Name = "McpToolContractGates", Version = "1.0.0" }
},
loggerFactory);
var tools = await client.ListToolsAsync();From each discovered tool, the gate extracts:
- the live MCP tool name
- the live JSON input schema
- the required arguments derived from that schema
- the MCP annotations such as
ReadOnlyHintandDestructiveHint
That means the contract comes from the server the agent would actually talk to, not from a separate documentation layer.
Candidate Drift Is Visible in the Live Surface
The candidate profile intentionally introduces two changes that should block promotion:
[McpServerTool(Name = "Incident.Declare", Destructive = true)]
public static string DeclareIncident(string serviceName, string summary) =>
$"Incident declared for {serviceName}: {summary}.";
[McpServerTool(Name = "Deployments.Rollback", Destructive = true)]
public static string RollbackDeployment(string serviceName, string environment, string releaseId) =>
$"Rollback started for {serviceName} in {environment} release {releaseId}.";The first change breaks the baseline schema by dropping the required severity argument. The second change exposes a destructive deployment rollback tool to a support role that should not have it at all.
Diffing Live MCP Schemas
The comparison layer works directly against the discovered tool contracts:
var missingRequiredArguments = baselineTool.RequiredArguments
.Except(candidateTool.RequiredArguments, StringComparer.Ordinal)
.ToArray();
if (missingRequiredArguments.Length > 0)
{
schemaBreakages.Add(new ToolContractChange(
baselineTool.Name,
string.Join(", ", baselineTool.RequiredArguments),
string.Join(", ", candidateTool.RequiredArguments)));
}
if (baselineTool.ReadOnlyHint && !candidateTool.ReadOnlyHint)
{
readOnlyHintRegressions.Add(new ToolContractChange(
baselineTool.Name,
"ReadOnlyHint=true",
"ReadOnlyHint=false"));
}This is where the system stops treating MCP as a vague interoperability label and starts treating it as a real contract boundary. The tool name, schema, and annotations are all part of what the client is allowed to rely on.
Frozen Role Cases Probe the Live Tools
The frozen dataset does not just name tools. It also includes the structured arguments used to probe those tools through real MCP calls.
{
"id": "TG-003",
"userTask": "Declare a P1 checkout incident with explicit severity.",
"requiredTools": [
{
"toolName": "Incident.Declare",
"requiredArguments": [
"serviceName",
"severity",
"summary"
],
"requireReadOnlyHint": false,
"requireDestructiveHint": true,
"probeArguments": {
"serviceName": "checkout-api",
"severity": "P1",
"summary": "error rate is rising"
},
"expectedOutputContains": "severity P1"
}
],
"forbiddenTools": [
"Deployments.Rollback"
]
}The probe layer then calls the real MCP tool:
var response = await tool.CallAsync(
arguments.ToDictionary(pair => pair.Key, pair => (object?)pair.Value),
cancellationToken: cancellationToken);
var output = string.Join("
", response.Content.Select(content => content.Text));That is the key difference from a static diff. The gate is not only inspecting what the tool surface claims to be. It is also exercising the live tools that the agent would actually call.
Promotion Gate Is Explicit
The promotion policy stays small and inspectable:
if (config.BlockOnSchemaBreakage && diff.SchemaBreakages.Count > 0)
{
reasons.Add("Candidate broke required MCP input schemas on baseline tools.");
}
if (config.BlockOnNewDestructiveTools && diff.AddedDestructiveTools.Count > config.MaxNewDestructiveTools)
{
reasons.Add("Candidate introduced new destructive MCP tools into the role surface.");
}
if (candidateSummary.BrokenCaseCount > config.MaxBrokenCases)
{
reasons.Add($"Candidate failed {candidateSummary.BrokenCaseCount} frozen role cases.");
}
if (reasons.Count > 0)
{
return new GateRecommendation(GateDecision.Rollback, reasons);
}The gate is boring on purpose. That is a strength. You can test it, inspect it, and explain exactly why a candidate MCP profile was blocked.
Walking a Real Live Run
A deterministic local run at 2026-05-29 23:16 UTC produced the following output:
MCP Tool Contract Gates
Baseline profile: baseline
Candidate profile: candidate
Transport: stdio
Dataset: 3 frozen role cases
baseline
- Discovered tools: 5
- Pass rate: 100%
- Cases passed: 3/3
candidate
- Discovered tools: 6
- Pass rate: 0%
- Cases passed: 0/3
Live MCP diff
- Added tools: 1
- Removed tools: 0
- Schema breakages: 1
- Read-only hint regressions: 0
- Added destructive tools: 1
candidate sample failures:
- TG-001: Forbidden tool Deployments.Rollback is exposed to the role surface.
- TG-002: Forbidden tool Deployments.Rollback is exposed to the role surface.
- TG-003: Tool Incident.Declare is missing required arguments: severity.; Forbidden tool Deployments.Rollback is exposed to the role surface.
Decision: Rollback
- Candidate broke required MCP input schemas on baseline tools.
- Candidate introduced new destructive MCP tools into the role surface.
- Candidate failed 3 frozen role cases.
Report: McpToolContractGates\bin\Debug\net10.0\data\reports\20260529T231611Z-tool-gate-baseline-to-candidate.jsonHow to interpret this:
baselinepassed every live role probe, so the current MCP surface still fits the intended support rolecandidatediscovered one extra tool, which already changed the live role surface before any model reasoning happened- The diff layer found one real schema break and one newly added destructive tool from the actual MCP discovery results
- The probe layer confirmed the operational consequence: the candidate profile both leaks a forbidden rollback tool and no longer satisfies the baseline incident schema
- The rollback happened for deterministic contract reasons, not because the candidate merely looked riskier
Why This Architecture Works
The gate works because the MCP server and the agent reasoning layer are treated as different responsibilities:
- The MCP server publishes the tool surface
- The discovery layer records what that surface actually is right now
- The diff layer catches structural drift in tool names, schemas, and hints
- The probe layer exercises the tools the agent would actually call
- The promotion gate converts those findings into a small explicit decision
- The saved report keeps the whole decision inspectable after the run ends
That is the real boundary here. The probabilistic layer may choose tools inside the allowed surface. The deterministic layer owns whether that live MCP surface is promotable in the first place.
Potential Enhancements
To extend this project further, you can consider:
- Add streamable HTTP MCP transport in addition to STDIO so the same gate can evaluate remote server deployments
- Split frozen role probes by agent role such as support, finance, and incident commander
- Persist longitudinal MCP diff history so you can detect tool-surface drift over time
- Add policy exceptions for intentionally approved tool additions while keeping the rest of the surface gated
- Extend the sample to compare multi-server MCP bundles instead of one profile at a time
Final Notes
MCP makes it easier to compose agents with tool servers, but it also makes tool drift a first-class operational risk.
If the tool surface is part of the system contract, then the server has to be tested and gated as a server, not just described in docs or assumed from configuration.
Explore the source code at the GitHub repository.
See you in the next issue.
Stay curious.
Join the Newsletter
Subscribe for AI engineering insights, system design strategies, and workflow tips.