Kubernetes-MCP-Guard Server

AI-safe approval plan gated Kubernetes operations through MCP with OAuth, RBAC, audit, guardrails.

Documentation

🛡️ Kubernetes MCP Guard

Human-approved, AI-driven Kubernetes remediation through a guarded MCP gateway.

Remediation, by design:

Observer detects anomalies.
Planner proposes an evidence-backed plan.
Human reviewer approves out-of-band.
Executor runs only the approved digest-bound plan.
Everything is auditable.


Unit Tests Integration Tests Docker Quality Gate Status Coverage Badge Hi Mom

.NET 10 Kubernetes Docker MCP AI Agents

📝 TL;DR

When something breaks, the system can collect evidence, propose a bounded fix, dry-run it, package it into a reviewable plan, and wait for a human to approve.

It is a security-first bridge between AI agents and Kubernetes, with out-of-band, OAuth-authenticated, human-in-the-loop (HITL), plan-based approval for every gateway-exposed mutation.

Why?

AI agents can help diagnose infrastructure problems, but giving them direct mutation access is risky. Kubernetes MCP Guard explores a safer pattern: agents may observe, dry-run, and propose bounded remediations, while humans approve the exact digest-bound plan before any Kubernetes write occurs.

🎬 Demo

https://github.com/user-attachments/assets/4e06b4ee-db80-4d74-96cc-38dfbb413042

[!NOTE] Demo scenario:

  1. A Deployment is intentionally broken.
  2. The Observer detects the unhealthy workload.
  3. The Planner proposes a bounded remediation.
  4. An approval access code is sent to the configured operator by email.
  5. An authenticated human approves the exact plan in the browser.
  6. The Executor applies the approved mutation.

The walkthrough in docs/demo-failing-deployment.md shows the full flow against a deliberately broken Deployment.

🧠 Core Ideas

Kubernetes MCP Guard explores a practical safety pattern for AI-assisted operations:

  • Plan before mutate: every gateway-exposed write starts as a request_* plan built from Kubernetes server-side dry-run evidence.
  • Separate review channel: the MCP client receives an approval URL, while approval happens through /approvals/* in a browser OAuth session.
  • Digest-bound approval: execution is bound to an Intent Digest for the executable mutation and a Review Digest for the human-reviewed snapshot.
  • Durable grant model: an approved Approval Challenge records a Challenge Outcome and issues an Approval Grant consumed by pre-execution gates.
  • Narrow Kubernetes scope: namespace allow-lists, namespace-scoped RBAC, supported-kind checks, and bounded read tools keep the operational surface small.
  • Auditable controls: guardrail and approval events are written as JSONL streams with identity, digest, grant, and execution context.
  • Structured multi-agent coordination: Observer, Planner, and Executor are independent processes (agents) that communicate over the A2A protocol (via a2a-dotnet), each with a separate OAuth service identity and a narrow gateway scope. The Planner owns a durable per-anomaly Task that persists across restarts and enforces one-remediation-per-anomaly without cross-service locking.

The repository also separates the generic approval lifecycle from the Kubernetes adapter, so the core language is not tied to one infrastructure domain.

See CONTEXT.md, docs/mutation-approval-profile.md, docs/mutation-approval-flow.md.

🗺️ Architecture

---
title: Security Boundaries
---
flowchart TB
    subgraph outer["🌐 Internet / Operator"]
        Human["👤 Operator\nbrowser · OAuth PKCE"]
        McpClient["🤖 MCP Client\nCodex · Claude Code"]
    end

    subgraph gateway["🛡️ Gateway —  OAuth JWT required"]
        direction LR
        Guard["🔍 Guardrails\n+ ToolScopeGuard"]
        ApprovalUI["📋 Approval UI\n/approvals/*"]
        ApprovalCore["🔐 Approval Core\nplan · challenge · grant · digest"]
    end

    subgraph agents["🤖 Agent Tier  —  client_credentials · narrow scopes"]
        direction LR
        Obs["🔎 Observer\nmcp:tools.readonly"]
        Plan["📋 Planner\nmcp:tools.propose + readonly"]
        Exec["🛠️ Executor\nmcp:tools.execute"]
        Obs <-->|"A2A"| Plan <-->|"A2A"| Exec
    end

    subgraph private["🔒 Private Subprocess  —  no public port"]
        McpServer["⚙️ McpServer\nKubernetes tools"]
    end

    K8s(("☸️ Kubernetes API\n(namespace-scoped RBAC)"))

    Human -->|"review snapshot · approve/deny"| ApprovalUI --> ApprovalCore
    McpClient -->|"Bearer JWT · mcp:tools scope"| Guard -->|"scope-filtered tool call"| ApprovalCore
    agents -->|"Bearer JWT · service identity"| Guard
    ApprovalCore -->|"stdio · service token"| McpServer -->|"KubernetesClient"| K8s

The Observer notifies the Planner and the Planner dispatches to the Executor synchronously and waits for the outcome.

The Planner's internal remediation pipeline is a concurrent DAG built on Microsoft.Agents.AI.Workflows, fanning each incoming anomaly through independent: Filter → Dedupe → LLM-Decide → Validate → Propose executor chains.

Full request-flow diagrams live in docs/architecture.md.

🔐 Approval Flow

The central safety property is that approval is necessary but not sufficient. A human approval creates execution authorization, but execution still has to pass the pre-execution gates immediately before Kubernetes is mutated.

PhaseWhat happensWhat can block it
PlanA human-driven MCP client calls request_*, or the Planner calls propose_plan; the Kubernetes adapter gathers dry-run, diff, and policy evidence; the generic core stores a Plan Envelope with Intent and Review Digests.Namespace rejection, manifest allow-list rejection, dry-run failure, domain policy denial, unsupported legacy plan format.
ApproveThe client calls execute_approved_plan; the gateway creates or reuses a short-lived Approval Challenge and returns a browser URL. The browser renders the stored review snapshot, not model-supplied approval text.Expired challenge, wrong authenticated subject, anti-forgery failure, changed digest binding, denied/rejected/canceled Challenge Outcome.
ExecuteAfter approval, the client retries execute_approved_plan; the gateway validates the Approval Grant, digests, validity window, reuse policy, freshness checks, and domain policy checks before the adapter writes.Missing/expired/mismatched grant, digest mismatch, already-applied Single-Execution Plan, second dry-run failure, policy failure, live-state drift.

Current implementation notes are tracked in docs/mutation-approval-profile.md#current-repository-fit.

🧰 Current Capabilities

🤖🔎 Anomaly Observer

The InfraGate.Observer is an LLM-driven agent that periodically inspects the cluster through the gateway's read-only tools and emits structured Anomaly Reports.

CapabilityDescription
Scheduled observationBackground IHostedService runs cycles on a configurable cadence (default 60s).
On-demand triggerPOST /observe-now returns a synchronous AnomalyReport[] with a 30s timeout.
Anomaly detectionLLM-assisted classification across four categories: Pod unhealthy, Deployment unavailable, Service no endpoints, Warning events.
Severity classificationRules-derived High/Medium/Low with LLM disagreement telemetry.
Deduplication & resolutionIn-memory dedupe window suppresses repeat reports; automatic Resolved emission when anomalies clear.
HandoffLog sink always on; JSON file sink and Planner A2A handoff are opt-in; see docs/configuration.md.

🤖📋 Remediation Planner

The InfraGate.Planner consumes Anomaly Reports, chooses a bounded remediation operation, and creates approval-pending Operator Approval Policy plans through propose_plan.

CapabilityDescription
Anomaly intakeReceives AnomalyHandoffBatch payloads from the Observer over A2A; each anomaly is processed independently through a concurrent DAG pipeline: Filter → Dedupe → LLM-Decide → Validate → Propose.
Operation menuChooses only restart_deployment, scale_deployment, or set_deployment_image in v1.
Plan proposalCalls propose_plan to create a digest-bound Plan Envelope for operator approval.
Approval notificationpropose_plan creates an Approval Access Code and sends the configured operator email through the gateway SMTP sender when configured.
Durable task lifecycleOne A2A Task per anomaly (keyed by contextId) tracks state from Submitted through Working, AuthRequired (awaiting operator approval), to Completed/Failed/Rejected. Persisted to PostgreSQL when InfraGate__Planner__AuditConnectionString is set; otherwise in-memory.
Scope boundaryPlanner can propose plans and use read-only inspection tools; it cannot execute plans.

🤖🛠️ Remediation Executor

The InfraGate.Executor consumes Planner proposals, waits for approval, and executes only after the gateway reports that an Approval Grant exists.

CapabilityDescription
Proposal intakeReceives plan ids from the Planner over synchronous A2A dispatch.
Approval waitCalls wait_for_plan_approval for each plan id until approval, timeout, or terminal status.
Approved executionCalls execute_approved_plan only after approval is reported.
Scope boundaryExecutor can wait and execute approved plans; it cannot create plans or call read-only inspection tools.
Gateway gatesThe gateway still enforces approval grants, digests, freshness, policy checks, and single execution.

🛡️ Gateway Protections

LayerCurrent behavior
MCP transportHTTP MCP endpoint at /mcp using Streamable HTTP.
AuthenticationOAuth JWT validation for MCP calls; browser OAuth cookie for approval pages.
OAuth discoveryProtected-resource metadata and insufficient-scope challenges for MCP clients.
Approval authorityBrowser approval endpoints under /approvals/* with same-subject binding and anti-forgery checks.
GuardrailsWarn on suspicious request patterns and redact suspicious response content before it returns to the MCP client.
AuditSeparate JSONL streams for guardrail findings and approval lifecycle events.

🔎 Read-Only Observability

ToolPurpose
get_allowed_namespacesReturn the namespace allow-list configured for the server.
get_k8s_statusSummarize Deployments, Services, ConfigMaps, Pods, and ReplicaSets in a namespace.
get_k8s_eventsRead bounded events.k8s.io/v1 diagnostics.
get_pod_logsRead bounded Pod logs with tail-line and byte caps.
get_k8s_resourceReturn a focused resource summary without Secret values, ConfigMap data, or raw manifests.
get_deployment_diagnosticsInspect Deployment health, related Pods, ReplicaSets, and Events.
get_pod_diagnosticsInspect Pod status, conditions, container state, and Events.
get_service_diagnosticsInspect Service endpoints, backing Pods, and Events.

✅ Gateway Approval Tools

ToolPurpose
request_apply_manifestDry-run and plan server-side apply for Deployment, Service, or ConfigMap.
request_delete_manifestDry-run and plan deletion for supported manifest kinds.
request_scale_deploymentDry-run and plan a Deployment replica-count change.
request_restart_deploymentDry-run and plan a Deployment rollout restart.
request_set_deployment_imageDry-run and plan a Deployment container image update.
propose_planCreate an approval-pending Operator Approval Policy plan for the autonomous Planner operation menu.
execute_approved_planCreate the browser approval challenge or execute an approved, digest-bound plan after gates pass.
get_plan_statusRead the current approval status for a plan.
wait_for_plan_approvalWait briefly for an out-of-band browser approval and return status JSON without applying the plan.

Direct Kubernetes mutation tools exist inside the private server surface for the adapter executor. The HTTP gateway exposes request_* wrappers plus execute_approved_plan instead of exposing raw destructive tools to MCP clients.

⚡ Quick Start

Prerequisites: Docker Compose v2, kubectl, minikube, and git.

Review docs/configuration.md before changing runtime settings.

📦 From Packages

The default quickstart uses published images and committed local-demo defaults.

git clone https://github.com/mirusser/Kubernetes-MCP-Guard.git
cd Kubernetes-MCP-Guard

export InfraGate__OpenRouter__ApiKey="<openrouter-api-key>"
make quickstart

make quickstart starts the local Keycloak-backed OAuth path, PostgreSQL approval store, and published gateway image with TAG=latest. Pin a release with TAG=v0.1.0 make quickstart. The committed no-SDK defaults come from the smoke-release Run Profile: deploy/local-oauth/release.env.example supplies both Compose interpolation and InfraGate__... runtime settings.

🛠️ From Source

Use source mode when you want the gateway, Observer, Planner, and Executor built from local code. This path also requires the .NET 10 SDK and an OpenRouter API key for the LLM-backed agents.

export InfraGate__OpenRouter__ApiKey="<openrouter-api-key>"
make quickstart-source

The source quickstart generates deploy/generated/local-compose.env (default configuration) from deploy/run-profiles.yaml and starts the gateway, Observer, Planner, and Executor from local source builds.

Useful follow-up commands:

make quickstart-logs
make quickstart-down

Other run modes and full setup details are in docs/setup-guide.md.

⌨️ Connect Codex CLI

Add this to ~/.codex/config.toml:

[mcp_servers.infra-gate]
url = "http://127.0.0.1:3001/mcp"
oauth_resource = "http://127.0.0.1:3001/mcp"
scopes = ["mcp:tools"]

Then authenticate and start Codex:

codex mcp login infra-gate
codex

💬 Connect Claude Code

claude mcp add-json --scope user infra-gate \
  '{"type":"http","url":"http://127.0.0.1:3001/mcp","oauth":{"scopes":"mcp:tools"}}'

claude
/mcp

📦 Container Images

Release images are built by the Docker workflow and published to GHCR and Docker Hub.

RegistryGateway image
GitHub Container Registryghcr.io/mirusser/kubernetes-mcp-guard-gateway:<tag>
Docker Hubmirusser/kubernetes-mcp-guard-gateway:<tag>

Use specific release tags for stable demos. The :dev tag tracks the development branch, and :latest tracks the most recent stable release.

🧩 Compatibility

AreaSupported / tested
.NET.NET 10
Kubernetesminikube / local cluster initially
MCP transportHTTP MCP endpoint at /mcp
OIDCKeycloak local/dev path; external OIDC providers by configuration
Container registriesGHCR, Docker Hub
Platformslinux/amd64 initially

🧭 Project Map

⚖️ Boundaries And Non-Goals

[!IMPORTANT]

  • The project is experimental and not production-certified.
  • The local Keycloak realm runs in development mode over HTTP and is not a production identity provider.
  • Prompt-injection guardrails are defense-in-depth, not a guaranteed hard security boundary.
  • The tool surface does not expose shell execution, kubectl passthrough, exec, attach, port-forward, namespace creation, RBAC manipulation, Secret reads, raw manifest reads, or cluster-scoped writes.
  • This is not a full Kubernetes policy engine and not an MCP standard.

See docs/security-model.md for the full threat model.

It is a working reference implementation for a possible MCP mutation-approval profile, designed for early technical evaluation in local or tightly controlled environments, not production-certified infrastructure.

The codebase uses InfraGate as the internal project name.

📜 Governance


Built with ❤️, ☕ and careful little guardrails 🛡️✨