Assay
The firewall for MCP tool calls. Block unsafe calls, audit every decision, replay anything. Deterministic policy enforcement with replayable evidence bundles.
Assay
Evidence compiler for agent review artifacts
Portable evidence receipts, verifiable bundles, and bounded Trust Basis claims for agent systems.
See It Work · Promptfoo JSONL · OpenFeature · CycloneDX ML-BOM · Quick Start · CI Guide · Discussions
Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.
Start with the path that matches what you already have:
| You have | Use this when | What you get | Next click |
|---|---|---|---|
| Promptfoo JSONL from CI evals | You want smaller PR evidence than a full eval export | Eval outcome receipts, verified bundle, Trust Basis diff | Promptfoo JSONL |
OpenFeature boolean EvaluationDetails | You want CI evidence for a runtime flag decision boundary | Decision receipt, verified bundle, Trust Basis diff | OpenFeature EvaluationDetails |
| CycloneDX ML-BOM model component | You want CI evidence for the model inventory/provenance boundary that existed | Inventory receipt, verified bundle, Trust Basis diff | CycloneDX ML-BOM |
| MCP tool calls | You are ready to put a policy file around tool execution | Allow/deny audit trail and evidence for observed tool behavior | MCP Quick Start |
| A GitHub PR gate | You want CI to block regressions from checked artifacts | Trust Basis diff, gate status, SARIF/JUnit-ready output | CI Guide |
The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile trust-basis.json, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.
Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0
Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See What Assay is and is not for the boundary.
Is This For Me?
Yes, if you:
- already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
- want a CI review artifact instead of a dashboard-only result
- need bounded auditability, not a scalar trust badge
Not yet, if you:
- need Assay to judge model correctness or policy quality for you
- want a hosted dashboard as the primary product
- want a compliance claim instead of a bounded evidence boundary
Install
cargo install assay-cli
CI: GitHub Action. Python SDK: pip install assay-it.
No hosted backend. No API keys for core flows. Deterministic: same input, same decision.
Evidence levels and non-goals
Trust claims use explicit epistemology, not a single “safety score”:
| Level | Meaning |
|---|---|
verified | Backed by direct evidence or offline verification in the bundle/path |
self_reported | Emitted by the system without stronger independent corroboration |
inferred | Derived from bounded, documented rules |
absent | No trustworthy evidence supports the claim |
Assay does not ship a primary aggregate trust score or a safe/unsafe badge as the main output. See ADR-033.
What ships today
| Output | Role |
|---|---|
| Policy gate | MCP wrap — deterministic allow/deny before tools run (see CLI note below the diagram). |
| Evidence bundle | Offline-verifiable, tamper-evident archive for audit and replay. |
| External receipts | Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts. |
| Trust Basis | Canonical trust-basis.json — bounded claim classification from verified bundles. |
| Trust Card | trustcard.json / trustcard.md / trustcard.html — same claims, review-friendly artifacts. |
| SARIF / CI | GitHub Action, Security tab integration, policy gates on PRs. |
Repository truth: release notes and CHANGELOG.md remain the authority for what is actually public.
mainmay carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.
Agent ──► Assay ──► MCP Server
│
├─ ✅ ALLOW / ❌ DENY (policy)
├─► 📋 Evidence bundle (verifiable)
└─► 📊 Trust Basis → Trust Card → SARIF / CI
CLI: The
mcpcommand group is hidden from top-levelassay --helpwhile the surface stabilizes; it is supported. Useassay mcp --help,assay mcp wrap …, or follow the MCP Quickstart.
Wedge, not category. “MCP firewall” describes the control plane; trust compilation describes the outcome: reviewable claims backed by evidence. See ADR-033 and RFC-005.
See It Work
cargo install assay-cli
mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt
assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
-- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo
✅ ALLOW read_file path=/tmp/assay-demo/safe.txt reason=policy_allow
✅ ALLOW list_dir path=/tmp/assay-demo/ reason=policy_allow
❌ DENY read_file path=/tmp/outside-demo.txt reason=path_constraint_violation
❌ DENY exec cmd=ls reason=tool_denied
Inspect the audit artifact:
assay evidence show demo/fixtures/bundle.tar.gz
The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.
Trust artifacts from a verified bundle
After a bundle verifies, compile the claim artifact:
# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json
trust-basis.json is the canonical output for CI and review. Claim id values are stable across runs; consumers should key by id, not row count or order. It is not a scalar trust score.
The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean EvaluationDetails, and CycloneDX ML-BOM model components. See the receipt-family matrix, the three-family note, and Evidence Receipts in Action.
Trust Card details
assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html
The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; trustcard.json is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: MIGRATION — Trust Compiler 3.2, receipt-family matrix. Release history belongs in CHANGELOG.md.
Add to Cursor in 30 Seconds
Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:
assay mcp config-path cursor
It generates JSON like:
{
"filesystem-secure": {
"command": "assay",
"args": [
"mcp",
"wrap",
"--policy",
"/path/to/policy.yaml",
"--",
"npx",
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you"
]
}
}
The same wrapped command works in other MCP clients — see MCP Quick Start.
Policy Is Simple
version: "2.0"
name: "my-policy"
tools:
allow: ["read_file", "list_dir"]
deny: ["exec", "shell", "write_file"]
schemas:
read_file:
type: object
additionalProperties: false
properties:
path:
type: string
pattern: "^/app/.*"
minLength: 1
required: ["path"]
Legacy constraints: policies still work. Use assay policy migrate for the v2 JSON Schema form, or assay init --from-trace trace.jsonl to generate from observed behavior.
See Policy Files.
Other import paths and protocol adapters
OpenTelemetry in, canonical evidence out
Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports canonical evidence — OTel is a bridge, not the sole semantic authority.
assay trace ingest-otel \
--input otel-export.jsonl \
--db .eval/eval.db \
--out-trace traces/otel.v2.jsonl
Protocol adapters
Assay ships adapters that map protocol events into canonical evidence:
| Protocol | Adapter | What it maps |
|---|---|---|
| ACP (OpenAI/Stripe) | assay-adapter-acp | Checkout events, payment intents, tool calls |
| A2A (Google) | assay-adapter-a2a | Agent capabilities, task delegation, artifacts |
| UCP (Google/Shopify) | assay-adapter-ucp | Discover/buy/post-purchase state transitions |
Adapter crates are workspace / binary-driven, not published as separate crates.io packages.
Add to CI
# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
contents: read
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Rul1an/assay-action@v2
PRs that violate policy get blocked; SARIF can surface in the Security tab.
Why Assay
| Canonical evidence | Assay’s evidence model is the stable contract; OTel and adapters map into it. |
| Deterministic | Same input, same decision — not probabilistic. |
| Portable artifacts | Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit. |
| Bounded claims | Explicit about what is verified vs visible vs absent — no score-first UX. |
| MCP-native wedge | assay mcp wrap is the fast path (the mcp group is hidden from assay --help; use assay mcp --help). Adapters extend the same engine. |
| Offline-first | No backend required for core enforcement and bundle verification. |
Measured latency
On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:
- Main protection run:
0.771msp50 /1.913msp95 - Fast-path scenario:
0.345msp50 /1.145msp95
These are tool-decision timings, not end-to-end model latency. (See Research & experiments for methodology context.)
Learn More
- Promptfoo JSONL to Evidence Receipts — smallest adoption path for existing eval artifacts
- OpenFeature EvaluationDetails to CI Review Artifact — runtime decision receipt path
- CycloneDX ML-BOM Model to Inventory Receipt — model inventory/provenance receipt path
- MCP Quickstart — filesystem server walkthrough
- Policy Files — YAML schema for
assay mcp wrap - OpenTelemetry & Langfuse — traces → replay and evidence
- CI Guide — GitHub Action
- Evidence Store — S3, B2, MinIO
- ADR-033: Trust compiler positioning
- RFC-005: Trust compiler MVP & Trust Card
Research, mappings & experiments
Bounded context: numbers below support mapping and experiments, not a product “security score.”
- OWASP MCP Top 10 Mapping — how Assay relates to each risk category (coverage is not a scalar guarantee).
- Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
- Security experiments — attack vectors and harness notes (methodology matters more than headline counts).
Contributing
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
See CONTRIBUTING.md. Discussions: GitHub Discussions — seed topics for pinned threads live in docs/community/DISCUSSIONS.md.
License
相关服务器
Alpha Vantage MCP Server
赞助Access financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
AWS DynamoDB
The official developer experience MCP Server for Amazon DynamoDB. This server provides DynamoDB expert design guidance and data modeling assistance.
pabal-store-api-mcp
MCP server that provides App Store / Play Store ASO workflows as tools.
AppSignal MCP
Integrate with the AppSignal monitoring API to query and fetch error and performance data.
Calva Backseat Driver
An MCP server for the Calva VS Code extension, allowing AI assistants to interact with a live Clojure REPL.
MCP Time Server
Provides tools for getting the current time and date, and formatting timestamps in various formats and timezones.
Memnode
Persistent, inspectable memory for AI agents via hosted MCP and API. Supports recall, structured query, lineage, correction, and tenant-scoped remote memory.
MindSwap AI
Your AI's black box recorder. CLI + MCP server for AI handoffs across Claude Code, Cursor, Codex, Copilot, and more.
Reports MCP Server
Manages penetration testing reports and vulnerabilities via a REST API.
lu-mcp-server
Verify AI agent communication with session types and formal proofs
Knowledge Graph
A knowledge graph-driven persistent memory layer for coding agents and LLM workflows.