AgentTrust

Challenge-response quality verification for AI agents and MCP servers.

AgentTrust

Challenge-response quality verification for AI agents and MCP servers.

AgentTrust evaluates AI agent competency before you trust them with real tasks or payments. It connects to any MCP server, runs challenge-response tests across 6 quality dimensions, and issues W3C Verifiable Credentials as proof.

Why

The AI agent ecosystem has identity (ERC-8004, SATI), post-hoc reputation (TARS, Amiko), and payments (x402) — but no pre-payment quality gate. AgentTrust fills this gap: verify competency first, then trust.

Features

Evaluation Engine

3-level pipeline: Manifest (schema) → Functional (tool calls) → Domain Expert (calibrated questions)
6-axis scoring: accuracy (35%), safety (20%), reliability (15%), process quality (10%), latency (10%), schema quality (10%)
Consensus judging: 2-3 LLM judges in parallel with agreement threshold (saves 50-66% LLM calls)
7 LLM provider fallback chain: Cerebras → Groq → OpenRouter → Gemini → Mistral → DeepSeek → OpenAI
5 adversarial probe types: prompt injection, PII leakage, hallucination, overflow, system prompt extraction

Battle Arena

Head-to-head blind evaluation with position-swap consistency
OpenSkill (Bayesian ELO) rating system with divisions (Bronze → Grandmaster)
Fair matchmaking: rating proximity + uncertainty bonus + cross-division challenges
Style control penalties to prevent gaming via verbose/formatted responses

IRT Adaptive Testing

Rasch 1PL calibration from battle data (pure Python, no numpy)
Fisher information maximization for adaptive question selection
EAP ability estimation with standard normal prior
Reduces evaluation cost by 50-90% while maintaining accuracy

Standards

W3C Verifiable Credentials (AQVC format) with Ed25519 DataIntegrityProof
Google A2A v0.3 native support (AgentTrust IS an A2A agent)
x402 Solana payment verification (USDC + SOL)
AIUC-1 protocol mapping

Quick Start

Docker (recommended)

cp .env.example .env
# Add at least one LLM key (GROQ_API_KEY, CEREBRAS_API_KEY, etc.)
docker compose up -d

Services:

Local Development

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add LLM keys to .env

unset GROQ_API_KEY  # Shell env overrides .env rotation pool
python -m uvicorn src.main:app --host 0.0.0.0 --port 8002 --reload

MCP Server (for Claude, Cursor, Windsurf)

Add to your MCP client config:

{
  "mcpServers": {
    "agenttrust": {
      "command": "python",
      "args": ["-m", "src.standards.mcp_server"],
      "env": {
        "GROQ_API_KEY": "your-key"
      }
    }
  }
}

Or connect to a running instance via SSE:

http://localhost:8003/sse

Available MCP tools:

Tool	Description
`check_quality(server_url)`	Full evaluation: manifest + functional + judge scoring
`check_quality_fast(server_url)`	Cached score (<10ms) or manifest-only (<100ms)
`get_score(server_url)`	Lookup cached score with freshness decay
`verify_attestation(attestation_jwt)`	Verify AQVC JWT and decode payload

API Endpoints

Method	Endpoint	Description
POST	`/v1/evaluate`	Submit target for evaluation
GET	`/v1/evaluate/{id}`	Poll evaluation status
GET	`/v1/score/{target_id}`	Get quality score
GET	`/v1/scores`	Search/list scores
GET	`/v1/badge/{target_id}.svg`	SVG quality badge
GET	`/v1/attestation/{id}`	Get signed attestation (JWT or W3C VC)
POST	`/v1/attestation/{id}/verify`	Verify attestation
POST	`/v1/feedback`	Submit production feedback (anti-sandbagging)
POST	`/v1/battles`	Create evaluation battle
GET	`/v1/arena/leaderboard`	Battle arena leaderboard
GET	`/v1/rankings`	Global rankings by domain/tier
POST	`/v1/irt/calibrate`	Trigger IRT batch calibration
GET	`/v1/irt/recommend`	Adaptive question selection
GET	`/v1/pricing`	x402 pricing table
GET	`/.well-known/agent.json`	A2A Agent Card

Architecture

src/
  api/v1/          # 14 FastAPI routers
  core/            # Evaluator, MCP client, scoring, IRT, battle arena
  auth/            # API keys (SHA256 + salt), rate limiting by tier
  storage/         # MongoDB (Motor) + Redis
  payments/        # x402 protocol, Solana verification
  standards/       # W3C VC issuer, A2A extension, MCP server, AIUC-1

Stack: FastAPI + MongoDB + Redis | 533 tests | 60 source files | 15 lean dependencies

Tests

python -m pytest tests/ -q
# 533 passed in ~2s

Configuration

See .env.example for all 60+ configuration options including:

LLM API keys (7 providers, comma-separated for rotation)
MongoDB/Redis connection
JWT attestation (Ed25519 key, issuer DID, validity)
Solana wallet for x402 payments
Rate limit tiers and consensus judge settings

License

MIT

เซิร์ฟเวอร์ที่เกี่ยวข้อง

Zcash MCP

MCP server for Zcash with 22 tools for shielded transactions, attestation verification, and wallet management

NBA MCP Server

Provides NBA statistics and live game data using the Model Context Protocol (MCP).

USA Spending MCP

Track government spending, search government spending be agency, explore government spending to communities, and much more.

Janee API Security

MCP server that sits between AI agents and APIs. Agents request access, Janee makes the call with the real credentials, agents never see the secrets.

WAF (ModSecurity)

Monitor WAF events, analyze attacks, tune rules and whitelist IPs for OWASP ModSecurity CRS via Docker

Business Helper

AI-powered Business Helper that analyzes thousands of YouTube videos to extract precise insights, timestamps, and actionable strategies. Instantly find the most relevant moments from podcasts, interviews, and lectures—turning long-form content into targeted business intelligence.

Strider Labs Uber MCP

MCP server for Uber - let AI agents book rides, get fare estimates, and track trips

MCP Internet Speed Test

Perform internet speed measurements including download/upload speed, latency, and jitter.

Microsoft Ads MCP

Bring your Microsoft Ads data into Claude or ChatGPT with Two Minute Reports MCP to analyze keyword performance, search queries, CTR, CPC, and conversion trends.

MCP-HA-Connect

A production-ready Model Context Protocol (MCP) server for Home Assistant integration with AI assistants like Claude.

AgentTrust

AgentTrust

Why

Features

Quick Start

Docker (recommended)

Local Development

MCP Server (for Claude, Cursor, Windsurf)

API Endpoints

Architecture

Tests

Configuration

License

Links

เซิร์ฟเวอร์ที่เกี่ยวข้อง

Zcash MCP

NBA MCP Server

USA Spending MCP

Janee API Security

WAF (ModSecurity)

Business Helper

Strider Labs Uber MCP

MCP Internet Speed Test

Microsoft Ads MCP

MCP-HA-Connect

NotebookLM Web Importer