AgentTrust

Challenge-response quality verification for AI agents and MCP servers.

AgentTrust

Challenge-response quality verification for AI agents and MCP servers.

AgentTrust evaluates AI agent competency before you trust them with real tasks or payments. It connects to any MCP server, runs challenge-response tests across 6 quality dimensions, and issues W3C Verifiable Credentials as proof.

Why

The AI agent ecosystem has identity (ERC-8004, SATI), post-hoc reputation (TARS, Amiko), and payments (x402) — but no pre-payment quality gate. AgentTrust fills this gap: verify competency first, then trust.

Features

Evaluation Engine

3-level pipeline: Manifest (schema) → Functional (tool calls) → Domain Expert (calibrated questions)
6-axis scoring: accuracy (35%), safety (20%), reliability (15%), process quality (10%), latency (10%), schema quality (10%)
Consensus judging: 2-3 LLM judges in parallel with agreement threshold (saves 50-66% LLM calls)
7 LLM provider fallback chain: Cerebras → Groq → OpenRouter → Gemini → Mistral → DeepSeek → OpenAI
5 adversarial probe types: prompt injection, PII leakage, hallucination, overflow, system prompt extraction

Battle Arena

Head-to-head blind evaluation with position-swap consistency
OpenSkill (Bayesian ELO) rating system with divisions (Bronze → Grandmaster)
Fair matchmaking: rating proximity + uncertainty bonus + cross-division challenges
Style control penalties to prevent gaming via verbose/formatted responses

IRT Adaptive Testing

Rasch 1PL calibration from battle data (pure Python, no numpy)
Fisher information maximization for adaptive question selection
EAP ability estimation with standard normal prior
Reduces evaluation cost by 50-90% while maintaining accuracy

Standards

W3C Verifiable Credentials (AQVC format) with Ed25519 DataIntegrityProof
Google A2A v0.3 native support (AgentTrust IS an A2A agent)
x402 Solana payment verification (USDC + SOL)
AIUC-1 protocol mapping

Quick Start

Docker (recommended)

cp .env.example .env
# Add at least one LLM key (GROQ_API_KEY, CEREBRAS_API_KEY, etc.)
docker compose up -d

Services:

Local Development

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add LLM keys to .env

unset GROQ_API_KEY  # Shell env overrides .env rotation pool
python -m uvicorn src.main:app --host 0.0.0.0 --port 8002 --reload

MCP Server (for Claude, Cursor, Windsurf)

Add to your MCP client config:

{
  "mcpServers": {
    "agenttrust": {
      "command": "python",
      "args": ["-m", "src.standards.mcp_server"],
      "env": {
        "GROQ_API_KEY": "your-key"
      }
    }
  }
}

Or connect to a running instance via SSE:

http://localhost:8003/sse

Available MCP tools:

Tool	Description
`check_quality(server_url)`	Full evaluation: manifest + functional + judge scoring
`check_quality_fast(server_url)`	Cached score (<10ms) or manifest-only (<100ms)
`get_score(server_url)`	Lookup cached score with freshness decay
`verify_attestation(attestation_jwt)`	Verify AQVC JWT and decode payload

API Endpoints

Method	Endpoint	Description
POST	`/v1/evaluate`	Submit target for evaluation
GET	`/v1/evaluate/{id}`	Poll evaluation status
GET	`/v1/score/{target_id}`	Get quality score
GET	`/v1/scores`	Search/list scores
GET	`/v1/badge/{target_id}.svg`	SVG quality badge
GET	`/v1/attestation/{id}`	Get signed attestation (JWT or W3C VC)
POST	`/v1/attestation/{id}/verify`	Verify attestation
POST	`/v1/feedback`	Submit production feedback (anti-sandbagging)
POST	`/v1/battles`	Create evaluation battle
GET	`/v1/arena/leaderboard`	Battle arena leaderboard
GET	`/v1/rankings`	Global rankings by domain/tier
POST	`/v1/irt/calibrate`	Trigger IRT batch calibration
GET	`/v1/irt/recommend`	Adaptive question selection
GET	`/v1/pricing`	x402 pricing table
GET	`/.well-known/agent.json`	A2A Agent Card

Architecture

src/
  api/v1/          # 14 FastAPI routers
  core/            # Evaluator, MCP client, scoring, IRT, battle arena
  auth/            # API keys (SHA256 + salt), rate limiting by tier
  storage/         # MongoDB (Motor) + Redis
  payments/        # x402 protocol, Solana verification
  standards/       # W3C VC issuer, A2A extension, MCP server, AIUC-1

Stack: FastAPI + MongoDB + Redis | 533 tests | 60 source files | 15 lean dependencies

Tests

python -m pytest tests/ -q
# 533 passed in ~2s

Configuration

See .env.example for all 60+ configuration options including:

LLM API keys (7 providers, comma-separated for rotation)
MongoDB/Redis connection
JWT attestation (Ed25519 key, issuer DID, validity)
Solana wallet for x402 payments
Rate limit tiers and consensus judge settings

License

MIT

Servidores relacionados

NDI-MCP-Server

AI-powered commercial real estate deal search, comp lookup, and property scoring for the Northeast US — 14K+ active listings, 100K+ closed comps

TradeMemory Protocol

AI trading memory layer for MT5/forex with 15 MCP tools — store/recall trades, pattern discovery, strategy evolution, and Outcome-Weighted Memory.

Armor Crypto MCP

MCP to interface with multiple blockchains, staking, DeFi, swap, bridging, wallet management, DCA, Limit Orders, Coin Lookup, Tracking and more.

TechMCP

Integrates with PSG College of Technology's e-campus portal to provide AI assistants access to student academic data like marks, attendance, and timetables.

Brick Directory

MCP that knows everything about LEGO sets, parts, minifigures, and pricing. Help you manage your collections across popular sites such as Rebrickable and BrickEconomy

Google Ads MCP Server

Connect Google Ads to Claude or ChatGPT via Two Minute Reports MCP and get accurate answers about campaigns, creatives, and spend.

Withings

MCP server for Withings health data integration

MCP-HA-Connect

A production-ready Model Context Protocol (MCP) server for Home Assistant integration with AI assistants like Claude.

FlashAlpha

Options Analytics API - GEX Exposure Greeks Volatility

MCP Weather Server

Provides real-time weather information and forecasts using the OpenWeatherMap API.

AgentTrust

AgentTrust

Why

Features

Quick Start

Docker (recommended)

Local Development

MCP Server (for Claude, Cursor, Windsurf)

API Endpoints

Architecture

Tests

Configuration

License

Links

Servidores relacionados

NDI-MCP-Server

TradeMemory Protocol

Armor Crypto MCP

TechMCP

Brick Directory

Google Ads MCP Server

Withings

MCP-HA-Connect

FlashAlpha

MCP Weather Server

NotebookLM Web Importer