AgentTrust
Challenge-response quality verification for AI agents and MCP servers.
AgentTrust
Challenge-response quality verification for AI agents and MCP servers.
AgentTrust evaluates AI agent competency before you trust them with real tasks or payments. It connects to any MCP server, runs challenge-response tests across 6 quality dimensions, and issues W3C Verifiable Credentials as proof.
Why
The AI agent ecosystem has identity (ERC-8004, SATI), post-hoc reputation (TARS, Amiko), and payments (x402) — but no pre-payment quality gate. AgentTrust fills this gap: verify competency first, then trust.
Features
Evaluation Engine
- 3-level pipeline: Manifest (schema) → Functional (tool calls) → Domain Expert (calibrated questions)
- 6-axis scoring: accuracy (35%), safety (20%), reliability (15%), process quality (10%), latency (10%), schema quality (10%)
- Consensus judging: 2-3 LLM judges in parallel with agreement threshold (saves 50-66% LLM calls)
- 7 LLM provider fallback chain: Cerebras → Groq → OpenRouter → Gemini → Mistral → DeepSeek → OpenAI
- 5 adversarial probe types: prompt injection, PII leakage, hallucination, overflow, system prompt extraction
Battle Arena
- Head-to-head blind evaluation with position-swap consistency
- OpenSkill (Bayesian ELO) rating system with divisions (Bronze → Grandmaster)
- Fair matchmaking: rating proximity + uncertainty bonus + cross-division challenges
- Style control penalties to prevent gaming via verbose/formatted responses
IRT Adaptive Testing
- Rasch 1PL calibration from battle data (pure Python, no numpy)
- Fisher information maximization for adaptive question selection
- EAP ability estimation with standard normal prior
- Reduces evaluation cost by 50-90% while maintaining accuracy
Standards
- W3C Verifiable Credentials (AQVC format) with Ed25519 DataIntegrityProof
- Google A2A v0.3 native support (AgentTrust IS an A2A agent)
- x402 Solana payment verification (USDC + SOL)
- AIUC-1 protocol mapping
Quick Start
Docker (recommended)
cp .env.example .env
# Add at least one LLM key (GROQ_API_KEY, CEREBRAS_API_KEY, etc.)
docker compose up -d
Services:
- API: http://localhost:8002
- MCP Server: http://localhost:8003
- Health: http://localhost:8002/health
Local Development
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add LLM keys to .env
unset GROQ_API_KEY # Shell env overrides .env rotation pool
python -m uvicorn src.main:app --host 0.0.0.0 --port 8002 --reload
MCP Server (for Claude, Cursor, Windsurf)
Add to your MCP client config:
{
"mcpServers": {
"agenttrust": {
"command": "python",
"args": ["-m", "src.standards.mcp_server"],
"env": {
"GROQ_API_KEY": "your-key"
}
}
}
}
Or connect to a running instance via SSE:
http://localhost:8003/sse
Available MCP tools:
| Tool | Description |
|---|---|
check_quality(server_url) | Full evaluation: manifest + functional + judge scoring |
check_quality_fast(server_url) | Cached score (<10ms) or manifest-only (<100ms) |
get_score(server_url) | Lookup cached score with freshness decay |
verify_attestation(attestation_jwt) | Verify AQVC JWT and decode payload |
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/evaluate | Submit target for evaluation |
| GET | /v1/evaluate/{id} | Poll evaluation status |
| GET | /v1/score/{target_id} | Get quality score |
| GET | /v1/scores | Search/list scores |
| GET | /v1/badge/{target_id}.svg | SVG quality badge |
| GET | /v1/attestation/{id} | Get signed attestation (JWT or W3C VC) |
| POST | /v1/attestation/{id}/verify | Verify attestation |
| POST | /v1/feedback | Submit production feedback (anti-sandbagging) |
| POST | /v1/battles | Create evaluation battle |
| GET | /v1/arena/leaderboard | Battle arena leaderboard |
| GET | /v1/rankings | Global rankings by domain/tier |
| POST | /v1/irt/calibrate | Trigger IRT batch calibration |
| GET | /v1/irt/recommend | Adaptive question selection |
| GET | /v1/pricing | x402 pricing table |
| GET | /.well-known/agent.json | A2A Agent Card |
Architecture
src/
api/v1/ # 14 FastAPI routers
core/ # Evaluator, MCP client, scoring, IRT, battle arena
auth/ # API keys (SHA256 + salt), rate limiting by tier
storage/ # MongoDB (Motor) + Redis
payments/ # x402 protocol, Solana verification
standards/ # W3C VC issuer, A2A extension, MCP server, AIUC-1
Stack: FastAPI + MongoDB + Redis | 533 tests | 60 source files | 15 lean dependencies
Tests
python -m pytest tests/ -q
# 533 passed in ~2s
Configuration
See .env.example for all 60+ configuration options including:
- LLM API keys (7 providers, comma-separated for rotation)
- MongoDB/Redis connection
- JWT attestation (Ed25519 key, issuer DID, validity)
- Solana wallet for x402 payments
- Rate limit tiers and consensus judge settings
License
MIT
Links
- Architecture — Full system design (845 lines)
- Distribution Roadmap — Partner and integration plan
- A2A Agent Card — Machine-readable capabilities
Built by Assisterr
相關伺服器
CS2 RCON MCP
A server for managing Counter-Strike 2 servers using the RCON protocol.
CryptoScholar
Live crypto technical analysis MCP server — EMA, RSI, MACD, ATR, Bollinger Bands, TSS scoring, and Claude AI bull/bear debate via CoinGecko free API
Mighty Bills
Interact with bills and ledger transactions tracked by Mighty Bills.
Trading MCP Server
An intelligent trading assistant that fetches live stock prices using the Yahoo Finance API.
MCP HUB
The Ultimate Control Plane for MCP Unlock the full power of Model Context Protocol with zero friction. One-Click GPT Integration: Bridge the gap between MCP servers and ChatGPT/LLMs instantly. No more manual config hunting. Pro-Level Orchestration: Manage, monitor, and toggle multiple MCP tools from a single, intuitive dashboard. Secure by Design: Built-in support for complex auth flows and 2FA, making enterprise-grade tool integration seamless. Streamlined Debugging: Test queries and inspect tool responses in real-time without leaving the hub. Stop wrestling with JSON configs. Start building agentic workflows that actually work.
AgentPay
x402 payment gateway for AI agents — 12 crypto data tools (price, whale activity, gas, TVL, Fear & Greed, Dune queries) paid per-call in USDC on Stellar or Base. No API keys, no subscriptions.
Lovie
The Company Formation MCP for AI coding tools.
RequestRepo MCP
A MCP for RequestRepo
Natural Disaster Intel MCP
FEMA disaster declarations, NOAA severe weather alerts, and USGS earthquake data. 4 MCP tools for real-time disaster monitoring.
Earnings Feed
SEC filings and insider trades in real-time. 10-K, 10-Q, 8-K, Form 4, and company lookup.