Advanced evaluation tools for AI safety, alignment, and performance using the Trustwise API.
The Trustwise MCP Server is a Model Context Protocol (MCP) server that provides a suite of advanced evaluation tools for AI safety, alignment, and performance. It enables developers and AI tools to programmatically assess the quality, safety, and cost of LLM outputs using Trustwise's industry-leading metrics.
To connect the Trustwise MCP Server to Claude Desktop, add the following configuration to your Claude Desktop settings:
{
"mcpServers": {
"trustwise": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"TW_API_KEY",
"ghcr.io/trustwiseai/trustwise-mcp-server:latest"
],
"env": {
"TW_API_KEY": "<YOUR_TRUSTWISE_API_KEY>"
}
}
}
}
To point to a specific Trustwise Instance - under env
, also set the following optional environment variable:
TW_BASE_URL: "<YOUR_TRUSTWISE_INSTANCE_URL>"
e.g "TW_BASE_URL": "https://api.yourdomain.ai"
To connect the Trustwise MCP Server to cursor, add the following configuration to your cursor settings:
{
"mcpServers": {
"trustwise": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"TW_API_KEY",
"-e",
"TW_BASE_URL",
"ghcr.io/trustwiseai/trustwise-mcp-server:latest"
],
"env": {
"TW_API_KEY": "<YOUR_TRUSTWISE_API_KEY>"
}
}
}
}
Replace <YOUR_TRUSTWISE_API_KEY>
with your actual Trustwise API key.
The Trustwise MCP Server exposes the following tools (metrics). Each tool can be called with the specified arguments to evaluate a model response.
Tool Name | Description |
---|---|
faithfulness_metric | Evaluate the faithfulness of a response to its context |
answer_relevancy_metric | Evaluate relevancy of a response to the query |
context_relevancy_metric | Evaluate relevancy of context to the query |
pii_metric | Detect PII in a response |
prompt_injection_metric | Detect prompt injection risk |
summarization_metric | Evaluate summarization quality |
clarity_metric | Evaluate clarity of a response |
formality_metric | Evaluate formality of a response |
helpfulness_metric | Evaluate helpfulness of a response |
sensitivity_metric | Evaluate sensitivity of a response |
simplicity_metric | Evaluate simplicity of a response |
tone_metric | Evaluate tone of a response |
toxicity_metric | Evaluate toxicity of a response |
refusal_metric | Detect refusal to answer or comply with the query |
completion_metric | Evaluate completion of the queryβs instruction |
adherence_metric | Evaluate adherence to a given policy or instruction |
stability_metric | Evaluate stability (consistency) of multiple responses |
carbon_metric | Estimate carbon footprint of a response |
cost_metric | Estimate cost of a response |
For more examples and advanced usage, see the official Trustwise SDK.
This project is licensed under the terms of the MIT open source license. See LICENSE for details.
Analyzes your codebase identifying important files based on dependency relationships. Generates diagrams and importance scores per file, helping AI assistants understand the codebase. Automatically parses popular programming languages, Python, Lua, C, C++, Rust, Zig.
A collection of demo files for MCP servers and clients, illustrating various transport protocols and server capabilities using Python.
A Swift-based MCP server that integrates with Xcode to enhance AI development workflows.
MCP server for integrating with Polarion Application Lifecycle Management (ALM).
Execute secure shell commands from AI assistants and other MCP clients, with configurable security settings.
An MCP (Model Context Protocol) aggregator that allows you to combine multiple MCP servers into a single endpoint allowing to filter specific tools.
An agentic communication framework for multi-agent collaboration using MCP.
A GDB/MI protocol server based on the MCP protocol, providing remote application debugging capabilities with AI assistants.
Arbitrary code execution and tool-use platform for LLMs by Riza
An MCP server for interacting with the Clay API, which requires a Clay API key.