MCP Test Utils
Desktop UI automation for AI agents: screenshots, window management, mouse, keyboard, UI Automation tree, OCR
MCP Test Utils
100% AI Code · Human Reviewed
MCP server for automated desktop UI testing. A single binary — no runtime, no dependencies, no installation.
Windows x64 only. macOS and Linux support is planned.
Gives AI agents eyes and hands: screenshots, window management, mouse, keyboard, UI Automation, OCR, file search.
Why
AI agents can trigger actions in applications but can't see the screen. This server bridges that gap:
Agent triggers action → takes screenshot → sees the result →
switches window → clicks a button → verifies → writes report
Fully autonomous, no user involvement required.
Demo
10 tasks. One take. Watch on YouTube →
MCP Test Utils vs Anthropic Computer Use
Claude Cowork now includes built-in Computer Use — Claude takes screenshots and clicks through interfaces visually. It works with zero setup. MCP Test Utils takes a different approach: instead of guessing where to click from a screenshot, it reads the actual UI structure through Windows APIs.
| MCP Test Utils | Computer Use | |
|---|---|---|
| Click precision | Exact — UI Automation API | Visual estimate from screenshot |
| Speed & token cost | Fast, low cost — text responses | Slower, costly — image on every step |
| UI structure | Full tree: roles, states, coordinates | Not available |
| OCR | Word-level coordinates, multi-language | Not available (model vision only) |
| Window management | API-based, window-relative coords | Visual navigation |
| File search | Ripgrep engine built-in | Not available |
| Session logging | JSONL + screenshots | Not available |
| Visual analysis | ✅ Same Claude model, full-res 1:1 | ✅ Same Claude model |
| Setup | Download binary, add to config | Built-in, one toggle |
| Mobile / Dispatch | — | ✅ Tasks from phone |
| Cross-platform | Windows (macOS/Linux planned) | macOS + Windows |
MCP Test Utils is faster, more precise, and cheaper per action. Computer Use is easier to start and works across platforms. They complement each other.
Platforms
| Platform | Status |
|---|---|
| Windows x64 | ✅ Full support |
| macOS arm64 | ⏳ Planned |
| Linux x64 | ⏳ Planned |
Tools (19)
Vision
| Tool | Description |
|---|---|
take_screenshot | Screenshot of the entire desktop with configurable quality |
take_window_screenshot | Screenshot of a specific window (screen or window capture mode) |
read_screen_text | OCR the entire screen (Windows.Media.Ocr) |
read_region_text | OCR a screen region with precise word coordinates |
Window Management
| Tool | Description |
|---|---|
list_windows | List windows with id, title, app, position, size, minimized, focused |
focus_window | Bring a window to front, restore if minimized |
Input
| Tool | Description |
|---|---|
mouse_click | Click (left / right / middle) at screen or window-relative coordinates |
mouse_move | Move cursor to a point |
mouse_drag | Drag from point A to point B |
mouse_scroll | Scroll the mouse wheel |
keyboard_type | Type text (full Unicode — Latin, Cyrillic, CJK, emoji) |
keyboard_press | Press a key (Enter, Tab, F1–F12, arrows, etc.) |
keyboard_shortcut | Key combinations (Ctrl+S, Alt+F4, Ctrl+Shift+P, etc.) |
Structured UI Access
| Tool | Description |
|---|---|
list_ui_elements | UI Automation tree — buttons, fields, menus with exact coordinates |
File Search
| Tool | Description |
|---|---|
search_in_files | Search text or regex in files within allowed directories (like VS Code Find in Files) |
find_files | Find files and directories by name pattern (glob), like "Go to File" |
Agent Guide
| Tool | Description |
|---|---|
get_usage_guide | Compact workflow guide for LLM agents — precision clicking, coordinate metadata, quality tips |
Session Logging
| Tool | Description |
|---|---|
enable_logging | Start recording tool calls to JSONL + screenshots (opt-in) |
disable_logging | Stop recording, get session stats |
Installation
- Download the binary from Releases.
- Add it to your MCP client config. Example below is for Claude Desktop — for other clients, refer to their documentation.
Claude Desktop: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"test-utils": {
"command": "D:\\path\\to\\mcp-test-utils.exe"
}
}
}
- Restart Claude Desktop.
- In chat, try: "Take a screenshot" — the agent will return an image of your desktop.
With Logging and File Search (optional)
{
"mcpServers": {
"test-utils": {
"command": "D:\\path\\to\\mcp-test-utils.exe",
"env": {
"MCP_LOG_DIR": "D:\\path\\to\\logs",
"MCP_LOG_MAX_MB": "500",
"MCP_LOG_RETAIN_DAYS": "30",
"MCP_SEARCH_DIRS": "D:\\Projects\\app1;D:\\Projects\\app2"
}
}
}
}
Quality Presets
Screenshots support configurable quality to balance detail and token cost:
| Preset | Scale | Format | Use Case |
|---|---|---|---|
full | 100% | JPEG q90 | Maximum detail |
standard | 50% | JPEG q70 | Balanced (default) |
compact | 50% | PNG | When PNG is needed |
minimal | 25% | Grayscale | Lowest token cost |
custom | 10–100% | JPEG / PNG / Grayscale | Full control |
Environment Variables
| Variable | Description | Default |
|---|---|---|
MCP_LOG_DIR | Path for log sessions. Without it, logging tools are hidden | — |
MCP_LOG_MAX_MB | Session size limit (warning on exceed) | 500 |
MCP_LOG_RETAIN_DAYS | Auto-delete sessions older than N days. 0 to disable | 30 |
MCP_SEARCH_DIRS | Allowed directories for search_in_files (; on Windows, : on macOS/Linux). Without it, the tool is hidden | — |
How It Works
MCP Test Utils is a JSON-RPC 2.0 server communicating over stdin/stdout. Any MCP-compatible client launches the binary, sends tool calls, and receives structured responses (text, base64 images). Tested with Claude Desktop.
The server uses native Windows APIs directly — Win32 GDI for screenshots, SendInput for mouse and keyboard, UI Automation COM API for element inspection, WinRT Windows.Media.Ocr for text recognition. File search uses the ripgrep engine (grep-regex, grep-searcher, ignore) — cross-platform, no external dependencies. No PowerShell, no external tools, no network access.
Use Cases
- Automated QA — agent navigates the app, clicks through flows, takes screenshots at each step, writes a test report
- Desktop automation — fill forms, copy data between windows, run workflows
- Accessibility audit — scan UI Automation tree for missing labels or roles
- Visual regression — screenshot comparison across releases
- Data extraction — OCR text from applications that don't expose APIs
- Code search — find patterns across multiple projects without leaving the agent session
Security
- Responds only to requests from the MCP client
- Opens no network ports
- Writes nothing to disk (except opt-in logging)
- Sends no data externally
- Screenshots capture the entire screen — make sure no sensitive information is visible
- File search is sandboxed — only directories in
MCP_SEARCH_DIRSare accessible
Support us
Free and unrestricted. If you find it useful — jeenyjai.github.io
License
Copyright 2026 JeenyJAI. All rights reserved.
🚀 Created with Claude
เซิร์ฟเวอร์ที่เกี่ยวข้อง
Alpha Vantage MCP Server
ผู้สนับสนุนAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
302AI Custom MCP Server
A customizable MCP service with flexible tool selection and configuration. Requires a 302AI API key.
Projet MCP Server-Client
An implementation of the Model Context Protocol (MCP) for communication between AI models and external tools, featuring server and client examples in Python and Spring Boot.
CircleCI
Enable AI Agents to fix build failures from CircleCI.
Smriti MCP
Smriti is a Model Context Protocol (MCP) server that provides persistent, graph-based memory for LLM applications. Built on LadybugDB (embedded property graph database), it uses EcphoryRAG-inspired multi-stage retrieval - combining cue extraction, graph traversal, vector similarity, and multi-hop association - to deliver human-like memory recall.
ClawGuard Shield
Security scanner for AI agents — detects prompt injection attacks with 245 patterns across 15 languages in under 10ms
Web3 Playground & Sandbox - Learn, Develop, Test MCP Servers + Toolkit SDK
Free Solidity compiler & Web3 IDE with interactive tutorials. Learn blockchain development, deploy smart contracts to 8+ chains (Ethereum, Polygon, Base, Arbitrum, Solana). Templates for tokens, NFTs, DeFi, DAOs. Monaco Editor, AI assistance, WCAG accessible. Remix alternative. Gas optimization, MetaMask integration, open source. Beginner-friendly. MCP toolkit.
MCP DevTools
A development tools server for Git management, file operations, AI-assisted editing, and terminal execution, integrable with AI assistants and code editors.
WordPress Docs
Access WordPress documentation and development tools.
Dan MCP
An example MCP server deployed on Cloudflare Workers without authentication.
gluestack-ui MCP Server
An MCP server tailored for React Native–first development using Gluestack UI
