omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Похожие серверы
Synter Ads
Cross-platform ad campaign management for AI agents across Google, Meta, LinkedIn, Reddit, TikTok, and more. 140+ tools with read/write access.
Procesio MCP Server
Interact with the Procesio automation platform API.
MCP Resume Server
Fetches resume data from a GitHub gist to provide professional background context to LLMs.
Vedit-MCP
Perform basic video editing operations using natural language commands. Requires ffmpeg to be installed.
HireBase
Interact with the HireBase Job API to manage job listings and applications.
Calculator
This server enables LLMs to use calculator for precise numerical calculations.
Obsidian MCP Server
An MCP server that allows AI assistants to read from and write to your local Obsidian vault.
Cal.com Calendar
Integrates with the Cal.com Calendar API for appointment scheduling.
flomo
flomo 官方 MCP 服务
Memory Graph
A graph-based Model Context Protocol (MCP) server that gives AI coding agents persistent memory. Originally built for Claude Code, MemoryGraph works with any MCP-enabled coding agent. Store development patterns, track relationships, and retrieve contextual knowledge across sessions and projects.