omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Servidores relacionados
Kone.vc
patrocinadorMonetize your AI agent with contextual product recommendations
Shine Tools MCP Server
Provides tech radar recommendations, customer management, product catalog, and invoicing functionality using external JSON data files.
Linksee Memory
Local-first cross-agent memory MCP. 6-layer structured brain (goal/context/emotion/impl/caveat/learning) with token-saving file diff cache (86% measured savings on re-reads)
activity-mcp
An MCP server for interacting with various services like Slack, Harvest, and GitHub to manage activities and data.
Linear MCP Server
Interact with the Linear API to manage issues, projects, and teams programmatically.
MCP-Zentao
An API integration for the Zentao project management system, supporting task management and bug tracking.
Spotify MCP Server
Control Spotify with natural language. Enables search, playback control, queue management, and device control using conversational commands.
AI MUSIC MCP
The World's First AI Music MCP Beyond images and video, your agent can now generate music.
Basecamp
Interact with Basecamp 3 to manage projects, to-dos, and messages.
JIRA Zephyr
Integrates with JIRA's Zephyr test management system.
mcp-apple-notes
Semantic search and RAG over Apple Notes with on-device embeddings, full CRUD, folder management, and fuzzy title matching. 10 tools. Fully local on macOS.