omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
เซิร์ฟเวอร์ที่เกี่ยวข้อง
Invoice MCP
Create professional PDF invoices using natural language.
Reepl MCP
Create, schedule, and publish LinkedIn posts directly from Claude Desktop or ChatGPT through natural conversations
Google Tasks
Interact with Google Tasks to manage your to-do lists and tasks.
Miro
Access the Miro REST API v2 for managing boards, creating content, and collaborating.
Dub.co Short Links (Unofficial)
An unofficial MCP server for creating and managing short links with Dub.co.
Resource Hub
Connects to the Resource Hub to centrally configure and manage your MCP servers.
Markdown to WeChat Converter
Converts Markdown text into HTML compatible with WeChat official accounts using an external API key.
Humanizer PRO
Humanizer PRO transforms AI content into natural, human-like writing that bypasses all AI detection. Our advanced AI humanizer ensures perfect authenticity while preserving your message. Try it now!
jCodeMunch-MCP
Token-efficient MCP server for GitHub source code exploration via tree-sitter AST parsing
Email MCP for Gmail, iCloud and microsoft
Organize, flag, read, delete, and clean email with AI.