omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
関連サーバー
Whoop
Access the Whoop API to query cycles, recovery, strain, and workout data.
Linear MCP Server
Manage projects, issues, and teams using the Linear API.
Unchained Engine
E-Commerce Engine with built-in MCP Server
Umbraco MCP
Interact with the Umbraco CMS Management API for administrative tasks.
MCP-MD-PDF: Markdown to Word/PDF Converter
A simple, reliable Model Context Protocol (MCP) server that converts Markdown files into professional Word (.docx) and PDF documents — with full support for .dotx templates.
Feishu/Lark OpenAPI
Connects AI agents to the Feishu/Lark platform to automate document processing, conversation management, and calendar scheduling via its OpenAPI.
SilverBullet MCP Server
An MCP server that enables LLMs and other clients to interact with your SilverBullet notes and data.
Geomanic
Privacy-first GPS tracking companion. Query travel statistics, manage waypoints, track countries visited, and analyze distances and speeds through natural language. Free, GDPR compliant.
上海迪士尼门票查询
sh-disney-mcp 是一个基于 Model Context Protocol (MCP) 的mcp server,旨在通过标准化的接口,帮助大模型快速获取上海迪士尼乐园的门票价格和售卖状态信息。
Feishu MCP Server
Access and manage Feishu documents for AI coding tools, enabling structured content retrieval, editing, and search.