An MCP server that analyzes the screen with OmniParser to automate GUI operations.
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export
instead of set
.)
(If you want langchain_example.py
to work, uv sync --extra langchain
instead.)
claude_desktop_config.json
:{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp
with the directory you cloned.)
env
allows for the following additional configurations:
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify 1
.
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen.
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as 127.0.0.1:8000
.
The server can be started with uv run omniparserserver
.
SSE_HOST
, SSE_PORT
If specified, communication will be done via SSE instead of stdio.
SOM_MODEL_PATH
, CAPTION_MODEL_NAME
, CAPTION_MODEL_PATH
, OMNI_PARSER_DEVICE
, BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
etc.
Tools for PostHog analytics, annotations, and project management.
Interact with the Confluence API to manage spaces, pages, and content. Supports searching, creating, and updating pages.
Access personal and team knowledge repositories, including documents and Slack discussions.
Interact with the Salesforce API using jsforce, requiring username and password for authentication.
Interact with data in Attio, the AI-native CRM, enabling AI assistants to access and manage your customer relationship information.
A CLI tool to synchronize MCP (Model Context Protocol) settings across multiple AI coding tools.
Manages work memories and shares context between AI tools using a local SQLite database.
Integrate the SpotDraft API into agentic workflows. Requires SpotDraft API credentials.
A server for querying Jira issues, requiring a Jira token for authentication.
Connect AI agents with the Feishu/Lark platform for automation, including document processing, conversation management, and calendar scheduling.