Advanced computer vision and object detection MCP server powered by Dino-X, enabling AI agents to analyze images, detect objects, identify keypoints, and perform visual understanding tasks.
English | δΈζ
Enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.
Although multimodal models can understand and describe images, they often lack precise localization and high-quality structured outputs for visual content.
With DINO-X MCP, you can:
π§ Achieve fine-grained image understanding β both full-scene recognition and targeted detection based on natural language.
π― Accurately obtain object count, position, and attributes, enabling tasks such as visual question answering.
π§© Integrate with other MCP Servers to build multi-step visual workflows.
π οΈ Build natural language-driven visual agents for real-world automation scenarios.
π― Scenario | π Input | β¨ Output |
---|---|---|
Detection & Localization | π¬ Prompt:Detect and visualize the fire areas in the forest πΌοΈ Input Image: | |
Object Counting | π¬ Prompt:Please analyze this warehouse image, detect all the cardboard boxes, count the total number πΌοΈ Input Image: | |
Feature Detection | π¬ Prompt:Find all red cars in the image πΌοΈ Input Image: | |
Attribute Reasoning | π¬ Prompt:Find the tallest person in the image, describe their clothing πΌοΈ Input Image: | |
Full Scene Detection | π¬ Prompt:Find the fruit with the highest vitamin C content in the image πΌοΈ Input Image: | |
Pose Analysis | π¬ Prompt:Please analyze what yoga pose this is πΌοΈ Input Image: |
You can install Node.js using one of the following methods:
# For MacOS or Linux
# 1. Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# OR
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# 2. Add these lines to your profile (~/.bash_profile, ~/.zshrc, ~/.profile, or ~/.bashrc)
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"
# 3. Activate nvm in current shell
source ~/.bashrc
# Or
source ~/.zshrc
# 4. Verify nvm installation
command -v nvm
# 5. Install and use LTS version of Node.js
nvm install --lts
nvm use --lts
# For Windows
winget install OpenJS.NodeJS.LTS
# Or using PowerShell (Administrator)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y
Download the installer from nodejs.org
Also, choose an AI assistants and applications that support the MCP Client, including but not limited to:
You can use DINO-X MCP server in two ways:
Add the following configuration in your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "npx",
"args": ["-y", "@deepdataspace/dinox-mcp"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
First, clone and build the project:
# Clone the project
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP
# Install dependencies
pnpm install
# Build the project
pnpm run build
Then configure your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "node",
"args": ["/path/to/DINO-X-MCP/build/index.js"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
Get your API key from DINO-X Platform (A free quota is available for new users).
Replace your-api-key-here
in the configuration above with your actual API key.
The DINO-X MCP server supports the following environment variables:
Variable Name | Description | Required | Default Value | Example |
---|---|---|---|---|
DINOX_API_KEY | Your DINO-X API key for authentication | Required | - | your-api-key-here |
IMAGE_STORAGE_DIRECTORY | Directory where generated visualization images will be saved | Optional | macOS/Linux: /tmp/dinox-mcp Windows: %TEMP%\dinox-mcp | /Users/admin/Downloads/dinox-images |
Restart your MCP client, and you should be able to use the following tools:
Method Name | Description | Input | Output |
---|---|---|---|
detect-all-objects | Detects and localizes all recognizable objects in an image. | Image | Category names + bounding boxes + captions |
object-detection-by-text | Detects and localizes objects in an image based on a natural language prompt. | Image + Text prompt | Bounding boxes + object captions |
detect-human-pose-keypoints | Detects 17 human body keypoints per person in an image for pose estimation. | Image | Keypoint coordinates and captions |
visualize-detections | Visualizes detection results by drawing bounding boxes and labels on the image. | Image + Detection results | Annotated image saved to storage directory |
https://
πfile://
)jpg, jpeg, png, webp
Please refer to DINO-X Platform for API usage limits and pricing information.
During development, you can use watch mode for automatic rebuilding:
pnpm run watch
Use MCP Inspector to debug the server:
pnpm run inspector
Apache License 2.0
An unofficial MCP server plugin for remote control of Unreal Engine using AI tools.
Manage DDEV projects, enabling LLM applications to interact with local development environments through the MCP protocol.
Popular MCP server that enables AI agents to scaffold, build, run and test iOS, macOS, visionOS and watchOS apps or simulators and wired and wireless devices. It has powerful UI-automation capabilities like controlling the simulator, capturing run-time logs, as well as taking screenshots and viewing the accessibility hierarchy.
Enables AI assistants to use a Neo4j knowledge graph for standardized coding workflows, acting as a dynamic instruction manual and project memory.
A JSON diff tool to compare two JSON strings.
A Model Context Protocol server that provides access to the connpass users API v2, utilizing Gemini for grounding.
A server for blockchain interactions, offering Ethereum vanity address generation, 4byte lookup, ABI encoding, and multi-chain RPC calls.
Create, validate, and render diagrams from D2 (Declarative Diagramming) code into SVG and PNG formats.
A tool server for integrating Dify Workflows via the Model Context Protocol (MCP).
Generate images using Bytedance's SeedDream 3.0 model via the FAL AI platform. Requires a FAL AI API key.