Advanced computer vision and object detection MCP server powered by Dino-X, enabling AI agents to analyze images, detect objects, identify keypoints, and perform visual understanding tasks.
English | ไธญๆ
Enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.
Although multimodal models can understand and describe images, they often lack precise localization and high-quality structured outputs for visual content.
With DINO-X MCP, you can:
๐ง Achieve fine-grained image understanding โ both full-scene recognition and targeted detection based on natural language.
๐ฏ Accurately obtain object count, position, and attributes, enabling tasks such as visual question answering.
๐งฉ Integrate with other MCP Servers to build multi-step visual workflows.
๐ ๏ธ Build natural language-driven visual agents for real-world automation scenarios.
๐ฏ Scenario | ๐ Input | โจ Output |
---|---|---|
Detection & Localization | ๐ฌ Prompt:Detect and visualize the fire areas in the forest ๐ผ๏ธ Input Image: | |
Object Counting | ๐ฌ Prompt:Please analyze this warehouse image, detect all the cardboard boxes, count the total number ๐ผ๏ธ Input Image: | |
Feature Detection | ๐ฌ Prompt:Find all red cars in the image ๐ผ๏ธ Input Image: | |
Attribute Reasoning | ๐ฌ Prompt:Find the tallest person in the image, describe their clothing ๐ผ๏ธ Input Image: | |
Full Scene Detection | ๐ฌ Prompt:Find the fruit with the highest vitamin C content in the image ๐ผ๏ธ Input Image: | Answer: Kiwi fruit (93mg/100g) |
Pose Analysis | ๐ฌ Prompt:Please analyze what yoga pose this is ๐ผ๏ธ Input Image: |
You can install Node.js using one of the following methods:
# For MacOS or Linux
# 1. Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# OR
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# 2. Add these lines to your profile (~/.bash_profile, ~/.zshrc, ~/.profile, or ~/.bashrc)
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"
# 3. Activate nvm in current shell
source ~/.bashrc
# Or
source ~/.zshrc
# 4. Verify nvm installation
command -v nvm
# 5. Install and use LTS version of Node.js
nvm install --lts
nvm use --lts
# For Windows
winget install OpenJS.NodeJS.LTS
# Or using PowerShell (Administrator)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y
Download the installer from nodejs.org
Also, choose an AI assistants and applications that support the MCP Client, including but not limited to:
You can use DINO-X MCP server in two ways:
Add the following configuration in your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "npx",
"args": ["-y", "@deepdataspace/dinox-mcp"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
First, clone and build the project:
# Clone the project
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP
# Install dependencies
pnpm install
# Build the project
pnpm run build
Then configure your MCP client:
{
"mcpServers": {
"dinox-mcp": {
"command": "node",
"args": ["/path/to/DINO-X-MCP/build/index.js"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
Get your API key from DINO-X Platform (A free quota is available for new users).
Replace your-api-key-here
in the configuration above with your actual API key.
The DINO-X MCP server supports the following environment variables:
Variable Name | Description | Required | Default Value | Example |
---|---|---|---|---|
DINOX_API_KEY | Your DINO-X API key for authentication | Required | - | your-api-key-here |
IMAGE_STORAGE_DIRECTORY | Directory where generated visualization images will be saved | Optional | macOS/Linux: /tmp/dinox-mcp Windows: %TEMP%\dinox-mcp | /Users/admin/Downloads/dinox-images |
Restart your MCP client, and you should be able to use the following tools:
Method Name | Description | Input | Output |
---|---|---|---|
detect-all-objects | Detects and localizes all recognizable objects in an image. | Image | Category names + bounding boxes + captions |
object-detection-by-text | Detects and localizes objects in an image based on a natural language prompt. | Image + Text prompt | Bounding boxes + object captions |
detect-human-pose-keypoints | Detects 17 human body keypoints per person in an image for pose estimation. | Image | Keypoint coordinates and captions |
visualize-detections | Visualizes detection results by drawing bounding boxes and labels on the image. | Image + Detection results | Annotated image saved to storage directory |
https://
๐file://
)jpg, jpeg, png, webp
Please refer to DINO-X Platform for API usage limits and pricing information.
During development, you can use watch mode for automatic rebuilding:
pnpm run watch
Use MCP Inspector to debug the server:
pnpm run inspector
Apache License 2.0
Reference / test server with prompts, resources, and tools
Create crafted UI components inspired by the best 21st.dev design engineers.
ALAPI MCP Tools,Call hundreds of API interfaces via MCP
AI-powered SVG animation generator that transforms static files into animated SVG components using the Allyson platform
APIMatic MCP Server is used to validate OpenAPI specifications using APIMatic. The server processes OpenAPI files and returns validation summaries by leveraging APIMaticโs API.
Enable AI agents to interact with the Atla API for state-of-the-art LLMJ evaluation.
Generate images using Amazon Nova Canvas with text prompts and color guidance.
Bring the full power of BrowserStackโs Test Platform to your AI tools, making testing faster and easier for every developer and tester on your team.
Flag features, manage company data, and control feature access using Bucket.
Official MCP server for Buildable AI-powered development platform. Enables AI assistants to manage tasks, track progress, get project context, and collaborate with humans on software projects.