Data scraping, conversion, and extraction tools from Dumpling AI.
A Model Context Protocol (MCP) server implementation that integrates with Dumpling AI for data scraping, content processing, knowledge management, AI agents, and code execution capabilities.
To install mcp-server-dumplingai for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @Dumpling-AI/mcp-server-dumplingai --client claude
env DUMPLING_API_KEY=your_api_key npx -y mcp-server-dumplingai
npm install -g mcp-server-dumplingai
Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+
To configure Dumpling AI MCP in Cursor:
{
"mcpServers": {
"dumplingai": {
"command": "npx",
"args": ["-y", "mcp-server-dumplingai"],
"env": {
"DUMPLING_API_KEY": "<your-api-key>"
}
}
}
}
If you are using Windows and are running into issues, try
cmd /c "set DUMPLING_API_KEY=your-api-key && npx -y mcp-server-dumplingai"
Replace your-api-key
with your Dumpling AI API key.
DUMPLING_API_KEY
: Your Dumpling AI API key (required)get-youtube-transcript
)Extract transcripts from YouTube videos with optional timestamps.
{
"name": "get-youtube-transcript",
"arguments": {
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"includeTimestamps": true,
"timestampsToCombine": 3,
"preferredLanguage": "en"
}
}
search
)Perform Google web searches and optionally scrape content from results.
{
"name": "search",
"arguments": {
"query": "machine learning basics",
"country": "us",
"language": "en",
"dateRange": "pastMonth",
"scrapeResults": true,
"numResultsToScrape": 3,
"scrapeOptions": {
"format": "markdown",
"cleaned": true
}
}
}
get-autocomplete
)Get Google search autocomplete suggestions for a query.
{
"name": "get-autocomplete",
"arguments": {
"query": "how to learn",
"country": "us",
"language": "en",
"location": "New York"
}
}
search-maps
)Search Google Maps for locations and businesses.
{
"name": "search-maps",
"arguments": {
"query": "coffee shops",
"gpsPositionZoom": "37.7749,-122.4194,14z",
"language": "en",
"page": 1
}
}
search-places
)Search for places with more detailed information.
{
"name": "search-places",
"arguments": {
"query": "hotels in paris",
"country": "fr",
"language": "en",
"page": 1
}
}
search-news
)Search for news articles with customizable parameters.
{
"name": "search-news",
"arguments": {
"query": "climate change",
"country": "us",
"language": "en",
"dateRange": "pastWeek"
}
}
get-google-reviews
)Retrieve Google reviews for businesses or places.
{
"name": "get-google-reviews",
"arguments": {
"businessName": "Eiffel Tower",
"location": "Paris, France",
"limit": 10,
"sortBy": "relevance"
}
}
scrape
)Extract content from a web page with formatting options.
{
"name": "scrape",
"arguments": {
"url": "https://example.com",
"format": "markdown",
"cleaned": true,
"renderJs": true
}
}
crawl
)Recursively crawl websites and extract content with customizable parameters.
{
"name": "crawl",
"arguments": {
"baseUrl": "https://example.com",
"maxPages": 10,
"crawlBeyondBaseUrl": false,
"depth": 2,
"scrapeOptions": {
"format": "markdown",
"cleaned": true,
"renderJs": true
}
}
}
screenshot
)Capture screenshots of web pages with customizable viewport and format options.
{
"name": "screenshot",
"arguments": {
"url": "https://example.com",
"width": 1280,
"height": 800,
"fullPage": true,
"format": "png",
"waitFor": 1000
}
}
extract
)Extract structured data from web pages using AI-powered instructions.
{
"name": "extract",
"arguments": {
"url": "https://example.com/products",
"instructions": "Extract all product names, prices, and descriptions from this page",
"schema": {
"products": [
{
"name": "string",
"price": "number",
"description": "string"
}
]
},
"renderJs": true
}
}
doc-to-text
)Convert documents to plaintext with optional OCR.
{
"name": "doc-to-text",
"arguments": {
"url": "https://example.com/document.pdf",
"options": {
"ocr": true,
"language": "en"
}
}
}
convert-to-pdf
)Convert various file formats to PDF.
{
"name": "convert-to-pdf",
"arguments": {
"url": "https://example.com/document.docx",
"format": "docx",
"options": {
"quality": 90,
"pageSize": "A4",
"margin": 10
}
}
}
merge-pdfs
)Combine multiple PDFs into a single document.
{
"name": "merge-pdfs",
"arguments": {
"urls": ["https://example.com/doc1.pdf", "https://example.com/doc2.pdf"],
"options": {
"addPageNumbers": true,
"addTableOfContents": true
}
}
}
trim-video
)Extract a specific clip from a video.
{
"name": "trim-video",
"arguments": {
"url": "https://example.com/video.mp4",
"startTime": 30,
"endTime": 60,
"output": "mp4",
"options": {
"quality": 720,
"fps": 30
}
}
}
extract-document
)Extract specific content from documents in various formats.
{
"name": "extract-document",
"arguments": {
"url": "https://example.com/document.pdf",
"format": "structured",
"options": {
"ocr": true,
"language": "en",
"includeMetadata": true
}
}
}
extract-image
)Extract text and information from images.
{
"name": "extract-image",
"arguments": {
"url": "https://example.com/image.jpg",
"extractionType": "text",
"options": {
"language": "en",
"detectOrientation": true
}
}
}
extract-audio
)Transcribe and extract information from audio files.
{
"name": "extract-audio",
"arguments": {
"url": "https://example.com/audio.mp3",
"language": "en",
"options": {
"model": "enhanced",
"speakerDiarization": true,
"wordTimestamps": true
}
}
}
extract-video
)Extract content from videos including transcripts, scenes, and objects.
{
"name": "extract-video",
"arguments": {
"url": "https://example.com/video.mp4",
"extractionType": "transcript",
"options": {
"language": "en",
"speakerDiarization": true
}
}
}
read-pdf-metadata
)Extract metadata from PDF files.
{
"name": "read-pdf-metadata",
"arguments": {
"url": "https://example.com/document.pdf",
"includeExtended": true
}
}
write-pdf-metadata
)Update metadata in PDF files.
{
"name": "write-pdf-metadata",
"arguments": {
"url": "https://example.com/document.pdf",
"metadata": {
"title": "New Title",
"author": "John Doe",
"keywords": ["keyword1", "keyword2"]
}
}
}
generate-agent-completion
)Get AI agent completions with optional tool definitions.
{
"name": "generate-agent-completion",
"arguments": {
"prompt": "How can I improve my website's SEO?",
"model": "gpt-4",
"temperature": 0.7,
"maxTokens": 500,
"context": ["The website is an e-commerce store selling handmade crafts."]
}
}
search-knowledge-base
)Search a knowledge base for relevant information.
{
"name": "search-knowledge-base",
"arguments": {
"kbId": "kb_12345",
"query": "How to optimize database performance",
"limit": 5,
"similarityThreshold": 0.7
}
}
add-to-knowledge-base
)Add entries to a knowledge base.
{
"name": "add-to-knowledge-base",
"arguments": {
"kbId": "kb_12345",
"entries": [
{
"text": "MongoDB is a document-based NoSQL database.",
"metadata": {
"source": "MongoDB documentation",
"category": "databases"
}
}
],
"upsert": true
}
}
generate-ai-image
)Generate images using AI models.
{
"name": "generate-ai-image",
"arguments": {
"prompt": "A futuristic city with flying cars and neon lights",
"width": 1024,
"height": 1024,
"numImages": 1,
"quality": "hd",
"style": "photorealistic"
}
}
generate-image
)Generate images using various AI providers.
{
"name": "generate-image",
"arguments": {
"prompt": "A golden retriever in a meadow of wildflowers",
"provider": "dalle",
"width": 1024,
"height": 1024,
"numImages": 1
}
}
run-js-code
)Execute JavaScript code with optional dependencies.
{
"name": "run-js-code",
"arguments": {
"code": "const result = [1, 2, 3, 4].reduce((sum, num) => sum + num, 0); console.log(`Sum: ${result}`); return result;",
"dependencies": {
"lodash": "^4.17.21"
},
"timeout": 5000
}
}
run-python-code
)Execute Python code with optional dependencies.
{
"name": "run-python-code",
"arguments": {
"code": "import numpy as np\narr = np.array([1, 2, 3, 4, 5])\nmean = np.mean(arr)\nprint(f'Mean: {mean}')\nreturn mean",
"dependencies": ["numpy", "pandas"],
"timeout": 10000,
"saveOutputFiles": true
}
}
The server provides robust error handling:
Example error response:
{
"content": [
{
"type": "text",
"text": "Error: Failed to fetch YouTube transcript: 404 Not Found"
}
],
"isError": true
}
# Install dependencies
npm install
# Build
npm run build
MIT License - see LICENSE file for details
Fast, token-efficient web content extraction that converts websites to clean Markdown. Features Mozilla Readability, smart caching, polite crawling with robots.txt support, and concurrent fetching with minimal dependencies.
A server for browser automation using Playwright, providing powerful tools for web scraping, testing, and automation.
Extracts web content using the Jina.ai Reader API.
Download webpages as markdown files using the r.jina.ai service, with configurable directories and persistent settings.
Intelligent web page fetching with automatic cookie support and CSS selector extraction.
A web crawling framework that integrates the Model Context Protocol (MCP) with the Colly web scraping library.
AI tools for web scraping, crawling, browser control, and web search via the Oxylabs AI Studio API.
Access Outscraper's data extraction services for business intelligence, location data, reviews, and contact information from various online platforms.
Interact with WebScraping.AI for web data extraction and scraping.
A browser screenshot tool to capture scrolling screenshots of webpages using Playwright, with support for intelligent section identification and multiple output formats.