ShopGraph
Structured product data from the open web — where platform APIs don't reach. Schema.org + AI extraction. Pay per call via Stripe MPP.
ShopGraph
The extraction API that shows its work. Send a URL or raw HTML, get structured JSON with per-field confidence scoring and extraction provenance — every field shows which method produced it (Schema.org, LLM inference, or headless browser) and how confident the system is. Set strict_confidence_threshold and uncertain fields are removed server-side before they reach your agent. 50 free calls/month.
Website: https://shopgraph.dev | API: https://shopgraph.dev/api/enrich/basic | MCP: https://shopgraph.dev/mcp
UCP output validated with ucp-schema v1.1.0 — the official Universal Commerce Protocol schema validator.
Quick Start
# Free — no API key, no signup
curl -X POST https://shopgraph.dev/api/enrich/basic \
-H "Content-Type: application/json" \
-d '{"url": "https://www.allbirds.com/products/mens-tree-runners"}'
Returns structured JSON with per-field confidence scores:
{
"product": {
"product_name": "Men's Tree Runners",
"brand": "Allbirds",
"price": { "amount": 100, "currency": "USD" },
"availability": "in_stock",
"categories": ["Shoes", "Running"],
"confidence": { "overall": 0.95 },
"_shopgraph": {
"field_confidence": {
"product_name": 0.97,
"brand": 0.95,
"price": 0.98,
"availability": 0.90
}
}
},
"free_tier": { "used": 1, "limit": 50 }
}
Tools / Endpoints
| Tool | REST Endpoint | Price | What It Does |
|---|---|---|---|
enrich_basic | POST /api/enrich/basic | Free (shared quota) | Schema.org extraction only. Fast, zero LLM cost. |
enrich_product | POST /api/enrich | Free 50/mo, then subscription or $0.02/call | Full pipeline with per-field confidence scoring and extraction provenance. |
enrich_html | POST /api/enrich/html | Subscription or $0.02/call | Bring your own HTML. Works with Bright Data, Firecrawl, or any fetch/proxy tool. |
Pricing: Free (50/mo) | Starter $99/mo (10K calls) | Growth $299/mo (50K calls) | Enterprise (custom). Pay-per-call via Stripe MPP still available for agents. Cached results (24h) are free. No charge for failed extractions.
How It Works
Your agent sends a URL (or raw HTML)
→ Tier 1: Schema.org/JSON-LD parsing (0.93 baseline confidence, instant)
→ Tier 2: LLM extracts from page text when structured data is absent (0.70 baseline)
→ Tier 3: Headless Playwright renders JavaScript, then extracts (additional inference step)
→ Returns ProductData with per-field confidence scores and extraction provenance
(which tier produced each field) in _shopgraph.field_confidence
→ Set strict_confidence_threshold to remove low-confidence fields server-side
before they reach your agent
→ Add format=ucp for Universal Commerce Protocol output
Authentication: API key (sg_live_ keys) for subscription tiers, or Stripe MPP for pay-per-call agents.
ShopGraph is a structuring layer, not a fetcher. It's complementary to Bright Data, Firecrawl, and other fetch/proxy tools. They handle retrieval. ShopGraph handles extraction provenance and per-field confidence scoring.
REST API
POST /api/enrich/basic (Free tier)
curl -X POST https://shopgraph.dev/api/enrich/basic \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/product"}'
Schema.org only. Shares the free-tier quota with /api/enrich. No signup needed.
POST /api/enrich (Full extraction)
# With API key (subscription)
curl -X POST https://shopgraph.dev/api/enrich \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sg_live_..." \
-d '{"url": "https://example.com/product", "strict_confidence_threshold": 0.8, "format": "ucp"}'
# With Stripe MPP (pay-per-call)
curl -X POST https://shopgraph.dev/api/enrich \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/product", "payment_method_id": "pm_..."}'
Full pipeline: Schema.org → LLM inference → headless browser. 50 free calls/month. Authenticate with API key (sg_live_) or Stripe MPP for higher limits.
POST /api/enrich/html (Bring your own HTML)
curl -X POST https://shopgraph.dev/api/enrich/html \
-H "Content-Type: application/json" \
-d '{"html": "<html>...</html>", "url": "https://example.com/product", "payment_method_id": "pm_..."}'
Already fetched the page? Pipe the HTML to ShopGraph for structuring.
MCP Configuration
{
"mcpServers": {
"shopgraph": {
"type": "url",
"url": "https://shopgraph.dev/mcp"
}
}
}
Works with Claude, Claude Code, Cursor, Windsurf, CrewAI, LangGraph, AutoGen, and any MCP client.
Extracted Data
Every response includes:
| Field | Description |
|---|---|
product_name | Product title |
brand | Manufacturer or brand |
price | Amount + currency + sale price |
availability | in_stock, out_of_stock, preorder, unknown |
categories | Product taxonomy |
image_urls | Product images (enrich_product/enrich_html only) |
color | Available colors |
material | Materials/fabrics |
dimensions | Size/weight info |
confidence | Overall + per-field scores (0-1) |
_shopgraph.field_confidence | Per-field confidence with field-type modifiers |
Self-Hosted Setup
git clone https://github.com/laundromatic/shopgraph.git
cd shopgraph
npm install
Required .env:
| Variable | Purpose |
|---|---|
STRIPE_TEST_SECRET_KEY | Stripe secret key (test or live) |
GOOGLE_API_KEY | Gemini API key for Tier 2 (LLM) inference |
UPSTASH_REDIS_REST_URL | Upstash Redis for stats/monitoring (optional) |
UPSTASH_REDIS_REST_TOKEN | Upstash Redis token (optional) |
npm run build # Compile TypeScript
npm start # Run MCP server (stdio)
npm run start:http # Run HTTP server
npm run dev # Dev mode (no build needed)
npm run test:run # Run 118 tests
Monitoring
ShopGraph runs 118 automated tests across 22 product verticals. Self-healing pipeline with circuit breaker, URL verification, and health alerts.
- Health: https://shopgraph.dev/health
- Stats: https://shopgraph.dev/api/stats
- Dashboard: Live on shopgraph.dev homepage
License
Apache 2.0
Built By
Krishna Brown | Los Angeles, CA
Related Servers
Bright Data
sponsorDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
Oxylabs
Scrape websites with Oxylabs Web API, supporting dynamic rendering and parsing for structured data extraction.
brosh
A browser screenshot tool to capture scrolling screenshots of webpages using Playwright, with support for intelligent section identification and multiple output formats.
Amazon Scraper API
An MCP server that connects AI agents to Amazon product, search, and review data across 20 marketplaces via the ChocoData Amazon Scraper API.
Playwright MCP Server
An MCP server using Playwright for browser automation and webscrapping
Browserless
Scrape and automate any webpage using headless browsers, captcha solving, and advanced stealth features, in an optimized infrastructure that works in seconds.
Postman V2
An MCP server that provides access to Postman using V2 api version.
Oxylabs AI Studio
AI-powered tools for web scraping, crawling, and browser automation.
Crawl4AI
Web scraping skill for Claude AI. Crawl websites, extract structured data with CSS/LLM strategies, handle dynamic JavaScript content. Built on crawl4ai with complete SDK reference, example scripts, and tests.
YouTube
Fetch YouTube subtitles
MCP Query Table
Query financial web tables from sources like iwencai, tdx, and eastmoney using Playwright.