PDF to Markdown MCP Server
Hosted MCP server that converts PDFs into clean, LLM-ready Markdown with tables, formulas (LaTeX) and OCR. Own engines (MinerU + Docling), not an LLM wrapper.
Documentation
Developer hub
Build with the PDF to Markdown API
One predictable job lifecycle over a REST API and an equivalent hosted MCP – create a job, wait for ready, fetch the Markdown, free the slot. The API and MCP never bypass product limits.
View pricing Jump to the API OpenAPI
At a glance
Two surfaces, one engine
Pick the integration that fits. Both call the same conversion engine and obey the same slots, limits and retention.
REST API
HTTPS endpoints with a bearer API key. Stable DTOs, predictable errors, idempotent create.
See the lifecycle
Hosted MCP
A managed Model Context Protocol endpoint exposing conversion as agent tools – a thin wrapper over the same API.
Connect MCP
Custom GPT Actions
Import a reduced OpenAPI spec into a ChatGPT Custom GPT so it can convert PDFs as a built-in tool.
Set up the action
Authentication
Bearer API keys over HTTPS
The API and MCP use bearer API keys – distinct from the device-signed path the Chrome extension uses. A free Google account is required to generate keys.
Get a key
- Sign in with Google (free account).
- Generate an API key in your account; it is shown once.
- Send it as
Authorization: Bearer p2m_…on every request. - Keys are secrets: store them server-side, rotate and revoke any time.
Honest defaults
Keys, not passwords. The extension stays anonymous and device-signed; API/MCP keys are a separate, account-bound credential.
HTTPS only. Always send keys over TLS; never embed a key in client-side code shipped to users.
Idempotent create. An optional Idempotency-Key on create lets you retry safely without duplicate jobs.
Scopes. Each API key carries scopes: jobs:create, jobs:read, jobs:download, jobs:delete (the defaults), plus settings:read / settings:write. Mint least-privilege keys; the REST API and MCP tools both enforce the key's scopes.
REST API
Create a job, wait, fetch Markdown, clean the slot
One predictable lifecycle, two ways to drive it: call the REST API from your own code, or use the equivalent hosted MCP tools. Never claim a result before status=ready.
REST API Hosted MCP
1
Create the job
POST a PDF URL or upload bytes. Get back a job id and slot. Idempotency-Key is honored but optional.
POST /api/v2/jobsmcp · pdf_to_markdown_create_job_from_url
2
Check status
Poll the job until ready or error, or register a signed webhook on paid tiers instead of polling.
GET /api/v2/jobs/{id}mcp · pdf_to_markdown_get_job
3
Fetch the Markdown
Download the result once ready. Read truncated and pages to know if a long document was returned partially.
GET /api/v2/jobs/{id}/downloadmcp · pdf_to_markdown_get_markdown
4
Delete / clean the slot
Free a slot when you're done. Deleting queued or processing jobs is destructive – confirm it in user-facing clients.
DELETE /api/v2/jobs/{id}mcp · pdf_to_markdown_delete_job
# 1. create a job from a PDF URL
curl -X POST https://pdf2md.dev/api/v2/jobs \
-H "Authorization: Bearer p2m_…" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com/report.pdf"}'
# → { "job_id": "job_9f3c…", "status": "queued" }
# 2. poll status
curl https://pdf2md.dev/api/v2/jobs/job_9f3c… \
-H "Authorization: Bearer p2m_…"
# → { "status": "ready", "pages": 24, "truncated": false }
# 3. fetch the Markdown
curl https://pdf2md.dev/api/v2/jobs/job_9f3c…/download \
-H "Authorization: Bearer p2m_…"
# 4. free the slot
curl -X DELETE https://pdf2md.dev/api/v2/jobs/job_9f3c… \
-H "Authorization: Bearer p2m_…"
Errors. Responses use stable shapes and predictable HTTP codes (400 bad input, 401 auth, 404 unknown job, 409 no free slot / slots_full, 413 too large, 429 rate limited). The full schema lives in the OpenAPI spec.
More job endpoints
Create from file (multipart)POST /api/v2/jobs
List your jobsGET /api/v2/jobs
Batch create (paid)POST /api/v2/jobs/batch
Create accepts a JSON url or a multipart file, plus optional file_name, external_id, tags and callback_url / callback_secret for a per-job webhook. Batch create is all-or-nothing and must fit your free slots.
The job object
job_idstring
statusqueued · processing · ready · error
pages · output_sizeinteger
truncatedboolean
error_code · error_messagereason (when error)
download_urlstring (when ready)
external_id · tagsyour metadata
slot_usage · tierquota context
Account & usage. Check your tier, limits and usage at runtime with GET /api/v2/me, /api/v2/limits and /api/v2/usage; manage keys at /api/v2/api-keys and webhooks at /api/v2/webhooks.
Quickstart
Convert a PDF to Markdown in Python
The same four calls from any language. Here it is with requests. For a full step-by-step walkthrough with error handling, see the Python tutorial.
# pip install requests
import time, requests
API = "https://pdf2md.dev/api/v2"
H = {"Authorization": "Bearer p2m_…"}
# 1. create a job from a PDF URL (or post a file with files={"file": ...})
job = requests.post(f"{API}/jobs", headers=H,
json={"url": "https://example.com/report.pdf"}).json()
jid = job["job_id"]
# 2. poll until ready (or register a webhook instead)
while True:
j = requests.get(f"{API}/jobs/{jid}", headers=H).json()
if j["status"] in ("ready", "error"):
break
time.sleep(3)
# 3. download the Markdown
md = requests.get(f"{API}/jobs/{jid}/download", headers=H).text
print(md)
More recipes: file upload & Node
Create from a local file (curl)
# multipart upload of a local PDF
curl -X POST https://pdf2md.dev/api/v2/jobs \
-H "Authorization: Bearer p2m_…" \
-F "[email protected]" \
-F "file_name=document.pdf"
Node 18+ (global fetch)
// create from URL, poll, download
const API = "https://pdf2md.dev/api/v2";
const H = { Authorization: "Bearer p2m_…" };
let job = await (await fetch(`${API}/jobs`, {
method: "POST",
headers: { ...H, "Content-Type": "application/json" },
body: JSON.stringify({ url: "https://example.com/report.pdf" })
})).json();
while (job.status === "queued" || job.status === "processing") {
await new Promise(s => setTimeout(s, 2000));
job = await (await fetch(`${API}/jobs/${job.job_id}`, { headers: H })).json();
}
if (job.status === "ready") {
const md = await (await fetch(`${API}/jobs/${job.job_id}/download`, { headers: H })).text();
console.log(md);
}
Webhook signature verification is in the Webhooks section; the MCP client config is in the MCP section. Full schema: OpenAPI.
Hosted MCP
Conversion as agent tools
Connect a compatible agent to our managed MCP endpoint. The tools are a thin wrapper over the REST API, so every call obeys the same slots, limits and retention.
1
Point the agent at the endpoint
JSON-RPC 2.0 over Streamable HTTP with your API key as the bearer token. No local server to run. Methods: initialize, tools/list, tools/call, ping.
POST https://pdf2md.dev/api/v2/mcp
2
Call the tools
The same lifecycle plus limits, exposed as seven tools. Each respects the key's scopes; tools/call responses include slot_usage and tier.
create_job_from_url · create_job_from_upload (jobs:create)list_jobs · get_job (jobs:read)get_markdown (jobs:download) · delete_job (jobs:delete)get_limits
3
Respect the rules
Wait for ready before using output; confirm before deleting queued/processing jobs; handle truncated and 429 Retry-After. (Tool names are prefixed pdf_to_markdown_.)
// MCP client config (hosted, no local process)
{
"mcpServers": {
"pdf2md": {
"url": "https://pdf2md.dev/api/v2/mcp",
"headers": {
"Authorization": "Bearer p2m_…"
}
}
}
}
OpenAPI & Custom GPT Actions
Import the spec, get a built-in tool
We publish two specs: the full OpenAPI for developers, and a reduced action spec with the safe, minimal subset for AI clients and ChatGPT Custom GPT Actions.
Full OpenAPI
The complete contract: every endpoint, parameter, DTO and error. Generate clients or explore it in your tooling.
Open the full spec
Reduced spec for Custom GPT
A minimal action subset (create, status, fetch) for ChatGPT Custom GPT Actions. Import the URL, set your API key as the auth, and your GPT converts PDFs natively.
Open the reduced spec
The reduced spec is a convenience for AI clients, not a security boundary – the same auth, scopes and limits apply as on the full API.
Limits & rate limits
Per-tier limits, applied to API and MCP alike
Limits come from your tier and apply identically across every surface. Live values are on the pricing page.
Free tier (with an account)
Active slots (queue depth)3
Max PDF size10 MB
Time budget per document15 min
Ready result retention1 hour
Paid tiers raise slots, file size, time budget, retention and rate limits, and add webhooks and batch create. Compare plans →
Rate limits & backpressure
Per-tier rate limits. Requests are rate limited per key; exceed them and you get 429 with a Retry-After header – back off and retry.
Slot pressure. If all slots are busy, create returns 409. Free a slot with delete, or wait for a job to finish.
Priority on paid. Paid jobs run with higher queue priority on a dedicated paid conversion pool, so they don't wait behind the free backlog.
Webhooks
Get notified instead of polling
On paid tiers, register a signed webhook (or pass a per-job callback_url) and we POST you on every notable terminal event: job.ready, job.error, job.truncated and job.deleted. The event is a notification, not a delivery: it carries no document content, so fetch the Markdown over the API after you receive it.
1
Register an endpoint
POST an HTTPS URL (SSRF-guarded) and an optional events filter. The signing secret whsec_… is returned once. Or set callback_url + callback_secret on a single job.
POST /api/v2/webhooksGET /api/v2/webhooks/deliveries
2
Receive the event
We POST JSON with headers X-P2M-Event, X-P2M-Timestamp, X-P2M-Delivery and X-P2M-Signature.
3
Verify, then act
Recompute the signature, ack with 2xx, and be idempotent (deliveries can retry with backoff). Then download the Markdown.
# delivery → your endpoint
X-P2M-Event: job.ready
X-P2M-Timestamp: 1718900000
X-P2M-Signature: sha256=9a8b7c…
{
"event": "job.ready",
"job": {
"job_id": "job_9f3c…",
"status": "ready",
"pages": 24, "truncated": false,
"download_url": "/api/v2/jobs/job_9f3c…/download"
}
}
# verify (Python): signature = sha256= + hex(HMAC(secret, "ts.rawbody"))
import hmac, hashlib
def verify(secret, ts, raw_body, sig):
expected = "sha256=" + hmac.new(
secret.encode(), f"{ts}.".encode() + raw_body,
hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, sig)
Prompts & safety
Portable instructions for agents
Drop these rules into an agent's system prompt so it drives the tools correctly and never invents results.
Wait for ready
Never claim or summarize a result before status=ready. While queued or processing, keep polling or wait for the webhook.
Confirm deletes
Deleting a queued or processing job is destructive. Ask the user before calling pdf_to_markdown_delete_job on a non-finished job.
Handle truncation
If truncated=true, tell the user the document was returned partially up to the tier time budget, and offer a higher tier or splitting the file.
Respect 429
On 429, wait for Retry-After seconds before retrying. Don't hammer the queue.
Clean slots
Delete finished jobs you no longer need so you don't exhaust your slots.
Read discovery
Start from /llms.txt and the OpenAPI spec rather than guessing endpoints from prose.
Machine-readable discovery
Everything an agent needs to integrate without reading source code: a compact capabilities file, a detailed context file and the OpenAPI spec.
llms.txt llms-full.txt OpenAPI