Dokumen-Pintar

62 format-aware tools for AI to read and edit Word, Excel, and PDF files with precision — plus academic document linting for Indonesian standards.

GitHub

Dokumen-Pintar

Universal MCP server for cross-format document CRUD

Read, write, search, and manage text, Office, and PDF files
from any AI agent that supports the Model Context Protocol.

Features · Formats · Quick Start · Tools · Docs · Benchmark · Contributing

Baca dalam Bahasa Indonesia

Why Dokumen-Pintar

Most filesystem MCP servers stop at "read this file, write that file." Dokumen-Pintar treats documents as structured data the AI can navigate. Tell an agent to update a paragraph by index, set a cell by Sheet1!B2, walk a JSON tree with JSONPath, generate a DOCX from Markdown - it works the same way across every supported format.

Every mutating tool snapshots the file first. Every action lands in an audit log. Every path is sandboxed to roots you opt in. The default config is sensible; the profiles cover the rest.

Features

Multi-root Sandbox — Define multiple workspace roots with per-root writable control. All paths outside the sandbox are rejected.

10 Formats — Plain text, Markdown, LaTeX, JSON / YAML, CSV / TSV, XML / SVG, DOCX, XLSX, PPTX, PDF.

62 MCP Tools - File & content CRUD, structured access, batch operations, search, versioning, metadata, authoring, image extraction, sections, templates, TOC, bibliography, document compare, lint - all exposed as callable tools.

Authoring — Generate DOCX or PDF from a JSON spec or Markdown source via compose_docx / compose_pdf / compose_from_markdown.

Structured Access — JSONPath for JSON / YAML, XPath for XML, cell / range / sheet for XLSX, paragraph / table for DOCX, slide for PPTX, page for PDF.

Automatic Versioning — Copy-on-write snapshots on every write. Undo, diff, restore, and purge anytime.

Metadata Layer — Read, write, delete, or strip EXIF, OOXML core properties, and PDF docinfo through one consistent API.

Audit Trail — Every mutation logged to JSONL with timestamp and operation details.

2 Transports — stdio (Claude Desktop, Cursor, VS Code, Windsurf) and HTTP / SSE.

Supported Formats

Format	Read	Write	Structured Query	Search
Plain text / Markdown	✅	✅	—	✅
JSON	✅	✅	JSONPath `$.key`	✅
YAML	✅	✅	JSONPath `$.key`	✅
CSV / TSV	✅	✅	`row:N` · `col:NAME` · `cell:row:N,col:NAME`	✅
XML / SVG	✅	✅	XPath `//node`	✅
DOCX	✅	✅	`paragraph:N` · `table:N`	✅
XLSX	✅	✅	`cell:Sheet!A1` · `range:` · `sheet:`	✅
PPTX	✅	✅	`slide:N` · `slide_title:N`	✅
PDF	✅	—	`page:N` · `outline` · `metadata`	✅

Quick Start

1. Install

pip install dokumen-pintar

From source (development)

git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"

With semantic search

pip install dokumen-pintar[semantic]

2. Create a Config

dokumen-pintar-init

Or create one manually:

{
  "roots": [
    { "name": "documents", "path": "~/Documents", "writable": true },
    { "name": "projects",  "path": "~/Projects",  "writable": true }
  ]
}

All other fields are optional with sensible defaults. See docs/CONFIG.md.

3. Run

dokumen-pintar --config dokumen-pintar.config.json

Ad-hoc roots without a config file

Override or replace config roots from the command line — handy for one-off sessions or scripting:

# Single writable root, no config file required
dokumen-pintar --root docs:/path/to/folder

# Multiple roots, mix read-only and writable, choose stdio transport
dokumen-pintar \
  --root project:/repo:rw \
  --root refs:/library:ro \
  --transport stdio

# Force every root to read-only (overrides config + --root)
dokumen-pintar --config myconfig.json --read-only

# Path-only shorthand (root name derived from basename)
dokumen-pintar --root /home/me/Documents

Health check

dokumen-pintar-doctor --config dokumen-pintar.config.json

Verifies config validity, root existence, .mcpdocs snapshot writability, registered handlers, and optional semantic-search dependencies.

4. Connect to an AI Client

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "dokumen-pintar": {
      "command": "dokumen-pintar",
      "args": ["--config", "/path/to/dokumen-pintar.config.json"]
    }
  }
}

Cursor / VS Code / Windsurf

Use the same stdio transport. Point your IDE's MCP settings to the dokumen-pintar command and config path.

HTTP/SSE (remote or multi-client)

{
  "transport": {
    "stdio": false,
    "http": { "enabled": true, "port": 7878 }
  }
}

Start the server and connect your client to http://127.0.0.1:7878.

Usage Examples

# List available workspace roots
workspace_list_roots()

# Read a Word document
content_read(path="documents:/reports/q1.docx")

# Create a new file
file_create(path="documents:/notes/todo.txt", content="Hello World")

# Find & replace inside a file
content_replace(path="documents:/notes/todo.txt", old="World", new="Everyone")

# Full-text search across all PDFs
search_content(query="budget 2024", format="pdf")

# Read an Excel cell
structured_get(path="documents:/data.xlsx", expr="cell:Sheet1!B2")

# Update a JSON key
structured_set(path="documents:/config.json", expr="$.database.port", value=5432)

# Delete an XML node
structured_delete(path="documents:/data.xml", expr="//item[@id='old']")

# Batch rename (dry-run first)
batch_rename(glob="*.txt", pattern="draft_", replacement="final_", dry_run=true)

# Undo last change
version_undo(path="documents:/reports/q1.docx")

Full guide with recipes: docs/USAGE.md

Tools Overview

62 MCP tools organized by category:

Category	Tools
Workspace	`workspace_list_roots` · `workspace_stat` · `workspace_tree` · `workspace_diagnose`
File CRUD	`file_create` · `file_delete` · `file_rename` · `file_copy` · `file_move`
Content	`content_read` · `content_write` · `content_append` · `content_insert` · `content_replace` · `content_delete_range` · `content_patch` · `content_diff`
Structured	`struct_get` · `struct_set` · `struct_delete` · `struct_meta`
Metadata	`metadata_read` · `metadata_write` · `metadata_delete` · `metadata_strip` · `metadata_read_batch`
Authoring	`validate_spec` · `compose_docx` · `compose_pdf` · `compose_from_markdown` · `compose_to_markdown`
Sections	`section_extract` · `section_merge`
Images	`image_list` · `image_extract` · `image_extract_all` · `image_replace`
Templates	`template_list` · `template_install` · `template_render` · `template_render_named`
TOC & Bibliography	`toc_generate` · `bibliography_check` · `bibliography_format`
Compare & Lint	`document_compare` · `document_lint` · `document_lint_fix`
Batch	`batch_rename` · `batch_replace_content` · `batch_replace_structured` · `batch_delete`
Search	`search_filename` · `search_content` · `search_in_format`
Versioning	`version_list` · `version_diff` · `version_restore` · `version_undo` · `version_purge`
Semantic *	`search_semantic` · `semantic_index_path` · `semantic_stats`

_{* Only registered when semantic_search.enabled = true and the [semantic] extra is installed.}

Full parameter reference: docs/TOOLS.md

Architecture

flowchart TD
    Client["AI Client\n(Claude, Cursor, VS Code, ...)"]
    Client -->|"MCP protocol\n(stdio or HTTP/SSE)"| Server

    subgraph Server["dokumen-pintar server"]
        PG["PathGuard\nsandboxed multi-root"]
        H["Handlers\n9 format parsers"]
        V["Versions\ncopy-on-write snapshots"]
        A["AuditLog\nJSONL mutation log"]
        S["Search\nfilename + content"]
        SE["Semantic\nvector index (optional)"]
    end

    Server --> FS["Filesystem\n(sandboxed workspace roots)"]

Full details: docs/ARCHITECTURE.md

Testing

pip install -e ".[dev]"
pytest

1403

_{Tests passed}

100%

_{Line + branch coverage}

100%

_{Minimum threshold}

-n auto

_{Parallel via xdist}

HTML coverage report: htmlcov/index.html

Performance numbers and methodology: docs/BENCHMARK.md

Documentation

Document	Contents
USAGE.md	Workspace URIs, tool examples, practical recipes
CONFIG.md	All config fields with types, defaults, and notes
TOOLS.md	Full reference for all 62 tools
ARCHITECTURE.md	Module map, request flow, versioning, safety
BENCHMARK.md	Performance baselines and methodology
profiles/	Six pre-tuned config profiles (personal, developer, research, ...)
AGENTS.md	Contributor guide: conventions, dev workflow, PR process

Contributing

git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"

ruff check src/             # lint
mypy src/dokumen_pintar/    # type check
pytest                      # test + coverage

PRs welcome. All tests must pass and coverage must stay at 100%. Read AGENTS.md before submitting - it covers conventions, the handler protocol, and how to add a new format or tool.

License

MIT — 2026 firdausmntp

_{Built by firdausmntp}

Related Servers

PDF MCP Server

Extract text and data from PDF files using pdfplumber by providing local file paths.

YggTorrent

A server to programmatically interact with the YggTorrent file-sharing platform.

FilerMoverMcp

A file mover tool that stages and executes file moves safely. Works as both a CLI tool and an MCP server for AI agents.

Akyn AI

Knowledge bases for AI agents via MCP

Knowerage

Local MCP that allows your agent to keep track of code analysis coverage

WebP Batch Converter

Batch convert PNG, JPG, and JPEG images to WebP format with options for quality, lossless mode, and multi-threaded processing.

HDFS MCP Server

Access and manage files on HDFS clusters using the MCP protocol, supporting operations like upload, download, move, and copy.

Vault Tools

Privacy-first file conversion MCP server : image, PDF, and text tools that run locally on your machine

zephex

Zephex is a hosted MCP gateway built for AI coding editors. It gives your agent 10 ready-to-use tools — check npm packages for vulnerabilities, audit security headers, read and search code, trace request flows, get project context from any repo, and more. One API key, works instantly with Claude Code, Cursor, VS Code, Windsurf, and others. Free to start at zephex.dev.

Fast Filesystem MCP

A high-performance Model Context Protocol (MCP) server that provides secure filesystem access and AI-optimized code development tools for Claude and other AI assistants.

Dokumen-Pintar

Dokumen-Pintar

Why Dokumen-Pintar

Features

Supported Formats

Quick Start

1. Install

2. Create a Config

3. Run

Ad-hoc roots without a config file

Health check

4. Connect to an AI Client

Usage Examples

Tools Overview

Architecture

Testing

1403

100%

100%

-n auto

Documentation

Contributing

License

Related Servers

PDF MCP Server

YggTorrent

FilerMoverMcp

Akyn AI

Knowerage

WebP Batch Converter

HDFS MCP Server

Vault Tools

zephex

Fast Filesystem MCP

NotebookLM Web Importer