Dokumen-Pintar
62 format-aware tools for AI to read and edit Word, Excel, and PDF files with precision — plus academic document linting for Indonesian standards.
Dokumen-Pintar
Universal MCP server for cross-format document CRUD
Read, write, search, and manage text, Office, and PDF files
from any AI agent that supports the Model Context Protocol.
Features · Formats · Quick Start · Tools · Docs · Benchmark · Contributing
Why Dokumen-Pintar
Most filesystem MCP servers stop at "read this file, write that file." Dokumen-Pintar treats documents as structured data the AI can navigate. Tell an agent to update a paragraph by index, set a cell by Sheet1!B2, walk a JSON tree with JSONPath, generate a DOCX from Markdown - it works the same way across every supported format.
Every mutating tool snapshots the file first. Every action lands in an audit log. Every path is sandboxed to roots you opt in. The default config is sensible; the profiles cover the rest.
Features
|
Multi-root Sandbox — Define multiple workspace roots with per-root 10 Formats — Plain text, Markdown, LaTeX, JSON / YAML, CSV / TSV, XML / SVG, DOCX, XLSX, PPTX, PDF. 62 MCP Tools - File & content CRUD, structured access, batch operations, search, versioning, metadata, authoring, image extraction, sections, templates, TOC, bibliography, document compare, lint - all exposed as callable tools. Authoring — Generate DOCX or PDF from a JSON spec or Markdown source via |
Structured Access — JSONPath for JSON / YAML, XPath for XML, cell / range / sheet for XLSX, paragraph / table for DOCX, slide for PPTX, page for PDF. Automatic Versioning — Copy-on-write snapshots on every write. Undo, diff, restore, and purge anytime. Metadata Layer — Read, write, delete, or strip EXIF, OOXML core properties, and PDF docinfo through one consistent API. Audit Trail — Every mutation logged to JSONL with timestamp and operation details. 2 Transports — stdio (Claude Desktop, Cursor, VS Code, Windsurf) and HTTP / SSE. |
Supported Formats
| Format | Read | Write | Structured Query | Search |
|---|---|---|---|---|
| Plain text / Markdown | ✅ | ✅ | — | ✅ |
| JSON | ✅ | ✅ | JSONPath $.key | ✅ |
| YAML | ✅ | ✅ | JSONPath $.key | ✅ |
| CSV / TSV | ✅ | ✅ | row:N · col:NAME · cell:row:N,col:NAME | ✅ |
| XML / SVG | ✅ | ✅ | XPath //node | ✅ |
| DOCX | ✅ | ✅ | paragraph:N · table:N | ✅ |
| XLSX | ✅ | ✅ | cell:Sheet!A1 · range: · sheet: | ✅ |
| PPTX | ✅ | ✅ | slide:N · slide_title:N | ✅ |
| ✅ | — | page:N · outline · metadata | ✅ |
Quick Start
1. Install
pip install dokumen-pintar
From source (development)
git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"
With semantic search
pip install dokumen-pintar[semantic]
2. Create a Config
dokumen-pintar-init
Or create one manually:
{
"roots": [
{ "name": "documents", "path": "~/Documents", "writable": true },
{ "name": "projects", "path": "~/Projects", "writable": true }
]
}
All other fields are optional with sensible defaults. See docs/CONFIG.md.
3. Run
dokumen-pintar --config dokumen-pintar.config.json
Ad-hoc roots without a config file
Override or replace config roots from the command line — handy for one-off sessions or scripting:
# Single writable root, no config file required
dokumen-pintar --root docs:/path/to/folder
# Multiple roots, mix read-only and writable, choose stdio transport
dokumen-pintar \
--root project:/repo:rw \
--root refs:/library:ro \
--transport stdio
# Force every root to read-only (overrides config + --root)
dokumen-pintar --config myconfig.json --read-only
# Path-only shorthand (root name derived from basename)
dokumen-pintar --root /home/me/Documents
Health check
dokumen-pintar-doctor --config dokumen-pintar.config.json
Verifies config validity, root existence, .mcpdocs snapshot writability,
registered handlers, and optional semantic-search dependencies.
4. Connect to an AI Client
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"dokumen-pintar": {
"command": "dokumen-pintar",
"args": ["--config", "/path/to/dokumen-pintar.config.json"]
}
}
}
Cursor / VS Code / Windsurf
Use the same stdio transport. Point your IDE's MCP settings to the dokumen-pintar command and config path.
HTTP/SSE (remote or multi-client)
{
"transport": {
"stdio": false,
"http": { "enabled": true, "port": 7878 }
}
}
Start the server and connect your client to http://127.0.0.1:7878.
Usage Examples
# List available workspace roots
workspace_list_roots()
# Read a Word document
content_read(path="documents:/reports/q1.docx")
# Create a new file
file_create(path="documents:/notes/todo.txt", content="Hello World")
# Find & replace inside a file
content_replace(path="documents:/notes/todo.txt", old="World", new="Everyone")
# Full-text search across all PDFs
search_content(query="budget 2024", format="pdf")
# Read an Excel cell
structured_get(path="documents:/data.xlsx", expr="cell:Sheet1!B2")
# Update a JSON key
structured_set(path="documents:/config.json", expr="$.database.port", value=5432)
# Delete an XML node
structured_delete(path="documents:/data.xml", expr="//item[@id='old']")
# Batch rename (dry-run first)
batch_rename(glob="*.txt", pattern="draft_", replacement="final_", dry_run=true)
# Undo last change
version_undo(path="documents:/reports/q1.docx")
Full guide with recipes: docs/USAGE.md
Tools Overview
62 MCP tools organized by category:
| Category | Tools |
|---|---|
| Workspace | workspace_list_roots · workspace_stat · workspace_tree · workspace_diagnose |
| File CRUD | file_create · file_delete · file_rename · file_copy · file_move |
| Content | content_read · content_write · content_append · content_insert · content_replace · content_delete_range · content_patch · content_diff |
| Structured | struct_get · struct_set · struct_delete · struct_meta |
| Metadata | metadata_read · metadata_write · metadata_delete · metadata_strip · metadata_read_batch |
| Authoring | validate_spec · compose_docx · compose_pdf · compose_from_markdown · compose_to_markdown |
| Sections | section_extract · section_merge |
| Images | image_list · image_extract · image_extract_all · image_replace |
| Templates | template_list · template_install · template_render · template_render_named |
| TOC & Bibliography | toc_generate · bibliography_check · bibliography_format |
| Compare & Lint | document_compare · document_lint · document_lint_fix |
| Batch | batch_rename · batch_replace_content · batch_replace_structured · batch_delete |
| Search | search_filename · search_content · search_in_format |
| Versioning | version_list · version_diff · version_restore · version_undo · version_purge |
| Semantic * | search_semantic · semantic_index_path · semantic_stats |
* Only registered when semantic_search.enabled = true and the [semantic] extra is installed.
Full parameter reference: docs/TOOLS.md
Architecture
flowchart TD
Client["AI Client\n(Claude, Cursor, VS Code, ...)"]
Client -->|"MCP protocol\n(stdio or HTTP/SSE)"| Server
subgraph Server["dokumen-pintar server"]
PG["PathGuard\nsandboxed multi-root"]
H["Handlers\n9 format parsers"]
V["Versions\ncopy-on-write snapshots"]
A["AuditLog\nJSONL mutation log"]
S["Search\nfilename + content"]
SE["Semantic\nvector index (optional)"]
end
Server --> FS["Filesystem\n(sandboxed workspace roots)"]
Full details: docs/ARCHITECTURE.md
Testing
pip install -e ".[dev]"
pytest
1403Tests passed |
100%Line + branch coverage |
100%Minimum threshold |
-n autoParallel via xdist |
HTML coverage report: htmlcov/index.html
Performance numbers and methodology: docs/BENCHMARK.md
Documentation
| Document | Contents |
|---|---|
| USAGE.md | Workspace URIs, tool examples, practical recipes |
| CONFIG.md | All config fields with types, defaults, and notes |
| TOOLS.md | Full reference for all 62 tools |
| ARCHITECTURE.md | Module map, request flow, versioning, safety |
| BENCHMARK.md | Performance baselines and methodology |
| profiles/ | Six pre-tuned config profiles (personal, developer, research, ...) |
| AGENTS.md | Contributor guide: conventions, dev workflow, PR process |
Contributing
git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"
ruff check src/ # lint
mypy src/dokumen_pintar/ # type check
pytest # test + coverage
PRs welcome. All tests must pass and coverage must stay at 100%. Read AGENTS.md before submitting - it covers conventions, the handler protocol, and how to add a new format or tool.
License
MIT — 2026 firdausmntp
Built by firdausmntp
Related Servers
PDF MCP Server
Extract text and data from PDF files using pdfplumber by providing local file paths.
YggTorrent
A server to programmatically interact with the YggTorrent file-sharing platform.
FilerMoverMcp
A file mover tool that stages and executes file moves safely. Works as both a CLI tool and an MCP server for AI agents.
Akyn AI
Knowledge bases for AI agents via MCP
Knowerage
Local MCP that allows your agent to keep track of code analysis coverage
WebP Batch Converter
Batch convert PNG, JPG, and JPEG images to WebP format with options for quality, lossless mode, and multi-threaded processing.
HDFS MCP Server
Access and manage files on HDFS clusters using the MCP protocol, supporting operations like upload, download, move, and copy.
Vault Tools
Privacy-first file conversion MCP server : image, PDF, and text tools that run locally on your machine
zephex
Zephex is a hosted MCP gateway built for AI coding editors. It gives your agent 10 ready-to-use tools — check npm packages for vulnerabilities, audit security headers, read and search code, trace request flows, get project context from any repo, and more. One API key, works instantly with Claude Code, Cursor, VS Code, Windsurf, and others. Free to start at zephex.dev.
Fast Filesystem MCP
A high-performance Model Context Protocol (MCP) server that provides secure filesystem access and AI-optimized code development tools for Claude and other AI assistants.