Dokumen-Pintar

62 format-aware tools for AI to read and edit Word, Excel, and PDF files with precision — plus academic document linting for Indonesian standards.

Dokumen-Pintar logo

Dokumen-Pintar

Universal MCP server for cross-format document CRUD

Read, write, search, and manage text, Office, and PDF files
from any AI agent that supports the Model Context Protocol.

PyPI  Python 3.10+  MIT License  1403 tests passed  100% coverage

Features  ·  Formats  ·  Quick Start  ·  Tools  ·  Docs  ·  Benchmark  ·  Contributing

Baca dalam Bahasa Indonesia


Why Dokumen-Pintar

Most filesystem MCP servers stop at "read this file, write that file." Dokumen-Pintar treats documents as structured data the AI can navigate. Tell an agent to update a paragraph by index, set a cell by Sheet1!B2, walk a JSON tree with JSONPath, generate a DOCX from Markdown - it works the same way across every supported format.

Every mutating tool snapshots the file first. Every action lands in an audit log. Every path is sandboxed to roots you opt in. The default config is sensible; the profiles cover the rest.


Features

Multi-root Sandbox — Define multiple workspace roots with per-root writable control. All paths outside the sandbox are rejected.

10 Formats — Plain text, Markdown, LaTeX, JSON / YAML, CSV / TSV, XML / SVG, DOCX, XLSX, PPTX, PDF.

62 MCP Tools - File & content CRUD, structured access, batch operations, search, versioning, metadata, authoring, image extraction, sections, templates, TOC, bibliography, document compare, lint - all exposed as callable tools.

Authoring — Generate DOCX or PDF from a JSON spec or Markdown source via compose_docx / compose_pdf / compose_from_markdown.

Structured Access — JSONPath for JSON / YAML, XPath for XML, cell / range / sheet for XLSX, paragraph / table for DOCX, slide for PPTX, page for PDF.

Automatic Versioning — Copy-on-write snapshots on every write. Undo, diff, restore, and purge anytime.

Metadata Layer — Read, write, delete, or strip EXIF, OOXML core properties, and PDF docinfo through one consistent API.

Audit Trail — Every mutation logged to JSONL with timestamp and operation details.

2 Transports — stdio (Claude Desktop, Cursor, VS Code, Windsurf) and HTTP / SSE.


Supported Formats

FormatReadWriteStructured QuerySearch
Plain text / Markdown
JSONJSONPath $.key
YAMLJSONPath $.key
CSV / TSVrow:N · col:NAME · cell:row:N,col:NAME
XML / SVGXPath //node
DOCXparagraph:N · table:N
XLSXcell:Sheet!A1 · range: · sheet:
PPTXslide:N · slide_title:N
PDFpage:N · outline · metadata

Quick Start

1. Install

pip install dokumen-pintar
From source (development)
git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"
With semantic search
pip install dokumen-pintar[semantic]

2. Create a Config

dokumen-pintar-init

Or create one manually:

{
  "roots": [
    { "name": "documents", "path": "~/Documents", "writable": true },
    { "name": "projects",  "path": "~/Projects",  "writable": true }
  ]
}

All other fields are optional with sensible defaults. See docs/CONFIG.md.

3. Run

dokumen-pintar --config dokumen-pintar.config.json

Ad-hoc roots without a config file

Override or replace config roots from the command line — handy for one-off sessions or scripting:

# Single writable root, no config file required
dokumen-pintar --root docs:/path/to/folder

# Multiple roots, mix read-only and writable, choose stdio transport
dokumen-pintar \
  --root project:/repo:rw \
  --root refs:/library:ro \
  --transport stdio

# Force every root to read-only (overrides config + --root)
dokumen-pintar --config myconfig.json --read-only

# Path-only shorthand (root name derived from basename)
dokumen-pintar --root /home/me/Documents

Health check

dokumen-pintar-doctor --config dokumen-pintar.config.json

Verifies config validity, root existence, .mcpdocs snapshot writability, registered handlers, and optional semantic-search dependencies.

4. Connect to an AI Client

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "dokumen-pintar": {
      "command": "dokumen-pintar",
      "args": ["--config", "/path/to/dokumen-pintar.config.json"]
    }
  }
}
Cursor / VS Code / Windsurf

Use the same stdio transport. Point your IDE's MCP settings to the dokumen-pintar command and config path.

HTTP/SSE (remote or multi-client)
{
  "transport": {
    "stdio": false,
    "http": { "enabled": true, "port": 7878 }
  }
}

Start the server and connect your client to http://127.0.0.1:7878.


Usage Examples

# List available workspace roots
workspace_list_roots()

# Read a Word document
content_read(path="documents:/reports/q1.docx")

# Create a new file
file_create(path="documents:/notes/todo.txt", content="Hello World")

# Find & replace inside a file
content_replace(path="documents:/notes/todo.txt", old="World", new="Everyone")

# Full-text search across all PDFs
search_content(query="budget 2024", format="pdf")

# Read an Excel cell
structured_get(path="documents:/data.xlsx", expr="cell:Sheet1!B2")

# Update a JSON key
structured_set(path="documents:/config.json", expr="$.database.port", value=5432)

# Delete an XML node
structured_delete(path="documents:/data.xml", expr="//item[@id='old']")

# Batch rename (dry-run first)
batch_rename(glob="*.txt", pattern="draft_", replacement="final_", dry_run=true)

# Undo last change
version_undo(path="documents:/reports/q1.docx")

Full guide with recipes: docs/USAGE.md


Tools Overview

62 MCP tools organized by category:

CategoryTools
Workspaceworkspace_list_roots · workspace_stat · workspace_tree · workspace_diagnose
File CRUDfile_create · file_delete · file_rename · file_copy · file_move
Contentcontent_read · content_write · content_append · content_insert · content_replace · content_delete_range · content_patch · content_diff
Structuredstruct_get · struct_set · struct_delete · struct_meta
Metadatametadata_read · metadata_write · metadata_delete · metadata_strip · metadata_read_batch
Authoringvalidate_spec · compose_docx · compose_pdf · compose_from_markdown · compose_to_markdown
Sectionssection_extract · section_merge
Imagesimage_list · image_extract · image_extract_all · image_replace
Templatestemplate_list · template_install · template_render · template_render_named
TOC & Bibliographytoc_generate · bibliography_check · bibliography_format
Compare & Lintdocument_compare · document_lint · document_lint_fix
Batchbatch_rename · batch_replace_content · batch_replace_structured · batch_delete
Searchsearch_filename · search_content · search_in_format
Versioningversion_list · version_diff · version_restore · version_undo · version_purge
Semantic *search_semantic · semantic_index_path · semantic_stats

* Only registered when semantic_search.enabled = true and the [semantic] extra is installed.

Full parameter reference: docs/TOOLS.md


Architecture

flowchart TD
    Client["AI Client\n(Claude, Cursor, VS Code, ...)"]
    Client -->|"MCP protocol\n(stdio or HTTP/SSE)"| Server

    subgraph Server["dokumen-pintar server"]
        PG["PathGuard\nsandboxed multi-root"]
        H["Handlers\n9 format parsers"]
        V["Versions\ncopy-on-write snapshots"]
        A["AuditLog\nJSONL mutation log"]
        S["Search\nfilename + content"]
        SE["Semantic\nvector index (optional)"]
    end

    Server --> FS["Filesystem\n(sandboxed workspace roots)"]

Full details: docs/ARCHITECTURE.md


Testing

pip install -e ".[dev]"
pytest

1403

Tests passed

100%

Line + branch coverage

100%

Minimum threshold

-n auto

Parallel via xdist

HTML coverage report: htmlcov/index.html

Performance numbers and methodology: docs/BENCHMARK.md


Documentation

DocumentContents
USAGE.mdWorkspace URIs, tool examples, practical recipes
CONFIG.mdAll config fields with types, defaults, and notes
TOOLS.mdFull reference for all 62 tools
ARCHITECTURE.mdModule map, request flow, versioning, safety
BENCHMARK.mdPerformance baselines and methodology
profiles/Six pre-tuned config profiles (personal, developer, research, ...)
AGENTS.mdContributor guide: conventions, dev workflow, PR process

Contributing

git clone https://github.com/firdausmntp/Dokumen-Pintar.git
cd Dokumen-Pintar
pip install -e ".[dev]"

ruff check src/             # lint
mypy src/dokumen_pintar/    # type check
pytest                      # test + coverage

PRs welcome. All tests must pass and coverage must stay at 100%. Read AGENTS.md before submitting - it covers conventions, the handler protocol, and how to add a new format or tool.


License

MIT — 2026 firdausmntp


Built by firdausmntp

Related Servers

NotebookLM Web Importer

Import web pages and YouTube videos to NotebookLM with one click. Trusted by 200,000+ users.

Install Chrome Extension