KnowledgeBaseMCP
Extract text content from local PDF, DOCX, and PPTX files to build a knowledge base.
KnowledgeBaseMCP
A powerful Model Context Protocol (MCP) server for extracting text content from various document formats including PDF, DOCX, PPTX, and XLSX files. This tool enables AI assistants like Claude to read and analyze document contents from your local knowledge base, and also create new Excel spreadsheets.
š Features
Document Reading
- Multi-format support: Extract text from PDF, DOCX, PPTX, and XLSX files
- Directory processing: Process entire directories of documents
- Recursive scanning: Optionally scan subdirectories
- File metadata: Get detailed information about document files
- Error handling: Robust error handling with clear error messages
- Async processing: Efficient asynchronous document processing
Excel Spreadsheet Creation
- XLSX workbook creation: Create Excel files with multiple sheets
- DataFrame support: Convert pandas DataFrames to Excel
- Data formatting: Apply professional formatting and styling
- Report generation: Create structured reports with summaries
- Data appending: Add data to existing Excel files
- Template support: Use predefined templates for consistent formatting
Integration
- Easy integration: Simple setup with Claude Desktop
- MCP protocol: Built on the Model Context Protocol standard
š Supported File Types
Reading Support
- PDF (.pdf) - Portable Document Format (using pdfplumber)
- DOCX (.docx) - Microsoft Word documents
- PPTX (.pptx) - Microsoft PowerPoint presentations
- XLSX (.xlsx) - Microsoft Excel spreadsheets (using openpyxl and pandas)
Writing Support
- DOCX (.docx) - Create Word documents with formatting
- XLSX (.xlsx) - Create Excel workbooks with multiple sheets, formatting, and charts
š ļø Installation
Prerequisites
- Python 3.8 or higher
- Claude Desktop application
Setup
- Clone the repository
git clone https://github.com/mehmetozcan-zz/KnowledgeBaseMCP.git
cd KnowledgeBaseMCP
- Install dependencies
pip install -r requirements.txt
- Test the server
python test.py
āļø Configuration
Claude Desktop Integration
Add this server to your Claude Desktop configuration file:
Windows: %APPDATA%\\Claude\\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"knowledgebase": {
"command": "python",
"args": ["path/to/KnowledgeBaseMCP/launch_mcp.py"]
}
}
}
Replace path/to/KnowledgeBaseMCP
with your actual installation path.
šÆ Usage
Once configured, you can use these tools in Claude:
Available Tools
Document Reading Tools
extract_text_from_file
Extract text content from a single document file.
Parameters:
file_path
(string): Path to the document file
extract_text_from_directory
Extract text content from all supported documents in a directory.
Parameters:
directory_path
(string): Path to the directory containing documentsrecursive
(boolean, optional): Whether to search subdirectories recursively
list_supported_files
List all supported document files in a directory with metadata.
Parameters:
directory_path
(string): Path to the directory to scan
DOCX Creation Tools
create_docx_document
Create a new Word document with text content.
Parameters:
content
(string): Document contentfile_path
(string): Output file path (.docx extension)title
(string, optional): Document title
create_structured_report
Create a structured Word report with formatting.
Parameters:
report_data
(object): Report data structurefile_path
(string): Output file path (.docx extension)
XLSX Creation Tools
create_xlsx_workbook
Create a new Excel workbook with multiple sheets.
Parameters:
data
(object): Dictionary with sheet names as keys and data as valuesfile_path
(string): Output file path (.xlsx extension)apply_formatting
(boolean, optional): Apply default formatting
create_xlsx_from_dataframe
Create Excel workbook from pandas DataFrames.
Parameters:
dataframes
(object): Dictionary with sheet names and DataFrame datafile_path
(string): Output file path (.xlsx extension)include_index
(boolean, optional): Include DataFrame index
append_to_xlsx
Append data to existing Excel workbook.
Parameters:
file_path
(string): Path to existing XLSX filesheet_name
(string): Target sheet namedata
(any): Data to append (list, dict, or DataFrame)
create_xlsx_report
Create a formatted Excel report with multiple sections.
Parameters:
report_data
(object): Report structure with title, description, and data sectionsfile_path
(string): Output file path (.xlsx extension)
Example Usage in Claude
Reading Documents
Please analyze all the documents in my Documents/Reports folder using your KnowledgeBaseMCP tools.
Creating Excel Reports
Create an Excel report with sales data for Q1 2025. Include a summary sheet and detailed transaction data.
Data Analysis and Export
Read the data from 'financial_report.xlsx' and create a new Excel file with a summary analysis.
Document Conversion
Extract content from all PDF files in my research folder and create a consolidated Excel workbook with the findings.
Claude will then use the MCP server to extract and analyze the content from your documents or create new Excel files as requested.
šļø Project Structure
KnowledgeBaseMCP/
āāā src/
ā āāā __init__.py # Package initialization
ā āāā main.py # Main MCP server
ā āāā extractors.py # Document reading classes
ā āāā docx_writer.py # Word document creation
ā āāā xlsx_writer.py # Excel spreadsheet creation
āāā requirements.txt # Python dependencies
āāā setup.py # Package setup
āāā README.md # This file
āāā LICENSE # MIT License
āāā launch_mcp.py # Server launcher
āāā run_server.py # Alternative launcher
āāā test.py # Basic test script
āāā test_xlsx.py # XLSX functionality tests
š§ Development
Running Tests
python test.py
Adding New File Formats
To add support for additional document formats:
- Add the file extension to
SUPPORTED_EXTENSIONS
inextractors.py
- Install the required library
- Add the library check to
check_dependencies()
- Implement the extraction method (e.g.,
_extract_xlsx()
) - Add the format handling to
extract_from_file()
Debugging
For debugging MCP connection issues:
- Check Claude Desktop logs
- Ensure the server starts without errors:
python launch_mcp.py
- Verify the config file path and format
š¦ Dependencies
Core Dependencies
mcp>=0.9.0
- Model Context Protocol framework
Document Reading
python-docx>=1.1.0
- For DOCX file processingpdfplumber>=0.9.0
- For PDF file processingpython-pptx>=0.6.23
- For PPTX file processingopenpyxl>=3.1.0
- For XLSX file reading/writingpandas>=2.0.0
- For advanced data manipulation and analysis
Additional
lxml>=4.9.0
- XML processing support
š¤ Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
š License
This project is licensed under the MIT License - see the LICENSE file for details.
š Acknowledgments
- Built with the Model Context Protocol
- Uses pdfplumber for PDF processing
- Uses python-docx for Word documents
- Uses python-pptx for PowerPoint presentations
š Support
If you encounter any issues or have questions, please open an issue on GitHub.
Made with ā¤ļø for the Claude AI community
Related Servers
HDFS MCP Server
Access and manage files on HDFS clusters using the MCP protocol, supporting operations like upload, download, move, and copy.
Cursor MCP File Organizer
Organize files in your Downloads folder using Cursor IDE with customizable rules.
MCP PDF Reader
Extract text, images, and perform OCR on PDF documents using Tesseract OCR.
Basic Memory
Build a persistent, local knowledge base in Markdown files through conversations with LLMs.
mini_mcp
A lightweight plugin to list all files and folders on the current macOS user's desktop.
Obsidian MCP Server - Enhanced
Provides comprehensive access to an Obsidian vault, allowing AI agents to read, write, search, and manage notes via the Local REST API plugin.
JSON MCP Server
A high-performance MCP server for comprehensive JSON file operations, including reading, writing, and advanced querying, optimized for LLM interactions.
Desktop Commander MCP
Execute terminal commands and edit local files on your desktop.
Deep Directory Tree MCP
Visualize directory structures with real-time updates, configurable depth, and smart exclusions for efficient project navigation.
Readonly Filesystem MCP Server
Provides read-only access to local files and directories.