KnowledgeBaseMCP
Extract text content from local PDF, DOCX, and PPTX files to build a knowledge base.
KnowledgeBaseMCP
A powerful Model Context Protocol (MCP) server for extracting text content from various document formats including PDF, DOCX, PPTX, and XLSX files. This tool enables AI assistants like Claude to read and analyze document contents from your local knowledge base, and also create new Excel spreadsheets.
š Features
Document Reading
- Multi-format support: Extract text from PDF, DOCX, PPTX, and XLSX files
- Directory processing: Process entire directories of documents
- Recursive scanning: Optionally scan subdirectories
- File metadata: Get detailed information about document files
- Error handling: Robust error handling with clear error messages
- Async processing: Efficient asynchronous document processing
Excel Spreadsheet Creation
- XLSX workbook creation: Create Excel files with multiple sheets
- DataFrame support: Convert pandas DataFrames to Excel
- Data formatting: Apply professional formatting and styling
- Report generation: Create structured reports with summaries
- Data appending: Add data to existing Excel files
- Template support: Use predefined templates for consistent formatting
Integration
- Easy integration: Simple setup with Claude Desktop
- MCP protocol: Built on the Model Context Protocol standard
š Supported File Types
Reading Support
- PDF (.pdf) - Portable Document Format (using pdfplumber)
- DOCX (.docx) - Microsoft Word documents
- PPTX (.pptx) - Microsoft PowerPoint presentations
- XLSX (.xlsx) - Microsoft Excel spreadsheets (using openpyxl and pandas)
Writing Support
- DOCX (.docx) - Create Word documents with formatting
- XLSX (.xlsx) - Create Excel workbooks with multiple sheets, formatting, and charts
š ļø Installation
Prerequisites
- Python 3.8 or higher
- Claude Desktop application
Setup
- Clone the repository
git clone https://github.com/mehmetozcan-zz/KnowledgeBaseMCP.git
cd KnowledgeBaseMCP
- Install dependencies
pip install -r requirements.txt
- Test the server
python test.py
āļø Configuration
Claude Desktop Integration
Add this server to your Claude Desktop configuration file:
Windows: %APPDATA%\\Claude\\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"knowledgebase": {
"command": "python",
"args": ["path/to/KnowledgeBaseMCP/launch_mcp.py"]
}
}
}
Replace path/to/KnowledgeBaseMCP with your actual installation path.
šÆ Usage
Once configured, you can use these tools in Claude:
Available Tools
Document Reading Tools
extract_text_from_file
Extract text content from a single document file.
Parameters:
file_path(string): Path to the document file
extract_text_from_directory
Extract text content from all supported documents in a directory.
Parameters:
directory_path(string): Path to the directory containing documentsrecursive(boolean, optional): Whether to search subdirectories recursively
list_supported_files
List all supported document files in a directory with metadata.
Parameters:
directory_path(string): Path to the directory to scan
DOCX Creation Tools
create_docx_document
Create a new Word document with text content.
Parameters:
content(string): Document contentfile_path(string): Output file path (.docx extension)title(string, optional): Document title
create_structured_report
Create a structured Word report with formatting.
Parameters:
report_data(object): Report data structurefile_path(string): Output file path (.docx extension)
XLSX Creation Tools
create_xlsx_workbook
Create a new Excel workbook with multiple sheets.
Parameters:
data(object): Dictionary with sheet names as keys and data as valuesfile_path(string): Output file path (.xlsx extension)apply_formatting(boolean, optional): Apply default formatting
create_xlsx_from_dataframe
Create Excel workbook from pandas DataFrames.
Parameters:
dataframes(object): Dictionary with sheet names and DataFrame datafile_path(string): Output file path (.xlsx extension)include_index(boolean, optional): Include DataFrame index
append_to_xlsx
Append data to existing Excel workbook.
Parameters:
file_path(string): Path to existing XLSX filesheet_name(string): Target sheet namedata(any): Data to append (list, dict, or DataFrame)
create_xlsx_report
Create a formatted Excel report with multiple sections.
Parameters:
report_data(object): Report structure with title, description, and data sectionsfile_path(string): Output file path (.xlsx extension)
Example Usage in Claude
Reading Documents
Please analyze all the documents in my Documents/Reports folder using your KnowledgeBaseMCP tools.
Creating Excel Reports
Create an Excel report with sales data for Q1 2025. Include a summary sheet and detailed transaction data.
Data Analysis and Export
Read the data from 'financial_report.xlsx' and create a new Excel file with a summary analysis.
Document Conversion
Extract content from all PDF files in my research folder and create a consolidated Excel workbook with the findings.
Claude will then use the MCP server to extract and analyze the content from your documents or create new Excel files as requested.
šļø Project Structure
KnowledgeBaseMCP/
āāā src/
ā āāā __init__.py # Package initialization
ā āāā main.py # Main MCP server
ā āāā extractors.py # Document reading classes
ā āāā docx_writer.py # Word document creation
ā āāā xlsx_writer.py # Excel spreadsheet creation
āāā requirements.txt # Python dependencies
āāā setup.py # Package setup
āāā README.md # This file
āāā LICENSE # MIT License
āāā launch_mcp.py # Server launcher
āāā run_server.py # Alternative launcher
āāā test.py # Basic test script
āāā test_xlsx.py # XLSX functionality tests
š§ Development
Running Tests
python test.py
Adding New File Formats
To add support for additional document formats:
- Add the file extension to
SUPPORTED_EXTENSIONSinextractors.py - Install the required library
- Add the library check to
check_dependencies() - Implement the extraction method (e.g.,
_extract_xlsx()) - Add the format handling to
extract_from_file()
Debugging
For debugging MCP connection issues:
- Check Claude Desktop logs
- Ensure the server starts without errors:
python launch_mcp.py - Verify the config file path and format
š¦ Dependencies
Core Dependencies
mcp>=0.9.0- Model Context Protocol framework
Document Reading
python-docx>=1.1.0- For DOCX file processingpdfplumber>=0.9.0- For PDF file processingpython-pptx>=0.6.23- For PPTX file processingopenpyxl>=3.1.0- For XLSX file reading/writingpandas>=2.0.0- For advanced data manipulation and analysis
Additional
lxml>=4.9.0- XML processing support
š¤ Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
š License
This project is licensed under the MIT License - see the LICENSE file for details.
š Acknowledgments
- Built with the Model Context Protocol
- Uses pdfplumber for PDF processing
- Uses python-docx for Word documents
- Uses python-pptx for PowerPoint presentations
š Support
If you encounter any issues or have questions, please open an issue on GitHub.
Made with ā¤ļø for the Claude AI community
Related Servers
Excel MCP Server
An MCP server for manipulating and managing Excel files.
Basic Memory
Build a persistent, local knowledge base in Markdown files through conversations with LLMs.
Download Assistant MCP
A universal file download assistant supporting secure and batch processing of any file type.
YaraFlux
An MCP server for YARA scanning, enabling LLMs to analyze files using YARA rules.
MCP File System Server
A server for secure, sandboxed file system operations.
Folder MCP
A server for local folder operations and file system access.
Desktop Commander MCP Server
A Node.js MCP server for managing local files, processes, and terminal sessions.
Filesystem MCP Server
A server for performing filesystem operations such as reading/writing files, managing directories, and searching.
SharePoint MCP Server
Browse and interact with Microsoft SharePoint sites and documents.
Android Filesystem
Securely browse and read files within an Android project, with built-in validation and access controls for sensitive directories.