MCP Server
Automate data science stages using your own CSV data files.
Auto ML - Automated Machine Learning Platform
An intelligent automated machine learning platform that provides comprehensive data analysis, preprocessing, model selection, and hyperparameter tuning capabilities through Model Context Protocol (MCP) tools.
π Features
π Data Analysis & Exploration
- Data Information: Get comprehensive dataset statistics including shape, memory usage, data types, and missing values
- CSV Reading: Efficient CSV file reading with pandas and pyarrow support
- Correlation Analysis: Visualize correlation matrices for numerical and categorical variables
- Outlier Detection: Identify and visualize outliers in your datasets
π§ Data Preprocessing
- Automated Preprocessing: Handle missing values, encode categorical variables, and scale numerical features
- Feature Engineering: Prepare features for both regression and classification problems
- Data Validation: Check for duplicates and data quality issues
π€ Machine Learning Models
- Multiple Algorithms: Support for various ML algorithms including:
- Regression: Linear Regression, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVR, KNN, CatBoost
- Classification: Logistic Regression, Ridge Classifier, Random Forest, XGBoost, SVM, KNN, Decision Tree, Naive Bayes, CatBoost
π Model Evaluation & Visualization
- Performance Metrics:
- Regression: RΒ², MAE, MSE
- Classification: Accuracy, F1-Score
- Confusion Matrix Visualization: For classification problems
- Model Comparison: Compare multiple models side-by-side
βοΈ Hyperparameter Tuning
- Automated Tuning: Optimize model hyperparameters using advanced search algorithms
- Customizable Scoring: Choose from various evaluation metrics
- Trial Management: Control the number of optimization trials
π Project Structure
AutoML/
βββ data/ # Sample datasets
β βββ Ai.csv
β βββ Calories.csv
β βββ Cost.csv
β βββ Digital.csv
β βββ Electricity.csv
β βββ ford.csv
β βββ Habits.csv
β βββ heart.csv
β βββ Lifestyle.csv
β βββ Mobiles.csv
β βββ Personality.csv
β βββ Salaries.csv
β βββ Shopper.csv
β βββ Sleep.csv
β βββ cat.csv
β βββ test.csv
β βββ train.csv
βββ tools/
β βββ all_tools.py # MCP tool definitions
βββ utils/
β βββ before_model.py # Feature preparation
β βββ details.py # Data information
β βββ external_test.py # External data test with XGBoost
β βββ feature_importance.py # Feature importance analysis
β βββ hyperparameter.py # Hyperparameter tuning
β βββ model_selection.py # Model selection and evaluation
β βββ prediction.py # Prediction utilities
β βββ preprocessing.py # Data preprocessing
β βββ read_csv_file.py # CSV reading utilities
β βββ visualize_data.py # Visualization functions
βββ main.py # Application entry point
βββ server.py # MCP server configuration
βββ requirements.txt # Python dependencies
βββ README.md # This file
π οΈ Installation
Prerequisites
- Python 3.8 or higher
- pip or uv package manager
Setup
-
Clone the repository
git clone https://github.com/emircansoftware/AutoML.git cd AutoML -
Install dependencies
# Using pip pip install -r requirements.txt pip install uv
Using with Claude Desktop
1. Data Path Setting
In utils/read_csv_file.py, update the path variable to match your own project directory on your computer:
# Example:
path = r"C:\\YOUR\\PROJECT\\PATH\\AutoML\\data"
2. Claude Desktop Configuration
In Claude Desktop, add the following block to your claude_desktop_config.json file and adjust the paths to match your own system:
{
"mcpServers": {
"AutoML": {
"command": "uv",
"args": [
"--directory",
"C:\\YOUR\\PROJECT\\PATH\\AutoML",
"run",
"main.py"
]
}
}
}
You can now start your project from Claude Desktop.
π Dependencies
- MCP Framework:
mcp[cli]>=1.9.4- Model Context Protocol for tool integration - Data Processing:
pandas>=2.3.0,pyarrow>=20.0.0,numpy>=2.3.1 - Machine Learning:
scikit-learn>=1.3.0,xgboost>=2.0.0,lightgbm>=4.3.0 - Additional ML:
catboost(for CatBoost models)
π― Usage
Starting the MCP Server
from server import mcp
# Run the server
mcp.run()
Available Tools
The platform provides the following MCP tools:
Data Analysis Tools
information_about_data(file_name): Give detailed information about the datareading_csv(file_name): Read the csv filevisualize_correlation_num(file_name): Visualize the correlation matrix for numerical columnsvisualize_correlation_cat(file_name): Visualize the correlation matrix for categorical columnsvisualize_correlation_final(file_name, target_column): Visualize the correlation matrix after preprocessingvisualize_outliers(file_name): Visualize outliers in the datavisualize_outliers_final(file_name, target_column): Visualize outliers after preprocessing
Preprocessing Tools
preprocessing_data(file_name, target_column): Preprocess the data (remove outliers, fill nulls, etc.)prepare_data(file_name, target_column, problem_type): Prepare the data for models (encoding, scaling, etc.)
Model Training & Evaluation
models(problem_type, file_name, target_column): Select and evaluate models based on problem typevisualize_accuracy_matrix(file_name, target_column, problem_type): Visualize the confusion matrix for predictionsbest_model_hyperparameter(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state): Tune the hyperparameters of the best modeltest_external_data(main_file_name, target_column, problem_type, test_file_name): Test external data with the best model and return predictionspredict_value(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state, input): Predict the value of the target column for new inputfeature_importance_analysis(file_name, target_column, problem_type): Analyze the feature importance of the data using XGBoost
Example Workflow
# 1. Analyze your data
info = information_about_data("data/heart.csv")
# 2. Preprocess the data
preprocessed = preprocessing_data("data/heart.csv", "target")
# 3. Prepare features for classification
features = prepare_data("data/heart.csv", "target", "classification")
# 4. Train and evaluate models
results = models("classification", "data/heart.csv", "target")
# 5. Visualize results
confusion_matrix = visualize_accuracy_matrix("data/heart.csv", "target", "classification")
# 6. Optimize best model
best_model = best_model_hyperparameter("RandomForestClassifier", "data/heart.csv", "target", "classification", 100, "accuracy", 42)
π Sample Datasets (All CSV datasets are from Kaggle.)
The project includes various sample datasets for testing:
- heart.csv: Heart disease prediction dataset
- Salaries.csv: Salary prediction dataset
- Calories.csv: Calorie prediction dataset
- Personality.csv: Personality analysis dataset
- Digital.csv: Digital behavior dataset
- Lifestyle.csv: Lifestyle analysis dataset
- Mobiles.csv: Mobile phone dataset
- Habits.csv: Habit analysis dataset
- Sleep.csv: Sleep pattern dataset
- Cost.csv: Cost analysis dataset
- ford.csv: Ford car dataset
- Ai.csv: AI-related dataset
- cat.csv: Cat-related dataset
π§ Configuration
Environment Variables
- Set your preferred random seed for reproducible results
- Configure MCP server settings in
server.py
Customization
- Add new ML algorithms in
utils/model_selection.py - Extend preprocessing steps in
utils/preprocessing.py - Create custom visualization functions in
utils/visualize_data.py
π€ Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Contributing Guidelines
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Model Context Protocol for the MCP framework
- scikit-learn for machine learning algorithms
- XGBoost for gradient boosting
- CatBoost for categorical boosting
- pandas for data manipulation
π Support
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Contact the maintainers
Related Servers
Kali AI Pentest MCP Tools
An AI penetration testing tool that uses natural language to operate various security tools like nmap, sqlmap, and metasploit.
Remote MCP Server on Cloudflare
An MCP server deployable on Cloudflare Workers with OAuth login support.
Metal MCP Server
Search Metal Framework documentation and generate code.
Nuxt MCP
MCP server helping models to understand your Vite/Nuxt app better.
Remote MCP Server on Cloudflare
A template for deploying a remote MCP server on Cloudflare Workers, allowing for custom tool integration.
BrowserStack
Bring the full power of BrowserStackβs Test Platform to your AI tools, making testing faster and easier for every developer and tester on your team.
WebDev MCP
Provides a collection of useful web development tools.
Document Schema Specifications
A collection of document schemas for standardizing project documentation across various software projects.
Honeybadger
Interact with the Honeybadger API for error and uptime monitoring.
pyATS
Interact with network devices using Cisco's pyATS and Genie libraries for model-driven automation.