AC to Automation Converter
An AI-powered system that converts Acceptance Criteria (AC) from QA specifications into automated browser testing workflows.
๐ค AI-Powered Browser Automation with Vision
Convert natural language into real browser automation using AI Vision and execute tests immediately with live process logs!
Inspired by Skyvern, this system combines traditional DOM automation with AI Vision that "sees" web pages like a human.
โจ Key Features
๐๏ธ AI Vision Mode
- Visual Understanding: AI analyzes screenshots to find elements visually
- Robust Automation: Works even when websites change their HTML structure
- Natural Descriptions: Use "click the blue login button" instead of CSS selectors
- Human-like Interaction: Sees pages exactly like humans do
๐ง Three Automation Modes
- DOM Mode: Traditional CSS selector-based (fast)
- Vision Mode: AI visual understanding (robust)
- Hybrid Mode: Smart fallback - tries DOM first, uses Vision if needed
๐ Real-Time Process Logs
- Live Execution Logs: Watch automation steps in real-time
- Floating Log Panel: See progress without scrolling
- Color-coded Messages: Easy to spot successes, warnings, and errors
- Detailed Timestamps: Track execution timing precisely
๐ Immediate Execution
- Real Browser Testing: Uses ChromeDriver for actual browser interaction
- AI-Powered Generation: OpenRouter AI converts natural language to automation
- Multiple Script Formats: Generate MCP Browser, Selenium, and Playwright scripts
- Visual Feedback: Screenshots and detailed execution reports
๐ฏ What You Can Automate
๐ User Workflows
- Login/registration flows
- E-commerce checkout processes
- Form submissions and validations
- Multi-step wizards
๐จ Visual Interactions
- Click buttons by description ("red submit button")
- Find inputs by visual context ("email field in top-right")
- Navigate by visual landmarks ("menu button with hamburger icon")
- Verify visual states ("success message appears")
๐ Content Testing
- Text presence verification
- Element visibility checks
- Page state validation
- Dynamic content testing
๐ ๏ธ Setup & Installation
๐ Prerequisites
- Install Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Install ChromeDriver:
macOS with Homebrew
brew install chromedriver
Ubuntu/Debian
sudo apt-get install chromium-chromedriver
Windows: Download from https://chromedriver.chromium.org/
- Get OpenRouter API Key (Recommended):
- Sign up at openrouter.ai
- Get your API key (starts with
sk-or-v1-...) - ๐ One key works for both AI generation AND vision models!
๐ Quick Start
- Clone and Build:
git clone cd ai-ac-automation cargo build --release
- Configure Environment Variables (Recommended):
Create .env file with your OpenRouter API key
echo "OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here" > .env
- Start ChromeDriver (in separate terminal):
chromedriver --port=9515
- Start the Web Interface:
cargo run --bin automation-ui
- Open Browser: Go to
http://localhost:3001
๐ API Key Configuration
You have two options for configuring your OpenRouter API key:
Option 1: Environment Variables (Recommended)
Create .env file in project root
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
Start the server - API key loaded automatically!
cargo run --bin automation-ui
โ Benefits:
- Secure: API key never appears in UI or logs
- Convenient: No need to enter key every time
- Universal: Works for both AI generation and vision
- Safe:
.envis in.gitignore- won't be committed
Option 2: Web Form
- Leave
.envempty or don't create it - Enter API key directly in the web interface forms
- Works for individual sessions
๐ก Pro Tip: Use Option 1 for development, Option 2 for sharing/demos!
๐ Environment File (.env) Format
Your .env file should contain:
Required: OpenRouter API key for all AI features
OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here
Optional: Default models (can be changed in UI)
AI_MODEL=anthropic/claude-3.5-sonnet VISION_MODEL=openai/gpt-4o
Optional: Browser settings
HEADLESS=false BROWSER_WIDTH=1920 BROWSER_HEIGHT=1080
Optional: Server port
PORT=3001
๐ Security Notes:
- Never commit
.envto version control - Keep your API keys secure and rotate them regularly
- Use different keys for development and production
๐ฏ How to Use
๐ง Basic DOM Automation
- Enter URL:
https://google.com - Test Scenario:
- Click on search box
- Type "browser automation"
- Press Enter
- Verify results appear
- Click on first result
- Execute: Check "Execute immediately" โ Click "Generate & Execute"
๐๏ธ AI Vision Mode (Recommended)
- Enable Vision: โ Check "Use AI Vision Mode (Like Skyvern)"
- Configure:
- Mode: Hybrid (tries DOM first, falls back to Vision)
- API Key: Auto-loaded from
.envor enter manually - Model: GPT-4 Omni (recommended)
- Natural Test Scenario:
Website: https://example.com/login
Test:
- Find the email input field
- Type [email protected]
- Find the password field
- Type mypassword123
- Click the blue login button
- Verify the dashboard appears
๐ค AI-Powered Test Generation
- Enable AI: โ Check "Use AI-Powered Automation"
- API Key: Auto-loaded from
.envor enter your OpenRouter key - Describe Naturally:
Test the login functionality:
- User should be able to log in with valid credentials
- After login, dashboard should be visible
- User profile should show correct information
- Logout should work properly
๐ฏ Example with Environment Variables:
If you have OPENROUTER_API_KEY in your .env file:
- โ No API key entry needed - works automatically!
- โ Same key works for both AI generation and vision
- โ Secure - never appears in forms or logs
- โ Fast - instant access to all AI features
๐ง Supported AI Models
๐ All models available through OpenRouter with a single API key!
๐๏ธ Vision Models (Real AI Vision Integration)
| OpenRouter Model ID | Provider | Vision Quality | Speed | Best For |
|---|---|---|---|---|
| openai/gpt-4o | OpenAI | โญโญโญโญโญ | โญโญโญโญ | Overall best choice ๐ |
| openai/gpt-4-vision-preview | OpenAI | โญโญโญโญ | โญโญโญ | Detailed analysis |
| anthropic/claude-3.5-sonnet | Anthropic | โญโญโญโญโญ | โญโญโญโญ | Complex reasoning |
| google/gemini-2.0-flash-001 | โญโญโญโญ | โญโญโญโญโญ | Fastest option ๐ | |
| google/gemini-pro-vision | โญโญโญ | โญโญโญโญ | Cost-effective |
๐ง Text Generation Models (AI Test Creation)
| OpenRouter Model ID | Provider | Quality | Speed | Best For |
|---|---|---|---|---|
| anthropic/claude-3.5-sonnet | Anthropic | โญโญโญโญโญ | โญโญโญโญ | Best reasoning ๐ง |
| openai/gpt-4o | OpenAI | โญโญโญโญโญ | โญโญโญโญ | Complex automation |
| openai/gpt-3.5-turbo | OpenAI | โญโญโญโญ | โญโญโญโญโญ | Fast & affordable ๐ฐ |
| google/gemini-pro | โญโญโญโญ | โญโญโญโญ | Good alternative |
๐ Single API Key Benefits:
- One account for all AI providers
- Unified billing and usage tracking
- Rate limiting across all models
- Easy model switching in the UI
- No separate API keys to manage
๐ Real-Time Execution Logs
๐จ Live Log Display
[12:09:15.234] INFO: ๐ Initializing Chrome WebDriver...
[12:09:15.456] SUCCESS: โ
Chrome WebDriver initialized successfully
[12:09:15.567] INFO: ๐ง Running in HYBRID mode (DOM + Vision)
[12:09:15.678] INFO: ๐ Navigating to: https://example.com
[12:09:17.123] SUCCESS: โ
Navigated to https://example.com
[12:09:17.234] INFO: ๐๏ธ AI Vision: Looking for 'email input field' to click
[12:09:17.456] INFO: ๐ง Analyzing screenshot with AI Vision
[12:09:18.789] SUCCESS: โ
AI Vision found coordinates: (450, 320)
[12:09:18.890] INFO: ๐ฑ๏ธ Clicking at coordinates (450, 320)
[12:09:19.123] SUCCESS: โ
Vision-clicked at coordinates (450, 320)
[12:09:19.234] INFO: โจ๏ธ Vision-typing '[email protected]' at coordinates (450, 320)
[12:09:19.567] SUCCESS: โ
Vision-typed '[email protected]' at coordinates (450, 320)
๐จ Color-Coded Messages
- ๐ข SUCCESS: Operations completed successfully
- ๐ต INFO: General information and progress
- ๐ก WARN: Warnings and fallback actions
- ๐ด ERROR: Failures and issues
๐ง Advanced Configuration
๐ฅ๏ธ Programmatic Usage
use automation_browser::{AutomationExecutor, AutomationMode};
#[tokio::main] async fn main() -> Result<(), Box> { // Create executor with Vision Mode let mut executor = AutomationExecutor::new()? .with_vision_mode("sk-your-openai-key".to_string(), Some("gpt-4o".to_string())) .with_headless(false);
// Execute workflow
let (report, logs) = executor.execute_workflow(&workflow).await?;
println!("Success rate: {:.1}%", report.success_rate() * 100.0);
println!("Logs captured: {}", logs.len());
Ok(())
}
โ๏ธ Automation Modes Comparison
| Feature | DOM Mode | Vision Mode | Hybrid Mode |
|---|---|---|---|
| Speed | โญโญโญโญโญ | โญโญโญ | โญโญโญโญ |
| Reliability | โญโญโญ | โญโญโญโญโญ | โญโญโญโญโญ |
| Setup Complexity | โญโญ | โญโญโญโญ | โญโญโญ |
| Website Changes | โญ | โญโญโญโญโญ | โญโญโญโญโญ |
| Natural Language | โญ | โญโญโญโญโญ | โญโญโญโญ |
๐จ Web Interface Features
๐ Smart Forms
- Live Examples: Click examples to auto-fill forms
- Vision Configuration: Easy setup for AI Vision mode
- Real-time Validation: Immediate feedback on inputs
- Progress Tracking: Live execution status
๐ Enhanced Results
- Execution Statistics: Success rates, timing, step counts
- Visual Logs: Floating panels and detailed terminals
- Screenshot Gallery: Automatic screenshots during execution
- Script Export: Download generated automation scripts
๐ง Debug Features
- Step-by-step Breakdown: See each action executed
- Error Highlighting: Clear error messages and solutions
- Retry Logic: Automatic retries with exponential backoff
- Fallback Options: Hybrid mode switches strategies automatically
๐ก๏ธ Security & Best Practices
๐ API Key Security
- Store API keys securely (never commit to version control)
- Use environment variables for production
- Rotate keys regularly
- Monitor API usage and costs
๐งช Testing Environment
- Use test accounts and sandbox environments
- Avoid testing on production systems
- Set up dedicated test data
- Use headless mode for CI/CD
๐ Website Considerations
- Respect robots.txt and website terms
- Add delays between actions to avoid rate limiting
- Handle dynamic content and loading states
- Consider website anti-automation measures
๐ Troubleshooting
๐ง ChromeDriver Issues
Check ChromeDriver status
curl http://localhost:9515/status
Restart ChromeDriver
pkill chromedriver chromedriver --port=9515
๐ Environment Variable Issues
Check if .env file exists and has correct format
cat .env
Verify environment variable is loaded
echo $OPENROUTER_API_KEY
Check server status for API key
curl http://localhost:3001/api/env-status
๐๏ธ Vision Mode Issues
- API Key: Verify OpenRouter key is valid (
sk-or-v1-...) - Environment: Check
.envfile or form input - Model Access: Ensure you have access to vision models on OpenRouter
- Rate Limits: Check API usage quotas on OpenRouter dashboard
- Fallback: Use Hybrid mode for automatic DOM fallback
๐ซ Common Automation Issues
- Element Not Found: Try Vision mode for robust element detection
- Timing Issues: Add waits for dynamic content
- Website Changes: Vision mode adapts automatically
- Anti-bot Detection: Use realistic delays and human-like patterns
๐๏ธ Architecture
๐งฑ System Components
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Web UI โ โ OpenRouter โ โ ChromeDriver โ
โ (Axum/HTML) โโโโโบโ (Unified AI API) โ โ (Browser) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Automation API โ โ Vision Engine โ โ Browser Actions โ
โ (Workflow) โโโโโบโ (Screenshots) โโโโโบโ (Click/Type) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
๐ง Integration Benefits:
- Single API endpoint for all AI models
- Environment variable configuration (
.env) - Automatic failover between providers
- Cost optimization through unified billing
๐ฆ Crate Structure
- automation-ui: Web interface and server
- automation-browser: Chrome automation with Vision support
- automation-api: Core workflow and data structures
- automation-integration: Pipeline orchestration
- automation-ai: AI model integration
๐ NEW: Enhanced OpenRouter LLM Vision Demo
Experience the latest OpenRouter integration with cutting-edge vision models:
Set your API key
export OPENROUTER_API_KEY=sk-or-v1-your-key-here
Run the comprehensive computer vision demo
cargo run --example computer_vision_demo -p automation-browser
๐ฅ NEW: Run the enhanced OpenRouter LLM vision demo
cargo run --example openrouter_vision_demo -p automation-browser
Run the enhanced Google search demo
cargo run --example enhanced_google_search -p automation-browser
โจ Latest OpenRouter Features
๐ฏ Enhanced Model Support (Updated 2024):
anthropic/claude-3-5-sonnet-20241022- Latest Claude 3.5 Sonnetopenai/gpt-4o-2024-11-20- Latest GPT-4oopenai/gpt-4o-mini-2024-07-18- Budget-friendly visiongoogle/gemini-pro-1.5- Google's vision modelanthropic/claude-3-5-haiku-20241022- Fast HTML analysis
โก Auto-Optimization Features:
- Automatic model selection for each strategy
- Model-specific prompt engineering
- Enhanced error handling with detailed messages
- Performance benchmarks and comparisons
๐ง Smart Vision Strategies:
- DOM Inspection: AI analyzes HTML (faster, cheaper)
- Coordinate-Based: AI analyzes screenshots (more robust)
- Adaptive: Tries DOM first, falls back to coordinates
Example Usage:
let mut engine = ChromeAutomationEngine::new(false) .with_vision_mode(api_key, None) .with_vision_strategy(VisionStrategy::Adaptive);
// Use convenient model shortcuts engine.set_vision_model("claude"); // โ claude-3-5-sonnet-20241022 engine.set_vision_model("gpt-4o"); // โ gpt-4o-2024-11-20 engine.set_vision_model("gpt-4o-mini"); // โ gpt-4o-mini-2024-07-18
// Auto-optimize for strategy engine.with_optimal_model_for_strategy(&VisionStrategy::CoordinateBased);
๐ค Contributing
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Add tests for your changes
- Update documentation as needed
- Submit a pull request
๐ฏ Contribution Areas
- Vision Model Support: Add new AI vision providers
- Browser Support: Firefox, Safari automation
- UI Enhancements: Better visual design
- Performance: Optimization and caching
- Testing: More comprehensive test coverage
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- Skyvern: Inspiration for AI Vision automation
- OpenAI: GPT-4 Vision capabilities
- Anthropic: Claude vision and reasoning
- Selenium: Browser automation foundation
- Rust Community: Amazing ecosystem and support
๐ Built with โค๏ธ using Rust, AI Vision, and Real Browser Automation
๐ Star this repo โข ๐ Report Issues โข ๐ก Request Features
Related Servers
Scout Monitoring MCP
sponsorPut performance and error data directly in the hands of your AI assistant.
Alpha Vantage MCP Server
sponsorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Knowledge Graph Memory Server
Enables persistent memory for Claude using a local knowledge graph of entities, relations, and observations.
APIWeaver
Dynamically creates MCP servers from web API configurations, integrating any REST API, GraphQL endpoint, or web service into MCP-compatible tools.
Jules
Jules async coding agent - run autonomous tasks using Jules
WinAiDbg MCP
AI-powered Windows crash dump analysis platform that provides structured access to Microsoft debugging tools through the Model Context Protocol, making complex crash investigation accessible to AI systems
Webhook Tester MCP Server
Interact with webhook-test.com to automate and manage webhook tokens, inspect incoming requests, and perform analytics.
MCP Installer
Set up MCP servers in Claude Desktop
NovaCV
An MCP server for accessing the NovaCV resume service API.
PHP MCP Server for Laravel
A Laravel wrapper for the php-mcp/server library to expose Laravel applications as MCP servers.
IDA Pro
Interact with IDA Pro for reverse engineering and binary analysis tasks.
Remote MCP Server (Authless)
A template for deploying a remote, auth-less MCP server on Cloudflare Workers.