AC to Automation Converter
An AI-powered system that converts Acceptance Criteria (AC) from QA specifications into automated browser testing workflows.
π€ AI-Powered Browser Automation with Vision
Convert natural language into real browser automation using AI Vision and execute tests immediately with live process logs!
Inspired by Skyvern, this system combines traditional DOM automation with AI Vision that "sees" web pages like a human.
β¨ Key Features
ποΈ AI Vision Mode
- Visual Understanding: AI analyzes screenshots to find elements visually
- Robust Automation: Works even when websites change their HTML structure
- Natural Descriptions: Use "click the blue login button" instead of CSS selectors
- Human-like Interaction: Sees pages exactly like humans do
π§ Three Automation Modes
- DOM Mode: Traditional CSS selector-based (fast)
- Vision Mode: AI visual understanding (robust)
- Hybrid Mode: Smart fallback - tries DOM first, uses Vision if needed
π Real-Time Process Logs
- Live Execution Logs: Watch automation steps in real-time
- Floating Log Panel: See progress without scrolling
- Color-coded Messages: Easy to spot successes, warnings, and errors
- Detailed Timestamps: Track execution timing precisely
π Immediate Execution
- Real Browser Testing: Uses ChromeDriver for actual browser interaction
- AI-Powered Generation: OpenRouter AI converts natural language to automation
- Multiple Script Formats: Generate MCP Browser, Selenium, and Playwright scripts
- Visual Feedback: Screenshots and detailed execution reports
π― What You Can Automate
π User Workflows
- Login/registration flows
- E-commerce checkout processes
- Form submissions and validations
- Multi-step wizards
π¨ Visual Interactions
- Click buttons by description ("red submit button")
- Find inputs by visual context ("email field in top-right")
- Navigate by visual landmarks ("menu button with hamburger icon")
- Verify visual states ("success message appears")
π Content Testing
- Text presence verification
- Element visibility checks
- Page state validation
- Dynamic content testing
π οΈ Setup & Installation
π Prerequisites
- Install Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Install ChromeDriver:
macOS with Homebrew
brew install chromedriver
Ubuntu/Debian
sudo apt-get install chromium-chromedriver
Windows: Download from https://chromedriver.chromium.org/
- Get OpenRouter API Key (Recommended):
- Sign up at openrouter.ai
- Get your API key (starts with
sk-or-v1-...) - π One key works for both AI generation AND vision models!
π Quick Start
- Clone and Build:
git clone cd ai-ac-automation cargo build --release
- Configure Environment Variables (Recommended):
Create .env file with your OpenRouter API key
echo "OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here" > .env
- Start ChromeDriver (in separate terminal):
chromedriver --port=9515
- Start the Web Interface:
cargo run --bin automation-ui
- Open Browser: Go to
http://localhost:3001
π API Key Configuration
You have two options for configuring your OpenRouter API key:
Option 1: Environment Variables (Recommended)
Create .env file in project root
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
Start the server - API key loaded automatically!
cargo run --bin automation-ui
β Benefits:
- Secure: API key never appears in UI or logs
- Convenient: No need to enter key every time
- Universal: Works for both AI generation and vision
- Safe:
.envis in.gitignore- won't be committed
Option 2: Web Form
- Leave
.envempty or don't create it - Enter API key directly in the web interface forms
- Works for individual sessions
π‘ Pro Tip: Use Option 1 for development, Option 2 for sharing/demos!
π Environment File (.env) Format
Your .env file should contain:
Required: OpenRouter API key for all AI features
OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here
Optional: Default models (can be changed in UI)
AI_MODEL=anthropic/claude-3.5-sonnet VISION_MODEL=openai/gpt-4o
Optional: Browser settings
HEADLESS=false BROWSER_WIDTH=1920 BROWSER_HEIGHT=1080
Optional: Server port
PORT=3001
π Security Notes:
- Never commit
.envto version control - Keep your API keys secure and rotate them regularly
- Use different keys for development and production
π― How to Use
π§ Basic DOM Automation
- Enter URL:
https://google.com - Test Scenario:
- Click on search box
- Type "browser automation"
- Press Enter
- Verify results appear
- Click on first result
- Execute: Check "Execute immediately" β Click "Generate & Execute"
ποΈ AI Vision Mode (Recommended)
- Enable Vision: β Check "Use AI Vision Mode (Like Skyvern)"
- Configure:
- Mode: Hybrid (tries DOM first, falls back to Vision)
- API Key: Auto-loaded from
.envor enter manually - Model: GPT-4 Omni (recommended)
- Natural Test Scenario:
Website: https://example.com/login
Test:
- Find the email input field
- Type [emailΒ protected]
- Find the password field
- Type mypassword123
- Click the blue login button
- Verify the dashboard appears
π€ AI-Powered Test Generation
- Enable AI: β Check "Use AI-Powered Automation"
- API Key: Auto-loaded from
.envor enter your OpenRouter key - Describe Naturally:
Test the login functionality:
- User should be able to log in with valid credentials
- After login, dashboard should be visible
- User profile should show correct information
- Logout should work properly
π― Example with Environment Variables:
If you have OPENROUTER_API_KEY in your .env file:
- β No API key entry needed - works automatically!
- β Same key works for both AI generation and vision
- β Secure - never appears in forms or logs
- β Fast - instant access to all AI features
π§ Supported AI Models
π All models available through OpenRouter with a single API key!
ποΈ Vision Models (Real AI Vision Integration)
| OpenRouter Model ID | Provider | Vision Quality | Speed | Best For |
|---|---|---|---|---|
| openai/gpt-4o | OpenAI | βββββ | ββββ | Overall best choice π |
| openai/gpt-4-vision-preview | OpenAI | ββββ | βββ | Detailed analysis |
| anthropic/claude-3.5-sonnet | Anthropic | βββββ | ββββ | Complex reasoning |
| google/gemini-2.0-flash-001 | ββββ | βββββ | Fastest option π | |
| google/gemini-pro-vision | βββ | ββββ | Cost-effective |
π§ Text Generation Models (AI Test Creation)
| OpenRouter Model ID | Provider | Quality | Speed | Best For |
|---|---|---|---|---|
| anthropic/claude-3.5-sonnet | Anthropic | βββββ | ββββ | Best reasoning π§ |
| openai/gpt-4o | OpenAI | βββββ | ββββ | Complex automation |
| openai/gpt-3.5-turbo | OpenAI | ββββ | βββββ | Fast & affordable π° |
| google/gemini-pro | ββββ | ββββ | Good alternative |
π Single API Key Benefits:
- One account for all AI providers
- Unified billing and usage tracking
- Rate limiting across all models
- Easy model switching in the UI
- No separate API keys to manage
π Real-Time Execution Logs
π¨ Live Log Display
[12:09:15.234] INFO: π Initializing Chrome WebDriver...
[12:09:15.456] SUCCESS: β
Chrome WebDriver initialized successfully
[12:09:15.567] INFO: π§ Running in HYBRID mode (DOM + Vision)
[12:09:15.678] INFO: π Navigating to: https://example.com
[12:09:17.123] SUCCESS: β
Navigated to https://example.com
[12:09:17.234] INFO: ποΈ AI Vision: Looking for 'email input field' to click
[12:09:17.456] INFO: π§ Analyzing screenshot with AI Vision
[12:09:18.789] SUCCESS: β
AI Vision found coordinates: (450, 320)
[12:09:18.890] INFO: π±οΈ Clicking at coordinates (450, 320)
[12:09:19.123] SUCCESS: β
Vision-clicked at coordinates (450, 320)
[12:09:19.234] INFO: β¨οΈ Vision-typing '[emailΒ protected]' at coordinates (450, 320)
[12:09:19.567] SUCCESS: β
Vision-typed '[emailΒ protected]' at coordinates (450, 320)
π¨ Color-Coded Messages
- π’ SUCCESS: Operations completed successfully
- π΅ INFO: General information and progress
- π‘ WARN: Warnings and fallback actions
- π΄ ERROR: Failures and issues
π§ Advanced Configuration
π₯οΈ Programmatic Usage
use automation_browser::{AutomationExecutor, AutomationMode};
#[tokio::main] async fn main() -> Result<(), Box> { // Create executor with Vision Mode let mut executor = AutomationExecutor::new()? .with_vision_mode("sk-your-openai-key".to_string(), Some("gpt-4o".to_string())) .with_headless(false);
// Execute workflow
let (report, logs) = executor.execute_workflow(&workflow).await?;
println!("Success rate: {:.1}%", report.success_rate() * 100.0);
println!("Logs captured: {}", logs.len());
Ok(())
}
βοΈ Automation Modes Comparison
| Feature | DOM Mode | Vision Mode | Hybrid Mode |
|---|---|---|---|
| Speed | βββββ | βββ | ββββ |
| Reliability | βββ | βββββ | βββββ |
| Setup Complexity | ββ | ββββ | βββ |
| Website Changes | β | βββββ | βββββ |
| Natural Language | β | βββββ | ββββ |
π¨ Web Interface Features
π Smart Forms
- Live Examples: Click examples to auto-fill forms
- Vision Configuration: Easy setup for AI Vision mode
- Real-time Validation: Immediate feedback on inputs
- Progress Tracking: Live execution status
π Enhanced Results
- Execution Statistics: Success rates, timing, step counts
- Visual Logs: Floating panels and detailed terminals
- Screenshot Gallery: Automatic screenshots during execution
- Script Export: Download generated automation scripts
π§ Debug Features
- Step-by-step Breakdown: See each action executed
- Error Highlighting: Clear error messages and solutions
- Retry Logic: Automatic retries with exponential backoff
- Fallback Options: Hybrid mode switches strategies automatically
π‘οΈ Security & Best Practices
π API Key Security
- Store API keys securely (never commit to version control)
- Use environment variables for production
- Rotate keys regularly
- Monitor API usage and costs
π§ͺ Testing Environment
- Use test accounts and sandbox environments
- Avoid testing on production systems
- Set up dedicated test data
- Use headless mode for CI/CD
π Website Considerations
- Respect robots.txt and website terms
- Add delays between actions to avoid rate limiting
- Handle dynamic content and loading states
- Consider website anti-automation measures
π Troubleshooting
π§ ChromeDriver Issues
Check ChromeDriver status
curl http://localhost:9515/status
Restart ChromeDriver
pkill chromedriver chromedriver --port=9515
π Environment Variable Issues
Check if .env file exists and has correct format
cat .env
Verify environment variable is loaded
echo $OPENROUTER_API_KEY
Check server status for API key
curl http://localhost:3001/api/env-status
ποΈ Vision Mode Issues
- API Key: Verify OpenRouter key is valid (
sk-or-v1-...) - Environment: Check
.envfile or form input - Model Access: Ensure you have access to vision models on OpenRouter
- Rate Limits: Check API usage quotas on OpenRouter dashboard
- Fallback: Use Hybrid mode for automatic DOM fallback
π« Common Automation Issues
- Element Not Found: Try Vision mode for robust element detection
- Timing Issues: Add waits for dynamic content
- Website Changes: Vision mode adapts automatically
- Anti-bot Detection: Use realistic delays and human-like patterns
ποΈ Architecture
π§± System Components
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Web UI β β OpenRouter β β ChromeDriver β
β (Axum/HTML) βββββΊβ (Unified AI API) β β (Browser) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Automation API β β Vision Engine β β Browser Actions β
β (Workflow) βββββΊβ (Screenshots) βββββΊβ (Click/Type) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
π§ Integration Benefits:
- Single API endpoint for all AI models
- Environment variable configuration (
.env) - Automatic failover between providers
- Cost optimization through unified billing
π¦ Crate Structure
- automation-ui: Web interface and server
- automation-browser: Chrome automation with Vision support
- automation-api: Core workflow and data structures
- automation-integration: Pipeline orchestration
- automation-ai: AI model integration
π NEW: Enhanced OpenRouter LLM Vision Demo
Experience the latest OpenRouter integration with cutting-edge vision models:
Set your API key
export OPENROUTER_API_KEY=sk-or-v1-your-key-here
Run the comprehensive computer vision demo
cargo run --example computer_vision_demo -p automation-browser
π₯ NEW: Run the enhanced OpenRouter LLM vision demo
cargo run --example openrouter_vision_demo -p automation-browser
Run the enhanced Google search demo
cargo run --example enhanced_google_search -p automation-browser
β¨ Latest OpenRouter Features
π― Enhanced Model Support (Updated 2024):
anthropic/claude-3-5-sonnet-20241022- Latest Claude 3.5 Sonnetopenai/gpt-4o-2024-11-20- Latest GPT-4oopenai/gpt-4o-mini-2024-07-18- Budget-friendly visiongoogle/gemini-pro-1.5- Google's vision modelanthropic/claude-3-5-haiku-20241022- Fast HTML analysis
β‘ Auto-Optimization Features:
- Automatic model selection for each strategy
- Model-specific prompt engineering
- Enhanced error handling with detailed messages
- Performance benchmarks and comparisons
π§ Smart Vision Strategies:
- DOM Inspection: AI analyzes HTML (faster, cheaper)
- Coordinate-Based: AI analyzes screenshots (more robust)
- Adaptive: Tries DOM first, falls back to coordinates
Example Usage:
let mut engine = ChromeAutomationEngine::new(false) .with_vision_mode(api_key, None) .with_vision_strategy(VisionStrategy::Adaptive);
// Use convenient model shortcuts engine.set_vision_model("claude"); // β claude-3-5-sonnet-20241022 engine.set_vision_model("gpt-4o"); // β gpt-4o-2024-11-20 engine.set_vision_model("gpt-4o-mini"); // β gpt-4o-mini-2024-07-18
// Auto-optimize for strategy engine.with_optimal_model_for_strategy(&VisionStrategy::CoordinateBased);
π€ Contributing
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Add tests for your changes
- Update documentation as needed
- Submit a pull request
π― Contribution Areas
- Vision Model Support: Add new AI vision providers
- Browser Support: Firefox, Safari automation
- UI Enhancements: Better visual design
- Performance: Optimization and caching
- Testing: More comprehensive test coverage
π License
MIT License - see LICENSE file for details.
π Acknowledgments
- Skyvern: Inspiration for AI Vision automation
- OpenAI: GPT-4 Vision capabilities
- Anthropic: Claude vision and reasoning
- Selenium: Browser automation foundation
- Rust Community: Amazing ecosystem and support
π Built with β€οΈ using Rust, AI Vision, and Real Browser Automation
π Star this repo β’ π Report Issues β’ π‘ Request Features
Related Servers
Ollama MCP Server
A bridge to use local LLMs from Ollama within the Model Context Protocol.
pfSense MCP Server
Enables natural language interaction with pfSense firewalls through GenAI applications.
Authless Remote MCP Server
An authentication-free, remote MCP server designed for deployment on Cloudflare Workers.
NeoCoder
Enables AI assistants to use a Neo4j knowledge graph for standardized coding workflows, acting as a dynamic instruction manual and project memory.
MCP LLaMA
An MCP server with weather tools and LLaMA integration.
Panther
Interact with the Panther security platform to write detections, query logs with natural language, and manage alerts.
refactor-mcp
Refactor code using regex-based search and replace.
Alertmanager
A Model Context Protocol (MCP) server that enables AI assistants to integrate with Prometheus Alertmanager
ast-grep MCP
An experimental MCP server that uses the ast-grep CLI for code structural search, linting, and rewriting.
Revit MCP Server
An MCP server for integrating AI with Autodesk Revit, enabling seamless communication via WebSocket.