AC to Automation Converter

An AI-powered system that converts Acceptance Criteria (AC) from QA specifications into automated browser testing workflows.

πŸ€– AI-Powered Browser Automation with Vision

Convert natural language into real browser automation using AI Vision and execute tests immediately with live process logs!

Inspired by Skyvern, this system combines traditional DOM automation with AI Vision that "sees" web pages like a human.

✨ Key Features

πŸ‘οΈ AI Vision Mode

  • Visual Understanding: AI analyzes screenshots to find elements visually
  • Robust Automation: Works even when websites change their HTML structure
  • Natural Descriptions: Use "click the blue login button" instead of CSS selectors
  • Human-like Interaction: Sees pages exactly like humans do

πŸ”§ Three Automation Modes

  • DOM Mode: Traditional CSS selector-based (fast)
  • Vision Mode: AI visual understanding (robust)
  • Hybrid Mode: Smart fallback - tries DOM first, uses Vision if needed

πŸ“ Real-Time Process Logs

  • Live Execution Logs: Watch automation steps in real-time
  • Floating Log Panel: See progress without scrolling
  • Color-coded Messages: Easy to spot successes, warnings, and errors
  • Detailed Timestamps: Track execution timing precisely

πŸš€ Immediate Execution

  • Real Browser Testing: Uses ChromeDriver for actual browser interaction
  • AI-Powered Generation: OpenRouter AI converts natural language to automation
  • Multiple Script Formats: Generate MCP Browser, Selenium, and Playwright scripts
  • Visual Feedback: Screenshots and detailed execution reports

🎯 What You Can Automate

πŸ”„ User Workflows

  • Login/registration flows
  • E-commerce checkout processes
  • Form submissions and validations
  • Multi-step wizards

🎨 Visual Interactions

  • Click buttons by description ("red submit button")
  • Find inputs by visual context ("email field in top-right")
  • Navigate by visual landmarks ("menu button with hamburger icon")
  • Verify visual states ("success message appears")

πŸ“Š Content Testing

  • Text presence verification
  • Element visibility checks
  • Page state validation
  • Dynamic content testing

πŸ› οΈ Setup & Installation

πŸ“‹ Prerequisites

  1. Install Rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

  1. Install ChromeDriver:

macOS with Homebrew

brew install chromedriver

Ubuntu/Debian

sudo apt-get install chromium-chromedriver

Windows: Download from https://chromedriver.chromium.org/

  1. Get OpenRouter API Key (Recommended):
    • Sign up at openrouter.ai
    • Get your API key (starts with sk-or-v1-...)
    • πŸŽ‰ One key works for both AI generation AND vision models!

πŸš€ Quick Start

  1. Clone and Build:

git clone cd ai-ac-automation cargo build --release

  1. Configure Environment Variables (Recommended):

Create .env file with your OpenRouter API key

echo "OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here" > .env

  1. Start ChromeDriver (in separate terminal):

chromedriver --port=9515

  1. Start the Web Interface:

cargo run --bin automation-ui

  1. Open Browser: Go to http://localhost:3001

πŸ” API Key Configuration

You have two options for configuring your OpenRouter API key:

Option 1: Environment Variables (Recommended)

Create .env file in project root

echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env

Start the server - API key loaded automatically!

cargo run --bin automation-ui

βœ… Benefits:

  • Secure: API key never appears in UI or logs
  • Convenient: No need to enter key every time
  • Universal: Works for both AI generation and vision
  • Safe: .env is in .gitignore - won't be committed

Option 2: Web Form

  • Leave .env empty or don't create it
  • Enter API key directly in the web interface forms
  • Works for individual sessions

πŸ’‘ Pro Tip: Use Option 1 for development, Option 2 for sharing/demos!

πŸ“„ Environment File (.env) Format

Your .env file should contain:

Required: OpenRouter API key for all AI features

OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here

Optional: Default models (can be changed in UI)

AI_MODEL=anthropic/claude-3.5-sonnet VISION_MODEL=openai/gpt-4o

Optional: Browser settings

HEADLESS=false BROWSER_WIDTH=1920 BROWSER_HEIGHT=1080

Optional: Server port

PORT=3001

πŸ”’ Security Notes:

  • Never commit .env to version control
  • Keep your API keys secure and rotate them regularly
  • Use different keys for development and production

🎯 How to Use

πŸ”§ Basic DOM Automation

  1. Enter URL: https://google.com
  2. Test Scenario:
- Click on search box
- Type "browser automation"
- Press Enter
- Verify results appear
- Click on first result

  1. Execute: Check "Execute immediately" β†’ Click "Generate & Execute"

πŸ‘οΈ AI Vision Mode (Recommended)

  1. Enable Vision: βœ… Check "Use AI Vision Mode (Like Skyvern)"
  2. Configure:
    • Mode: Hybrid (tries DOM first, falls back to Vision)
    • API Key: Auto-loaded from .env or enter manually
    • Model: GPT-4 Omni (recommended)
  3. Natural Test Scenario:
Website: https://example.com/login
Test:
- Find the email input field
- Type [emailΒ protected]
- Find the password field
- Type mypassword123
- Click the blue login button
- Verify the dashboard appears

πŸ€– AI-Powered Test Generation

  1. Enable AI: βœ… Check "Use AI-Powered Automation"
  2. API Key: Auto-loaded from .env or enter your OpenRouter key
  3. Describe Naturally:
Test the login functionality:
- User should be able to log in with valid credentials
- After login, dashboard should be visible
- User profile should show correct information
- Logout should work properly

🎯 Example with Environment Variables:

If you have OPENROUTER_API_KEY in your .env file:

  • βœ… No API key entry needed - works automatically!
  • βœ… Same key works for both AI generation and vision
  • βœ… Secure - never appears in forms or logs
  • βœ… Fast - instant access to all AI features

🧠 Supported AI Models

πŸŽ‰ All models available through OpenRouter with a single API key!

πŸ‘οΈ Vision Models (Real AI Vision Integration)

OpenRouter Model IDProviderVision QualitySpeedBest For
openai/gpt-4oOpenAI⭐⭐⭐⭐⭐⭐⭐⭐⭐Overall best choice 🌟
openai/gpt-4-vision-previewOpenAI⭐⭐⭐⭐⭐⭐⭐Detailed analysis
anthropic/claude-3.5-sonnetAnthropic⭐⭐⭐⭐⭐⭐⭐⭐⭐Complex reasoning
google/gemini-2.0-flash-001Google⭐⭐⭐⭐⭐⭐⭐⭐⭐Fastest option πŸš€
google/gemini-pro-visionGoogle⭐⭐⭐⭐⭐⭐⭐Cost-effective

🧠 Text Generation Models (AI Test Creation)

OpenRouter Model IDProviderQualitySpeedBest For
anthropic/claude-3.5-sonnetAnthropic⭐⭐⭐⭐⭐⭐⭐⭐⭐Best reasoning 🧠
openai/gpt-4oOpenAI⭐⭐⭐⭐⭐⭐⭐⭐⭐Complex automation
openai/gpt-3.5-turboOpenAI⭐⭐⭐⭐⭐⭐⭐⭐⭐Fast & affordable πŸ’°
google/gemini-proGoogle⭐⭐⭐⭐⭐⭐⭐⭐Good alternative

πŸ”‘ Single API Key Benefits:

  • One account for all AI providers
  • Unified billing and usage tracking
  • Rate limiting across all models
  • Easy model switching in the UI
  • No separate API keys to manage

πŸ“Š Real-Time Execution Logs

🎨 Live Log Display

[12:09:15.234] INFO: πŸš€ Initializing Chrome WebDriver...
[12:09:15.456] SUCCESS: βœ… Chrome WebDriver initialized successfully
[12:09:15.567] INFO: πŸ”§ Running in HYBRID mode (DOM + Vision)
[12:09:15.678] INFO: 🌐 Navigating to: https://example.com
[12:09:17.123] SUCCESS: βœ… Navigated to https://example.com
[12:09:17.234] INFO: πŸ‘οΈ AI Vision: Looking for 'email input field' to click
[12:09:17.456] INFO: 🧠 Analyzing screenshot with AI Vision
[12:09:18.789] SUCCESS: βœ… AI Vision found coordinates: (450, 320)
[12:09:18.890] INFO: πŸ–±οΈ Clicking at coordinates (450, 320)
[12:09:19.123] SUCCESS: βœ… Vision-clicked at coordinates (450, 320)
[12:09:19.234] INFO: ⌨️ Vision-typing '[email protected]' at coordinates (450, 320)
[12:09:19.567] SUCCESS: βœ… Vision-typed '[emailΒ protected]' at coordinates (450, 320)

🎨 Color-Coded Messages

  • 🟒 SUCCESS: Operations completed successfully
  • πŸ”΅ INFO: General information and progress
  • 🟑 WARN: Warnings and fallback actions
  • πŸ”΄ ERROR: Failures and issues

πŸ”§ Advanced Configuration

πŸ–₯️ Programmatic Usage

use automation_browser::{AutomationExecutor, AutomationMode};

#[tokio::main] async fn main() -> Result<(), Box> { // Create executor with Vision Mode let mut executor = AutomationExecutor::new()? .with_vision_mode("sk-your-openai-key".to_string(), Some("gpt-4o".to_string())) .with_headless(false);

// Execute workflow
let (report, logs) = executor.execute_workflow(&workflow).await?;

println!("Success rate: {:.1}%", report.success_rate() * 100.0);
println!("Logs captured: {}", logs.len());

Ok(())

}

βš™οΈ Automation Modes Comparison

FeatureDOM ModeVision ModeHybrid Mode
Speed⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Reliability⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Setup Complexity⭐⭐⭐⭐⭐⭐⭐⭐⭐
Website Changes⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Natural Language⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

🎨 Web Interface Features

πŸ“‹ Smart Forms

  • Live Examples: Click examples to auto-fill forms
  • Vision Configuration: Easy setup for AI Vision mode
  • Real-time Validation: Immediate feedback on inputs
  • Progress Tracking: Live execution status

πŸ“Š Enhanced Results

  • Execution Statistics: Success rates, timing, step counts
  • Visual Logs: Floating panels and detailed terminals
  • Screenshot Gallery: Automatic screenshots during execution
  • Script Export: Download generated automation scripts

πŸ”§ Debug Features

  • Step-by-step Breakdown: See each action executed
  • Error Highlighting: Clear error messages and solutions
  • Retry Logic: Automatic retries with exponential backoff
  • Fallback Options: Hybrid mode switches strategies automatically

πŸ›‘οΈ Security & Best Practices

πŸ” API Key Security

  • Store API keys securely (never commit to version control)
  • Use environment variables for production
  • Rotate keys regularly
  • Monitor API usage and costs

πŸ§ͺ Testing Environment

  • Use test accounts and sandbox environments
  • Avoid testing on production systems
  • Set up dedicated test data
  • Use headless mode for CI/CD

🌐 Website Considerations

  • Respect robots.txt and website terms
  • Add delays between actions to avoid rate limiting
  • Handle dynamic content and loading states
  • Consider website anti-automation measures

πŸ†˜ Troubleshooting

πŸ”§ ChromeDriver Issues

Check ChromeDriver status

curl http://localhost:9515/status

Restart ChromeDriver

pkill chromedriver chromedriver --port=9515

πŸ” Environment Variable Issues

Check if .env file exists and has correct format

cat .env

Verify environment variable is loaded

echo $OPENROUTER_API_KEY

Check server status for API key

curl http://localhost:3001/api/env-status

πŸ‘οΈ Vision Mode Issues

  • API Key: Verify OpenRouter key is valid (sk-or-v1-...)
  • Environment: Check .env file or form input
  • Model Access: Ensure you have access to vision models on OpenRouter
  • Rate Limits: Check API usage quotas on OpenRouter dashboard
  • Fallback: Use Hybrid mode for automatic DOM fallback

🚫 Common Automation Issues

  • Element Not Found: Try Vision mode for robust element detection
  • Timing Issues: Add waits for dynamic content
  • Website Changes: Vision mode adapts automatically
  • Anti-bot Detection: Use realistic delays and human-like patterns

πŸ—οΈ Architecture

🧱 System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web UI        β”‚    β”‚   OpenRouter     β”‚    β”‚  ChromeDriver   β”‚
β”‚  (Axum/HTML)    │◄──►│ (Unified AI API) β”‚    β”‚   (Browser)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Automation API  β”‚    β”‚  Vision Engine   β”‚    β”‚ Browser Actions β”‚
β”‚   (Workflow)    │◄──►│  (Screenshots)   │◄──►│ (Click/Type)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Integration Benefits:

  • Single API endpoint for all AI models
  • Environment variable configuration (.env)
  • Automatic failover between providers
  • Cost optimization through unified billing

πŸ“¦ Crate Structure

  • automation-ui: Web interface and server
  • automation-browser: Chrome automation with Vision support
  • automation-api: Core workflow and data structures
  • automation-integration: Pipeline orchestration
  • automation-ai: AI model integration

πŸš€ NEW: Enhanced OpenRouter LLM Vision Demo

Experience the latest OpenRouter integration with cutting-edge vision models:

Set your API key

export OPENROUTER_API_KEY=sk-or-v1-your-key-here

Run the comprehensive computer vision demo

cargo run --example computer_vision_demo -p automation-browser

πŸ”₯ NEW: Run the enhanced OpenRouter LLM vision demo

cargo run --example openrouter_vision_demo -p automation-browser

Run the enhanced Google search demo

cargo run --example enhanced_google_search -p automation-browser

✨ Latest OpenRouter Features

🎯 Enhanced Model Support (Updated 2024):

  • anthropic/claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet
  • openai/gpt-4o-2024-11-20 - Latest GPT-4o
  • openai/gpt-4o-mini-2024-07-18 - Budget-friendly vision
  • google/gemini-pro-1.5 - Google's vision model
  • anthropic/claude-3-5-haiku-20241022 - Fast HTML analysis

⚑ Auto-Optimization Features:

  • Automatic model selection for each strategy
  • Model-specific prompt engineering
  • Enhanced error handling with detailed messages
  • Performance benchmarks and comparisons

🧠 Smart Vision Strategies:

  • DOM Inspection: AI analyzes HTML (faster, cheaper)
  • Coordinate-Based: AI analyzes screenshots (more robust)
  • Adaptive: Tries DOM first, falls back to coordinates

Example Usage:

let mut engine = ChromeAutomationEngine::new(false) .with_vision_mode(api_key, None) .with_vision_strategy(VisionStrategy::Adaptive);

// Use convenient model shortcuts engine.set_vision_model("claude"); // β†’ claude-3-5-sonnet-20241022 engine.set_vision_model("gpt-4o"); // β†’ gpt-4o-2024-11-20 engine.set_vision_model("gpt-4o-mini"); // β†’ gpt-4o-mini-2024-07-18

// Auto-optimize for strategy engine.with_optimal_model_for_strategy(&VisionStrategy::CoordinateBased);

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Add tests for your changes
  4. Update documentation as needed
  5. Submit a pull request

🎯 Contribution Areas

  • Vision Model Support: Add new AI vision providers
  • Browser Support: Firefox, Safari automation
  • UI Enhancements: Better visual design
  • Performance: Optimization and caching
  • Testing: More comprehensive test coverage

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • Skyvern: Inspiration for AI Vision automation
  • OpenAI: GPT-4 Vision capabilities
  • Anthropic: Claude vision and reasoning
  • Selenium: Browser automation foundation
  • Rust Community: Amazing ecosystem and support

πŸš€ Built with ❀️ using Rust, AI Vision, and Real Browser Automation

🌟 Star this repo β€’ πŸ› Report Issues β€’ πŸ’‘ Request Features

Related Servers