AC to Automation Converter

An AI-powered system that converts Acceptance Criteria (AC) from QA specifications into automated browser testing workflows.

๐Ÿค– AI-Powered Browser Automation with Vision

Convert natural language into real browser automation using AI Vision and execute tests immediately with live process logs!

Inspired by Skyvern, this system combines traditional DOM automation with AI Vision that "sees" web pages like a human.

โœจ Key Features

๐Ÿ‘๏ธ AI Vision Mode

  • Visual Understanding: AI analyzes screenshots to find elements visually
  • Robust Automation: Works even when websites change their HTML structure
  • Natural Descriptions: Use "click the blue login button" instead of CSS selectors
  • Human-like Interaction: Sees pages exactly like humans do

๐Ÿ”ง Three Automation Modes

  • DOM Mode: Traditional CSS selector-based (fast)
  • Vision Mode: AI visual understanding (robust)
  • Hybrid Mode: Smart fallback - tries DOM first, uses Vision if needed

๐Ÿ“ Real-Time Process Logs

  • Live Execution Logs: Watch automation steps in real-time
  • Floating Log Panel: See progress without scrolling
  • Color-coded Messages: Easy to spot successes, warnings, and errors
  • Detailed Timestamps: Track execution timing precisely

๐Ÿš€ Immediate Execution

  • Real Browser Testing: Uses ChromeDriver for actual browser interaction
  • AI-Powered Generation: OpenRouter AI converts natural language to automation
  • Multiple Script Formats: Generate MCP Browser, Selenium, and Playwright scripts
  • Visual Feedback: Screenshots and detailed execution reports

๐ŸŽฏ What You Can Automate

๐Ÿ”„ User Workflows

  • Login/registration flows
  • E-commerce checkout processes
  • Form submissions and validations
  • Multi-step wizards

๐ŸŽจ Visual Interactions

  • Click buttons by description ("red submit button")
  • Find inputs by visual context ("email field in top-right")
  • Navigate by visual landmarks ("menu button with hamburger icon")
  • Verify visual states ("success message appears")

๐Ÿ“Š Content Testing

  • Text presence verification
  • Element visibility checks
  • Page state validation
  • Dynamic content testing

๐Ÿ› ๏ธ Setup & Installation

๐Ÿ“‹ Prerequisites

  1. Install Rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

  1. Install ChromeDriver:

macOS with Homebrew

brew install chromedriver

Ubuntu/Debian

sudo apt-get install chromium-chromedriver

Windows: Download from https://chromedriver.chromium.org/

  1. Get OpenRouter API Key (Recommended):
    • Sign up at openrouter.ai
    • Get your API key (starts with sk-or-v1-...)
    • ๐ŸŽ‰ One key works for both AI generation AND vision models!

๐Ÿš€ Quick Start

  1. Clone and Build:

git clone cd ai-ac-automation cargo build --release

  1. Configure Environment Variables (Recommended):

Create .env file with your OpenRouter API key

echo "OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here" > .env

  1. Start ChromeDriver (in separate terminal):

chromedriver --port=9515

  1. Start the Web Interface:

cargo run --bin automation-ui

  1. Open Browser: Go to http://localhost:3001

๐Ÿ” API Key Configuration

You have two options for configuring your OpenRouter API key:

Option 1: Environment Variables (Recommended)

Create .env file in project root

echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env

Start the server - API key loaded automatically!

cargo run --bin automation-ui

โœ… Benefits:

  • Secure: API key never appears in UI or logs
  • Convenient: No need to enter key every time
  • Universal: Works for both AI generation and vision
  • Safe: .env is in .gitignore - won't be committed

Option 2: Web Form

  • Leave .env empty or don't create it
  • Enter API key directly in the web interface forms
  • Works for individual sessions

๐Ÿ’ก Pro Tip: Use Option 1 for development, Option 2 for sharing/demos!

๐Ÿ“„ Environment File (.env) Format

Your .env file should contain:

Required: OpenRouter API key for all AI features

OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here

Optional: Default models (can be changed in UI)

AI_MODEL=anthropic/claude-3.5-sonnet VISION_MODEL=openai/gpt-4o

Optional: Browser settings

HEADLESS=false BROWSER_WIDTH=1920 BROWSER_HEIGHT=1080

Optional: Server port

PORT=3001

๐Ÿ”’ Security Notes:

  • Never commit .env to version control
  • Keep your API keys secure and rotate them regularly
  • Use different keys for development and production

๐ŸŽฏ How to Use

๐Ÿ”ง Basic DOM Automation

  1. Enter URL: https://google.com
  2. Test Scenario:
- Click on search box
- Type "browser automation"
- Press Enter
- Verify results appear
- Click on first result

  1. Execute: Check "Execute immediately" โ†’ Click "Generate & Execute"

๐Ÿ‘๏ธ AI Vision Mode (Recommended)

  1. Enable Vision: โœ… Check "Use AI Vision Mode (Like Skyvern)"
  2. Configure:
    • Mode: Hybrid (tries DOM first, falls back to Vision)
    • API Key: Auto-loaded from .env or enter manually
    • Model: GPT-4 Omni (recommended)
  3. Natural Test Scenario:
Website: https://example.com/login
Test:
- Find the email input field
- Type [email protected]
- Find the password field
- Type mypassword123
- Click the blue login button
- Verify the dashboard appears

๐Ÿค– AI-Powered Test Generation

  1. Enable AI: โœ… Check "Use AI-Powered Automation"
  2. API Key: Auto-loaded from .env or enter your OpenRouter key
  3. Describe Naturally:
Test the login functionality:
- User should be able to log in with valid credentials
- After login, dashboard should be visible
- User profile should show correct information
- Logout should work properly

๐ŸŽฏ Example with Environment Variables:

If you have OPENROUTER_API_KEY in your .env file:

  • โœ… No API key entry needed - works automatically!
  • โœ… Same key works for both AI generation and vision
  • โœ… Secure - never appears in forms or logs
  • โœ… Fast - instant access to all AI features

๐Ÿง  Supported AI Models

๐ŸŽ‰ All models available through OpenRouter with a single API key!

๐Ÿ‘๏ธ Vision Models (Real AI Vision Integration)

OpenRouter Model IDProviderVision QualitySpeedBest For
openai/gpt-4oOpenAIโญโญโญโญโญโญโญโญโญOverall best choice ๐ŸŒŸ
openai/gpt-4-vision-previewOpenAIโญโญโญโญโญโญโญDetailed analysis
anthropic/claude-3.5-sonnetAnthropicโญโญโญโญโญโญโญโญโญComplex reasoning
google/gemini-2.0-flash-001GoogleโญโญโญโญโญโญโญโญโญFastest option ๐Ÿš€
google/gemini-pro-visionGoogleโญโญโญโญโญโญโญCost-effective

๐Ÿง  Text Generation Models (AI Test Creation)

OpenRouter Model IDProviderQualitySpeedBest For
anthropic/claude-3.5-sonnetAnthropicโญโญโญโญโญโญโญโญโญBest reasoning ๐Ÿง 
openai/gpt-4oOpenAIโญโญโญโญโญโญโญโญโญComplex automation
openai/gpt-3.5-turboOpenAIโญโญโญโญโญโญโญโญโญFast & affordable ๐Ÿ’ฐ
google/gemini-proGoogleโญโญโญโญโญโญโญโญGood alternative

๐Ÿ”‘ Single API Key Benefits:

  • One account for all AI providers
  • Unified billing and usage tracking
  • Rate limiting across all models
  • Easy model switching in the UI
  • No separate API keys to manage

๐Ÿ“Š Real-Time Execution Logs

๐ŸŽจ Live Log Display

[12:09:15.234] INFO: ๐Ÿš€ Initializing Chrome WebDriver...
[12:09:15.456] SUCCESS: โœ… Chrome WebDriver initialized successfully
[12:09:15.567] INFO: ๐Ÿ”ง Running in HYBRID mode (DOM + Vision)
[12:09:15.678] INFO: ๐ŸŒ Navigating to: https://example.com
[12:09:17.123] SUCCESS: โœ… Navigated to https://example.com
[12:09:17.234] INFO: ๐Ÿ‘๏ธ AI Vision: Looking for 'email input field' to click
[12:09:17.456] INFO: ๐Ÿง  Analyzing screenshot with AI Vision
[12:09:18.789] SUCCESS: โœ… AI Vision found coordinates: (450, 320)
[12:09:18.890] INFO: ๐Ÿ–ฑ๏ธ Clicking at coordinates (450, 320)
[12:09:19.123] SUCCESS: โœ… Vision-clicked at coordinates (450, 320)
[12:09:19.234] INFO: โŒจ๏ธ Vision-typing '[email protected]' at coordinates (450, 320)
[12:09:19.567] SUCCESS: โœ… Vision-typed '[email protected]' at coordinates (450, 320)

๐ŸŽจ Color-Coded Messages

  • ๐ŸŸข SUCCESS: Operations completed successfully
  • ๐Ÿ”ต INFO: General information and progress
  • ๐ŸŸก WARN: Warnings and fallback actions
  • ๐Ÿ”ด ERROR: Failures and issues

๐Ÿ”ง Advanced Configuration

๐Ÿ–ฅ๏ธ Programmatic Usage

use automation_browser::{AutomationExecutor, AutomationMode};

#[tokio::main] async fn main() -> Result<(), Box> { // Create executor with Vision Mode let mut executor = AutomationExecutor::new()? .with_vision_mode("sk-your-openai-key".to_string(), Some("gpt-4o".to_string())) .with_headless(false);

// Execute workflow
let (report, logs) = executor.execute_workflow(&workflow).await?;

println!("Success rate: {:.1}%", report.success_rate() * 100.0);
println!("Logs captured: {}", logs.len());

Ok(())

}

โš™๏ธ Automation Modes Comparison

FeatureDOM ModeVision ModeHybrid Mode
Speedโญโญโญโญโญโญโญโญโญโญโญโญ
Reliabilityโญโญโญโญโญโญโญโญโญโญโญโญโญ
Setup Complexityโญโญโญโญโญโญโญโญโญ
Website Changesโญโญโญโญโญโญโญโญโญโญโญ
Natural Languageโญโญโญโญโญโญโญโญโญโญ

๐ŸŽจ Web Interface Features

๐Ÿ“‹ Smart Forms

  • Live Examples: Click examples to auto-fill forms
  • Vision Configuration: Easy setup for AI Vision mode
  • Real-time Validation: Immediate feedback on inputs
  • Progress Tracking: Live execution status

๐Ÿ“Š Enhanced Results

  • Execution Statistics: Success rates, timing, step counts
  • Visual Logs: Floating panels and detailed terminals
  • Screenshot Gallery: Automatic screenshots during execution
  • Script Export: Download generated automation scripts

๐Ÿ”ง Debug Features

  • Step-by-step Breakdown: See each action executed
  • Error Highlighting: Clear error messages and solutions
  • Retry Logic: Automatic retries with exponential backoff
  • Fallback Options: Hybrid mode switches strategies automatically

๐Ÿ›ก๏ธ Security & Best Practices

๐Ÿ” API Key Security

  • Store API keys securely (never commit to version control)
  • Use environment variables for production
  • Rotate keys regularly
  • Monitor API usage and costs

๐Ÿงช Testing Environment

  • Use test accounts and sandbox environments
  • Avoid testing on production systems
  • Set up dedicated test data
  • Use headless mode for CI/CD

๐ŸŒ Website Considerations

  • Respect robots.txt and website terms
  • Add delays between actions to avoid rate limiting
  • Handle dynamic content and loading states
  • Consider website anti-automation measures

๐Ÿ†˜ Troubleshooting

๐Ÿ”ง ChromeDriver Issues

Check ChromeDriver status

curl http://localhost:9515/status

Restart ChromeDriver

pkill chromedriver chromedriver --port=9515

๐Ÿ” Environment Variable Issues

Check if .env file exists and has correct format

cat .env

Verify environment variable is loaded

echo $OPENROUTER_API_KEY

Check server status for API key

curl http://localhost:3001/api/env-status

๐Ÿ‘๏ธ Vision Mode Issues

  • API Key: Verify OpenRouter key is valid (sk-or-v1-...)
  • Environment: Check .env file or form input
  • Model Access: Ensure you have access to vision models on OpenRouter
  • Rate Limits: Check API usage quotas on OpenRouter dashboard
  • Fallback: Use Hybrid mode for automatic DOM fallback

๐Ÿšซ Common Automation Issues

  • Element Not Found: Try Vision mode for robust element detection
  • Timing Issues: Add waits for dynamic content
  • Website Changes: Vision mode adapts automatically
  • Anti-bot Detection: Use realistic delays and human-like patterns

๐Ÿ—๏ธ Architecture

๐Ÿงฑ System Components

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Web UI        โ”‚    โ”‚   OpenRouter     โ”‚    โ”‚  ChromeDriver   โ”‚
โ”‚  (Axum/HTML)    โ”‚โ—„โ”€โ”€โ–บโ”‚ (Unified AI API) โ”‚    โ”‚   (Browser)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ–ผ                       โ–ผ                       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Automation API  โ”‚    โ”‚  Vision Engine   โ”‚    โ”‚ Browser Actions โ”‚
โ”‚   (Workflow)    โ”‚โ—„โ”€โ”€โ–บโ”‚  (Screenshots)   โ”‚โ—„โ”€โ”€โ–บโ”‚ (Click/Type)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Integration Benefits:

  • Single API endpoint for all AI models
  • Environment variable configuration (.env)
  • Automatic failover between providers
  • Cost optimization through unified billing

๐Ÿ“ฆ Crate Structure

  • automation-ui: Web interface and server
  • automation-browser: Chrome automation with Vision support
  • automation-api: Core workflow and data structures
  • automation-integration: Pipeline orchestration
  • automation-ai: AI model integration

๐Ÿš€ NEW: Enhanced OpenRouter LLM Vision Demo

Experience the latest OpenRouter integration with cutting-edge vision models:

Set your API key

export OPENROUTER_API_KEY=sk-or-v1-your-key-here

Run the comprehensive computer vision demo

cargo run --example computer_vision_demo -p automation-browser

๐Ÿ”ฅ NEW: Run the enhanced OpenRouter LLM vision demo

cargo run --example openrouter_vision_demo -p automation-browser

Run the enhanced Google search demo

cargo run --example enhanced_google_search -p automation-browser

โœจ Latest OpenRouter Features

๐ŸŽฏ Enhanced Model Support (Updated 2024):

  • anthropic/claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet
  • openai/gpt-4o-2024-11-20 - Latest GPT-4o
  • openai/gpt-4o-mini-2024-07-18 - Budget-friendly vision
  • google/gemini-pro-1.5 - Google's vision model
  • anthropic/claude-3-5-haiku-20241022 - Fast HTML analysis

โšก Auto-Optimization Features:

  • Automatic model selection for each strategy
  • Model-specific prompt engineering
  • Enhanced error handling with detailed messages
  • Performance benchmarks and comparisons

๐Ÿง  Smart Vision Strategies:

  • DOM Inspection: AI analyzes HTML (faster, cheaper)
  • Coordinate-Based: AI analyzes screenshots (more robust)
  • Adaptive: Tries DOM first, falls back to coordinates

Example Usage:

let mut engine = ChromeAutomationEngine::new(false) .with_vision_mode(api_key, None) .with_vision_strategy(VisionStrategy::Adaptive);

// Use convenient model shortcuts engine.set_vision_model("claude"); // โ†’ claude-3-5-sonnet-20241022 engine.set_vision_model("gpt-4o"); // โ†’ gpt-4o-2024-11-20 engine.set_vision_model("gpt-4o-mini"); // โ†’ gpt-4o-mini-2024-07-18

// Auto-optimize for strategy engine.with_optimal_model_for_strategy(&VisionStrategy::CoordinateBased);

๐Ÿค Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Add tests for your changes
  4. Update documentation as needed
  5. Submit a pull request

๐ŸŽฏ Contribution Areas

  • Vision Model Support: Add new AI vision providers
  • Browser Support: Firefox, Safari automation
  • UI Enhancements: Better visual design
  • Performance: Optimization and caching
  • Testing: More comprehensive test coverage

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Skyvern: Inspiration for AI Vision automation
  • OpenAI: GPT-4 Vision capabilities
  • Anthropic: Claude vision and reasoning
  • Selenium: Browser automation foundation
  • Rust Community: Amazing ecosystem and support

๐Ÿš€ Built with โค๏ธ using Rust, AI Vision, and Real Browser Automation

๐ŸŒŸ Star this repo โ€ข ๐Ÿ› Report Issues โ€ข ๐Ÿ’ก Request Features

Related Servers