WebServ CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is an RL (Reinforcement Learning) web agent project that enables automated browser interactions through a WebArena-like environment. The project uses Playwright for browser automation, Hydra for configuration management, and includes a proxy system for host rewriting. Development primarily happens through Jupyter notebooks.
Architecture
Core Components
- WebAgentEnv (
rl_web_agent/env.py): Main environment class that manages browser sessions, page interactions, and provides a step-based interface for RL agents - Configuration System: Single YAML config file (
rl_web_agent/conf/config.yaml) managed by Hydra with override support - JavaScript Layer: Browser-side scripts for DOM parsing (
parser.js) and event detection (initscript.js) - Proxy System: HTTP proxy client (
proxy/proxy_client_aiohttp.py) for host rewriting and API gateway communication
Key Design Patterns
- Async/Await: All browser operations are asynchronous using Playwright's async API
- Semantic IDs: DOM elements are tagged with unique
data-semantic-idattributes for reliable interaction - Action-Observation Loop: Environment provides JSON-based action interface and structured observations
- Shared Playwright Instance: ClassVar pattern for efficient resource management across multiple environment instances
Development Workflow
Running the Agent
# Basic execution with default config
python -m rl_web_agent.main
# Override specific configuration values
python -m rl_web_agent.main environment.browser.headless=true environment.proxy.enabled=false
# Development with Jupyter notebooks (primary workflow)
jupyter notebook # Use notebooks/ directory for experimentation
Dependency Management (UV)
# Install dependencies
uv sync
# Add new dependencies
uv add package_name
# Install with GPU support (includes transformers, torch, etc.)
uv sync --group gpu
# Install WebArena support
uv sync --extra webarena
Code Quality
# Linting (uses ruff configuration in pyproject.toml)
ruff check .
ruff format .
Configuration System
Single Config File Pattern
All configuration is centralized in rl_web_agent/conf/config.yaml. The project uses Hydra's single config approach rather than component configs.
Key Configuration Sections
environment.browser: Playwright launch and context optionsenvironment.proxy: Proxy server settings for host rewritingenvironment.sites: Mapping of site names to hostnames for WebArena environmentshydra.run.dir: Output directory for execution logs
Override Examples
# Disable headless mode for debugging
python -m rl_web_agent.main environment.browser.launch_options.headless=false
# Change proxy settings
python -m rl_web_agent.main environment.proxy.server=http://localhost:9090 environment.proxy.enabled=true
# Add new site mapping
python -m rl_web_agent.main environment.sites.custom_site=example.com:8080
WebAgentEnv API
Action Interface
Actions are JSON strings with specific formats:
# Click actions
await env.step('{"action": "click", "target": "login_button"}')
# Text input with optional enter
await env.step('{"action": "type", "target": "username", "text": "john_doe", "enter": true}')
# Navigation
await env.step('{"action": "goto_url", "url": "https://example.com"}')
# Tab management
await env.step('{"action": "new_tab", "url": "https://example.com"}')
await env.step('{"action": "switch_tab", "tab_id": 1}')
Observation Structure
observation = await env.observation()
# Returns:
# {
# "html": "...", # Processed DOM with semantic IDs
# "clickable_elements": ["button1", "link2", ...],
# "input_elements": [{"id": "username", "type": "text", "value": "", ...}],
# "tabs": [{"id": 0, "title": "Page Title", "url": "...", "is_active": true}]
# }
Browser Automation Details
DOM Processing Pipeline
- Initialization Script (
initscript.js): Detects hover events and marks hoverable elements - Parser Script (
parser.js): Strips DOM to essential interactive elements, assigns semantic IDs, preserves form state - Semantic ID Generation: Creates hierarchical, unique identifiers for reliable element targeting
Element Interaction Patterns
- All interactions use semantic IDs rather than CSS selectors
- Elements are automatically scrolled into view before interaction
- Form state is captured and preserved in observations
- Empty interactive elements (inputs, selects) are preserved even if visually empty
Third-Party Integration
WebArena Integration
- Located in
thirdparty/webarena/as editable dependency - Provides web-based RL environments for agent training
- Task configurations define start URLs, evaluation criteria, and required actions
VERL Integration
- Located in
thirdparty/verl/for reinforcement learning workflows - GPU dependency group includes all necessary ML packages
- Supports distributed training with Ray
Important Conventions
Code Style
- Use async/await for all browser operations
- Follow PEP 8 with 4-space indentation
- Use
pathlib.Pathinstead ofos.path - NEVER use dict.get() - Always use direct key access (dict["key"]) to fail fast
- No graceful error handling for missing keys - Let KeyErrors happen to catch bugs early
- Fail fast approach - Don't hide missing keys or data structure issues
Error Handling
- FAIL FAST - No try-catch blocks unless absolutely necessary (e.g., auto retry for LLM API calls to avoid rate limits)
- NEVER use getattr() or dict.get() - Always use direct key access to fail fast for missing keys
- No graceful error handling for missing keys - Let KeyErrors happen to catch bugs early
- Wrap browser operations in try/finally blocks only when needed
- Use structured logging with semantic context
- Return error information in step() observations
Logging and Debugging Rules
- NEVER truncate conversation history or message content - Always log full content without [:200] or similar truncation
- NEVER limit message history - Log all messages, not just recent ones (avoid [-4:] or similar slicing)
- NEVER assume content length limits - Show complete debugging information
- ALWAYS show full conversation history - No shortcuts or "last N messages" approaches
Prompt Management Rules
- NEVER hardcode prompts in code - Always load prompts from .txt files in rl_web_agent/prompts/
- Use load_prompt() function - Import from rl_web_agent.prompts and use load_prompt("prompt_name")
- Create descriptive prompt files - Name files clearly (e.g., fuzzy_match_evaluator.txt, system_prompt.txt)
- Keep prompts maintainable - Use format strings with placeholders like {objective}, {question}, {reference}
Development Notes
Testing Approach
- Primary development through Jupyter notebooks in
notebooks/directory - No formal test suite - notebooks serve as interactive testing environment
- Use
notebooks/playwright-test.ipynbfor browser automation experiments
Proxy System
- Enables host rewriting for WebArena environments
- Supports AWS SigV4 authentication for API gateway communication
- Configure target host rewrites in proxy client for local development
Performance Considerations
- Shared Playwright instance across multiple environments
- Image blocking enabled by default to speed up page loads
- Browser args optimized for automation (disable extensions, autofill, etc.)
IDE Integration Test
This is a test message added to verify IDE integration is working correctly.