Metadata-Version: 2.4
Name: openbrowser-ai
Version: 0.1.6
Summary: Agentic browser automation using LangGraph and raw CDP
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: aiofiles
Requires-Dist: browser-use==0.9.5
Requires-Dist: bubus>=1.5.6
Requires-Dist: cdp-use
Requires-Dist: click>=8.1.8
Requires-Dist: google-genai>=0.2.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: imageio-ffmpeg>=0.6.0
Requires-Dist: imageio>=2.37.2
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-openai>=0.2.0
Requires-Dist: langgraph
Requires-Dist: litellm==1.80.0
Requires-Dist: markdownify>=0.11.6
Requires-Dist: numpy>=2.4.0
Requires-Dist: openai<2.0.0,>=1.99.5
Requires-Dist: pillow>=11.0.0
Requires-Dist: playwright
Requires-Dist: psutil>=7.2.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv
Requires-Dist: rich>=13.0.0
Requires-Dist: websockets>=15.0.1
Provides-Extra: all
Requires-Dist: anthropic>=0.68.0; extra == 'all'
Requires-Dist: boto3; extra == 'all'
Requires-Dist: groq>=0.30.0; extra == 'all'
Requires-Dist: imageio[ffmpeg]; extra == 'all'
Requires-Dist: numpy; extra == 'all'
Requires-Dist: ollama>=0.5.1; extra == 'all'
Requires-Dist: posthog>=3.7.0; extra == 'all'
Requires-Dist: pypdf; extra == 'all'
Requires-Dist: reportlab; extra == 'all'
Requires-Dist: textual>=3.2.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.68.0; extra == 'anthropic'
Provides-Extra: aws
Requires-Dist: boto3; extra == 'aws'
Provides-Extra: azure
Requires-Dist: openai; extra == 'azure'
Provides-Extra: cli
Requires-Dist: textual>=3.2.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=1.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=9.0.2; extra == 'dev'
Provides-Extra: groq
Requires-Dist: groq>=0.30.0; extra == 'groq'
Provides-Extra: ollama
Requires-Dist: ollama>=0.5.1; extra == 'ollama'
Provides-Extra: pdf
Requires-Dist: pypdf; extra == 'pdf'
Requires-Dist: reportlab; extra == 'pdf'
Provides-Extra: telemetry
Requires-Dist: posthog>=3.7.0; extra == 'telemetry'
Provides-Extra: video
Requires-Dist: imageio[ffmpeg]; extra == 'video'
Requires-Dist: numpy; extra == 'video'
Description-Content-Type: text/markdown

# OpenBrowser

[![PyPI version](https://badge.fury.io/py/openbrowser-ai.svg)](https://pypi.org/project/openbrowser-ai/)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/billy-enrizky/openbrowser-ai/actions/workflows/test.yml/badge.svg)](https://github.com/billy-enrizky/openbrowser-ai/actions)

**AI-powered browser automation using LangGraph and CDP (Chrome DevTools Protocol)**

OpenBrowser is a framework for intelligent browser automation. It combines direct CDP communication with LangGraph orchestration to create AI agents that can navigate, interact with, and extract information from web pages autonomously.

## Documentation

**Full documentation**: [openbrowser.mintlify.app](https://openbrowser.mintlify.app)

## Key Features

- **LangGraph-Powered Agents** - Stateful workflow orchestration with perceive-plan-execute loop
- **Raw CDP Communication** - Direct Chrome DevTools Protocol for maximum control and speed
- **Vision Support** - Screenshot analysis for visual understanding of pages
- **12+ LLM Providers** - OpenAI, Anthropic, Google, Groq, AWS Bedrock, Azure OpenAI, Ollama, and more
- **Code Agent Mode** - Jupyter notebook-like code execution for complex automation
- **MCP Server** - Model Context Protocol support for Claude Desktop integration
- **Video Recording** - Record browser sessions as video files

## Installation

```bash
pip install openbrowser-ai
```

### With Optional Dependencies

```bash
# Install with all LLM providers
pip install openbrowser-ai[all]

# Install specific providers
pip install openbrowser-ai[anthropic]  # Anthropic Claude
pip install openbrowser-ai[groq]       # Groq
pip install openbrowser-ai[ollama]     # Ollama (local models)
pip install openbrowser-ai[aws]        # AWS Bedrock
pip install openbrowser-ai[azure]      # Azure OpenAI

# Install with video recording support
pip install openbrowser-ai[video]
```

### Install Browser

```bash
uvx openbrowser install
# or
playwright install chromium
```

## Quick Start

### Basic Usage

```python
import asyncio
from openbrowser import Agent, ChatGoogle

async def main():
    agent = Agent(
        task="Go to google.com and search for 'Python tutorials'",
        llm=ChatGoogle(),
    )
    
    result = await agent.run()
    print(f"Result: {result}")

asyncio.run(main())
```

### With Different LLM Providers

```python
from openbrowser import Agent, ChatOpenAI, ChatAnthropic, ChatGoogle

# OpenAI
agent = Agent(task="...", llm=ChatOpenAI(model="gpt-4o"))

# Anthropic
agent = Agent(task="...", llm=ChatAnthropic(model="claude-sonnet-4-0"))

# Google Gemini
agent = Agent(task="...", llm=ChatGoogle(model="gemini-2.0-flash"))
```

### Using Browser Session Directly

```python
import asyncio
from openbrowser import BrowserSession, BrowserProfile

async def main():
    profile = BrowserProfile(
        headless=True,
        viewport_width=1920,
        viewport_height=1080,
    )
    
    session = BrowserSession(browser_profile=profile)
    await session.start()
    
    await session.navigate_to("https://example.com")
    screenshot = await session.screenshot()
    
    await session.stop()

asyncio.run(main())
```

## Configuration

### Environment Variables

```bash
# Google (recommended)
export GOOGLE_API_KEY="..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Groq
export GROQ_API_KEY="gsk_..."

# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

# Browser-Use LLM (external service)
export BROWSER_USE_API_KEY="..."
```

### BrowserProfile Options

```python
from openbrowser import BrowserProfile

profile = BrowserProfile(
    headless=True,
    viewport_width=1280,
    viewport_height=720,
    disable_security=False,
    extra_chromium_args=["--disable-gpu"],
    record_video_dir="./recordings",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    },
)
```

## Supported LLM Providers

| Provider | Class | Models |
|----------|-------|--------|
| **Google** | `ChatGoogle` | gemini-2.0-flash, gemini-1.5-pro |
| **OpenAI** | `ChatOpenAI` | gpt-4o, o3, gpt-4-turbo |
| **Anthropic** | `ChatAnthropic` | claude-sonnet-4-0, claude-3-opus |
| **Groq** | `ChatGroq` | llama-3.3-70b-versatile, mixtral-8x7b |
| **AWS Bedrock** | `ChatAWSBedrock` | claude-3, amazon.titan |
| **Azure OpenAI** | `ChatAzureOpenAI` | Any Azure-deployed model |
| **Ollama** | `ChatOllama` | llama3, mistral (local) |
| **OCI** | `ChatOCIRaw` | Oracle Cloud GenAI models |
| **Browser-Use** | `ChatBrowserUse` | External LLM service |

## MCP Server (Claude Desktop Integration)

OpenBrowser includes an MCP server for integration with Claude Desktop.

### Running the MCP Server

```bash
python -m openbrowser.mcp
```

### Claude Desktop Configuration

Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai", "mcp"],
      "env": {
        "GOOGLE_API_KEY": "..."
      }
    }
  }
}
```

## CLI Usage

```bash
# Run a browser automation task
uvx openbrowser run "Search for Python tutorials on Google"

# Install browser
uvx openbrowser install

# Run MCP server
uvx openbrowser mcp
```

## Project Structure

```
openbrowser-ai/
├── src/openbrowser/
│   ├── __init__.py          # Main exports
│   ├── cli.py                # CLI commands
│   ├── config.py             # Configuration
│   ├── actor/                # Element interaction
│   ├── agent/                # LangGraph agent
│   │   ├── graph.py          # Agent workflow
│   │   ├── service.py        # Agent class
│   │   └── views.py          # Data models
│   ├── browser/              # CDP browser control
│   │   ├── session.py        # BrowserSession
│   │   └── profile.py        # BrowserProfile
│   ├── code_use/             # Code agent
│   ├── dom/                  # DOM extraction
│   ├── llm/                  # LLM providers
│   │   ├── openai/
│   │   ├── anthropic/
│   │   ├── google/
│   │   ├── groq/
│   │   ├── aws/
│   │   ├── azure/
│   │   └── ...
│   ├── mcp/                  # MCP server
│   └── tools/                # Action registry
└── tests/                    # Test suite
```

## Testing

```bash
# Run tests
pytest tests/

# Run with verbose output
pytest tests/ -v
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Forked from [browser-use](https://github.com/browser-use/browser-use)
- Built with [LangGraph](https://github.com/langchain-ai/langgraph)
- Uses [Playwright](https://playwright.dev/) for browser orchestration

## Contact

- **Email**: billy.suharno@gmail.com
- **GitHub**: [@billy-enrizky](https://github.com/billy-enrizky)
- **Repository**: [github.com/billy-enrizky/openbrowser-ai](https://github.com/billy-enrizky/openbrowser-ai)
- **Documentation**: [openbrowser.mintlify.app](https://openbrowser.mintlify.app)

---

**Made with love for the AI automation community**
