Metadata-Version: 2.4
Name: pwbase
Version: 0.1.9
Summary: A lightweight async Playwright wrapper for Python that supports three browser launch strategies and can intercept authenticated HTTP sessions from live browser traffic.
Project-URL: Homepage, https://github.com/virgotagle/pwbase
Project-URL: Repository, https://github.com/virgotagle/pwbase
Project-URL: Issues, https://github.com/virgotagle/pwbase/issues
Author-email: Floyd <pagarfloyd@gmail.com>
License: MIT License
        
        Copyright (c) 2025 Floyd
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: automation,browser,cdp,http,playwright,scraping,stealth
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: playwright-stealth>=2.0.2
Requires-Dist: playwright>=1.58.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: requests>=2.32.5
Provides-Extra: dev
Requires-Dist: build>=1.4.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.15.1; extra == 'dev'
Requires-Dist: pytest>=9.0.2; extra == 'dev'
Requires-Dist: twine>=6.2.0; extra == 'dev'
Description-Content-Type: text/markdown

# pwbase

A lightweight async Playwright wrapper for Python that supports three browser launch strategies and can intercept authenticated HTTP sessions from live browser traffic.

## Features

- Three browser modes: plain Playwright, stealth (bot-detection evasion), and CDP attachment
- Persistent browser state (cookies + localStorage) via `save_state` / `state_path`
- `BrowserSessionExtractor` — intercepts JSON responses and converts them into authenticated `requests.Session` objects
- Fully async, context-manager-friendly API

## Requirements

- Python 3.12+
- [uv](https://github.com/astral-sh/uv) (recommended) or pip

## Installation

Available on [PyPI](https://pypi.org/project/pwbase/).

```bash
uv add pwbase
# or
pip install pwbase
```

Install Playwright browsers after installing the package:

```bash
playwright install chromium
```

## Quick Start

```python
import asyncio
from pwbase import Browser, BrowserConfig, BrowserType

async def main():
    async with Browser(BrowserConfig(type=BrowserType.STEALTH)) as browser:
        page = await browser.get_page()
        await page.goto("https://example.com")
        print(await page.title())

asyncio.run(main())
```

## Browser Modes

| Mode | `BrowserType` | Description |
|---|---|---|
| Default | `DEFAULT` | Pure Playwright, no extras |
| Stealth | `STEALTH` | Applies `playwright-stealth` to reduce bot detection signals |
| CDP | `CDP` | Attaches to an existing Chrome instance via Chrome DevTools Protocol |

### Default

```python
Browser(BrowserConfig(type=BrowserType.DEFAULT))
```

### Stealth

```python
Browser(BrowserConfig(type=BrowserType.STEALTH))
```

### CDP

Start Chrome with remote debugging enabled:

```bash
google-chrome --remote-debugging-port=9222
```

Then attach:

```python
Browser(BrowserConfig(type=BrowserType.CDP, cdp_url="http://localhost:9222"))
```

> **Note:** `headless`, `state_path`, `viewport`, and related options are ignored in CDP mode. `save_state()` is not available in CDP mode.

## BrowserConfig Reference

```python
@dataclass
class BrowserConfig:
    type: BrowserType = BrowserType.DEFAULT
    headless: bool = True
    state_path: Path | None = None      # Load/save cookies + localStorage
    channel: str = "chrome"             # Browser channel for STEALTH mode
    cdp_url: str = "http://localhost:9222"
    viewport: tuple[int, int] = (1920, 1080)
    user_agent: str = "..."             # Windows Chrome UA by default
    locale: str = "en-US"
    timezone: str = "America/New_York"
    args: list[str] = [                 # Extra Chromium flags
        "--disable-blink-features=AutomationControlled",
        "--no-sandbox",
    ]
```

## Saving and Restoring Browser State

```python
from pathlib import Path
from pwbase import Browser, BrowserConfig, BrowserType

config = BrowserConfig(
    type=BrowserType.STEALTH,
    state_path=Path("state.json"),
)

# First run — log in and save session
async with Browser(config) as browser:
    page = await browser.get_page()
    await page.goto("https://example.com/login")
    # ... perform login ...
    await browser.save_state()

# Subsequent runs — state is restored automatically
async with Browser(config) as browser:
    page = await browser.get_page()
    await page.goto("https://example.com/dashboard")
```

## Session Extraction

`BrowserSessionExtractor` extends `Browser` and intercepts JSON responses in real time. Use it to capture authenticated sessions without manually copying cookies or headers.

```python
from pwbase import BrowserSessionExtractor, BrowserConfig, BrowserType

async with BrowserSessionExtractor(BrowserConfig(type=BrowserType.STEALTH)) as browser:
    page = await browser.get_page()
    await browser.start_recording(page)

    await page.goto("https://example.com")
    # Trigger the API call you want to capture, then:

    response = browser.find_response("api/data")
    if response:
        session = browser.to_session(response)
        r = session.get("https://example.com/api/data")
        print(r.json())
```

### API

| Method | Description |
|---|---|
| `start_recording(page)` | Begin intercepting JSON responses on `page` |
| `stop_recording()` | Stop intercepting; safe to call if never started |
| `find_response(url_contains)` | Return the most recent captured response matching the substring |
| `find_all_responses(url_contains)` | Return all captured responses matching the substring |
| `wait_for_response(url_contains, timeout)` | Poll until a matching response is captured |
| `to_session(response)` | Build an authenticated `requests.Session` from a `CapturedResponse` |

### CapturedResponse Fields

```python
@dataclass
class CapturedResponse:
    url: str
    method: str
    headers: dict[str, str]           # Response headers
    body: dict | list | None          # Parsed JSON body
    request_headers: dict[str, str]   # Request headers (HTTP/2 pseudo-headers excluded from session)
    request_post_data: str | None
    cookies: list[Cookie]
```

## Manual Lifecycle

If you prefer not to use the context manager:

```python
browser = Browser(BrowserConfig())
await browser.start()
page = await browser.get_page()
# ... do work ...
await browser.stop()
```

## Development

```bash
# Install with dev dependencies
uv sync --group dev

# Run tests
uv run pytest

# Run tests with output
uv run pytest -v
```

### Project Structure

```
src/pwbase/
├── __init__.py                  # Public API surface
├── browser.py                   # Browser — core async Playwright wrapper
├── browser_config.py            # BrowserConfig dataclass
├── browser_type.py              # BrowserType enum
└── browser_session_extractor.py # BrowserSessionExtractor + CapturedResponse
tests/
├── conftest.py                  # Shared async mock fixtures
├── test_browser.py              # Unit tests for Browser (all three modes)
└── test_browser_session_extractor.py
```

## License

MIT
