Metadata-Version: 2.4
Name: agent_aid
Version: 1.7.0
Summary: Local Windows desktop control for AI agents — Python library and CLI.
Author: agent_aid
License: MIT
Project-URL: Documentation, https://github.com/local/agent_aid
Keywords: windows,automation,ai-agent,rpa,computer-use
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mss>=10.1.0
Requires-Dist: uiautomation>=2.0

# agent-aid

A small Python library and CLI to control the local Windows desktop from AI
agents. Open Interpreter, GPT, Claude, or your own agent — call `agent-aid` to
read the screen, drive mouse and keyboard, manage windows, and find UI
elements by their accessibility name (no coordinate guessing). Two
dependencies: `mss` (screen capture) and `uiautomation` (UI tree).

```
pip install agent-aid
```

With `uv` (recommended — installs Python automatically if needed):

```
uv tool install agent-aid --python 3.11
```

## Quick look

```
agent-aid health
agent-aid state
agent-aid screenshot active_window=true save_path=captures/active.png include_base64=false
agent-aid click x=500 y=300
agent-aid type text="hello"
agent-aid press keys=ctrl+s
agent-aid focus_window title_fragment=Chrome
agent-aid open target=https://example.com
```

List every route:

```
agent-aid --list
```

Print the full AI-oriented usage spec to stdout (markdown):

```
agent-aid --readme
```

## Capabilities

- Screenshots: full desktop, single monitor, active window, specific `hwnd`, or rectangular region
- PNG `sha256` hash returned for every screenshot (use with `wait_screen_change`)
- Mouse: click, double-click, right-click, drag, move, scroll (vertical/horizontal), hold buttons
- Drag with optional `pre_hold_ms`, `post_hold_ms`, `steps` for finicky shape/preview UIs
- Keyboard: short text, hotkeys (`ctrl+shift+a`), modifier hold/release
- Clipboard: `clipboard text=...` + `press keys=ctrl+v` for long pastes
- Windows: find, focus, minimize/maximize/restore/close, move/resize, hide/show
- **UI Automation: find/click/read/write elements by accessibility name (no coordinate guessing)**
- System: list processes, open file/URL/`shell:` target, read pixel color, query state
- Verify: `wait_screen_change`, `wait_pixel`, `wait_window`
- `batch` for atomic multi-step flows in a single CLI call

## UI Automation (recommended over raw coordinates)

```
# Find an element by partial name match (case-insensitive)
agent-aid ui_find name="Save"
agent-aid ui_find name="Yapıştır" control_type=Button hwnd=12345

# Find every match (use index=N to pick the n-th, e.g. for repeated names)
agent-aid ui_find_all control_type=Button hwnd=12345 limit=20
agent-aid ui_click name="Delete" index=1                       # 2nd "Delete"

# Find + click in one call
agent-aid ui_click name="OK"

# Read the text/value of a control (uses ValuePattern → TextPattern → Name)
agent-aid ui_text automation_id="UrlBar" hwnd=12345

# Append text into an edit/combo via UIA (safe-by-default: does NOT clear
# existing content, inserts at cursor). Full Unicode, no key races.
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="https://example.com"

# Replace the entire field — opt-in destructive mode
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="..." clear=true

# Dump the accessibility tree (filter by control_type to keep small)
agent-aid ui_dump hwnd=12345 max_depth=4 limit=100
agent-aid ui_dump hwnd=12345 max_depth=10 control_type=ListItem
```

Works on Paint, Office, Edge, Chrome, native Win32, and most modern UWP apps.
Selector args: `name` (partial), `control_type`, `automation_id` (exact),
`class_name` (partial), `hwnd` (root scope), `index`, `max_depth`, `timeout_ms`.

## Targeting windows / coordinates

All coordinates are physical screen pixels. To work relative to a specific
window:

```
agent-aid click x=120 y=80 relative_to=active_window
agent-aid click x=120 y=80 hwnd=123456
```

## Use as a Python library

```python
from agent_aid import core

core.set_dpi_aware()
print(core.active_window())
core.click(800, 500)
core.type_text("hello")
```

Same capabilities as the CLI — just call the `core` module directly from your
Python code.

## Argument formats

```
# key=value (shortest)
agent-aid click x=500 y=300 button=left

# JSON (for nested fields)
agent-aid screenshot '{"region":{"left":0,"top":0,"width":800,"height":600},"save_path":"r.png"}'

# Pretty-print output
agent-aid --pretty state
```

## Practical AI agent flow

1. `agent-aid state` — see what you're looking at
2. `agent-aid screenshot active_window=true save_path=captures/now.png include_base64=false`
3. Inspect the image, choose target coordinates
4. Act: `click` / `type` / `press` / `clipboard`
5. Verify: another `screenshot` or `wait_screen_change`

## Safety notes

- Sends real mouse and keyboard input — types into the focused window.
- `clipboard` overwrites the user's clipboard.
- `open` launches a Windows target (same effect as a user double-click).
- `window/manage close` posts `WM_CLOSE` — apps with unsaved data may prompt.
- After any action, prefer to verify with `wait_*` or a fresh `screenshot`.

## License

MIT.
