Metadata-Version: 2.4
Name: urlps
Version: 0.2.1
Summary: Lightweight URL parsing and building helpers (RFC 3986-like).
Author: urlps maintainers
License-Expression: MIT
Project-URL: Homepage, https://github.com/micro/urlps
Project-URL: Repository, https://github.com/micro/urlps
Project-URL: Documentation, https://github.com/micro/urlps#readme
Project-URL: Changelog, https://github.com/micro/urlps/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/micro/urlps/issues
Project-URL: Bug Tracker, https://github.com/micro/urlps/issues
Keywords: url,parser,builder,rfc3986,validation,idna,ipv6,punycode,ipv4
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: idna
Requires-Dist: idna>=3.11; extra == "idna"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Requires-Dist: mypy>=1.19; extra == "dev"
Requires-Dist: bandit>=1.9; extra == "dev"
Requires-Dist: idna>=3.11; extra == "dev"
Dynamic: license-file

# urlps

Lightweight, secure URL parsing and building library with RFC 3986 compliance. Features comprehensive security protections including SSRF prevention, DNS rebinding detection, path traversal protection, and homograph attack detection.

## Installation

```bash
pip install urlps
```

Development setup:
```bash
python -m venv .venv
. .venv/Scripts/activate  # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
```

## Quick Start

```python
from urlps import parse_url, build

# Secure by default - blocks SSRF, private IPs, localhost
url = parse_url("https://api.example.com/data?token=abc#section")
print(url.host)  # api.example.com
print(url.query_params)  # [("token", "abc")]

# Build URLs
url_str = build("https", "example.com", port=8443, path="/api", query="x=1")
# https://example.com:8443/api?x=1

# Immutable with functional updates
url = parse_url("https://example.com/path")
new_url = url.with_host("other.com").with_port(8080)
print(new_url)  # https://other.com:8080/path
```

### Security

`parse_url()` blocks by default:
- Private IPs (192.168.x.x, 10.x.x.x, 172.16.x.x)
- Localhost and loopback addresses
- Link-local addresses (169.254.x.x)
- `.local` and `.internal` domains
- Path traversal patterns (`../`)
- Double-encoded characters
- Mixed Unicode scripts (homograph attacks)

Use `parse_url_unsafe()` for internal/development URLs:
```python
from urlps import parse_url_unsafe

dev_url = parse_url_unsafe("http://localhost:3000/api")
internal = parse_url_unsafe("http://192.168.1.100/metrics")
```

## Core Features

### Immutable URL Objects

```python
url = parse_url("https://user:pass@example.com:8080/path?token=abc")
print(url.netloc)         # user:pass@example.com:8080
print(url.effective_port) # 8080

# with_* methods return new URL objects
url2 = url.with_netloc("admin@example.com")
url3 = url.with_host("other.com").with_port(443).with_path("/api")
url4 = url.with_query_param("new", "value")
url5 = url.without_query_param("token")
```

### Security Checks

```python
from urlps import parse_url, InvalidURLError

# SSRF protection (enabled by default)
try:
    parse_url("http://localhost/admin")  # Blocked
except InvalidURLError as e:
    print(f"Rejected: {e}")

# DNS rebinding detection (optional)
url = parse_url("https://api.example.com/", check_dns=True)

# URL canonicalization
url = parse_url("HTTP://EXAMPLE.COM:80/path?z=1&a=2")
canonical = url.canonicalize()
print(canonical.scheme)  # "http"
print(canonical.host)    # "example.com"
print(canonical.port)    # None (default port removed)
print(canonical.query)   # "a=2&z=1" (sorted)

# Password masking
url = parse_url("https://admin:secret123@api.example.com/")
print(url.as_string(mask_password=True))  # https://admin:***@api.example.com/
```

### Audit Logging

```python
from urlps import set_audit_callback
import logging

def audit_url_parsing(raw_url, parsed_url, exception):
    if exception:
        logging.warning(f"Failed to parse URL: {exception}")
    else:
        logging.info(f"Parsed URL to host: {parsed_url.host}")

set_audit_callback(audit_url_parsing)
```

### Component Length Limits

Conservative limits to prevent DoS attacks:

| Component | Max Length |
|-----------|------------|
| URL (total) | 32 KB |
| Scheme | 16 chars |
| Host | 253 chars |
| Path | 4 KB |
| Query | 8 KB |
| Fragment | 1 KB |
| Userinfo | 128 chars |

## Environment Variables

Override length limits via environment variables:

```bash
# PowerShell
$env:URLPS_MAX_URL_LENGTH = "65536"
python -c "import urlps.constants as c; print(c.MAX_URL_LENGTH)"

# Bash
export URLPS_MAX_URL_LENGTH=65536
python -c 'import urlps.constants as c; print(c.MAX_URL_LENGTH)'
```

Supported variables:
- `URLPS_MAX_URL_LENGTH`
- `URLPS_MAX_SCHEME_LENGTH`
- `URLPS_MAX_HOST_LENGTH`
- `URLPS_MAX_PATH_LENGTH`
- `URLPS_MAX_QUERY_LENGTH`
- `URLPS_MAX_FRAGMENT_LENGTH`
- `URLPS_MAX_USERINFO_LENGTH`
- `URLPS_MAX_IPV6_STRING_LENGTH`

## API Reference

### Main Functions

| Function | Description |
| --- | --- |
| `parse_url(url, *, allow_custom_scheme=False, check_dns=False)` | Parse URL with security checks enabled (recommended) |
| `parse_url_unsafe(url, *, allow_custom_scheme=False, strict=False)` | Parse URL without security checks (trusted input only) |
| `build(*scheme_and_host, port=None, path="/", query=None, fragment=None, userinfo=None)` | Build URL string from components |
| `compose_url(components)` | Build URL from components dict |

### URL Methods

| Method | Description |
| --- | --- |
| `url.as_string(mask_password=False)` | Convert to string, optionally masking password |
| `url.canonicalize()` | Return canonicalized copy |
| `url.is_semantically_equal(other)` | Compare URLs by meaning after canonicalization |
| `url.same_origin(other)` | Check if URLs have same origin |
| `url.origin` | Return origin string (e.g., `https://example.com`) |
| `url.copy(**overrides)` | Create copy with optional component overrides |
| `url.with_*()` | Functional updates: `with_scheme`, `with_host`, `with_port`, `with_path`, `with_fragment`, `with_userinfo`, `with_netloc`, `with_query_param`, `without_query_param` |

### Cache Management

```python
from urlps import get_cache_info, clear_all_caches

# Get cache statistics
stats = get_cache_info()
print(stats['parser']['normalize_path']['hits'])

# Clear all caches (useful for long-running apps)
previous = clear_all_caches()
```

## Comparison with urllib.parse

| Feature | urllib.parse | urlps |
| --- | --- | --- |
| Basic URL parsing | ✓ | ✓ |
| RFC 3986 strict compliance | Partial | ✓ |
| SSRF protection | ✗ | ✓ |
| DNS rebinding detection | ✗ | ✓ |
| Path traversal detection | ✗ | ✓ |
| Homograph detection | ✗ | ✓ |
| Immutable URL objects | ✗ | ✓ |
| URL canonicalization | ✗ | ✓ |
| Password masking | ✗ | ✓ |
| Audit logging | ✗ | ✓ |
| Component length limits | ✗ | ✓ |

**Use urllib.parse when:** You need zero dependencies and basic parsing is sufficient.

**Use urlps when:** Security matters, you need RFC 3986 strict compliance, or you want immutable URL objects with ergonomic manipulation methods.

## Exceptions

```python
from urlps import InvalidURLError, HostValidationError, parse_url

try:
    url = parse_url(user_input)
except HostValidationError:
    print("Invalid hostname")
except InvalidURLError:
    print("Invalid URL")
```

Exception hierarchy:
- `InvalidURLError` — Base exception for all URL errors
- `URLParseError` — Parsing errors
- `URLBuildError` — Building errors
- `HostValidationError` / `PortValidationError` — Component validation errors
- `QueryParsingError`, `FragmentEncodingError`, `UserInfoParsingError`, `UnsupportedSchemeError` — Specific errors

## Running Tests

```bash
pytest
pytest -v -k "test_parse"     # Run specific tests
pytest -m ipv6                # Run IPv6 tests
pytest -m idna                # Run IDNA tests
```

## License

MIT
