Metadata-Version: 2.4
Name: propalgos
Version: 0.2.0
Summary: Quant research SDK for PropAlgos — data, signals, backtesting, and visualization
Project-URL: Homepage, https://propalgos.ai
Project-URL: Documentation, https://propalgos.ai/docs
Author-email: PropAlgos <eng@propalgos.ai>
License: Proprietary
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.28
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.2
Requires-Dist: pydantic>=2.8
Requires-Dist: typer[all]>=0.15
Requires-Dist: yfinance>=0.2.40
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: ipywidgets>=8.1; extra == 'dev'
Requires-Dist: mypy>=1.15; extra == 'dev'
Requires-Dist: plotly>=6.0; extra == 'dev'
Requires-Dist: pyarrow>=18.0; extra == 'dev'
Requires-Dist: pytest>=8.3; extra == 'dev'
Requires-Dist: ruff>=0.11; extra == 'dev'
Provides-Extra: keyring
Requires-Dist: keyring>=25.0; extra == 'keyring'
Provides-Extra: notebook
Requires-Dist: ipykernel>=6.29; extra == 'notebook'
Requires-Dist: jupyterlab>=4.2; extra == 'notebook'
Provides-Extra: viz
Requires-Dist: ipywidgets>=8.1; extra == 'viz'
Requires-Dist: plotly>=6.0; extra == 'viz'
Description-Content-Type: text/markdown

# propalgos

Quant research SDK for [PropAlgos](https://propalgos.ai) — data, signals, backtesting, and visualization on GPU cloud instances.

## Install

```bash
pip install propalgos

# with visualization (plotly)
pip install 'propalgos[viz]'

# with notebook support (jupyterlab)
pip install 'propalgos[notebook]'

# all optional deps
pip install 'propalgos[viz,notebook,keyring]'

# development
pip install -e ".[dev,viz]"
```

Or with uv:

```bash
uv add propalgos
uv add 'propalgos[viz]'
```

## Quickstart

```python
import propalgos as pa

prices = pa.data.prices("SPY", start="2020-01-01")
signal = pa.signals.sma_cross(prices, fast=20, slow=100)
bt = pa.backtest.run(prices, signal, initial_cash=100_000, fee_bps=1)
bt.summary()
pa.viz.tearsheet(bt)
```

## CLI

```bash
propalgos version     # print SDK version
propalgos doctor      # check deps and connectivity
propalgos env         # print runtime environment summary (GPU, frameworks, cloud)
propalgos --help      # all commands
```

The `pa` alias works too: `pa version`, `pa doctor`, `pa env`.

---

## Module Reference

### `pa.data` — Market Data

```python
# fetch OHLCV data (yfinance backend, disk-cached)
prices = pa.data.prices("SPY", start="2020-01-01", end="2024-12-31", interval="1d")

# multiple symbols → MultiIndex columns
prices = pa.data.prices(["SPY", "QQQ"], start="2023-01-01")

# returns
daily = pa.data.returns(prices)                    # simple returns
log_r = pa.data.returns(prices, method="log")      # log returns
cum   = pa.data.cumulative_returns(daily)          # cumulative

# load from file
prices = pa.data.from_csv("my_data.csv")

# bulk dataset loading (on-platform or with local parquets)
df = pa.data.load("crypto-kraken", interval="1h")  # stacked multi-symbol DataFrame
datasets = pa.data.list_datasets()                  # discover available datasets
symbols = pa.data.list_symbols("crypto-kraken", interval="1d")  # symbols in a dataset

# swap data backend
from propalgos.data.prices import DataProvider
pa.data.configure(my_custom_provider)  # must implement DataProvider protocol
pa.data.clear_cache()                  # purge disk cache
```

| Function | Parameters | Returns |
|----------|-----------|---------|
| `prices` | `symbols`, `start=None`, `end=None`, `interval="1d"`, `cache=True` | `pd.DataFrame` |
| `returns` | `prices_df`, `method="simple"` | `pd.DataFrame` |
| `cumulative_returns` | `returns_df` | `pd.DataFrame` |
| `from_csv` | `path` | `pd.DataFrame` |
| `load` | `dataset_id`, `interval="1d"` | `pd.DataFrame` |
| `list_datasets` | — | `list[dict]` |
| `list_symbols` | `dataset_id`, `interval="1d"` | `list[str]` |
| `configure` | `provider: DataProvider` | `None` |
| `clear_cache` | — | `int` (files removed) |

**DataProvider protocol** — implement `name: str` property and `fetch_prices(symbols, start, end, interval) -> pd.DataFrame`.

---

### `pa.signals` — Signal Generators

All signal functions accept a `pd.DataFrame` with a `close` column and return a `pd.Series` of `{-1, 0, 1}` (sell, flat, buy). Warmup periods output `0`.

```python
sig = pa.signals.sma_cross(prices, fast=20, slow=100)
sig = pa.signals.rsi_reversion(prices, period=14, overbought=70, oversold=30)
sig = pa.signals.combine([sig_a, sig_b], method="and")

# raw RSI values (0-100 float, not a signal)
rsi_values = pa.signals.rsi(prices, period=14)

# multi-symbol operations (stacked DataFrames)
df = pa.signals.rsi_stacked(stacked_df, period=14, symbol_col="symbol")  # adds 'rsi' column
df = pa.signals.scan(stacked_df, pa.signals.sma_cross, symbol_col="symbol")  # adds 'signal' column
```

| Function | Parameters | Strategy |
|----------|-----------|----------|
| `sma_cross` | `fast=20`, `slow=100` | SMA crossover — long when fast > slow |
| `ema_cross` | `fast=12`, `slow=26` | EMA crossover |
| `trend_filter` | `period=50` | Trend following — long above SMA, short below |
| `rsi_reversion` | `period=14`, `overbought=70`, `oversold=30` | RSI mean reversion |
| `rsi` | `period=14` | Raw RSI values (0–100 float, not a signal) |
| `macd` | *(default MACD params)* | MACD histogram sign |
| `bollinger_bands` | `period=20`, `num_std=2.0` | Band extremes — long at lower, short at upper |
| `zscore_reversion` | `period=20`, `entry_z=2.0`, `exit_z=0.5` | Z-score extremes |
| `combine` | `signals: list`, `method="and"\|"or"\|"majority"` | Combine multiple signals |
| `rsi_stacked` | `df`, `period=14`, `symbol_col="symbol"` | Vectorized RSI across stacked multi-symbol DataFrames |
| `scan` | `df`, `signal_fn`, `symbol_col="symbol"` | Generic groupby applicator — apply any signal function across symbols |

---

### `pa.factors` — Performance Metrics & Risk Analytics

All functions accept a `pd.Series` of daily simple returns.

```python
pa.factors.sharpe(returns)           # 1.42
pa.factors.max_drawdown(returns)     # 0.187
pa.factors.var(returns, 0.95)        # 0.023
```

**Performance** (annualized where applicable):

| Function | Parameters | Returns |
|----------|-----------|---------|
| `sharpe` | `rf=0.0` | `float` |
| `sortino` | `rf=0.0` | `float` |
| `calmar` | — | `float` |
| `total_return` | — | `float` |
| `annualized_return` | — | `float` |
| `annualized_volatility` | — | `float` |
| `win_rate` | — | `float` |
| `profit_factor` | — | `float` |

**Risk:**

| Function | Parameters | Returns |
|----------|-----------|---------|
| `var` | `confidence=0.95` | `float` |
| `cvar` | `confidence=0.95` | `float` |
| `max_drawdown` | — | `float` |
| `drawdown_series` | — | `pd.Series` |
| `beta` | `strategy_returns, benchmark_returns` | `float` |
| `rolling_volatility` | `window=20` | `pd.Series` |

**Constant:** `pa.factors.TRADING_DAYS = 252`

---

### `pa.backtest` — Vectorized Backtesting

```python
result = pa.backtest.run(
    prices,                    # DataFrame with 'close' column
    signal,                    # Series of {-1, 0, 1}
    initial_cash=100_000,      # starting capital
    fee_bps=5,                 # 0.05% per trade
    slippage_bps=2,            # 0.02% slippage
    benchmark="SPY",           # optional benchmark (str or DataFrame)
)

result.summary()       # formatted text output
result.to_dict()       # serializable dict
result.to_json()       # JSON string
```

**BacktestResult attributes:**

| Attribute | Type | Description |
|-----------|------|-------------|
| `equity_curve` | `pd.Series` | Dollar equity by date |
| `returns` | `pd.Series` | Daily simple returns |
| `positions` | `pd.Series` | Position at each bar (-1, 0, 1) |
| `trades` | `pd.DataFrame` | One row per position change |
| `drawdowns` | `pd.Series` | Peak-to-trough drawdown series |
| `benchmark_equity` | `pd.Series \| None` | Benchmark equity curve |
| `benchmark_returns` | `pd.Series \| None` | Benchmark daily returns |
| `metrics` | `dict[str, float]` | Sharpe, Sortino, max DD, CAGR, etc. |
| `initial_cash` | `float` | Starting capital |
| `cost_model` | `CostModel` | Fee + slippage config |

**CostModel:** `CostModel(fee_bps=0, slippage_bps=0)` — immutable cost config.

---

### `pa.viz` — Visualization

Requires `plotly` — install with `pip install 'propalgos[viz]'`.

```python
pa.viz.tearsheet(result)                          # multi-panel backtest tearsheet
pa.viz.candles(prices, indicators=["sma20", "bb20"], signals=signal)
pa.viz.monthly_returns(result)                    # years x months heatmap
pa.viz.heatmap(matrix, colorscale="rsi", fmt=".0f", zrange=(0, 100))
pa.viz.strip(data, x="timeframe", y="rsi", color="symbol",
             thresholds={"Oversold": (0, 30), "Overbought": (70, 100)})
```

| Function | Description |
|----------|-------------|
| `tearsheet(result)` | Equity curve + drawdown + monthly returns + metrics panel. Auto-shows in notebooks. |
| `candles(prices, indicators=None, signals=None, title=None)` | Candlestick chart with optional SMA/EMA/BB overlays and buy/sell markers. |
| `monthly_returns(result_or_returns)` | Monthly returns heatmap (years x months). Accepts BacktestResult or raw Series. |
| `heatmap(z, colorscale="diverging", fmt=".1%", title="", zrange=None)` | General-purpose heatmap for any DataFrame matrix. |
| `strip(data, x, y, color, thresholds=None, title="", y_range=None)` | Strip/jitter plot with optional threshold bands for signal distribution. |

**Indicator specs** for `candles()`: `sma{n}`, `ema{n}`, `bb{n}` (e.g. `"sma20"`, `"ema50"`, `"bb20"`).

**Colorscale presets**: `"diverging"` (red → green), `"sequential"` (green → red), `"rsi"` (green → red, 5-stop). Or pass a raw Plotly colorscale list.

#### Dark/Light Mode

```python
pa.viz.set_mode("light")          # switch to light theme
pa.viz.tearsheet(result)          # now renders with light palette

pa.viz.set_mode("dark")           # back to default dark theme
```

All chart functions read from the active palette automatically. `COLORS` dict and `COLORSCALES` dict update in place when mode changes.

| Function/Object | Description |
|-----------------|-------------|
| `set_mode("dark" \| "light")` | Set global theme mode |
| `colors()` | Returns active color palette dict |
| `colorscales()` | Returns active colorscale presets dict |
| `COLORS` | Active palette dict (mutates with `set_mode`) |
| `COLORSCALES` | Active colorscale presets (mutates with `set_mode`) |
| `apply_theme(fig)` | Apply PropAlgos theme to any Plotly figure |

---

### `pa.env` — Environment Detection

```python
pa.env.summary()
# PropAlgos Environment
# ─────────────────────
# Runtime:    PropAlgos Cloud (AWS us-east-1, g4dn.xlarge)
# Python:     3.11.9
# GPU:        NVIDIA T4 (16 GB) — CUDA 12.4
# Frameworks: PyTorch 2.3.0 (GPU), RAPIDS cuDF 24.10 (GPU)

gpu = pa.env.gpu()          # GPUInfo(available=True, name="NVIDIA T4", ...)
fws = pa.env.frameworks()   # [FrameworkInfo(name="torch", version="2.3.0", gpu_enabled=True), ...]
rt  = pa.env.detect_runtime()  # RuntimeInfo(on_propalgos=True, cloud_provider="aws", ...)
```

| Function | Returns | Description |
|----------|---------|-------------|
| `summary()` | `None` | Print formatted environment info to stdout |
| `gpu()` | `GPUInfo` | GPU hardware detection (nvidia-smi → torch → tensorflow fallback) |
| `in_propalgos()` | `bool` | True if running on a PropAlgos-managed instance |
| `frameworks()` | `list[FrameworkInfo]` | Detected ML frameworks (torch, tensorflow, jax, RAPIDS) |
| `detect_runtime()` | `RuntimeInfo` | Cloud provider, region, instance type, Python version |

**GPUInfo fields**: `available`, `name`, `memory_mb`, `driver_version`, `cuda_version`, `count`

**RuntimeInfo fields**: `on_propalgos`, `cloud_provider`, `region`, `instance_type`, `instance_id`, `python_version`, `platform`

---

### `pa.config` — Configuration

```python
from propalgos.config import get_config

cfg = get_config()
cfg.api_url          # "https://api.propalgos.ai"
cfg.token            # from PROPALGOS_TOKEN env
cfg.is_on_platform   # True if PROPALGOS_INSTANCE_ID is set
cfg.cache_dir        # ~/.cache/propalgos
```

**Environment variables:**

| Variable | Default | Description |
|----------|---------|-------------|
| `PROPALGOS_API_URL` | `https://api.propalgos.ai` | Backend API URL |
| `PROPALGOS_TOKEN` | `None` | Auth token |
| `PROPALGOS_ENV` | `production` | Environment name |
| `PROPALGOS_CACHE_DIR` | `~/.cache/propalgos` | Disk cache directory |
| `PROPALGOS_INSTANCE_ID` | `None` | Set automatically on PropAlgos instances |
| `PROPALGOS_CUDF_AUTO` | `1` | Set to `0` to disable auto GPU acceleration |

---

### `pa.exceptions` — Error Hierarchy

All exceptions inherit from `PropAlgosError(message, code)`.

| Exception | Code | Description |
|-----------|------|-------------|
| `PropAlgosError` | varies | Base exception with machine-readable `code` |
| `AuthenticationError` | `auth_error` | Invalid or expired token |
| `APIError` | `api_error` | Backend API call failed (`status_code`, `response_body`) |
| `InsufficientCreditsError` | `insufficient_credits` | Balance too low (`balance`, `required`) |
| `DataError` | `data_error` | Market data fetch failed |
| `BacktestError` | `backtest_error` | Unrecoverable backtest issue |
| `ConfigError` | `config_error` | Missing or invalid configuration |

---

## GPU Acceleration

On any machine with RAPIDS installed, `import propalgos` automatically activates `cudf.pandas` — all pandas operations across the SDK and your own code run on the GPU with zero code changes:

```python
import propalgos as pa
import pandas as pd

# both use GPU transparently — no %load_ext or manual setup needed
prices = pa.data.prices("SPY", start="2020-01-01")
df = pd.read_parquet("my_data.parquet")  # also GPU-accelerated
```

Auto-activation happens whenever `cudf.pandas` is importable. On PropAlgos GPU instances this is always the case (RAPIDS is pre-installed). If you're on a GPU instance and cuDF is missing, the SDK emits a visible warning.

### What runs on GPU

The SDK is optimized end-to-end for NVIDIA GPU acceleration via cudf.pandas and direct cuDF APIs:

**Data loading** — `pa.data.load()` and `CryptoParquetProvider` use `cudf.read_parquet()` directly (bypassing the cudf.pandas shim) for zero-copy NVMe-to-GPU memory loads when GPU Direct Storage (GDS) is available. Data lands on the GPU from the first instruction — no CPU-to-GPU transfer before computation.

**Data staging** — FUSE-mounted parquet files are staged to local NVMe via kvikIO (GDS zero-copy) when available, falling back to a 16 MB buffered copy (250x larger than the default 64 KB shutil buffer).

**Signal generation** — all signal functions (`sma_cross`, `ema_cross`, `rsi_reversion`, `macd`, `bollinger_bands`, `zscore_reversion`, `trend_filter`) use vectorized pandas operations (`.rolling()`, `.ewm()`, `.clip()`, boolean indexing) that map directly to cuDF GPU kernels. No lambdas, no row-level Python loops.

**Multi-symbol operations** — `rsi_stacked()` uses a boundary-nulling pattern to compute RSI across all symbols in a single vectorized pass (no `groupby().apply()`). `scan()` uses `groupby().apply()` which cuDF can parallelize across groups.

**Backtesting** — the entire backtest engine is vectorized: `.pct_change()`, `.cumprod()`, `.diff()`, `.fillna()`, scalar arithmetic. Position tracking uses boolean indexing instead of lambdas.

**Factor computation** — all risk and performance metrics use GPU-native pandas methods (`.quantile()` instead of `np.percentile()`, `.std()`, `.mean()`, `.cumprod()`, `.cummax()`). Numpy ufuncs are only called on scalars, never on Series.

**Visualization** — aggregations (min, max, abs) compute on GPU before transferring to CPU for Plotly rendering. Monthly returns use vectorized `resample().prod()` instead of `groupby().apply(lambda)`.

**Cache** — cache hits use direct `cudf.read_parquet()`, bypassing shim dispatch overhead. Parquet files are written with zstd compression and write statistics enabled for optimal GPU reads.

### Writing GPU-compatible code

When extending the SDK or writing notebook code, follow these rules so cudf.pandas can accelerate everything:

1. **Use vectorized numpy ufuncs as free functions, not inside `.apply()`** — `np.log(series)` dispatches to GPU via `__array_ufunc__`. `series.apply(np.log)` forces per-row CPU fallback.
2. **Avoid lambdas in `.apply()`** — cuDF can't JIT Python lambdas. Use vectorized boolean indexing: `result[series > 0] = "a"` instead of `series.apply(lambda x: "a" if x > 0 else "b")`.
3. **Use `.quantile()` instead of `np.percentile()`** — `series.quantile(0.95)` stays on GPU; `np.percentile(series, 95)` forces CPU.
4. **Avoid `groupby().apply()` with arbitrary functions** when possible — prefer boundary-nulling patterns (see `rsi_stacked()` for the canonical example).

### Check GPU status

```python
>>> pa.gpu_status()
{
    'cudf_active': True,
    'gpu_available': True,
    'rapids_version': '24.10',
    'device': 'NVIDIA A100-SXM4-40GB',
    'instance_id': 'i-0abc123def456',
    'cloud_provider': 'aws',
    'region': 'us-east-1',
    'instance_type': 'p3.2xlarge',
    'on_platform': True,
    'python_version': '3.11.9',
    'sdk_version': '0.1.0',
    'warning': None
}
```

If `cudf_active` is `False` and `gpu_available` is `True`, the `warning` field explains why.

**Opt out** (for benchmarking or debugging):

```bash
export PROPALGOS_CUDF_AUTO=0
```

**Off-platform** (local dev, Colab, etc.) — activate manually before importing propalgos:

```python
%load_ext cudf.pandas   # or: import cudf.pandas; cudf.pandas.install()

import propalgos as pa
```

---

## Development

```bash
cd sdk/python

# install with dev + viz extras
pip install -e ".[dev,viz]"
# or with uv
uv sync --extra dev

# run tests (153 tests)
uv run -- python -m pytest tests/ -v

# lint
uv run -- ruff check src tests

# typecheck
uv run -- mypy src
```

### Test files

| File | Coverage |
|------|----------|
| `test_import.py` | Package structure, namespaces, exports |
| `test_signals.py` | All 8 signal functions + combine, raw RSI, `rsi_stacked`, `scan` |
| `test_factors.py` | Performance and risk metrics |
| `test_backtest.py` | Engine, results, costs, edge cases |
| `test_viz.py` | Heatmap, strip, theme modes, colorscales, monthly returns |
| `test_data.py` | Crypto providers, dataset registry, `load()`, `list_datasets()`, `list_symbols()` |
| `test_gpu_compat.py` | GPU optimization regression tests — log returns, signal composition, staging, cache, viz helpers, monthly matrix, CSV loading, scan groupby |

---

## License

Proprietary — PropAlgos, Inc. See [LICENSE](./LICENSE).
