Metadata-Version: 2.4
Name: ddrtree-python
Version: 0.1.6
Summary: Python port of the R DDRTree package (tracks R DDRTree 0.1.6, CRAN 2026-02-24) — learns principal graphs via reversed graph embedding.
Author-email: Jeffery Liu <jeffliu.lucky@gmail.com>
License-Expression: Artistic-2.0
Project-URL: Homepage, https://github.com/Bio-Babel/DDRTree-python
Project-URL: Repository, https://github.com/Bio-Babel/DDRTree-python
Project-URL: Issues, https://github.com/Bio-Babel/DDRTree-python/issues
Project-URL: Documentation, https://github.com/Bio-Babel/DDRTree-python#readme
Project-URL: Upstream (R package), https://cran.r-project.org/package=DDRTree
Project-URL: Original paper (KDD'15), https://dl.acm.org/doi/10.1145/2783258.2783309
Keywords: DDRTree,principal-graph,reversed-graph-embedding,dimensionality-reduction,single-cell,trajectory,pseudotime,monocle,R-port
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: scikit-learn>=1.0
Provides-Extra: torch
Requires-Dist: torch>=2.1; extra == "torch"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# DDRTree-python

[![PyPI](https://img.shields.io/pypi/v/ddrtree-python)](https://pypi.org/project/ddrtree-python/)

AI-assisted Python port of the [DDRTree](https://cran.r-project.org/package=DDRTree) R
package — **Discriminative Dimensionality Reduction via learning a Tree** — from the
KDD'15 paper by Qi Mao, Li Wang, Steve Goodison and Yijun Sun.

Tracks the R CRAN release **DDRTree 0.1.6** (2026-02-24).

DDRTree simultaneously:

1. Reduces high-dimensional data to a low-dimensional latent space `Z`,
2. Learns an explicit smooth principal tree graph embedded in that space, and
3. Obtains a soft clustering of points onto the tree nodes.

It is the dimensionality-reduction backbone used by Monocle 2 for single-cell
pseudotime / branching-trajectory inference, but is a general-purpose algorithm
for any data with a tree-like intrinsic structure.

## Installation

```bash
# From PyPI (distribution: ddrtree-python, import: ddrtree)
pip install ddrtree-python             # NumPy backend only
pip install ddrtree-python[torch]      # + PyTorch backend (CPU / CUDA)
```

For local development:

```bash
git clone https://github.com/Bio-Babel/DDRTree-python.git
cd DDRTree-python
pip install -e ".[dev]"
```

The core depends on `numpy`, `scipy`, and `scikit-learn`. The optional
`torch` extra enables the GPU-friendly backend — no C/C++ extensions are
built either way.

## Quick start

```python
import numpy as np
from ddrtree import DDRTree

# X is a D x N matrix (features x samples), matching the R convention.
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 200))

res = DDRTree(X, dimensions=2, max_iter=20,
              sigma=1e-3, lambda_=None, ncenter=50,
              gamma=10.0, tol=1e-3, verbose=False)

res.Z      # 2 x 200  reduced-dimension embedding
res.Y      # 2 x 50   principal-graph node coordinates
res.W      # 10 x 2   orthogonal projection basis
res.stree  # N x N scipy.sparse MST weights (first K x K block populated)
res.objective_vals  # objective at each iteration
```

## Backends

`DDRTree` dispatches to one of two computational backends via the
`backend` argument. The public function signature is otherwise unchanged.
Backend selection is always explicit — there is no auto-detection.

| `backend` | Executes on | Typical use                                  |
| --------- | ----------- | -------------------------------------------- |
| `"numpy"` | CPU (NumPy) | **Default.** Reference path, aligned with R. |
| `"torch"` | CPU / CUDA  | GPU acceleration (Borůvka MST, fast BLAS).   |

```python
# GPU path — requires torch with CUDA
res = DDRTree(X, ncenter=500, backend="torch", device="cuda")

# Half precision (torch backend only). Memory ½, throughput ~1.5–2× on
# CUDA, ~1e-3 relative drift in the converged embedding.
res = DDRTree(X, ncenter=500, backend="torch", device="cuda", dtype="float32")
```

Torch backend runs a **pure-torch parallel Borůvka** (`O(log K)` rounds)
for the MST step — no host round trip per iteration. Explicit
`mst_algorithm="prim"` or `"kruskal"` routes MST through NumPy / SciPy
for strict parity testing against the R gold standard; see
`tests/test_boruvka_integration.py`.

### Known differences between backends

* **PCA initialisation.** NumPy uses `scipy.sparse.linalg.svds` (iterative,
  Lanczos, matches R's `irlba`). Torch uses a direct `torch.linalg.svd`
  for the truncated branch — identical subspace, small per-iteration
  numerical drift.
* **K-means.** Both backends call `sklearn.cluster.KMeans` on CPU (small K,
  negligible overhead). No GPU K-means yet.
* **Cholesky fallback.** When the `tmp_M` system drifts non-PSD both
  backends fall back to LU and emit a `RuntimeWarning`. The Y-update
  Cholesky is strict in both backends — the `(λ/γ)L + Γ` system is PSD
  by construction, any failure there is surfaced rather than masked.

## Numerical parity with the R package

The test suite runs the same inputs through the R `DDRTree` package and compares
the results, for both backends. Eigen-vector sign flips (inherent to
eigen-decompositions) are handled in tests. See
`tests/scripts/generate_gold_standard.R`, `tests/test_ddrtree.py` (NumPy)
and `tests/test_backend_torch_gold.py` (Torch).

## Reference

Qi Mao, Li Wang, Steve Goodison, Yijun Sun.
*Dimensionality Reduction via Graph Structure Learning*.
KDD'15. https://dl.acm.org/doi/10.1145/2783258.2783309
