Metadata-Version: 2.4
Name: simflux
Version: 0.1.1
Summary: MINFLUX-like simulator
Author: Jack Peyton
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENCE
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: plotly>=5.0
Requires-Dist: h5py>=3.7
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Dynamic: license-file

# SimFlux — MINFLUX-like simulator (NPC and DNA-Origami)

Real MINFLUX - and single molecule localisation microscopy at large - datasets rarely have a perfect, known ground truth. This package generates synthetic SMLM datasets representative of NPC-like structures, or of DNA-Origami, with user-customised settings impacting replicating the effects of biolabelling, clutter, or the probability of oligomer chains. SimFlux aims to be explicit about what’s generated and how, so results are reproducible and assumptions are visible. The code focuses on two simple structures that cover a lot of benchmarking needs:

- **poly** — regular polygons acting as monomers, assembled into **linear oligomers** with a spacing rule that enforces a single shared interface between neighbors.
- **grid** — DNA‑origami–style **point lattices** (`rows × cols`) with a fixed node separation, randomly rotated and placed in a field of view.

Both modes share the same measurement model (label assignment, measurement noise, optional clutter) and the same I/O layout (in‑memory return or HDF5 on disk).

---

## The Simulating Process

This section spells out the generation steps and how the parts fit together.

### 1) Centroids
A **centroid** anchors each object: either a monomer in a chain (poly) or a whole grid (grid). The number of groups to place is drawn from a Poisson with mean `--centroid-mean`, then each group attempts placement inside `--xrange/--yrange` with a **hard‑core** rule that prevents different groups from getting too close (distance threshold depends on the geometry: `2 × radius` for poly, `2 × R` for grid). Multiple attempts are made per group up to an internal budget; if no acceptable spot is found, the group is skipped.

### 2) Geometry per centroid
- **poly**: for each chain, a polygon type is sampled from `--mix`, and an **oligomer length** (number of monomers) is sampled from `--oligomers`. Adjacent monomers are spaced so that one full edge is shared; optionally, the two shared vertices are snapped to coincide exactly (`--enforce-shared-edge`).
- **grid**: a centered `rows × cols` lattice with node separation `--sep` is rotated by a random angle, then translated to the grid centroid. The effective grid radius `R` is computed from the farthest node offset.

### 3) Emitters
Every retained polygon vertex (poly) or lattice node (grid) becomes an **emitter**. Each emitter is independently tagged **labelled** with probability `p` (unlabelled otherwise). Two uncertainties can be introduced:
- `gt_uncertainty` (ground‑truth jitter) perturbs the emitter anchors before any measurements are drawn.
- `ms_uncertainty` (measurement noise) is used when drawing individual measurement locations.

### 4) Measurements
Each **labelled** emitter draws a Poisson number of **measurement attempts** (`measured`). Every attempt generates a noisy coordinate by adding Gaussian scatter with standard deviation `ms_uncertainty` to the emitter location. A Bernoulli filter with probability `q` keeps or discards each noisy point. Unlabelled emitters do not generate measurements.

### 5) Clutter (optional)
If `clutter_fraction` is set, a number of **clutter clusters** equal to `floor(clutter_fraction × N_labelled)` is created. Clutter cluster centers are sampled inside the bounding box of the emitters but are kept away from real centroids by an exclusion radius. Each clutter cluster then draws measurements using the same Poisson + Gaussian model (with the same `measured` and `ms_uncertainty`), but these points are tagged `type="clutter"` and have `emitter_id = -1`.

### 6) Membrane lifting (optional, on write)
If a callable `membrane_function(x, y)` is supplied and a filename is given, 2D coordinates are lifted to 3D by appending `z = membrane_function(x, y)` **when writing the HDF5 file**. In‑memory results remain 2D unless you explicitly apply the membrane first.

### 7) Audits & reproducibility
After placement, a quick **hard‑core audit** checks inter‑group centroid spacing and prints a short report if any pairs violate the rule. All top‑level routines accept a `seed` and set NumPy’s RNG accordingly inside the function for deterministic runs.

---

## Features

- Oligomer‑aware centroid placement with **hard‑core spacing** between different groups
- Grid placement with automatic **effective radius R**
- **Shared‑edge enforcement** for polygon chains
- Configurable **measurement model**: `p`, `q`, `measured`, `gt_uncertainty`, `ms_uncertainty`
- Optional **clutter** proportional to the number of labelled emitters
- Optional **membrane lifting** to 3D on write
- Compact, descriptive **auto‑naming** of output files
- **Plotly** 2D quicklook
- **pytest** test suite

---

## Requirements

- Python ≥ 3.9 (tested with 3.13)
- NumPy ≥ 1.23
- pandas ≥ 1.5
- Plotly ≥ 5.0
- h5py ≥ 3.7

---

## Installation

Create and activate a fresh environment (example with `venv`):

```bash
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
```

Install in editable mode (recommended for development):

```bash
pip install -e .
```

Optionally, install dev tools:

```bash
pip install -e ".[dev]"
```

Quick verification:

```bash
python -c "import simflux, pkgutil; print(simflux.__file__)"
```

---

## Project layout

```
simflux/
├─ simflux/               # Python package
│  ├─ __init__.py
│  └─ simflux.py          # simulator module (CLI & API)
├─ tests/                 # pytest tests
│  └─ tests.py
├─ pyproject.toml
└─ README.md
```

Run tests from the repository root so the package import resolves cleanly.

---

## Command‑line usage

Global options (e.g. `--plot`) must appear **before** the subcommand (`poly` or `grid`) due to argparse rules.

```
python -m simflux.simflux \
  --xrange XMIN XMAX \
  --yrange YMIN YMAX \
  --centroid-mean MU \
  [--p P] [--q Q] [--measured LAMBDA] \
  [--gt-uncertainty SIG_GT] [--ms-uncertainty SIG_MS] \
  [--clutter-fraction FRACTION] \
  [--plot] [--seed SEED] \
  -o OUT_PREFIX \
  {poly|grid} ...
```

### poly‑specific options

```
poly
  --radius R
  --mix "polygon:S@W[,polygon:S@W...]"
  [--oligomers "K:W[,K:W...]"]   # or short names: mono/di/tri/...
  [--enforce-shared-edge | --no-enforce-shared-edge]
  [--store-edges]
```

### grid‑specific options

```
grid
  --grid ROWS COLS
  --sep SEPARATION
```

---

## Examples

### Octagons (poly mode)

Octagonal monomers, monomer‑only (no oligomers), modest noise, write HDF5 and show a 2D plot.

```bash
python -m simflux.simflux \
  --xrange 0 80 \
  --yrange 0 80 \
  --centroid-mean 6 \
  --p 0.8 \
  --q 0.9 \
  --measured 5 \
  --gt-uncertainty 0.05 \
  --ms-uncertainty 0.5 \
  --clutter-fraction 0.1 \
  --plot \
  --seed 42 \
  -o out/sim \
  poly \
  --radius 3.5 \
  --mix "polygon:8@1.0" \
  --oligomers "mono:1.0" \
  --enforce-shared-edge
```

This produces a file named like:

```
out/sim_poly-8_olig-mono1_p-0.8_q-0.9_cl-0.1_mu-6_R-3.5_gt-0.05_ms-0.5_SE_seed-42.h5
```

The 2D plot shows centroids, emitter anchors, observed points, and clutter (if any).

---

### 3×3 DNA‑origami grid (grid mode)

A 3×3 lattice with 10‑unit separation, moderate noise.

```bash
python -m simflux.simflux \
  --xrange 0 120 \
  --yrange 0 120 \
  --centroid-mean 4 \
  --p 0.75 \
  --q 0.8 \
  --measured 6 \
  --gt-uncertainty 0.05 \
  --ms-uncertainty 0.4 \
  --clutter-fraction 0.05 \
  --plot \
  --seed 99 \
  -o out/grid \
  grid \
  --grid 3 3 \
  --sep 10
```

The `R` term in the filename is the automatically computed effective radius of the grid.

---

## Python API quickstart

```python
from simflux.simflux import (
    simulate_poly, simulate_grid, plot2d,
    parse_mix, parse_oligomers
)

# Poly: octagons
geom_mix = parse_mix("polygon:8@1.0")
oligs    = parse_oligomers("mono:1.0")

poly_data = simulate_poly(
    filename=None,
    xrange=(0, 80), yrange=(0, 80),
    centroid_mean_groups=6,
    radius=3.5,
    geom_mix=geom_mix,
    oligomer_pmf=oligs,
    p=0.8, q=0.9, measured=5,
    gt_uncertainty=0.05, ms_uncertainty=0.5,
    clutter_fraction=0.1,
    membrane_function=None,
    enforce_shared_edge=True,
    store_edges=False,
    seed=42,
)
plot2d(data=poly_data)

# Grid: 3×3
grid_data = simulate_grid(
    filename=None,
    xrange=(0, 120), yrange=(0, 120),
    centroid_mean_groups=4,
    rows=3, cols=3, sep=10,
    p=0.75, q=0.8, measured=6,
    gt_uncertainty=0.05, ms_uncertainty=0.4,
    clutter_fraction=0.05,
    membrane_function=None,
    seed=99,
)
plot2d(data=grid_data)
```

---

## Output data model (HDF5 schema)

When `filename` is provided, the following groups/datasets are written. In‑memory results mirror the same structure (positions are 2D unless a membrane is applied prior to write).

```
/centroid
  position    (C, 3) float   # x,y[,z]; z=0 if no membrane
  id          (C,)   int32
  group_id    (C,)   int32    # chain/grid id
  [poly only]
    group_index (C,) int32    # index within chain
    group_size  (C,) int32
    sides       (C,) int32    # polygon sides
    theta       (C,) float    # monomer orientation
  [grid only]
    theta      (C,) float     # grid rotation angle
    rows, cols (C,) int32
    sep, R     (C,) float

/ emitter
  position    (E, 3) float
  id          (E,)   int32
  centroid_id (E,)   int32
  type        (E,)   string   # "labelled" or "unlabelled"

/ observed
  position    (M, 3) float
  emitter_id  (M,)   int32
  centroid_id (M,)   int32

/ clutter
  position    (K, 3) float
  emitter_id  (K,)   int32    # always -1
  type        (K,)   string   # "clutter"

/ edges  [poly, optional]
  (P, 2) int32   # emitter id pairs per monomer if --store-edges
```

---

## Reproducibility and seeding

- `simulate_poly`, `simulate_grid`, and the centroid generators accept `seed` and reseed NumPy’s RNG inside the function. Two calls with the same arguments and the same `seed` produce identical results.
- For tests that need deterministic outputs, pass a `seed` to the call. Global RNG state in the calling code remains predictable across separate runs.

---

## Testing

Run everything:

```bash
pytest -q
```

Common issues:
- String arrays (e.g., `"labelled"`) should be compared with exact equality, not numeric closeness.
- When running tests from an IDE, point the working directory to the repository root or ensure `pip install -e .` has been done in the active environment.

---

## Troubleshooting

- **Global flags after subcommand**: argparse won’t see them. Put flags like `--plot` **before** `poly`/`grid`.
- **Too few centroids placed**: widen the field of view, reduce the hard‑core scale (smaller `--radius` for poly or `--sep` for grid), or reduce `--centroid-mean`.
- **Editable install**: `pip install -e .`. Running `pip install -e` without a path is an error.

---

## License

MIT License — include a plain‑text `LICENCE` file at the repository root. The `pyproject.toml` already declares the license.

---

## A note on scope

This is a simulator for **shape‑aware** point patterns under a simple measurement model. It is not a physical fluorophore simulator, a photon budget model, or a microscope aberration engine. The goal is fast, controllable synthetic data for algorithm development and validation.

Future work shelved for future consideration includes time-series like spawned emitters to replicate STORM data, and a manual input scheme that accepts coordinates to build the structures. Presets for common structures appearing across SMLM are in the works, such as the 3-dimensional, double layer 8-fold symmetry of Nup96-SNAP.
