Metadata-Version: 2.4
Name: digest2
Version: 2.7.0
Summary: NEO orbit classification from short-arc astrometric tracklets
Author: Minor Planet Center
License: Public Domain
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: Public Domain
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: C
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Astronomy
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: lxml
Provides-Extra: filters
Requires-Dist: pandas; extra == "filters"
Provides-Extra: truth
Requires-Dist: matplotlib; extra == "truth"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: numpy; extra == "test"
Requires-Dist: lxml; extra == "test"
Requires-Dist: matplotlib; extra == "test"
Provides-Extra: all
Requires-Dist: pandas; extra == "all"
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: pytest; extra == "all"
Requires-Dist: numpy; extra == "all"
Requires-Dist: lxml; extra == "all"

# digest2

NEO orbit classification from short-arc astrometric tracklets.
Available as both a C command-line tool and a pip-installable Python package.

digest2 is a fast short-arc orbit classifier for minor planets, primarily used to identify Near-Earth Object (NEO) candidates from astrometric tracklets (groups of 2+ observations of the same object over a short time). It has been in operational use at the Minor Planet Center for over 15 years and is a key component of the NEO discovery workflow.

Given a set of observations, digest2 outputs a score (0-100) for each of 14 orbit classes, representing the pseudo-probability that the object belongs to that class. Objects scoring D2 >= 65 for the NEO class are posted to the NEO Confirmation Page (NEOCP) for follow-up.

**References:**
- Keys et al. 2019, "The digest2 NEO Classification Code" (PASP 131, 064501) -- [arXiv:1904.09188](https://arxiv.org/abs/1904.09188)
- Shober, Cloete, Veres 2023, "Improvement of digest2 NEO Classification Code" -- [arXiv:2309.16407](https://arxiv.org/abs/2309.16407)

**Authors:** Sonia Keys (original), with contributions from Carl Hergenrother, Robert McNaught, David Asher. ADES support added by Richard Cloete and Peter Veres.

**License:** Public domain.

---

## Python Package (recommended)

```bash
pip install digest2
```

```python
from digest2 import Digest2

with Digest2() as d2:
    results = d2.classify_file("observations.obs")
    for r in results:
        print(r.designation, r.noid.NEO, r.noid.MB1)
```

The `digest2` Python package wraps a C scoring engine. 
Pre-built wheels are available on PyPI for common platforms 
(Linux, macOS, Windows), so most users get a ready-to-use binary with `pip install digest2`. 
If no wheel matches your platform or Python version, 
pip will automatically compile the C code from the source distribution - 
this requires a local C compiler (e.g., gcc or clang) but no external C libraries such as libxml2.

### Python Tutorials

There are demonstration notebooks in 
[digest2/notebooks/](https://github.com/Smithsonian/mpc-public/tree/main/digest2/notebooks) that show how to use the Python API to classify tracklets, 
parse input files, and apply NEOCP filters. 
These are a great starting point for new users.

### Python API

- **`Digest2(model_path=None, config_path=None, obscodes_path=None, repeatable=True, no_threshold=False)`** -- Stateful classifier; auto-discovers bundled model data. Set `no_threshold=True` to disable per-observation RMS ceiling clamping.
- **`d2.classify_tracklet(observations)`** -- Classify a list of `Observation` objects. Returns `ClassificationResult`.
- **`d2.classify_file(filepath)`** -- Classify all tracklets in an MPC 80-col or ADES XML file. Returns `List[ClassificationResult]`.
- **`classify(input, ...)`** -- One-shot convenience function.
- **`parse_mpc80(line)`** / **`parse_mpc80_file(path)`** -- Parse MPC 80-column observations.
- **`parse_ades_xml(path)`** -- Parse ADES XML observations.
- **`digest2.filters`** -- NEOCP threshold filtering tools (requires `pip install digest2[filters]`).

All classification methods accept `collect_orbits=True` to return individual trial orbit elements alongside scores (see Trial Orbit Collection below).

### Development Setup

```bash
cd digest2
pip install -e '.[test]'

# Download observatory codes (one-time)
curl -o digest2/digest2.obscodes https://minorplanetcenter.net/iau/lists/ObsCodes.html

# Run tests
pytest tests/ -v
```

### Releasing to PyPI

Three GitHub Actions workflows handle CI and publishing for the Python package:

1. **Tests** (`digest2-python-test.yml`) -- Runs `pytest` on every pull request and on pushes to `main` that modify `digest2/` files.

2. **TestPyPI** (`digest2-python-testpypi.yml`) -- On every push to a PR branch that modifies `digest2/`, wheels and an sdist are built for Linux, macOS, and Windows, version-suffixed with `.devN`, and published to [TestPyPI](https://test.pypi.org/project/digest2/). Reviewers can install a pre-release build with:
   ```bash
   pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ digest2
   ```

3. **PyPI release** (`digest2-python-release.yml`) -- Triggered when a GitHub Release is published with a tag matching `digest2-v*`. Builds production wheels and sdist, then publishes to [PyPI](https://pypi.org/project/digest2/).

**To release a new version:**

1. Update the version in `pyproject.toml` (e.g., `version = "2.2.0"`).
2. Open a PR with the version bump (and any other changes) and merge it into `main`.
3. Go to the repository's [Releases page](https://github.com/Smithsonian/mpc-public/releases) and create a new release:
   - **Tag:** `digest2-v2.2.0` (must start with `digest2-v`)
   - **Target:** `main`
   - **Title/notes:** Describe what changed in this release.
4. Publishing the release automatically triggers the workflow, which builds wheels for all platforms and uploads them to PyPI.

### Python Package Structure

```
src/digest2/
├── __init__.py           # Package init, exports Digest2, classify, Scores, ClassificationResult, TrialOrbit
├── _extension.c          # CPython C extension (score + score_orbits bindings to d2lib)
├── core.py               # High-level API: Digest2 class, classify() function, collect_orbits support
├── result.py             # ClassificationResult, Scores, TrialOrbit dataclasses
├── observation.py        # Observation dataclass, MPC80/ADES parsers
├── model.py              # Model/obscodes/config path resolution
├── filters.py            # NEOCP filter tools (from NEOCP_filters/)
└── data/
    └── MPC.config        # Bundled per-site observatory errors
```

### C Library API (d2lib)

The files `d2lib.h` / `d2lib.c` provide a clean, non-threaded library interface used by the Python extension:

- `d2_init(model_csv, obscodes)` -- Load model and observatory data
- `d2_score_observations(obs, n, classes, n_classes, is_ades)` -- Score a tracklet
- `d2_score_observations_ext(obs, n, classes, n_classes, is_ades)` -- Score a tracklet and collect trial orbit elements (returns `d2_result_ext`)
- `d2_free_result_ext(result)` -- Free memory from extended result
- `d2_cleanup()` -- Release resources
- `d2_configure()` -- Set observatory errors, repeatable mode, noThreshold flag
- No libxml2 or pthreads dependency (XML parsing done in Python)

### Key Design Decisions

- **Existing CLI is unchanged.** `make` in `digest2/digest2/` still builds the same binary.
- **Preprocessor guards** (`D2_NO_LIBXML`, `D2_NO_REGEX`) allow building without libxml2/regex.
- **Global state is managed** by `d2_init()`/`d2_cleanup()` -- single-threaded library mode.
- **Band correction** matches C code exactly (common.c `updateMagnitude`).
- **MJD calculation** uses C-style truncation-toward-zero integer division.

### Trial Orbit Collection

The `collect_orbits=True` parameter exposes the individual trial orbits generated by the statistical ranging engine. Normally these orbits are immediately discarded after binning; this feature captures them for analysis.

> **Memory warning:** `collect_orbits=True` can use substantial RAM because it stores every collected trial orbit.
> In typical runs this can be hundreds of MB per tracklet, and for difficult/large solution spaces can approach ~1 GB.
> We recommend to keep `collect_orbits=False` (default) for production scoring unless you explicitly need orbit-level diagnostics.

**Python usage:**

```python
from digest2 import Digest2

with Digest2() as d2:
    results = d2.classify_file("observations.obs", collect_orbits=True)
    r = results[0]
    print(f"Orbits collected: {len(r.trial_orbits)}")
    elems = r.orbit_elements  # dict of numpy arrays
    print(f"q range: {elems['q'].min():.2f} - {elems['q'].max():.2f}")
```

**Data returned per trial orbit (`TrialOrbit` dataclass):**

| Field | Type | Description |
|-------|------|-------------|
| `q` | float | Perihelion distance (AU) |
| `e` | float | Eccentricity |
| `i` | float | Inclination (degrees) |
| `H` | float | Absolute magnitude |
| `d` | float | Geocentric distance used to generate the orbit (AU) |
| `an` | float | Angle parameter in orbit solution (radians) |
| `iq` | int | q bin index (0..28) |
| `ie` | int | e bin index (0..7) |
| `ii` | int | i bin index (0..10) |
| `ih` | int | H bin index (0..17) |
| `new_tag` | bool | True if this orbit tagged a previously unvisited bin |

**`ClassificationResult.orbit_elements`** property returns a dict of numpy arrays keyed by field name, for convenient vectorized analysis.

**Design notes:**
- **Opt-in via `collect_orbits=False` default** -- zero performance/memory impact when not used (NULL pointer check in C)
- **All valid orbits collected** (those passing stability tests and within model bounds), not just those tagging new bins
- **Scores are deterministic** -- `score()` and `score_orbits()` produce identical scores for the same input
- **Score reconstruction** -- digest2 scores can be exactly reconstructed from trial orbits + population model data, since scores are based on which (q,e,i,H) bins are reachable, weighted by model populations. See the trial orbits notebook for a worked example.

**C implementation:**
- `d2_trial_orbit` struct (11 fields, ~72 bytes/orbit) defined in `d2lib.h`
- `d2_orbit_buffer` (dynamic array with geometric growth, initial capacity 1024) attached to `tracklet.orbit_buf`
- Orbit capture happens in `d2math.c:tagAngle()` after `qeiToBin()` succeeds -- guarded by `if (tk->orbit_buf)` so the CLI path is unaffected
- `d2_score_observations_ext()` in `d2lib.c` allocates the buffer, calls `score(tk)`, and transfers ownership to the result
- `_extension.c:py_score_orbits()` converts to Python tuples and is registered as `score_orbits` in the module

**Files involved:**

| File | Role |
|------|------|
| `digest2/d2lib.h` | `d2_trial_orbit`, `d2_orbit_buffer`, `d2_result_ext` types |
| `digest2/digest2.h` | Forward-declares `d2_orbit_buffer`; `orbit_buf` pointer in `tracklet` struct |
| `digest2/d2math.c` | ~20 lines in `tagAngle()` to capture orbits when buffer is active |
| `digest2/d2lib.c` | `d2_score_observations_ext()` and `d2_free_result_ext()` |
| `src/digest2/_extension.c` | `py_score_orbits()` C-to-Python bridge |
| `src/digest2/result.py` | `TrialOrbit` dataclass; `trial_orbits` field and `orbit_elements` property on `ClassificationResult` |
| `src/digest2/core.py` | `collect_orbits` parameter threaded through all API methods |
| `src/digest2/__init__.py` | Exports `TrialOrbit` |

**Tutorial notebook:** `notebooks/mpc_tutorial_digest2_trial_orbits.ipynb` demonstrates orbit collection, element distributions, correlations, bin occupancy analysis, and exact score reconstruction from trial orbits.

---

## C Command-Line Tool

### Building from Source

Requirements:
- C99-capable C compiler (e.g., `gcc` or `clang`)
- `make`
- `libxml2` development headers (`libxml2-dev` or `libxml2-devel`)
- Optional: internet connection to download latest observatory codes from the MPC website.

```bash
cd digest2/digest2
make                          # Produces the `digest2` executable
```

Ensure the runtime data files are alongside the executable:
- `digest2.model` (or `digest2.model.csv`), and `MPC.config` (found under `digest2/population/`; updated versions can be copied in when available).

Platform notes:
- The provided `Makefile` is minimal and may require small tweaks for non-Linux platforms (see `digest2/digest2/BUILDING.md` for details).
- **libxml2** is required for ADES XML parsing. Install the development package for your platform:
  - Debian/Ubuntu: `sudo apt-get install libxml2 libxml2-dev`
  - RHEL/CentOS/Fedora: `sudo dnf install libxml2 libxml2-devel`
  - macOS: `brew install libxml2` (ensure headers are on your include path via `pkg-config`)

### Usage

```bash
# MPC 80-column format
./digest2 sample.obs

# ADES XML format
./digest2 sample.xml

# With config file (per-observatory errors)
./digest2 -c MPC.config sample.obs

# From stdin
cat sample.obs | ./digest2 -

# Generate binary model from CSV (faster subsequent loads)
./digest2 -m digest2.model
```

Run `./digest2 sample.obs` to verify the build. Expected output (small differences may occur):
```
Desig.    RMS Int NEO N22 N18 Other Possibilities
K16S99K  0.73   0   2   1   0 (MC 2) (MB1 93) (MB2 3) (JFC <1)
```

Notes:
- If you have an internet connection, `digest2` will download the latest observatory parallax data from the Minor Planet Center. Otherwise, download `ObsCodes.html` and place it in the current directory as `digest2.obscodes`.
- To execute from a different directory, use the `-p` option: `./digest2 some.obs -p '/path/to/data'`.
- Run `./digest2 --help` for full command-line help.
- More detailed options can be found in [digest2/OPERATION.md](digest2/OPERATION.md).

### Input Formats

1. **MPC 80-column** (`.obs`): Fixed-width format with packed designation, date, RA/Dec, magnitude, observatory code
2. **ADES XML** (`.xml`): Rich format with per-observation uncertainties (rmsRA, rmsDec), roving/satellite observer support

### Config File Keywords

| Keyword | Description |
|---------|-------------|
| `headings`/`noheadings` | Toggle column headers in output |
| `rms` | Output great-circle RMS of tracklet |
| `rmsPrime` | Output RMS from ADES-provided uncertainties |
| `raw` | Output raw population scores |
| `noid` | Output no-ID scores (default) |
| `repeatable`/`random` | Deterministic vs stochastic Monte Carlo seeding |
| `obserr` | Default observational error (arcsec), default 1.0 |
| `obserrXXX` | Per-observatory error (e.g., `obserrF51=0.3`) |
| `poss` | Show "Other Possibilities" column |
| `noThreshold` | Accept ADES uncertainties without floor/ceiling clamping |

---

## Algorithm Overview

1. **Motion vector & photometry:** Computes a motion vector from the first and last observation, and a composite V magnitude from available photometry (default 21 if none).

2. **Statistical ranging:** Generates many trial orbits consistent with the observed motion, each with an absolute magnitude (H) consistent with the apparent brightness.

3. **Histogram binning:** Each trial orbit is located in a 4D binned model of the Solar System (dimensions: q, e, i, H). Bins are tagged as "reachable" for each orbit class.

4. **Iterative search:** Orbits are generated to explore the full solution space, terminating when diminishing returns in discovering new bins.

5. **Population scoring (raw):** The population of tagged bins for a given class, divided by the total population of all tagged bins, gives the raw score (x100).

6. **No-ID scoring (noid):** Same as raw, but uses the estimated *undiscovered* population (total minus known objects with sky uncertainty < 1 arcmin). This is the default score and represents the probability that an *unidentified* tracklet belongs to a class.

### Model Dimensions

The population histogram is binned in 4 dimensions:
- **q** (perihelion): 29 bins
- **e** (eccentricity): 8 bins
- **i** (inclination): 11 bins
- **H** (absolute magnitude): 18 bins

Total: 29 x 8 x 11 x 18 = 45,936 bins per class, times 15 classes.

### Orbit Classes (14 + "MPC Interest")

| Abbr | Description |
|------|-------------|
| Int  | MPC Interest (q<1.3 OR e>0.5 OR i>=40 OR Q>10) |
| NEO  | Near-Earth Object (q < 1.3 AU) |
| N22  | NEO with H <= 22 |
| N18  | NEO with H <= 18 |
| MC   | Mars Crosser |
| Hun  | Hungaria group |
| Pho  | Phocaea group |
| MB1  | Inner Main Belt |
| Pal  | Pallas group |
| Han  | Hansa group |
| MB2  | Middle Main Belt |
| MB3  | Outer Main Belt |
| Hil  | Hilda group |
| JTr  | Jupiter Trojan |
| JFC  | Jupiter Family Comet |

---

## NEOCP Filters

Post-processing tools that apply threshold-based filtering to digest2 output to separate likely NEOs from non-NEOs, using the methods documented in [Veres et al. (2025)](https://arxiv.org/abs/2505.11910).

**Requirements:** Python 3.6+, pandas (`pip install pandas` or `pip install digest2[filters]`)

**Tools:**
- `find_filter.py` -- Trains optimal per-class thresholds from labeled NEOCP data. Reads a digest2 CSV and produces `optimal_thresholds.json`.
- `neocp_filter.py` -- Applies thresholds to new digest2 output to flag likely non-NEOs for removal from NEOCP.

**Sample data:**
- `digest_data_19-24.csv` -- Training data (2019-2023 NEOCP)
- `digest_data_24.csv` -- Test data (2024 NEOCP)
- `optimal_thresholds.json` -- Pre-computed thresholds

### End-to-End Example

1. Create a config file `digest2.conf` in the directory where the `digest2` binary lives:

    ```
    repeatable
    norms
    raw
    noid
    Int
    NEO
    MC
    Hun
    Pho
    MB1
    Pal
    Han
    MB2
    MB3
    Hil
    JTr
    JFC
    ```

2. Run `digest2` on an ADES XML file and save the output:

    ```bash
    ./digest2 sample.xml -c digest2.conf > sample.digest2
    ```

3. Convert the output to the CSV format expected by the filter tools:

    ```python
    with open("sample.digest2", "r") as infile:
        lines = infile.readlines()[2:]  # skip header lines

    output_lines = []
    for line in lines:
        fields = line.strip().split()
        fields.append("0")  # Add class column (placeholder)
        output_lines.append(",".join(fields))

    with open("sample.digest2.csv", "w") as outfile:
        outfile.write("trksub,Int1,Int2,Neo1,Neo2,MC1,MC2,Hun1,Hun2,Pho1,Pho2,"
                      "MB1_1,MB1_2,Pal1,Pal2,Han1,Han2,MB2_1,MB2_2,MB3_1,MB3_2,"
                      "Hil1,Hil2,JTr1,JTr2,JFC1,JFC2,class\n")
        for line in output_lines:
            outfile.write(line + "\n")
    ```

4. Apply the filter:

    ```bash
    python NEOCP_filters/neocp_filter.py sample.digest2.csv NEOCP_filters/optimal_thresholds.json
    ```

    If the input contains non-NEO tracklets, `neocp_filter.py` will output those flagged as likely non-NEOs. NEO tracklets produce no output.

---

## Directory Structure

```
digest2/
├── digest2/                 # C source code and build system
│   ├── Makefile             # Build: `make` produces `digest2` executable
│   ├── digest2.c            # Main program: CLI, threading, I/O orchestration
│   ├── d2math.c             # Core algorithm: orbit generation, scoring, statistical ranging
│   ├── d2cli.c              # Command-line parsing and config file reading
│   ├── d2model.c            # Population model: orbit class definitions, bin partitions
│   ├── d2modelio.c          # Model I/O: read CSV, read/write binary model
│   ├── d2mpc.c              # MPC 80-column format parser, observatory code loading
│   ├── d2ades.c             # ADES XML format parser (uses libxml2)
│   ├── common.c             # Shared utilities (obscode parsing, magnitude conversion)
│   ├── digest2.h            # Main header: observation, tracklet, perClass structs
│   ├── d2model.h            # Model dimensions: QX=29, EX=8, IX=11, HX=18; 15 classes
│   ├── d2ades.h             # ADES optical observation struct
│   ├── common.h             # Shared function declarations
│   ├── ALGORITHM.md         # Algorithm description
│   ├── BUILDING.md          # Build instructions
│   ├── OPERATION.md         # Usage, config file format, orbit class list
│   ├── sample.obs           # Sample MPC 80-column input
│   ├── sample.xml           # Sample ADES XML input
│   └── MPC.config -> ../population/MPC.config
│
├── population/              # Solar System population model data
│   ├── digest2.model.csv    # 4D histogram of SS populations (q, e, i, H bins)
│   ├── MPC.config           # Config with per-observatory observational errors
│   ├── README.md
│   └── make_population/     # Tools to regenerate model from SSM + astorb.dat
│       ├── s3mbin.c         # Processes Pan-STARRS Synthetic Solar System Model
│       ├── muk.c            # Combines s3m.dat + astorb.dat -> digest2.model.csv
│       └── README.md
│
├── NEOCP_filters/           # Post-processing filters for NEO/non-NEO classification
│   ├── find_filter.py       # Derives optimal thresholds from labeled digest2 output
│   ├── neocp_filter.py      # Applies thresholds to flag likely non-NEOs
│   ├── optimal_thresholds.json  # Pre-computed thresholds
│   ├── digest_data_19-24.csv    # Training data (2019-2023 NEOCP)
│   ├── digest_data_24.csv       # Test data (2024 NEOCP)
│   ├── MPC.config
│   └── README.md
│
├── src/digest2/             # Python package source
├── tests/                   # Python package tests
├── pyproject.toml           # Python package build configuration
└── README.md                # This file
```

## Key Data Structures (C)

- **`observation`** -- Single astrometric observation: MJD, RA/Dec (radians), V magnitude, site index, rmsRA/rmsDec
- **`tracklet`** -- Group of observations of one object: designation, observation list, motion vector, scoring arrays, per-class results. Contains `orbit_buf` pointer (NULL unless collecting trial orbits)
- **`perClass`** -- Per-class scoring data: raw/noID scores, tagged bin arrays, population sums
- **`site`** -- Observatory parallax constants and observational error
- **`d2_trial_orbit`** -- Single trial orbit: q, e, i, H, geocentric distance, angle, bin indices, new_tag flag (defined in `d2lib.h`)
- **`d2_orbit_buffer`** -- Dynamic array of `d2_trial_orbit` with geometric growth (defined in `d2lib.h`)

## Dependencies

- **C CLI:** C99 compiler, `libxml2` (for ADES XML parsing), pthreads, math library
- **Python package:** Python >= 3.8, C99 compiler for building from source (no libxml2 or pthreads needed)
- **Filters (optional):** pandas
- **Runtime data:** `digest2.model.csv`, `MPC.config`, `digest2.obscodes` (auto-downloaded from MPC)
