Metadata-Version: 2.1
Name: evoseer-utils
Version: 0.2.0
Summary: Shared library for mutation management across modules
Author: benoît de Witte
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: pydantic (>=2.0,<3.0)
Description-Content-Type: text/markdown

# Mutation Library

Shared library for mutation management across modules.

## Components

### `DbConnection` - Singleton DB connection
```python
from libs import DbConnection

DbConnection.set_db_path("mutations.db")
conn = DbConnection.get_connection()
```

### `Mutation` - Pydantic model with DB integration

#### States
- `"full"`: Has both id and (chrom, pos, ref, alt)
- `"miss_id"`: Has coordinates, missing id
- `"miss_attributes"`: Has id, missing coordinates

#### Creation patterns

```python
# With coordinates (lazy load id)
mut = Mutation(chrom=17, pos=7577548, ref="C", alt="T")

# With id (lazy load attributes)
mut = Mutation(id=123)

# With both
mut = Mutation(id=123, chrom=17, pos=7577548, ref="C", alt="T")
```

#### Methods

**Instance methods:**
```python
mut.fetch_id_from_db()           # Get id from coordinates
mut.fetch_attributes_from_db()   # Get coordinates from id
mut.ensure_in_db()              # Create if missing, return id
```

**Class methods (batch):**
```python
Mutation.fetch_ids_from_db_batch(mutations)
Mutation.fetch_attributes_from_db_batch(mutations)
Mutation.ensure_in_db_batch(mutations)
```

## Usage in modules with OutputDescription (fully automatic)

`OutputDescription` is a base class that provides automatic DB insertion for module outputs.

```python
from pydantic import Field
from typing import ClassVar, List
from libs import OutputDescription, DbConnection, Mutation

class MyModuleOutput(OutputDescription):
    table_name: ClassVar[str] = "tool_mymodule"
    db_fields: ClassVar[List[str]] = ["my_score", "my_prediction"]

    my_score: float = Field(..., description="Module score")
    my_prediction: str = Field(..., description="Prediction")

# Setup
DbConnection.set_db_path("mutations.db")

# Single insertion (automatic table creation + mutation insertion)
output = MyModuleOutput(
    mutation=Mutation(chrom=17, pos=7577548, ref="C", alt="T"),
    version="1.0.0",  # Required field (free text)
    my_score=0.85,
    my_prediction="pathogenic"
)
output.insert_to_db()  # Creates table if needed, ensures mutation exists, inserts

# Batch insertion
outputs = [...]
MyModuleOutput.insert_batch_to_db(outputs)
```

**What happens automatically:**
- Table creation with correct SQL types (inferred from Python types)
- Mutation insertion/lookup
- Index creation on mutation_id
- `version` field automatically added to table and insertion
- INSERT OR REPLACE (idempotent)

**Note:** `version` field is required in all OutputDescription subclasses. Format is free text.

## Chromosome encoding

- Autosomes: `1-22`
- X: `23`
- Y: `24`

Helper functions:

```python
from libs.src.mutations import chrom_to_int, int_to_chrom

chrom_to_int("chr17")  # 17
chrom_to_int("chrX")  # 23
int_to_chrom(23)  # "chrX"
```

## Tests

```bash
# From project root
.venv/bin/python3 libs/tests/test_mutations_lib.py

# Or use the test runner
libs/tests/run_tests.sh
```

## Examples

```bash
python3 example_mutations_lib.py
python3 modules/boostdm/output_description_example.py
```

