Metadata-Version: 2.4
Name: emic
Version: 0.5.0
Summary: Epsilon Machine Inference & Characterization - A framework for computational mechanics
Project-URL: Homepage, https://github.com/johnazariah/emic
Project-URL: Documentation, https://johnazariah.github.io/emic/
Project-URL: Repository, https://github.com/johnazariah/emic
Project-URL: Issues, https://github.com/johnazariah/emic/issues
Author-email: John Azariah <john.azariah@student.uts.edu.au>
License-Expression: MIT
License-File: LICENSE
Keywords: causal-states,complexity,computational-mechanics,emergence,epsilon-machine,stochastic-processes
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: graphviz>=0.21
Requires-Dist: numpy>=2.4.1
Requires-Dist: pandas>=3.0.0
Requires-Dist: scipy>=1.17.0
Provides-Extra: all
Requires-Dist: graphviz>=0.20; extra == 'all'
Requires-Dist: mkdocs-material>=9.0; extra == 'all'
Requires-Dist: mkdocs>=1.5; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'all'
Requires-Dist: pyarrow>=14.0; extra == 'all'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: experiments
Requires-Dist: pyarrow>=14.0; extra == 'experiments'
Provides-Extra: viz
Requires-Dist: graphviz>=0.20; extra == 'viz'
Description-Content-Type: text/markdown

# emic

[![CI](https://github.com/johnazariah/emic/actions/workflows/ci.yml/badge.svg)](https://github.com/johnazariah/emic/actions/workflows/ci.yml)
[![Docs](https://github.com/johnazariah/emic/actions/workflows/docs.yml/badge.svg)](https://johnazariah.github.io/emic/)
[![Coverage](https://img.shields.io/badge/coverage-90%25-brightgreen)](https://github.com/johnazariah/emic)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)

**E**psilon **M**achine **I**nference & **C**haracterization

A Python framework for constructing and analyzing epsilon-machines based on computational mechanics.

📚 **[Documentation](https://johnazariah.github.io/emic/)** | 🚀 **[Getting Started](https://johnazariah.github.io/emic/getting-started/)**

## What is an Epsilon-Machine?

An **epsilon-machine** (ε-machine) is the minimal, optimal predictor of a stochastic process. Introduced by James Crutchfield and collaborators, ε-machines capture the intrinsic computational structure hidden in sequential data.

Key concepts:
- **Causal states**: Equivalence classes of histories that yield identical predictions
- **Statistical complexity** (Cμ): The entropy of the causal state distribution — a measure of structural complexity
- **Entropy rate** (hμ): The irreducible randomness in the process

ε-machines reveal the *emic* structure of a process — the computational organization that exists within the system itself, not imposed from outside.

## Features

- 🔮 **Inference**: Reconstruct ε-machines using multiple algorithms (CSSR, CSM, BSI, Spectral, NSD)
- 📊 **Analysis**: Compute complexity measures (Cμ, hμ, excess entropy, crypticity)
- 🎲 **Sources**: Built-in stochastic process generators (Golden Mean, Even Process, Biased Coin, Periodic)
- 🔗 **Pipeline**: Composable `>>` operator for source → inference → analysis workflows
- 🧪 **Experiments**: CLI and framework for reproducible algorithm benchmarking
- 📈 **Visualization**: State diagram rendering with Graphviz
- 📝 **Export**: LaTeX tables, TikZ diagrams, DOT, Mermaid, and JSON formats
- 🧩 **Extensible**: Protocol-based architecture for custom algorithms and sources

## Installation

```bash
pip install emic
```

Or install from source with [uv](https://github.com/astral-sh/uv):

```bash
git clone https://github.com/johnazariah/emic.git
cd emic
uv sync --dev
```

## Quick Start

```python
from emic.sources import GoldenMeanSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze

# Generate data from the Golden Mean process (no consecutive 1s)
source = GoldenMeanSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)

# Infer the epsilon-machine using CSSR
config = CSSRConfig(max_history=5, significance=0.001)
result = CSSR(config).infer(data)

# Analyze the inferred machine
summary = analyze(result.machine)
print(f"States: {len(result.machine.states)}")
print(f"Statistical Complexity: Cμ = {summary.statistical_complexity:.4f}")
print(f"Entropy Rate: hμ = {summary.entropy_rate:.4f}")
```

### Pipeline Composition

Chain operations using the `>>` operator:

```python
from emic.sources import GoldenMeanSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze

# Compose source and transforms
source = GoldenMeanSource(p=0.5, _seed=42)
data = source >> TakeN(10_000)

# Infer and analyze
config = CSSRConfig(max_history=5, significance=0.001)
result = CSSR(config).infer(data)
summary = analyze(result.machine)

print(summary)
```

## Built-in Sources

| Process | Description | True States |
|---------|-------------|-------------|
| **Golden Mean** | No consecutive 1s allowed | 2 |
| **Even Process** | Even number of 1s between 0s | 2 |
| **Biased Coin** | i.i.d. Bernoulli process | 1 |
| **Periodic** | Deterministic repeating pattern | n (period length) |

## Experiments

Run reproducible experiments to evaluate algorithm performance:

```bash
# Run all experiments with parallel execution
emic-experiment --all --parallel 4

# Quick mode for development
emic-experiment --quick

# List available experiments
emic-experiment --list
```

### Algorithm Accuracy (January 2026)

| Algorithm | State Count Accuracy | Cμ Error |
|-----------|---------------------|----------|
| **Spectral** | 85% (100% at N≥10K) | 0.15 |
| **CSSR** | 82% | **0.05** |
| NSD | 73% | 0.12 |
| CSM | 39% | 0.10 |
| BSI | 32% | 0.53 |

See the [Experiments Guide](https://johnazariah.github.io/emic/guide/experiments/) for full details.

## Project Status

✅ **Core implementation complete** — The framework is functional with:
- Multiple inference algorithms: CSSR, CSM, BSI, Spectral, NSD
- Full analysis suite (Cμ, hμ, excess entropy, crypticity)
- Synthetic and empirical data sources
- Pipeline composition
- 194 tests with 90% coverage

📚 **[Full documentation available](https://johnazariah.github.io/emic/)**

## Etymology

The name **emic** works on multiple levels:

1. **Acronym**: **E**psilon **M**achine **I**nference & **C**haracterization
2. **Linguistic**: In linguistics/anthropology, *emic* refers to analysis from within the system — understanding structure on its own terms. This resonates with computational mechanics: ε-machines reveal the intrinsic structure of a process.
3. **Phonetic**: Pronounced "EE-mik" or "EH-mic" — a nod to "ε-machine"

## References

- Crutchfield, J.P. (1994). ["The Calculus of Emergence: Computation, Dynamics, and Induction"](https://doi.org/10.1016/0167-2789(94)90273-9). *Physica D*.
- Shalizi, C.R. & Crutchfield, J.P. (2001). ["Computational Mechanics: Pattern and Prediction, Structure and Simplicity"](https://arxiv.org/abs/cond-mat/9907176). *Journal of Statistical Physics*.
- Crutchfield, J.P. & Young, K. (1989). "Inferring Statistical Complexity". *Physical Review Letters*.

## Contributing

Contributions are welcome! See the [Contributing Guide](https://johnazariah.github.io/emic/contributing/) for details.

## License

MIT License — see [LICENSE](LICENSE) for details.

## Author

John Azariah ([@johnazariah](https://github.com/johnazariah))
