Metadata-Version: 2.4
Name: pyrevealed
Version: 0.3.1
Summary: Behavioral Signal Analysis for User Understanding - Detect bots, shared accounts, and UI confusion
Project-URL: Homepage, https://github.com/pyrevealed/pyrevealed
Project-URL: Documentation, https://pyrevealed.readthedocs.io
Project-URL: Repository, https://github.com/pyrevealed/pyrevealed
Author: PyRevealed Contributors
License: MIT
Keywords: anomaly-detection,behavioral-analysis,bot-detection,consistency-check,fraud-detection,machine-learning,revealed-preference,user-analytics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: networkx>=3.0
Requires-Dist: numba>=0.58.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Provides-Extra: all
Requires-Dist: jupyter>=1.0.0; extra == 'all'
Requires-Dist: matplotlib>=3.7.0; extra == 'all'
Requires-Dist: mypy>=1.0; extra == 'all'
Requires-Dist: pandas>=2.0.0; extra == 'all'
Requires-Dist: pydata-sphinx-theme>=0.14; extra == 'all'
Requires-Dist: pytest-cov>=4.0; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: seaborn>=0.12.0; extra == 'all'
Requires-Dist: sphinx-autodoc-typehints>=1.25; extra == 'all'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'all'
Requires-Dist: sphinx>=7.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: pydata-sphinx-theme>=0.14; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=1.25; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5; extra == 'docs'
Requires-Dist: sphinx>=7.0; extra == 'docs'
Provides-Extra: notebooks
Requires-Dist: jupyter>=1.0.0; extra == 'notebooks'
Requires-Dist: matplotlib>=3.7.0; extra == 'notebooks'
Requires-Dist: pandas>=2.0.0; extra == 'notebooks'
Requires-Dist: seaborn>=0.12.0; extra == 'notebooks'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Description-Content-Type: text/markdown

# PyRevealed

A Python implementation of revealed preference theory.

> Based on: Chambers, C. P., & Echenique, F. (2016). *Revealed Preference Theory*. Cambridge University Press.

## What is this?

Given a history of user choices and the options available at each choice, PyRevealed computes:

- **Consistency scores**: How internally consistent is this user's behavior? (0 = random, 1 = perfectly consistent)
- **Preference recovery**: If consistent, what utility function explains their choices?
- **Exploitability metrics**: How much could be extracted from a user via arbitrage on their inconsistencies?
- **Feature independence**: Are choices over group A independent of choices over group B?

## Installation

```bash
pip install pyrevealed
```

For visualization support:
```bash
pip install pyrevealed[viz]
```

## Quick Start

```python
from pyrevealed import BehaviorLog, validate_consistency, compute_integrity_score, compute_confusion_metric
import numpy as np

# Create a behavior log from observed choices
log = BehaviorLog(
    cost_vectors=np.array([      # Prices at each observation (T x N)
        [1.0, 2.0],              # Observation 0: price of good A=1, B=2
        [2.0, 1.0],              # Observation 1: price of good A=2, B=1
    ]),
    action_vectors=np.array([    # Quantities chosen (T x N)
        [3.0, 1.0],              # Observation 0: bought 3 of A, 1 of B
        [1.0, 3.0],              # Observation 1: bought 1 of A, 3 of B
    ])
)

# Test consistency (GARP)
is_consistent = validate_consistency(log)
print(f"Consistent: {is_consistent}")

# Compute integrity score (Afriat Efficiency Index)
integrity = compute_integrity_score(log)
print(f"Integrity Score: {integrity:.3f}")

# Compute confusion metric (Money Pump Index)
confusion = compute_confusion_metric(log)
print(f"Confusion Metric: {confusion:.3f}")
```

---

## Available Tests & Scores

### Yes/No Tests

| Method | Question it answers |
|--------|---------------------|
| `validate_consistency(log)` | Is this user rational? (no self-contradicting choices) |
| `validate_consistency_weak(log)` | Any obvious flip-flops? (picked A over B, then B over A) |
| `validate_smooth_preferences(log)` | Smooth preferences? (needed for price sensitivity analysis) |
| `validate_strict_consistency(log)` | Approximately rational? (ignores minor contradictions) |
| `validate_price_preferences(log)` | Does user prefer situations where their items are cheaper? |

### Scores (0 to 1)

| Method | What it measures |
|--------|------------------|
| `compute_integrity_score(log)` | How consistent is this user? (higher = more consistent) |
| `compute_confusion_metric(log)` | How exploitable via pricing tricks? (lower = safer) |
| `compute_minimal_outlier_fraction(log)` | Fraction of observations to remove for consistency |
| `compute_test_power(log)` | Statistical power of consistency test |

### Preference Structure

| Method | Question it answers |
|--------|---------------------|
| `validate_proportional_scaling(log)` | Do they buy the same mix regardless of budget size? |
| `test_income_invariance(log)` | Does budget size affect what they choose? |
| `test_feature_independence(log, [a], [b])` | Are choices in group A separate from group B? |
| `test_cross_price_effect(log, item1, item2)` | Are these items substitutes or complements? |
| `transform_to_characteristics(log, A)` | Analyze by attributes (nutrition, specs) not products |

---

## Case Study

See **[DUNNHUMBY.md](DUNNHUMBY.md)** for a real-world validation on 2,222 households from the Dunnhumby grocery dataset.

Key findings: 4.5% fully consistent, mean integrity 0.839, test power 0.845.

---

## Project Structure

```
pyrevealed/
├── src/pyrevealed/
│   ├── auditor.py       # BehavioralAuditor class
│   ├── encoder.py       # PreferenceEncoder class
│   ├── lancaster.py     # Lancaster characteristics model
│   ├── algorithms/      # Core algorithms
│   ├── core/            # Data containers
│   ├── graph/           # Graph algorithms
│   └── viz/             # Visualization
├── tests/               # Unit tests
├── dunnhumby/           # Real-world validation suite
│   ├── run_all.py       # Main test runner
│   ├── extended_analysis.py  # Statistical analyses
│   ├── comprehensive_analysis.py  # MPI, WARP, separability
│   ├── advanced_analysis.py  # Complementarity, stress tests
│   ├── encoder_analysis.py  # Auto-discovery, Houtman-Maks
│   ├── predictive_analysis.py  # Split-sample LightGBM
│   ├── lancaster_analysis.py  # Lancaster characteristics model
│   └── data/            # Kaggle dataset (download required)
├── docs/images/         # README visualizations
├── notebooks/           # Tutorials
└── examples/            # Advanced usage examples
```

## License

MIT
