Metadata-Version: 2.4
Name: pycatalyst
Version: 0.0.13
Summary: A schema-aware data generation and testing platform for Python
Author-email: StatFYI <contact@statfyi.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/optophi/pycatalyst
Project-URL: Documentation, https://optophi.github.io/pycatalyst/
Project-URL: Repository, https://github.com/optophi/pycatalyst
Project-URL: Issues, https://github.com/optophi/pycatalyst/issues
Project-URL: Changelog, https://github.com/optophi/pycatalyst/blob/main/CHANGELOG.md
Keywords: data-generation,testing,synthetic-data,schema,fake-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: exrex>=0.11.0
Provides-Extra: inference
Requires-Dist: pandas>=2.0.0; extra == "inference"
Provides-Extra: faker
Requires-Dist: faker>=20.0.0; extra == "faker"
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0.0; extra == "parquet"
Provides-Extra: http
Requires-Dist: httpx>=0.24.0; extra == "http"
Provides-Extra: kafka
Requires-Dist: confluent-kafka>=2.3.0; extra == "kafka"
Provides-Extra: rabbitmq
Requires-Dist: pika>=1.3.0; extra == "rabbitmq"
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: pydantic>=2.0.0; extra == "api"
Requires-Dist: pydantic-settings>=2.0.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: PyJWT>=2.8.0; extra == "api"
Requires-Dist: httpx>=0.24.0; extra == "api"
Requires-Dist: alembic>=1.13.0; extra == "api"
Requires-Dist: sqlalchemy>=2.0.0; extra == "api"
Provides-Extra: worker
Requires-Dist: alembic>=1.13.0; extra == "worker"
Requires-Dist: sqlalchemy>=2.0.0; extra == "worker"
Provides-Extra: ui
Requires-Dist: fastapi>=0.104.0; extra == "ui"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "ui"
Requires-Dist: httpx>=0.24.0; extra == "ui"
Requires-Dist: aiofiles>=23.0.0; extra == "ui"
Provides-Extra: workbench
Requires-Dist: pycatalyst[api,sandbox]; extra == "workbench"
Requires-Dist: ipython>=8.0.0; extra == "workbench"
Provides-Extra: sandbox
Requires-Dist: docker>=7.0.0; extra == "sandbox"
Provides-Extra: ml-core
Requires-Dist: torch>=2.0.0; extra == "ml-core"
Requires-Dist: numpy>=1.24.0; extra == "ml-core"
Requires-Dist: pandas>=2.0.0; extra == "ml-core"
Requires-Dist: matplotlib>=3.7.0; extra == "ml-core"
Requires-Dist: pydantic>=2.0.0; extra == "ml-core"
Requires-Dist: joblib>=1.3.0; extra == "ml-core"
Provides-Extra: ml-sklearn
Requires-Dist: pycatalyst[ml-core]; extra == "ml-sklearn"
Requires-Dist: scikit-learn>=1.3.0; extra == "ml-sklearn"
Provides-Extra: ml-llm
Requires-Dist: pycatalyst[ml-core]; extra == "ml-llm"
Requires-Dist: transformers>=4.36.0; extra == "ml-llm"
Requires-Dist: peft>=0.7.0; extra == "ml-llm"
Requires-Dist: datasets>=2.16.0; extra == "ml-llm"
Requires-Dist: accelerate>=0.25.0; extra == "ml-llm"
Provides-Extra: ml
Requires-Dist: pycatalyst[ml-core,ml-llm,ml-sklearn]; extra == "ml"
Provides-Extra: db
Requires-Dist: alembic>=1.13.0; extra == "db"
Requires-Dist: sqlalchemy>=2.0.0; extra == "db"
Provides-Extra: postgres
Requires-Dist: psycopg2-binary>=2.9.0; extra == "postgres"
Provides-Extra: mongodb
Requires-Dist: pymongo>=4.6.0; extra == "mongodb"
Provides-Extra: s3
Requires-Dist: boto3>=1.28.0; extra == "s3"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Provides-Extra: all
Requires-Dist: pycatalyst[api,db,docs,faker,http,inference,ml,mongodb,parquet,postgres,s3,sandbox,ui,workbench,worker]; extra == "all"
Provides-Extra: dev
Requires-Dist: pycatalyst[all]; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: hypothesis>=6.0.0; extra == "dev"
Requires-Dist: ruff>=0.9.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
Requires-Dist: jupyterlab>=4.0.0; extra == "dev"
Requires-Dist: ipykernel>=6.0.0; extra == "dev"
Dynamic: license-file

# PyCatalyst

Schema-aware **data generation** and testing: recipes (YAML), REST API, optional UI, schema inference, and ML hooks.

## Documentation

- **Published** (GitHub Pages): [optophi.github.io/pycatalyst](https://optophi.github.io/pycatalyst/) — start with **Getting Started → Quick Start**.
- **Local docs** (from repo root):

  ```bash
  pip install "pycatalyst[docs]"
  pycatalyst docs serve
  ```

  Opens at [http://127.0.0.1:5005](http://127.0.0.1:5005) by default. Static site: `pycatalyst docs build` → `site/`.

- **Live API** (when the server is running): [http://127.0.0.1:8005/docs](http://127.0.0.1:8005/docs) (Swagger).

Tutorial index in the doc site: Quick Start, API authentication, generate, recipes, infer schema, workbench.

## Installation

```bash
pip install pycatalyst
```

API server, DB, and common extras:

```bash
pip install "pycatalyst[api,db,inference,faker]"
```

From source (editable):

```bash
git clone https://github.com/optophi/pycatalyst
cd pycatalyst
pip install -e ".[dev]"
```

## Quick Start (CLI)

```bash
pycatalyst api --port 8005
pycatalyst generate --recipe path/to/recipe.yaml -n 10
```

See the **Quick Start** page in the docs for database setup, UI, and REST examples.

## Streaming mock data

**Python** — same sequence as batch generation for a fixed seed:

```python
from pycatalyst import GenerationEngine, SchemaBuilder

engine = GenerationEngine()
schema = SchemaBuilder("events").add_uuid("id").add_string("kind").build()
for row in engine.iter_records(schema, seed=123, limit=1000):
    process(row)
```

**HTTP (Server-Sent Events)** — `POST /api/v1/stream/sse` with the same field list as `/generate`, plus `max_records` and optional `interval_ms`. Use `curl -N` and a Bearer token when auth is enabled:

```bash
curl -N -H "Authorization: Bearer YOUR_JWT" -H "Content-Type: application/json" \
  -d '{"name":"x","fields":[{"name":"n","type":"int"}],"max_records":5,"seed":1}' \
  http://127.0.0.1:8005/api/v1/stream/sse
```

**CLI — NDJSON to stdout** (one JSON object per line):

```bash
pycatalyst stream ndjson --schema my.json -n 500 --seed 1
```

**CLI — Kafka** (`pip install pycatalyst[kafka]`):

```bash
export PYCATALYST_STREAM_KAFKA_BOOTSTRAP=localhost:9092
export PYCATALYST_STREAM_KAFKA_TOPIC=test-topic
pycatalyst stream kafka --schema my.json -n 2000 --batch 100
```

Server caps (optional): `PYCATALYST_STREAM_MAX_RECORDS`, `PYCATALYST_STREAM_MAX_INTERVAL_MS`.

## Development

- **Lint & format:** `ruff check . && ruff format .`
- **Type check:** `mypy src/`
- **Tests:** `pytest`
- **Coverage:** `pytest --cov=pycatalyst --cov-report=term-missing`

## Publishing

1. Bump version in `pyproject.toml` and `CHANGELOG.md`.
2. Create a release tag: `git tag v0.1.0 && git push origin v0.1.0`.
3. The GitHub Action uses [PyPI Trusted Publishing](https://docs.pypi.org/trusted-publishers/); configure the publisher on PyPI for this repo, then the workflow will publish on tag push.

## License

MIT
