Metadata-Version: 2.4
Name: sedpack
Version: 0.1.4
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Framework :: Jupyter
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: aiofiles
Requires-Dist: asyncstdlib
Requires-Dist: flatbuffers
Requires-Dist: lz4
Requires-Dist: numpy
Requires-Dist: pydantic
Requires-Dist: semver
Requires-Dist: tenacity
Requires-Dist: tensorflow
Requires-Dist: tqdm
Requires-Dist: xxhash
Requires-Dist: zstandard
Requires-Dist: maturin[patchelf,zig] ; platform_system != 'Windows' and extra == 'dev'
Requires-Dist: maturin[zig] ; platform_system == 'Windows' and extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: pylint ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-asyncio ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: types-aiofiles ; extra == 'dev'
Requires-Dist: types-tensorflow ; extra == 'dev'
Requires-Dist: types-tqdm ; extra == 'dev'
Requires-Dist: yapf ; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: General ML dataset package
Keywords: machine learning,dataset
Author: Elie Bursztein, Karel Král, Jean-Michel Picod
License: Apache License 2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/google/sedpack
Project-URL: Bug Tracker, https://github.com/google/sedpack

# Sedpack - Scalable and efficient data packing

[![Coverage Status](https://coveralls.io/repos/github/google/sedpack/badge.svg?branch=main)](https://coveralls.io/github/google/sedpack?branch=main)

[Documentation](https://google.github.io/sedpack/)

Mainly refactored from the [SCAAML](https://github.com/google/scaaml) project.

## Available components

See the documentation website:
[https://google.github.io/sedpack/](https://google.github.io/sedpack/).

## Install

### Dependencies

To use this library you need to have a working version of [TensorFlow
2.x](https://www.tensorflow.org/install).

Development dependencies:

-   python-dev and gcc for [xxhash](https://pypi.org/project/xxhash/)

### Dataset install

#### Development install

1.  Clone the repository: `git clone https://github.com/google/sedpack`
2.  Install dependencies: `python3 -m pip install --require-hashes -r requirements.txt`
3.  Install the package in development mode: `python3 -m pip install --editable
    .` (short `pip install -e .` or legacy `python setup.py develop`)

#### Rust install

-   Activate your Python virtual environment
-   [Install Rust](https://www.rust-lang.org/tools/install)
-   Run `maturin develop --release`
-   Run `python -m pytest` from the project root directory -- no tests should
    be skipped

### Update dependencies

Make sure to have: `sudo apt install python3 python3-pip python3-venv` and
activated the virtual environment.

Install requirements: `pip install --require-hashes -r base-tooling-requirements.txt`

Update: `pip-compile pyproject.toml --generate-hashes --upgrade` and commit requirements.txt.

#### Package install

`pip install sedpack`

### Tutorial

A tutorial and documentation is available at
[https://google.github.io/sedpack/](https://google.github.io/sedpack/).

Code for the tutorials is available in the `docs/tutorials` directory. For a
"hello world" see
[https://google.github.io/sedpack/tutorials/mnist/](https://google.github.io/sedpack/tutorials/mnist/).

## Disclaimer

This is not an official Google product.

