Metadata-Version: 2.1
Name: metastore
Version: 1.0.0.dev15
Summary: Metastore Python SDK. Feature store and data catalog for machine learning.
Home-page: https://metastore.readthedocs.io
Author: Metastore Developers
Author-email: metastoredevelopers@gmail.com
License: BSD-3-Clause
Download-URL: https://pypi.org/project/metastore
Project-URL: Code, https://github.com/metastore-developers/metastore
Project-URL: Issue tracker, https://github.com/metastore-developers/metastore/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: BSD License
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: dask (>=2021.10.0)
Provides-Extra: development
Requires-Dist: setuptools (>=58.2.0) ; extra == 'development'
Requires-Dist: wheel (>=0.37.0) ; extra == 'development'
Requires-Dist: autopep8 (>=1.6.0) ; extra == 'development'
Requires-Dist: pylint (>=2.12.0) ; extra == 'development'
Requires-Dist: pytest (>=6.2.0) ; extra == 'development'
Requires-Dist: pytest-cov (>=3.0.0) ; extra == 'development'
Requires-Dist: sphinx (>=4.3.0) ; extra == 'development'
Requires-Dist: myst-parser (>=0.15.0) ; extra == 'development'
Requires-Dist: pydata-sphinx-theme (>=0.7.0) ; extra == 'development'
Requires-Dist: twine (>=3.7.0) ; extra == 'development'

<p align="left">
    <a href="https://github.com/metastore-developers/metastore" title="Metastore">
        <img src="https://metastore.readthedocs.io/en/latest/_static/logo.svg" width="128px"/>
    </a>
</p>

[![Releases](https://img.shields.io/github/v/release/metastore-developers/metastore?color=blue)](https://github.com/metastore-developers/metastore/releases)
[![Issues](https://img.shields.io/github/issues/metastore-developers/metastore?color=blue)](https://github.com/metastore-developers/metastore/issues)
[![Pull requests](https://img.shields.io/github/issues-pr/metastore-developers/metastore?color=blue)](https://github.com/metastore-developers/metastore/pulls)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://metastore.readthedocs.io)
[![License](https://img.shields.io/pypi/l/metastore?color=blue)](https://metastore.readthedocs.io/en/latest/license.html)

# Metastore

Metastore Python SDK.

Feature store and data catalog for machine learning.

## Prerequisites

* [Python (>=3.7.0)](https://www.python.org)

## Installation

### Production

Install package:

```console
pip install metastore
```

### Development

Install package:

```console
pip install -e .[development]
```

> **Note** Use the `-e, --editable` flag to install the package in development mode.

> **Note** Set up a virtual environment for development.

Format source code:

```console
autopep8 --recursive --in-place setup.py metastore/ tests/
```

Lint source code:

```console
pylint setup.py metastore/ tests/
```

Test package:

```console
pytest
```

Report test coverage:

```console
pytest --cov --cov-fail-under 80
```

> **Note** Set the `--cov-fail-under` flag to 80% to validate the code coverage metric.

Generate documentation:

```console
sphinx-apidoc -f -e -T -d 2 -o docs/metastore/api-reference/ metastore/
```

Build documentation (optional):

```console
cd docs/
sphinx-build -b html metastore/ build/
```

## Usage

### Create project definition

```yaml
# metastore.yaml

project:
    name: 'customer_transactions'
    display_name: 'Customer transactions'
    description: 'Customer transactions feature store.'
    author: 'Metastore Developers'
    tags:
      - 'customer'
      - 'transaction'
    version: '1.0.0'
credential_store:
    type: 'local'
    path: '/path/to/.env'
metadata_store:
    type: 'file'
    path: 's3://path/to/metadata.db'
    s3_endpoint:
        type: 'secret'
        name: 'S3_ENDPOINT'
    s3_access_key:
        type: 'secret'
        name: 'S3_ACCESS_KEY'
    s3_secret_key:
        type: 'secret'
        name: 'S3_SECRET_KEY'
feature_store:
    offline_store:
        type: 'file'
        path: 's3://path/to/features/'
        s3_endpoint:
            type: 'secret'
            name: 'S3_ENDPOINT'
        s3_access_key:
            type: 'secret'
            name: 'S3_ACCESS_KEY'
        s3_secret_key:
            type: 'secret'
            name: 'S3_SECRET_KEY'
    online_store:
        type: 'redis'
        hostname:
            type: 'secret'
            name: 'REDIS_HOSTNAME'
        port:
            type: 'secret'
            name: 'REDIS_PORT'
        database:
            type: 'secret'
            name: 'REDIS_DATABASE'
        password:
            type: 'secret'
            name: 'REDIS_PASSWORD'
data_sources:
  - name: 'postgresql_data_source'
    type: 'postgresql'
    hostname:
        type: 'secret'
        name: 'POSTGRESQL_HOSTNAME'
    port:
        type: 'secret'
        name: 'POSTGRESQL_PORT'
    database:
        type: 'secret'
        name: 'POSTGRESQL_DATABASE'
    username:
        type: 'secret'
        name: 'POSTGRESQL_USERNAME'
    password:
        type: 'secret'
        name: 'POSTGRESQL_PASSWORD'
```

### Create feature definitions

```python
# feature_definitions.py

from datetime import timedelta

from metastore import (
    FeatureStore,
    FeatureGroup,
    Feature,
    ValueType
)


feature_store = FeatureStore(repository='/path/to/repository/')

feature_group = FeatureGroup(
    name='customer_transactions',
    record_identifiers=['customer_id'],
    event_time_feature='timestamp',
    features=[
        Feature(name='customer_id', value_type=ValueType.INTEGER),
        Feature(name='timestamp', value_type=ValueType.STRING),
        Feature(name='daily_transactions', value_type=ValueType.FLOAT),
        Feature(name='total_transactions', value_type=ValueType.FLOAT)
    ]
)

feature_store.apply(feature_group)
```

### Ingest features

```python
# ingest_features.py

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

dataframe = feature_store.read_dataframe(
    'postgresql_data_source',
    table='customer_transaction',
    index_column='customer_id',
    partitions=10
)

feature_store.ingest('customer_transactions', dataframe)
```

### Materialize features

```python
# materialize_features.py

from datetime import datetime, timedelta

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

feature_store.materialize(
    'customer_transactions',
    end_date=datetime.utcnow(),
    expires_in=timedelta(days=1)
)
```

### Retrieve historical features

```python
# retrieve_historical_features.py

from datetime import datetime

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001],
    'timestamp': [datetime.utcnow()]
})

dataframe = feature_store.get_historical_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
).compute()

metadata = dataframe.attrs['metastore']
print(metadata)
```

### Retrieve online features

```python
# retrieve_online_features.py

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001]
})

dataframe = feature_store.get_online_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
).compute()

metadata = dataframe.attrs['metastore']
print(metadata)
```

## Documentation

Please refer to the official [Metastore Documentation](https://metastore.readthedocs.io).

## Changelog

[Changelog](https://metastore.readthedocs.io/en/latest/changelog.html) contains information about new features, improvements, known issues, and bug fixes in each release.

## Copyright and license

Copyright (c) 2022, Metastore Developers. All rights reserved.

Project developed under a [BSD-3-Clause License](https://metastore.readthedocs.io/en/latest/license.html).


