Metadata-Version: 2.1
Name: encrypted-datasets
Version: 1.0.12
Summary: Convenience functions for symetrically encrypting/decrypting huggingface Datasets
Author: nvjoshi2
Author-email: nvj1300@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: boto3 (>=1.35.29,<2.0.0)
Requires-Dist: cryptography (>=43.0.1,<44.0.0)
Requires-Dist: datasets (>=3.0.1,<4.0.0)
Requires-Dist: huggingface-hub (>=0.29.3,<0.30.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Description-Content-Type: text/markdown

## Installation

```bash
pip install encrypted-datasets
```

## Usage

### Raw string key

```python
from datasets import load_dataset
from encrypted_datasets import encrypt_dataset, decrypt_dataset

huggingface_api_token = 'API_TOKEN'
downloaded_dataset = load_dataset('organization/dataset_repo', token=huggingface_api_token)
key = 'Your Symetric encryption key'

decrypted_dataset = decrypt_dataset(downloaded_dataset, key)

# Make modifications to decrypted_dataset...

re_encrypted_dataset = encrypt_dataset(decrypted_dataset, key)

re_encrypted_dataset.push_to_hub('organization/dataset_repo',token=huggingface_api_token)
```

### AWS Key management service key

In this method, you use an AWS KMS key to encrypt data keys that are stored in huggingface with the data.

#### Create new encryped dataset and upload it to huggingface hub

```python
from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3

kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'

cypher = KMSCypher(
    key_id=kms_key_id,
    client=client
)

dataset = Dataset.from_pandas(...)

encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)

encrypted_dataset.push_to_hub('organization/repo_id', token='<ACCESS_TOKEN>')
```

#### Load encrypted dataset, modify it, and reupload

```python
from encrypted_datasets import EncryptedDataset, KMSCypher
import boto3

kms_client = boto3.client('kms')
kms_key_id = '<KMS_KEY_ID>'
hf_token= '<HF_TOKEN>'

cypher = KMSCypher(
    key_id=kms_key_id,
    client=client
)

encrypted_dataset = EncryptedDataset.load('organization/repo_id', token=hf_token)

dataset = encrypted_dataset.decrypt(cypher)

# Make modifications to dataset...


new_encrypted_dataset = EncryptedDataset.encrypt(dataset, cypher)

new_encrypted_dataset.push_to_hub('organization/repo_id', token=hf_token)
```

