Metadata-Version: 2.1
Name: DataSynthesizer
Version: 0.1.10
Summary: Generate synthetic data that simulate a given dataset.
Home-page: https://github.com/DataResponsibly/DataSynthesizer
Author: Data, Responsibly
Author-email: dataresponsibly@gmail.com
License: MIT license
Keywords: DataSynthesizer
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.rst
Requires-Dist: numpy (>=1.18.5)
Requires-Dist: pandas (>=1.0.5)
Requires-Dist: scikit-learn (>=0.23.1)
Requires-Dist: matplotlib (>=3.2.2)
Requires-Dist: seaborn (>=0.10.1)
Requires-Dist: python-dateutil (>=2.8.1)

[![PyPi Shield](https://img.shields.io/pypi/v/DataSynthesizer.svg)](https://pypi.python.org/pypi/DataSynthesizer) [![Travis CI Shield](https://travis-ci.com/DataResponsibly/DataSynthesizer.svg?branch=master)](https://travis-ci.com/DataResponsibly/DataSynthesizer)

# DataSynthesizer

DataSynthesizer generates synthetic data that simulates a given dataset.

> It aims to facilitate the collaborations between data scientists and owners of sensitive data. It applies Differential Privacy techniques to achieve strong privacy guarantee.
>
> For more details, please refer to [DataSynthesizer: Privacy-Preserving Synthetic Datasets](docs/cr-datasynthesizer-privacy.pdf)

### Install DataSynthesizer

```bash
pip install DataSynthesizer
```

### Usage

##### Assumptions for the Input Dataset

1. The input dataset is a table in first normal form ([1NF](https://en.wikipedia.org/wiki/First_normal_form)).
2. When implementing differential privacy, DataSynthesizer injects noises into the statistics within **active domain** that are the values presented in the table.

##### Use Jupyter Notebook

After installing DataSynthesizer and [Jupyter Notebook](https://jupyter.org/install), open and try the demos in `./notebooks/`

- [DataSynthesizer__random_mode.ipynb](notebooks/DataSynthesizer__random_mode.ipynb)
- [DataSynthesizer__independent_attribute_mode.ipynb](notebooks/DataSynthesizer__independent_attribute_mode.ipynb)
- [DataSynthesizer__correlated_attribute_mode.ipynb](notebooks/DataSynthesizer__correlated_attribute_mode.ipynb)

##### Use Web UI

The [dataResponsiblyUI](https://github.com/DataResponsibly/dataResponsiblyUI) is a Django project that includes DataSynthesizer. Please follow the steps in [Run the Web UIs locally](https://github.com/DataResponsibly/dataResponsiblyUI#run-the-web-uis-locally) and run DataSynthesizer by visiting http://127.0.0.1:8000/synthesizer in a browser.



# History

## 0.1.0 - 2020-06-11

* First release on PyPI.

## 0.1.1 - 2020-07-05

### Bugs Fixed

* Numpy error when synthesising data with unique identifiers. - [Issue #23](https://github.com/DataResponsibly/DataSynthesizer/issues/23) by @raids

## 0.1.2 - 2020-07-19

### Bugs Fixed

* infer_distribution() for string attributes fails to sort index of varying types. - [Issue #24](https://github.com/DataResponsibly/DataSynthesizer/issues/24) by @raids

## 0.1.3 - 2020-09-13

### Bugs Fixed

* The dataframes are not appended into the full space in get_noisy_distribution_of_attributes(). - [Issue #26](https://github.com/DataResponsibly/DataSynthesizer/issues/26) by @zjroth

## 0.1.4 - 2021-01-14

### Bugs Fixed

* Fix a bug in candidate key identification.

## 0.1.5 - 2021-03-11

### What's New

* Downgrade required Python from >=3.8 to >=3.7.

## 0.1.6 - 2021-03-11

### What's New

* Update example notebooks.

## 0.1.7 - 2021-03-31

### Bugs Fixed

* Fixed an error in Laplace noise parameter. - [Issue #34](https://github.com/DataResponsibly/DataSynthesizer/issues/34) by @ganevgv

## 0.1.8 - 2021-04-09

### Bugs Fixed

* The randomness seeding is effective across the entire project now.

## 0.1.9 - 2021-07-18

### Bugs Fixed

* Optimized the datetime datatype detection.

## 0.1.10 - 2021-11-15

### Bugs Fixed

* Seed the randomness in `greedy_bayes()`.


