Metadata-Version: 2.1
Name: analyticsdf
Version: 0.0.7
Summary: Analytic generation of datasets with specified statistical characteristics.
Home-page: https://github.com/Faye-yufan/analytics-dataset
Author: Fei, Eli
Author-email: yufanfei@usc.edu
License: MIT
Platform: unix
Platform: linux
Platform: osx
Platform: cygwin
Platform: win32
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: scikit-learn
Provides-Extra: testing
Requires-Dist: pytest (>=6.0) ; extra == 'testing'
Requires-Dist: pytest-cov (>=2.0) ; extra == 'testing'
Requires-Dist: mypy (>=0.910) ; extra == 'testing'
Requires-Dist: flake8 (>=3.9) ; extra == 'testing'
Requires-Dist: tox (>=3.24) ; extra == 'testing'

Analytic generation of datasets with specified statistical characteristics.

# Introduction
analytics-dataset provides a set of functionality to enable the specification and generation of a wide range of datasets with specified statistical characteristics. Specification to include the predictor matrix and the response vector. Check the [analyticsdf documentation](https://faye-yufan.github.io/analytics-dataset/) for more details.
Examples include:
* High correlation and multi-collinearity among predictor variables
* Interaction effects between variables
* Skewed distributions of predictor and response variables
* Nonlinear relationships between predictor and response variables

## Research existing automate dataset functionality
* Sklearn [Make Datasets](https://scikit-learn.org/stable/datasets/sample_generators.html) functionality
* MIT Synthetic Data Vault project
  * [MIT Data to AI Lab](https://dai.lids.mit.edu/)
  * [datacebo](https://datacebo.com/)
  * 2016 IEEE conference paper, The Synthetic Data Vault. 

## Public Package
This repo has published beta packages on both [Pypi](https://pypi.org/project/analyticsdf/) and [Anaconda](https://anaconda.org/faye-yufan/analyticsdf)
