Metadata-Version: 2.0
Name: dbcollection
Version: 0.1.11
Summary: A collection of popular datasets for deep learning.
Home-page: https://github.com/dbcollection/dbcollection
Author: M. Farrajota
Author-email: UNKNOWN
License: MIT License
Download-URL: https://github.com/dbcollection/dbcollection/archive/0.1.11.tar.gz
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: MacOS X
Classifier: Environment :: Win32 (MS Windows)
Classifier: Environment :: X11 Applications
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: Pillow
Requires-Dist: h5py
Requires-Dist: numpy
Requires-Dist: patool
Requires-Dist: progressbar2
Requires-Dist: requests
Requires-Dist: scipy
Requires-Dist: xmltodict


dbcollection is a library for downloading/parsing/managing datasets via simple methods.
It was built from the ground up to be cross-platform (Windows, Linux, MacOS) and
cross-language (Python, Lua, Matlab, etc.). This is achieved by using the popular HDF5
file format to store (meta)data of manually parsed datasets and Python for scripting.
By doing so, this library can target any platform that supports Python and any language
that has bindings for HDF5.

This package allows to easily manage and load datasets in an easy and simple
way by using HDF5 files as metadata storage. By storing all the necessary metadata
to disk, it allows for huge datasets to be used in systems with reduced
memory usage. Also, once a dataset is setup, it is setup forever! Users can reuse it
as many times as they want/need for a myriad of tasks without having to setup a
dataset each time they hack some code. This lets users focus on more important tasks
fast prototyping without having to spend time managing datasets or creating/modyfing
scripts to load/fetch data from disk.

Main features
-------------

Here are some of key features dbcollection provides:

- Simple API to load/download/setup/manage datasets
- Simple API to fetch data of a dataset
- All data is stored in disk, resulting in reduced RAM usage (useful for large datasets)
- Datasets only need to be setup once
- Cross-platform (Windows, Linux, MacOs).
- Easily extensible to other languages that have support for HDF5 files
- Concurrent/parallel data access is possible thanks to the HDF5 file format
- Diverse list of popular datasets are available for use
- All datasets were manually parsed by someone, meaning that some of the quirks were
  already solved for you


