Metadata-Version: 2.1
Name: esdocs
Version: 0.4.1
Summary: Serialization & bulk indexing package for Elasticsearch; based on elasticsearch-dsl.py, supports multi-processing, Django
Home-page: https://github.com/jaddison/esdocs
Author: jaddison
Author-email: addi00+github.com@gmail.com
License: MIT
Keywords: elasticsearch django multiprocessing gevent gipc asynchronous bulk index serialization
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Framework :: Django
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Database
Classifier: Topic :: System :: Networking
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: >=3.4
Provides-Extra: gevent
Requires-Dist: elasticsearch-dsl (>6.2.1)
Provides-Extra: gevent
Requires-Dist: gevent; extra == 'gevent'
Requires-Dist: gipc; extra == 'gevent'

This is a modern replacement of [django-simple-elasticsearch](https://github.com/jaddison/django-simple-elasticsearch/) (DSE). Both Django
and Elasticsearch have seen major changes over the years; this is a move to keep up.

##### Why not just update django-simple-elasticsearch?

* DSE is Django-specific; I wanted to build a solution that could be used in a broader scope of applications
* To start fresh and avoid assumptions made in the DSE project
* Dropped support for Python 2

##### Details

* Flexible and modular; eg. Django support is available via a 'contrib' module
* Supports multi-process indexing and asynchronous IO via `gevent`
* Depends on elasticsearch-dsl-py rather than the low level elasticsearch-py package
  * You get a lot of functionality for free!
* Python 3 only

##### Installation

```
pip install esdocs
```

If multi-process indexing is desired, you will want to install it along with the necessary `gevent` dependencies:

```
pip install esdocs[gevent]
```

##### Command Line Usage

```
$ esdocs -h
usage: esdocs [-h] [-v] [--version] [--no_input] [--indexes INDEXES]
              [--using USING] [--multi [MULTI]]
              {list,init,update,rebuild,cleanup} ...

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increase output verbosity
  --version             show program's version number and exit
  --no_input, --noinput
                        Do not prompt for user input (assumes 'Yes' for
                        actions)
  --indexes INDEXES     Comma-separate list of index names to target
  --using USING         Elasticsearch named connection to use
  --multi [MULTI]       Enable multiple processes and optionally set number of
                        CPU cores to use (defaults to all cores)

commands:
  {list,init,update,rebuild,cleanup}
    list                List indexes
    init                Initialize indexes
    update              Update indexes
    rebuild             Rebuild indexes
    cleanup             Delete unaliased indexes
```

To rebuild indexes specified by document serializers in `ESDOCS_SERIALIZER_MODULES`:

```
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"

esdocs rebuild
```

Multi-process indexing:
```
export ESDOCS_GEVENT=y
export ESDOCS_SERIALIZER_MODULES="mypackage.module1,myotherpackage.module2"
export ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS="esdocs.contrib.postgresql.compatibility.range_field"

# auto detect number of CPU cores to use
esdocs rebuild --multiproc

# specify the number of cores to use
esdocs rebuild --multiproc --numprocs=4
```

###### Django

You must specify `ESDOCS_SERIALIZER_MODULES` in your Django settings and add `esdocs.contrib.esdjango` to your
`INSTALLED_APPS`. You can optionally set `ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS` as well:

```

INSTALLED_APPS = [
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    ...,
    'esdocs.contrib.esdjango'
]


ESDOCS_SERIALIZER_MODULES = [
    'mypackage.module1',
    'myotherpackage.module2'
]

# these are the current defaults for this setting
ESDOCS_SERIALIZER_COMPATIBILITY_HOOKS = [
    'esdocs.contrib.esdjango.compatibility.manager',
    'esdocs.contrib.esdjango.compatibility.geosgeometry',
    'esdocs.contrib.postgresql.compatibility.range_field'
]
```

##### Serializing Data

For esdocs to work, you need to define `Document` and `Serializer` (or `DjangoSerializer`) subclasses to index
your data. `Document` comes from the excellent elasticsearch-dsl-py, while `Serializer`/`DjangoSerializer` are
a part of esdocs.

* `Document` defines the Elasticsearch field mappings
* `Serializer` is associated with a `Document`
* `Serializer` defines how to retrieve the dataset
* For each record in your dataset, the `Serializer` will attempt to retrieve a value for each field defined on the associated `Document`
  * There are a number of methods you can implement on a `Serializer` to retrieve (or construct/munge) each value

###### Examples

```

```

