Metadata-Version: 2.1
Name: sparpy
Version: 0.4.0.dev1
Summary: An Spark entry point for python
Home-page: https://github.com/alfred82santa/sparpy
Author: Alfred Santacatalina
Author-email: UNKNOWN
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: click

=======================================
Sparpy: An Spark entry point for python
=======================================

---------
Changelog
---------

......
v0.4.0
......

* Added `--pre` option in order to allow pre-release packages.
* Added `--env` option in order to set environment variables for spark process.
* Added `spark-env` config section in order to set environment variables for spark process.
* Write pip output when it fails.
* Fixed problems with interactive sparpy.
* Fixed `no-self` option in config file.

* Allow to use plugins that don't use `click`. They must be callable with one argument of type `Sequence[str]`
  in order to pass arguments to it.

* Added `--version` option in order to print sparpy version.
* Fixed error when a plugin requires a package which is already installed but version does not satisfy requirement.

......
v0.3.0
......

* Enable `--force-download` option.
* Added `--find-links` option in order to use a directory as package repository.
* Added `--no-index` option in order to avoid to use external package repositories.
* Added `--queue` option in order to set yarn queue.
* Ensure driver's python executable is same python as `sparpy`.
* Added new entry point `sparpy-download` just to download packages to specific directory.
* Added new entry point `isparpy` in order to start an interactive session.

......
v0.2.1
......

* Force `pyspark` python executable to same as `sparpy`.
* Fix unrecognized options.
* Fix default configuration file names.

......
v0.2.0
......

* Added configuration file option.
* Added `--debug` option.

----------------------------
How to build a Sparpy plugin
----------------------------

On package `setup.py` an entry point should be configured for Sparpy:

.. code-block:: python

    setup(
        name='yourpackage',
        ...

        entry_points={
            ...
            'sparpy.cli_plugins': [
                'my_command_1=yourpackage.module:command_1',
                'my_command_2=yourpackage.module:command_2',
            ]
        }
    )

.. note::

    Avoid to use PySpark as requirement in order to not download package from pypi.

-------
Install
-------

It must be installed on a Spark edge node.

.. code-block:: bash

    $  pip install sparpy


----------
How to use
----------

Using default Spark submit parameters:

.. code-block:: bash

    $ sparpy --plugin "mypackage>=0.1" my_plugin_command --myparam 1


-------------------
Configuration files
-------------------

`sparpy` and `sparpu-submit` accept the parameter `--config` that allow to set a configuration file. If it is not set
it will try to use configuration file `$HOME/.sparpyrc`. It if does not exist it will try to use `/etc/sparpy.conf`.

Format:

.. code-block:: ini

    [spark]

    master=yarn
    deploy-mode=client

    queue=my_queue

    spark-executable=/path/to/my-spark-submit
    conf=
        spark.conf.1=value1
        spark.conf.2=value2

    packages=
        maven:package_1:0.1.1
        maven:package_2:0.6.1

    repositories=
        https://my-maven-repository-1.com/mvn
        https://my-maven-repository-2.com/mvn

    reqs_paths=
        /path/to/dir/with/python/packages_1
        /path/to/dir/with/python/packages_2

    [spark-env]

    MY_ENV_VAR=value

    [plugins]

    extra-index-urls=
        https://my-pypi-repository-1.com/simple
        https://my-pypi-repository-2.com/simple

    cache-dir=/path/to/cache/dir

    plugins=
        my-package1
        my-package2==0.1.2

    requirements-files=
        /path/to/requirement-1.txt
        /path/to/requirement-2.txt

    find-links=
        /path/to/directory/with/packages_1
        /path/to/directory/with/packages_2

    download-dir-prefix=my_prefix_

    no-index=false
    no-self=false
    force-download=true

    [interactive]

    pyspark-executable=/path/to/pyspark
    python-interactive-driver=/path/to/interactive/driver

