How do I install PyDAP?
=======================

You need three simple steps::

    $ wget http://peak.telecommunity.com/dist/ez_setup.py
    $ sudo python ez_setup.py
    $ sudo easy_install -f http://pydap.org/ dap

This will install setuptools_, the latest version of the dap module
and fpconst_ (a module that implements constants and functions for
working with IEEE754 double-precision special values). You'll also
need a module implementing arrays, like Numeric_, numarray_ or `Scipy
Core`_; these have to be installed manually for now.

.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
.. _fpconst: http://research.warnes.net/projects/RStatServer/fpconst/index_html
.. _Numeric: http://sourceforge.net/projects/numpy
.. _numarray: http://www.stsci.edu/resources/software_hardware/numarray
.. _`Scipy Core`: http://numeric.scipy.org/


How do I run a server?
======================

Standalone server
-----------------

PyDAP comes with a standalone server that runs from the script
``dap-server.py``. This server can be run from command line::

    $ dap-server.py --help
    usage: dap-server.py [options]

    options:
      -h, --help            show this help message and exit
      -d PIDFILE, --pid=PIDFILE
                            pid file
      -f, --foreground      don't detach process
      -i HOST, --host=HOST  host to listen on (default 127.0.0.1)
      -p PORT, --port=PORT  port to serve on (default 8888)
      -l LOGFILE, --log=LOGFILE
                            log file
      -v LEVEL, --level=LEVEL
                            logging level: DEBUG, INFO, WARN[ING], ERROR or
                            CRITICAL/FATAL

If the data directory is ommited, the server will serve files from
the current directory (and subdirectories). The default behaviour
is to run on port 8888, logging error messages to ``stderr``. The debug
level can be ``DEBUG``, ``INFO``, ``WARN[ING]``, ``ERROR`` and ``CRITICAL``
(or ``FATAL``).

This standalone servers uses the WSGI_ server from the module
WSGIUtils_, which must be installed. This can be done with::

    $ sudo easy_install -f http://pydap.org/ dap[server]

CGI script
----------

You can also run a server from a CGI. Just copy the file ``dap-server.cgi``
to your cgi-bin directory, and edit the ``DATADIR`` variable to point
to the directory holding your data.

Paste Deploy
------------

A better approach for running a server is using `Paste Deploy`_. Install
the modules::

    $ sudo easy_install Paste
    $ sudo easy_install PasteDeploy
    $ sudo easy_install PasteScript

And create a configuration file::

    [server:main]
    use = egg:PasteScript#wsgiutils
    host = 0.0.0.0
    port = 80
 
    [app:main]
    use = egg:dap
    root = /path/to/data
    name = Data Server

Now you can run a server using `Paste Script`_::

    $ sudo paster serve <config>

Why use Paste Script? Because it makes it easy to configure your
server and use additional WSGI middleware. For example, if you want
to compress the server responses, you can use the Gzip middleware
from Paste. The config file would look like this::

    [server:main]
    ...
 
    [pipeline:main]
    pipeline = gzip dap
 
    [app:dap]
    use = egg:dap
    root = /path/to/data
    name = Data Server
 
    [filter:gzip]
    use = egg:Paste#gzip

PyDAP comes with middleware to generate a THREDDS_ catalog describing
the available datasets, and also a logger. Here's how to use them
both::
 
    [pipeline:main]
    pipeline = logger catalog dap
 
    [app:dap]
    ...
 
    [filter:logger]
    use = egg:dap#logger
    filename = /var/log/dapserver.log
    level = WARNING
 
    [filter:catalog]
    use = egg:dap#catalog

.. _WSGI: http://www.python.org/peps/pep-0333.html
.. _WSGIUtils: http://www.owlfish.com/software/wsgiutils/
.. _Paste Deploy: http://pythonpaste.org/deploy/
.. _Paste Script: http://pythonpaste.org/script/
.. _THREDDS: http://www.unidata.ucar.edu/Projects/THREDDS/


Which files are supported by the server?
========================================

PyDAP uses a plugin architecture to handle different file formats.
It comes with a plugin for netCDF_, Matlab_ and CSV (comma separated
values) files. There's also a plugin for SQL RDBMS, based on the
Python DB API 2.0 specification_.

All files can be compressed with ``gzip`` or ``bzip2``. A special plugin 
will uncompress the data to a temporary directory and call the 
appropriate plugin to handle the new file, which is deleted
after the request.

.. _netCDF: http://www.unidata.ucar.edu/packages/netcdf/
.. _Matlab: http://www.mathworks.com/
.. _specification: http://www.python.org/peps/pep-0249.html


How do I serve data from a RDBMS like PostgreSQL_?
==================================================

Create a file with the extension ``sql`` and the following content::

    [database]
    dsn: pgsql://user:password@host:port/dbname
    name: cruise
    arbitrary-attribute: 42
    another-attribute: foo

Now create additional sections for each column you want to retrieve
data from. Suppose we have salinity and temperature data, stored
in the table ``casts`` as ``temp`` and ``salt``, respectively::

    [salinity]
    column: casts.salt
    units: psu

    [temperature]
    column: casts.temp
    units: deg C

If the columns are located in different tables, you need to specify
an id to join the values together. For example::

    [salinity]
    column: salt.salt
    id: salt_id
 
    [temperature]
    column: temp.temp
    id: temp_id

In this case, the server will join the values of temperature and
salinity where ``temp_id == salt_id``, so if we have::

    temp.temp   temp.temp_id
    24          1
    25          2
    26          3

And::

    salt.salt   salt.salt_id
    35          0
    34          1
    33          2

We would get the following ASCII response from the server::

    Dataset {
        Sequence {
            Int32 temperature;
            Int32 salinity;
        } cruise;
    } test.sql;
    =============================================
    cruise.temperature, cruise.salinity
    24, 34
    25, 33
 
.. _PostgreSQL: http://www.postgresql.org/


How do I write plugins for new data formats?
============================================

As Mark Pilgrim would've said, "a lot of effort went into making
this effortless" (or something like that). If you want to write a
plugin for a new data format you don't need to know anything about
the binary encoding the DAP uses (called XDR_), how to generate
different responses (DDS, DAS, ASCII, etc.) or how to parse a
constraint expression (like ``sst.sst[0:1:10][0:2:7]`` or
``seq.cast?seq.lat>-30?seq.lon<300``).

A plugin is a Python module. It should have at least two things:

* a variable called ``extensions`` specifying a regular expression
  that matches the files handled by that plugin; and
 
* a class ``Handler`` with the following methods::

      from dap.server import BaseHandler
   
      class Handler(BaseHandler):
          def __init__(self, filepath, environ):
              self.environ = environ
              ...
   
          def _parseconstraints(self, constraints=None):
              ...
   
          def close(self):
              ...

The variable ``filepath`` is the full path (on disk) to the file being
requested and that should be handled by the plugin. The method
``_parseconstraints`` should parse the constraint expression and return
a dataset object built with types from dap.dtypes_. (Take a look at
the `Matlab module`_ and things will get clearer.)

The ``close`` method is optional, and is called when the plugin is no
longer necessary. It's often used to close open files or to remove
temporary data.

If you don't want to parse the constraint expression, just build
the whole dataset and use the ``trim`` function available from
``dap.helper``::

    def _parseconstraints(self, constraints=None):
        # Build the dataset.
        dataset = dtypes.DatasetType(name=...)
        ...
        dataset[var] = dtypes.BaseType(name=var, data=...)
        ...
   
        return trim(dataset, constraints)

And that should do it. An example will make things easier: suppose
we want to serve single integers. Not very useful, but that's ok.
We create a file called ``42.int`` with the number 42 in it, and put
it somewhere in our PyDAP server root. This is how our plugin would
look like (untested)::

    import os.path

    from dap import dtypes
    from dap.server import BaseHandler
    from dap.helper import trim

    extensions = r"""^.*\.int"""  # file ending in '.int'

    class Handler(BaseHandler):
        def __init__(self, filepath, environ):
            self.environ = environ
 
            dir, self.filename = os.path.split(filepath)
 
            value = open(filepath).read()
            self.integer = int(value)
 
        def _parseconstraints(self, constraints=None):
            # Build the dataset.
            dataset = dtypes.DatasetType(name=self.filename)
 
            # Add a variable.
            dataset['integer'] = dtypes.BaseType(name='integer', data=self.integer)
            dataset['integer'].attributes['long_name'] = 'This is the number %d.' % self.integer
 
            return trim(dataset, constraints)
 
It's better to do the parsing of the constraint expression yourself,
though, to avoid the overhead of building the whole dataset just
to strip it down later. Also, the ``trim`` function does not work
flawlessly since it does not handle short notation names.

You can also hire me to write new plugins. Send an email to
<rob@pydap.org> if you're interested.

.. _XDR: http://en.wikipedia.org/wiki/External_Data_Representation
.. _Matlab module: /module-dap.plugins.matlab.html
.. _dap.dtypes: /module-dap.dtypes.html
