
dispyscheduler
**************

As mentioned in *dispy* and *dispy: Distributed and Parallel Computing
with/for Python*, if at most one cluster uses a node at any time,
cluster can be created with JobCluster; scheduler included in
JobCluster manages jobs and nodes. If, however, there is a need to use
(share) nodes by more than one cluster at the same time (e.g.,
multiple client programs need to run simultaneously), then
SharedJobCluster must be used (instead of JobCluster) to create
cluster, and **dispyscheduler.py** must be running on a node. Usually
no options are needed when invoking this program.

Below are various options to invoking dispyscheduler:

* "-d" enables debug messages that show trace of execution.

* "-n node1 -n node2" or "--nodes node1 --nodes node2" etc. sends
  handshake messages when dispyscheduler starts. This is not needed if
  scheduler and nodes are on same network.

* "-i addr" or "--ip_addr=addr" directs dispyscheduler to use given
  *addr* for communication, instead of the IP address associated with
  the host name.

* "--ext_ip_addr=addr" directs dispyscheduler to announce *addr* in
  network communication so that the scheduler can be used if it is
  behind NAT firewall/gateway that is configured to use *addr*. See
  below.

* "-p n" or "--port=n" directs dispyscheduler to use given port *n*
  instead of default port 51347 for UDP and TCP communication for job
  results.

* "--scheduler_port=n" directs dispyscheduler to use given port *n*
  instead of default port 51349 for job scheduler.

* "--node_port=n" directs dispyscheduler to use given port *n*
  instead of default port 51348 where dispynodes must be running.

* "--node_secret secret" directs dispyscheduler to use 'secret' for
  hashing handshake communication with nodes; i.e., the scheduler will
  only use nodes that use same secret (see *secret* option to
  *dispynode*).

* "--cluster_secret secret" directs dispyscheduler to use 'secret'
  for hashing handshake communication with clusters; i.e., the
  scheduler will only work with SharedJobCluster clients that use same
  secret (see *secret* option to JobCluster).

* "--node_keyfile path" is file containing private key for SSL
  communication with nodes, same as 'keyfile' parameter to
  ssl.wrap_socket of Python ssl module. This key may be stored in
  'node_certfile' itself, in which case this must be *None* (default).
  Same file must be used as "--keyfile" for *dispynode*.

* "--node_certfile path" is file containing SSL certificate with

     nodes, same as 'certfile' parameter to ssl.wrap_socket of Python
     ssl module. Same file must be used as "--certfile" for
     *dispynode*.

* "--cluster_keyfile path" is file containing private key for SSL
  communication with clusters, same as 'keyfile' parameter to
  ssl.wrap_socket of Python ssl module. This key may be stored in
  *cluster_certfile* itself, in which case this must be *None*. Same
  file must be used as *keyfile* parameter for SharedJobCluster.

* "--cluster_certfile path" is file containing SSL certificate with
  clusters, same as 'certfile' parameter to ssl.wrap_socket of Python
  ssl module. Same file must be used as *certfile* parameter for
  SharedJobCluster.

* "--httpd" option starts httpd server so that all clusters using
  the scheduler can be monitored and managed with a web browser. The
  clusters are shown with their names appended with client's IP
  address as '@ <IP address>'.

* "--pulse_interval n" directs nodes it controls to send pulse
  messages every *n* seconds; if a pulse message is not received
  within 5*n, then a node is presumed dead. In that case, if a cluster
  set 'reentrant=True', then jobs scheduled on that node will be
  migrated to other node(s) if possible; if *reentrant=False*, then
  jobs are automatically cancelled. *n* must be between 1 and 600.

* "--ping_interval" is number of seconds.  Normally dispyscheduler
  can locate nodes running dispynode by broadcasting UDP ping messages
  on local network and point-to-point UDP messages to nodes on remote
  networks. However, UDP messages may get lost.  Ping interval is
  number of seconds between repeated ping messages to find any nodes
  that have missed previous ping messages.

* "--max_file_size n" specifies maximum size of any file transferred
  from clients. If size of a file transferred exceeds *n*, the file
  will be truncated. *n* can be specified as a number >= 0, with an
  optional suffix letter k (indicating *n* kilobytes), m (*n*
  megabytes), g (*n* gigabytes) or t (*n* terrabytes). If *n* is 0,
  there is no maximum limit.

* "--clean" option causes dispyscheduler to remove any files saved
  from previous runs. dispyscheduler saves any files sent by dispy
  clients and information about jobs' execution results that couldn't
  be sent to clients (because of network failures, clients crashed
  etc.). The cleaning is done once when the scheduler is starting. If
  dispyscheduler is left running for a long time, it may be advisable
  to periodically remove such files (perhaps files that were accessed
  before a certain time).

* "--msg_timeout n" specifies timeout value in seconds for socket
  I/O operations with the client / scheduler. The default value is 5
  seconds. If the network is slow, this timeout can be increased.
  Bigger timeout values than necessary will cause longer delays in
  recognizing communication failures.


NAT/Firewall Forwarding
=======================

As explained in *dispy* and *dispynode* documentation, *ext_ip_addr*
can be used to use services behind NAT firewall/gateway. This option
can be used with dispyscheduler, too. This is especially useful if
there are many nodes in a network behind NAT firewall/gateway
(otherwise, as explained in dispynode documentation, each dispynode
should be started with a different port and all those ports forwarded
appropriately). Assuming that dispyscheduler is to run on a node with
(private) IP address 192.168.20.55 and it is behind NAT
firewall/gateway at (public) IP address a.b.c.d, dispyscheduler can be
invoked as:

   dispyscheduler.py -i 192.168.20.55 --ext_ip_addr a.b.c.d

and setup NAT to forward UDP and TCP ports 51347 and TCP port 51349 to
192.168.20.55.  Then dispy clients can use nodes in this network with:

   cluster = SharedJobCluster(compute, nodes=['*'], scheduler_node='a.b.c.d')
