Metadata-Version: 2.1
Name: cloudmesh-cc
Version: 5.0.3
Summary: The cloudmesh compute coordinator
Author-email: Gregor von Laszewski <laszewski@gmail.com>
Maintainer-email: Gregor von Laszewski <laszewski@gmail.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           Copyright 2017 Gregor von Laszewski, Indiana University
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
Project-URL: Homepage, https://github.com/cloudmesh/cloudmesh-cc
Project-URL: Documentation, https://github.com/cloudmesh/cloudmesh-cc/blob/main/README.md
Project-URL: Repository, https://github.com/cloudmesh/cloudmesh-cc.git
Project-URL: Issues, https://github.com/cloudmesh/cloudmesh-cc/issues
Project-URL: Changelog, https://github.com/cloudmesh/cloudmesh-cc/blob/main/CHANGELOG.md
Keywords: helper library,cloudmesh
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: Other Environment
Classifier: Environment :: Plugins
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: User Interfaces
Classifier: Topic :: System
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: System :: Shells
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: toml
Requires-Dist: cloudmesh-cmd5
Requires-Dist: cloudmesh-sys
Requires-Dist: cloudmesh-inventory
Requires-Dist: cloudmesh-configuration
Requires-Dist: cloudmesh-progress
Requires-Dist: psutil
Requires-Dist: yamldb
Requires-Dist: networkx[default]
Requires-Dist: pydot
Requires-Dist: graphviz
Requires-Dist: pexpect
Requires-Dist: fastapi[all]
Requires-Dist: httpx
Requires-Dist: papermill
Requires-Dist: pandas
Requires-Dist: markdown
Requires-Dist: ipython
Requires-Dist: ipykernel

# Hybrid Multi-Cloud Analytics Services Framework

**Cloudmesh Controlled Computing through Workflows**

Gregor von Laszewski (laszewski@gmail.com)$^*$,
Jacques Fleischer

$^*$ Corresponding author

## Citation

* <https://arxiv.org/pdf/2210.16941>
*  <https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf>

```
@misc{las-2022-hybrid-cc,
  title =	 {Hybrid Reusable Computational Analytics Workflow
                  Management with Cloudmesh},
  author =	 {Gregor von Laszewski and J. P. Fleischer and
                  Geoffrey C. Fox},
  year =	 2022,
  eprint =	 {2210.16941},
  archivePrefix ={arXiv},
  primaryClass = {cs.DC},
  url =		 {https://arxiv.org/pdf/2210.16941},
  urlOPT =
                  {https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf}
}
```


## Background

High-performance computing (HPC) is for decades a very important tool
for science. Scientific tasks can be leveraging the processing power
of a supercomputer so they can run at previously unobtainable high
speeds or utilize specialized hardware for acceleration that otherwise
are not available to the user. HPC can be used for analytic programs
that leverage machine learning applied to large data sets to, for
example, predict future values or to model current states. For such
high-complexity projects, there are often multiple complex programs
that may be running repeatedly in either competition or
cooperation. Leveraging for example computational GPUs leads to
several times higher performance when applied to deep learning
algorithms. With such projects, program execution is submitted as a
job to a typically remote HPC center, where time is billed as node
hours. Such projects must have a service that lets the user manage and
execute without supervision. We have created a service that lets the
user run jobs across multiple platforms in a dynamic queue with
visualization and data storage.

See @fig:fastapi-service.

![OpenAPI Description of the REST Interface to the Workflow](images/fastapi-service.png){#fig:fastapi-service width=50%}


## Workflow Controlled Computing

This software was developed end enhancing Cloudmesh, a suite of
software to make using cloud and HPC resources easier. Specifically,
we have added a library called Cloudmesh Controlled Computing
(cloudmesh-cc) that adds workflow features to control the execution of
jobs on remote compute resources.

The goal is to provide numerous methods of specifying the workflows on
a local computer and running them on remote services such as HPC and
cloud computing resources. This includes REST services and command
line tools. The software developed is freely available and can easily
be installed with standard Python tools so integration in the Python
ecosystem using virtualenv's and Anaconda is simple.


## Workflow Functionality

A hybrid multi-cloud analytics service framework was created to manage
heterogeneous and remote workflows, queues, and jobs. It was designed
for access through both the command line and REST services
to simplify the coordination of tasks on remote computers. In
addition, this service supports multiple operating systems like macOS,
Linux, and Windows 10 and 11, on various hosts: the computer's
localhost, remote computers, and the Linux-based virtual image WSL.
Jobs can be visualized and saved as a YAML and SVG data file. This
workflow was extensively tested for functionality and reproducibility.

## Quickstart

To test the workflow program, prepare a cm directory in your home
directory by executing the following commands in a terminal:

```bash
mkdir ~/cm
cd ~/cm
pip install cloudmesh-installer -U
cloudmesh-installer get cc
cd cloudmesh-cc
pytest -v -x --capture=no tests/test_199_workflow_clean.py
```

This test runs three jobs within a singular workflow: the first job
runs a local shell script, the second runs a local Python script, and
the third runs a local Jupyter notebook.

## Application demonstration using MNIST

The Modified National Institute of Standards and Technology Database
is a machine learning database based on image processing Various MNIST
files involving different machine learning cases were modified and
tested on various local and remote machines These cases include
Multilayer Perceptron, LSTM (Long short-term memory), Auto-Encoder,
Convolutional, and Recurrent Neural Networks, Distributed Training,
and PyTorch training.

See @fig:workflow-uml.

![Design for the workflow.](images/workflow-uml.png){#fig:workflow-uml}

## Design

The hybrid multi-cloud analytics service framework was created to
ensure running jobs across many platforms. We designed a small and
streamlined number of abstractions so that jobs and workflows can be
represented easily. The design is flexible and can be expanded as each
job can contain arbitrary arguments. This made it possible to custom
design for each target type a specific job type so that execution on
local and remote compute resources including batch operating systems
can be achieved. The job types supported include: local job on Linux,
macOS, Windows 10, and Windows 11, jobs running in WSL on Windows
computers, remote jobs using ssh, and batch jobs using Slurm.



In addition, we leveraged the existing Networkx Graph framework to
allow dependencies between jobs. This greatly reduced the complexity
of the implementation while being able to leverage graphical displays
of the workflow, as well as using scheduling jobs with for example
topological sort available in Networkx. Custom schedulers can be
designed easily based on the dependencies and job types managed
through this straightforward interface. The status of the jobs is
stored in a database that can be monitored during program
execution. The creation of the jobs is done on the fly, e.g. when the
job is needed to be determined on the dependencies when all its
parents are resolved. This is especially important as it allows
dynamic workflow patterns to be implemented while results from
previous calculations can be used in later stages of the workflow.

We have developed a simple-to-use API for this so programs can be
formulated using the API in Python. However, we embedded this API also
in a prototype REST service to showcase that integration into
language-independent frameworks is possible. The obvious functions to
manage workflows are supported including graph specification through
configuration files, upload of workflows, export, adding jobs and
dependencies, and visualizing the workflow during the execution. An
important feature that we added is the monitoring of the jobs while
using progress reports through automated log file mining. This way
each job reports the progress during the execution. This is especially
of importance when we run very complex and long-running jobs.


The REST service was implemented in FastAPI to leverage a small but
fast service that features a much smaller footprint for implementation
and setup in contrast to other similar REST service frameworks using
python.

This architectural component building this framework is depicted
@fig:workflow-uml.  The code is available in this repository and
manual pages are provided on how to install it:
[cloudmesh-cc](https://github.com/cloudmesh/cloudmesh-cc).

## Summary

The main interaction with the workflow is through the command line.
With the framework, researchers and scientists should be able to
create jobs on their own, place them in the workflow, and run them on
various types of computers.

In addition, developers and users can utilize the built-in OpenAPI 
graphical user interface to manage
workflows between jobs. They can be uploaded as YAML files or individually 
added through the build-in debug framework.

Improvements to this project will include code cleanup and manual development.

## References

A poster based on a pre-alpha version of this code is available as ppt
and PDF file. However, that version is no longer valid and is
superseded by much improved efforts. The code summarized in the
pre-alpha version was mainly used to teach a number of students Python
and how to work in a team

* [Poster Presentation (PPTX)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pptx)
* [Poster Presentation (PDF)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pdf)

Please note also that the poster contains inaccurate statements and
descriptions and should not be used as a reference to this work.

## Acknowledgments

Continued work was in part funded by the NSF CyberTraining: CIC:
CyberTraining for Students and Technologies from Generation Z with the
award numbers 1829704 and 2200409.
We like to thank the following contributors for their help and evaluation in a 
pre-alpha version of the code: Jackson Miskill, Alex Beck, Alison Lu.
We are excited that this effort contributed significantly to their
increased understanding of Python and how to develop in a team using
the Python ecosystem.


