Metadata-Version: 2.1
Name: pyreadstore
Version: 1.1.0
Summary: PyReadStore is the Python client (SDK) for the ReadStore API
Home-page: https://github.com/EvobyteDigitalBiology/pyreadstore
Author: Jonathan Alles
Author-email: Jonathan.Alles@evo-byte.com
License: Apache-2.0 license
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: Unix
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.32.3
Requires-Dist: pydantic>=2.9
Requires-Dist: pandas>=2.2

# PyReadStore SDK

This README describes PyReadStore, the Python client (SDK) for the ReadStore API. 

PyReadStore can be used to access projects, datasets, metadata and attachment files in the ReadStore Database from  Python code. 
The package enables you to automate your bioinformatics pipelines, Python scripts and notebooks.

Check the [ReadStore Github repository](https://github.com/EvobyteDigitalBiology/readstore) for more information on how to get started with ReadStore and setting up your server.

More infos on the [ReadStore website](https://evo-byte.com/readstore/)

Tutorials and Intro Videos: https://www.youtube.com/@evobytedigitalbio

Blog posts and How-Tos: https://evo-byte.com/blog/

For general questions reach out to info@evo-byte.com

Happy analysis :)


## Table of Contents
- [Description](#description)
- [Installation](#installation)
- [Usage](#usage)
    1. [Quickstart](#quickstart)
    2. [Client Config](#client_config)
    3. [Datasets](#access_datasets)
    4. [Project](#access_projects)
    5. [ProData](#access_prodata)
    6. [Download](#download_attach)
    7. [Upload FASTQ](#upload_fastq)
- [Contributing](#contributing)
- [License](#license)
- [Credits and Acknowledgments](#acknowledgments)

## The Lean Solution for Managing FASTQ and NGS Data

ReadStore is a platform for storing, managing, and integrating omics data. It speeds up analysis and offers a simple way of managing and sharing NGS omics datasets, metadata and processed data (**Pro**cessed **Data**).
Built-in project and metadata management structures your workflows and a collaborative user interface enhances teamwork — so you can focus on generating insights.

The integrated Webservice (API) enables your to directly retrieve data from ReadStore via the terminal [Command-Line-Interface (CLI)](https://github.com/EvobyteDigitalBiology/readstore-cli) or [Python](https://github.com/EvobyteDigitalBiology/pyreadstore) / [R](https://github.com/EvobyteDigitalBiology/r-readstore) SDKs.

The ReadStore Basic version provides a local webserver with a simple user management. If you need an organization-wide deployment, advanced user and group management or cloud integration please check the ReadStore Advanced versions and reach out to info@evo-byte.com.

## Description

PyReadStore is a Python client (SDK) that lets you easily connect to your ReadStore server and interact with the ReadStore API.
By importing the pyreadstore package in Python, you can quickly retrieve data from a ReadStore server.

This tool provides streamlined and standardized access to NGS datasets and metadata, helping you run analyses more efficiently and with fewer errors.
You can easily scale your pipelines, and if you need to migrate or move NGS data, updating the ReadStore database ensures all your workflows stay up-to-date.


## Security and Permissions<a id="backup"></a>

**PLEASE READ AND FOLLOW THESE INSTRUCTIONS CAREFULLY!**

### User Accounts and Token<a id="token"></a>

Using PyReadStore requires an active user account and a token (and a running ReadStore server). 

You should **never enter your user account password** when working with PyReadStore.

To retrieve your token:

1. Login to the ReadStore app via your browser
2. Navigate to `Settings` page and click on `Token`
3. You can regenerate your token anytime (`Reset`). This will invalidate the previous token

For uploading FASTQ files your user account needs to have `Staging Permission`.
You can check this in the `Settings` page of your account.
If you not have `Staging Permission`, ask your ReadStore server admin to grant you permission.

### Setting Your Credentials

You need to provide the PyReadStore client with valid ReadStore credentials.

There are different options

1. Load credentials from the ReadStore `config` file. 
The file is generated by the [ReadStore CLI](https://github.com/EvobyteDigitalBiology/readstore-cli),
by default in your home directory (`~/.readstore/`). Make sure to keep read permissions to the file restrictive

2. Directly enter your username and token when instantiating a PyReadStore client within your Python code

3. Set username and token via environment variables (`READSTORE_USERNAME`, `READSTORE_TOKEN`). This is useful in container or cloud environments.


## Installation

`pip3 install pyreadstore`

You can perform the install in a conda or venv virtual environment to simplify package management.

A local install is also possible

`pip3 install --user pyreadstore`

Validate the install with a module import

```python 
import pyreadstore
```

## Usage

Detailed tutorials, videos and explanations are found on [YouTube](https://www.youtube.com/playlist?list=PLk-WMGySW9ySUfZU25NyA5YgzmHQ7yquv) or on the [**EVO**BYTE blog](https://evo-byte.com/blog).

### Quickstart<a id="quickstart"></a>

Let's access some dataset and project data from the ReadStore database!

Make sure a ReadStore server is running and reachable (by default under `127.0.0.1:8000`).
You can enter (`http://127.0.0.1:8000/api_v1/`) in your browser and should get a response from the API.

We assume you ran `readstore configure` before to create a config file for your user.
If not, consult the [ReadStore CLI](https://github.com/EvobyteDigitalBiology/readstore-cli) README on how to set this up.

We will create a client instance and perform some operations to retrieve data from the ReadStore database.
More information on all available methods can be found below.


```python 
import pyreadstore

rs_client = pyreadstore.Client() # Create an instance of the ReadStore client

# Manage Datasets

datasets = rs_client.list()      # List all datasets and return pandas dataframe

datasets_project_1 = rs_client.list(project_id = 1) # List all datasets for project 1

datasets_id_25 = rs_client.get(dataset_id = 25)     # Get detailed data for dataset 25

# Manage Projects

projects = rs_client.list_projects()                # List all projects

projects = rs_client.get_project(project_name = 'MyProject') # Get details for MyProject

fastq_data_id_25 = rs_client.get_fastq(dataset_id = 25)     # Get fastq file data for dataset 25

rs_client.download_attachment(dataset_id = 25,              # Download files attached to dataset 25
                              attachment_name = 'gene_table.tsv') 

# Manage Processed Data

rs_client.upload_pro_data(name = 'sample_1_count_matrix',      # Set name of count matrix
                            pro_data_file = 'path/to/sample_1_counts.h5',   # Set file path
                            data_type = 'count_matrix',                     # Set type to 'count_matrix'
                            dataset_id = 25)                                # Set dataset id for upload

pro_data_project_1 = rs_client.list_pro_data(project_id = 1) # Get all ProData entries for Project 1

pro_data = rs_client.get_pro_data(name = 'sample_1_count_matrix',   # Set name to sample_1_count_matrix
                                dataset_id = 25)                    # dataset_id

pro_data_id = rs_client.delete_pro_data(name = 'sample_1_count_matrix',
                                        dataset_id = 25)

# Ingest FASTQ files

rs_client.upload_fastq(fastq = ['path/to_fastq_r1.fq', 'path/to_fastq_r2.fq'], # Upload a FASTQ files
                        fastq_name = ['sample_rep_1_r1', 'sample_rep_1_r2'],    # Set FASTQ names
                        read_type = ['R1', 'R2'])                               # Set individual FASTQ read types
```


### Configure the Python Client<a id="client_config"></a>

The Client is the central object and provides authentication against the ReadStore API.
By default, the client will try to read the `~/.readstore/config` credentials file.
You can change the directory if your config file is located in another folder.

If you set the `username` and `token` arguments, the client will use these credentials instead.

If your ReadStore server is not running under localhost (`127.0.0.1`) port `8000`, you can adapt the default settings.

```python 
pyreadstore.Client(config_dir: str = '~/.readstore',  # Directory containing ReadStore credentials
                  username: str | None = None,        # Username
                  token : str | None = None,          # Token
                  host: str = 'http://localhost',     # Hostname / IP of ReadStore server
                  return_type: str = 'pandas',        # Default return types, can be pandas or json
                  port: int = 8000,                   # Server Port Number
                  fastq_extensions: List[str] = ['.fastq','.fastq.gz','.fq','.fq.gz']) 
                  # Accepted FASTQ file extensions for upload validation 
```

Is is possible to set userame, token, server endpoint and fastq extensions using the listed environment variables. 
The enironment variables precede over other client configurations.

- `READSTORE_USERNAME` (username)
- `READSTORE_TOKEN` (token)
- `READSTORE_ENDPOINT_URL` (`http://host:post`, e.g. `http://localhost:8000`)
- `READSTORE_FASTQ_EXTENSIONS` (fastq_extensions, `'.fastq',.fastq.gz,.fq,.fq.gz'`)

**Possible errors**

    - Connection Error:     If no ReadStore server was found at the provided endpoint
    - Authentication Error: If provided username or token are not found
    - No Permission to Upload/Delete FASTQ/ProData: User has no [Staging Permissions]

### Access Datasets<a id="access_datasets"></a>

```python 
# List ReadStore Datasets

rs_client.list(project_id: int | None = None,   # Filter datasets for project with id `project_id`
              project_name: str | None = None,  # Filter datasets for project with name `project_name`
               return_type: str | None = None   # Return pd.DataFrame or JSON type
               ) -> pd.DataFrame | List[dict]

# Get ReadStore Dataset Details
# Provide dataset_id OR dataset_name

rs_client.get(dataset_id: int| None = None,     # Get dataset with id `dataset_id`
              dataset_name: str | None = None,  # Filter datasets with name `dataset_name`
              return_type: str | None = None    # Return pd.Series or json(dict)
              ) -> pd.Series | dict

# Get FASTQ file data for a dataset
# Provide dataset_id OR dataset_name

rs_client.get_fastq(dataset_id: int| None = None,    # Get fastq data for dataset with id `dataset_id`
                  dataset_name: str | None = None,   # Get fastq data for dataset `dataset_name`
                  return_type: str | None = None     # Return pd.Series or json(dict)
                  ) -> pd.DataFrame | List[dict]
```


### Access Projects<a id="access_projects"></a>

```python 
# List ReadStore Projects

rs_client.list_projects(return_type: str | None = None   # Return pd.DataFrame or JSON type
                        ) -> pd.DataFrame | List[dict]

# Get ReadStore Project Details
# Provide project_id OR project_name

rs_client.get_project(project_id: int| None = None,     # Get dataset with id `project_id`
                      project_name: str | None = None,  # Filter datasets with name `project_name`
                      return_type: str | None = None    # Return pd.Series or json(dict)
                      ) -> pd.Series | dict
```

### Access **Pro**cessed **Data**<a id="access_prodata"></a>

```python 
# Upload Processed Data

rs_client.upload_pro_data(name: str,                # Name of ProData
                        pro_data_file: str,         # Set ProData file path
                        data_type: str,             # Set ProData data type
                        description: str = '',      # Description for ProData
                        metadata: dict = {},        # MetaData
                        dataset_id: int | None = None,  # Dataset ID to assign ProData to
                        dataset_name: str | None = None)# Dataset Name to assign ProData to

# Must provide dataset_id or dataset_name

# List and filter Processed Data

rs_client.list_pro_data(project_id: int | None = None,      # Filter by Project ID
                        project_name: str | None = None,    # Filter by Project Name
                        dataset_id: int | None = None,      # Filter by Dataset ID
                        dataset_name: str | None = None,    # Filter by Dataset Name
                        name: str | None = None,            # Filter by ProData name
                        data_type: str | None = None,       # Filter by ProData data type
                        include_archived: bool = False,     # Include archived
                        return_type: str | None = None) -> pd.DataFrame | List[dict]

# Get individual ProData entry

rs_client.get_pro_data(pro_data_id: int | None = None,  # Get ProData by ID
                        dataset_id: int | None = None,  # Get ProData by Dataset ID
                        dataset_name: str | None = None, # Get ProData by Dataset Name
                        name: str | None = None,        # Get ProData by Name ID
                        version: int | None = None,     # Get specific verion, None returns latest valid version
                        return_type: str | None = None) -> pd.Series | dict

# Provide ID or Name + Dataset ID/Name

# Delete ProData entry

rs_client.delete_pro_data(pro_data_id: int | None = None,   # Delete by ProData ID
                        dataset_id: int | None = None,      # Delete by Dataset ID
                        dataset_name: str | None = None,    # Delete by Dataset Name
                        name: str | None = None,            # Delete by name
                        version: int | None = None):        # Delete specific version

# Provide ID or Name + Dataset ID/Name for delete
```

### Download Attachments<a id="download_attach"></a>

```python 
# Download project attachment file from ReadStore Database 

rs_client.download_project_attachment(attachment_name: str,            # name of attachment file
                                      project_id: int | None = None,   # project id with attachment
                                      project_name: str | None = None, # project name with attachment
                                      outpath: str | None = None)      # Path to download file to

# Download dataset attachment file from ReadStore Database 

rs_client.download_attachment(attachment_name: str,             # name of attachment file
                              dataset_id: int | None = None,    # datatset id with attachment
                              dataset_name: str | None = None,  # datatset name with attachment
                              outpath: str | None = None)       # Path to download file to
```

### Upload FASTQ files<a id="upload_fastq"></a>

Upload FASTQ files to ReadStore server. The methods checks if the FASTQ files exist and end with valid FASTQ ending.

```python 
# Upload FASTQ files to ReadStore 

rs_client.upload_fastq(fastq : List[str] | str)  # Path of FASTQ files to upload
```

## Contributing

Contributions make this project better! Whether you want to report a bug, improve documentation, or add new features, any help is welcomed!

### How You Can Help
- Report Bugs
- Suggest Features
- Improve Documentation
- Code Contributions

### Contribution Workflow
1. Fork the repository and create a new branch for each contribution.
2. Write clear, concise commit messages.
3. Submit a pull request and wait for review.

Thank you for helping make this project better!

## License

The pyreadstore is licensed under an Apache 2.0 Open Source License.
See the LICENSE file for more information.

## Credits and Acknowledgments<a id="acknowledgments"></a>

pyreadstore is built upon the following open-source python packages and would like to thank all contributing authors, developers and partners.

- Python (https://www.djangoproject.com/)
- requests (https://requests.readthedocs.io/en/latest/)
- pydantic (https://docs.pydantic.dev/latest/)
- pandas (https://pandas.pydata.org/)
