Metadata-Version: 2.1
Name: scsims
Version: 3.0.2
Summary: Scalable, Interpretable Deep Learning for Single-Cell RNA-seq Classification
Home-page: https://github.com/braingeneers/sims
Author: Julian Lehrer
Author-email: jmlehrer@ucsc.edu
License: MIT license
Keywords: scsims
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp (==3.8.4)
Requires-Dist: aiosignal (==1.3.1)
Requires-Dist: anndata (==0.9.1)
Requires-Dist: anyio (==3.6.2)
Requires-Dist: appdirs (==1.4.4)
Requires-Dist: arrow (==1.2.3)
Requires-Dist: async-timeout (==4.0.2)
Requires-Dist: attrs (==23.1.0)
Requires-Dist: beautifulsoup4 (==4.12.2)
Requires-Dist: blessed (==1.20.0)
Requires-Dist: boto3 (==1.26.130)
Requires-Dist: botocore (==1.29.130)
Requires-Dist: certifi (==2023.5.7)
Requires-Dist: charset-normalizer (==3.1.0)
Requires-Dist: click (==8.1.3)
Requires-Dist: contourpy (==1.0.7)
Requires-Dist: croniter (==1.3.14)
Requires-Dist: cycler (==0.11.0)
Requires-Dist: dateutils (==0.6.12)
Requires-Dist: deepdiff (==6.3.0)
Requires-Dist: docker-pycreds (==0.4.0)
Requires-Dist: fastapi (==0.88.0)
Requires-Dist: fonttools (==4.39.3)
Requires-Dist: fortran-language-server (==1.12.0)
Requires-Dist: frozenlist (==1.3.3)
Requires-Dist: fsspec (==2023.5.0)
Requires-Dist: gitdb (==4.0.10)
Requires-Dist: GitPython (==3.1.31)
Requires-Dist: h11 (==0.14.0)
Requires-Dist: h5py (==3.8.0)
Requires-Dist: idna (==3.4)
Requires-Dist: importlib-resources (==5.12.0)
Requires-Dist: inquirer (==3.1.3)
Requires-Dist: itsdangerous (==2.1.2)
Requires-Dist: Jinja2 (==3.1.2)
Requires-Dist: jmespath (==1.0.1)
Requires-Dist: joblib (==1.2.0)
Requires-Dist: kiwisolver (==1.4.4)
Requires-Dist: lightning (==2.0.2)
Requires-Dist: lightning-cloud (==0.5.34)
Requires-Dist: lightning-utilities (==0.8.0)
Requires-Dist: llvmlite (==0.40.0)
Requires-Dist: markdown-it-py (==2.2.0)
Requires-Dist: MarkupSafe (==2.1.2)
Requires-Dist: matplotlib (==3.7.1)
Requires-Dist: mdurl (==0.1.2)
Requires-Dist: multidict (==6.0.4)
Requires-Dist: natsort (==8.3.1)
Requires-Dist: networkx (==3.1)
Requires-Dist: numba (==0.57.0)
Requires-Dist: numpy (==1.24.3)
Requires-Dist: ordered-set (==4.1.0)
Requires-Dist: packaging (==23.1)
Requires-Dist: pandas (==2.0.1)
Requires-Dist: pathtools (==0.1.2)
Requires-Dist: patsy (==0.5.3)
Requires-Dist: Pillow (==9.5.0)
Requires-Dist: protobuf (==4.23.0)
Requires-Dist: psutil (==5.9.5)
Requires-Dist: pydantic (==1.10.7)
Requires-Dist: Pygments (==2.15.1)
Requires-Dist: PyJWT (==2.6.0)
Requires-Dist: pynndescent (==0.5.10)
Requires-Dist: pyparsing (==3.0.9)
Requires-Dist: python-dateutil (==2.8.2)
Requires-Dist: python-editor (==1.0.4)
Requires-Dist: python-multipart (==0.0.6)
Requires-Dist: pytorch-lightning (==2.0.2)
Requires-Dist: pytorch-tabnet (==4.0)
Requires-Dist: pytz (==2023.3)
Requires-Dist: PyYAML (==6.0)
Requires-Dist: readchar (==4.0.5)
Requires-Dist: requests (==2.30.0)
Requires-Dist: rich (==13.3.5)
Requires-Dist: s3transfer (==0.6.1)
Requires-Dist: scanpy (==1.9.3)
Requires-Dist: scikit-learn (==1.2.2)
Requires-Dist: scipy (==1.10.1)
Requires-Dist: seaborn (==0.12.2)
Requires-Dist: sentry-sdk (==1.22.2)
Requires-Dist: session-info (==1.0.0)
Requires-Dist: setproctitle (==1.3.2)
Requires-Dist: six (==1.16.0)
Requires-Dist: smmap (==5.0.0)
Requires-Dist: sniffio (==1.3.0)
Requires-Dist: soupsieve (==2.4.1)
Requires-Dist: starlette (==0.22.0)
Requires-Dist: starsessions (==1.3.0)
Requires-Dist: statsmodels (==0.14.0)
Requires-Dist: stdlib-list (==0.8.0)
Requires-Dist: threadpoolctl (==3.1.0)
Requires-Dist: torch (==1.13.1)
Requires-Dist: torchmetrics (==0.11.4)
Requires-Dist: tqdm (==4.65.0)
Requires-Dist: traitlets (==5.9.0)
Requires-Dist: typing-extensions (==4.5.0)
Requires-Dist: tzdata (==2023.3)
Requires-Dist: umap-learn (==0.5.3)
Requires-Dist: urllib3 (==1.26.15)
Requires-Dist: uvicorn (==0.22.0)
Requires-Dist: wandb (==0.15.2)
Requires-Dist: wcwidth (==0.2.6)
Requires-Dist: websocket-client (==1.5.1)
Requires-Dist: websockets (==11.0.3)
Requires-Dist: yarl (==1.9.2)
Requires-Dist: zipp (==3.15.0)

# **SIMS**: Scalable, Interpretable Modeling for Single-Cell RNA-Seq Data Classification

SIMS is a pipeline for building interpretable and accurate classifiers for identifying any target on single-cell rna-seq data. The SIMS model is based on [a sequential transformer](https://arxiv.org/abs/1908.07442), a transformer model specifically built for large-scale tabular datasets.

SIMS takes in a list of arbitrarily many expression matrices along with their corresponding target variables. We assume the matrix form `cell x gene`, and NOT `gene x cell`, since our training samples are the transcriptomes of individual cells.

The code is run with `python`. To use the package, we recommend using a virtual environment such as [miniconda](https://docs.conda.io/en/latest/miniconda.html) which will allow you to install packages without harming your system `python`.  

## Installation
If using conda, run 
1. Create a new virtual environment with `conda create --name=<NAME> python=3.9`
2. Enter into your virtual environment with `conda activate NAME`

Otherwise, enter your virtual environment of choice and
1. Install the SIMS package with `pip install --use-pep517 git+https://github.com/braingeneers/SIMS.git`
2. Set up the model training code in a `MYFILE.py` file, and run it with `python MYFILE.py`. A tutorial on how to set up training code is shown below.

## Training and inference
To train a model, we can set up a SIMS class in the following way:

```python 
from scsims import SIMS
from pytorch_lightning.loggers import WandbLogger
logger = WandbLogger(offline=True)

sims = SIMS(data=['my/data/file.h5ad'], class_label='class_label')
sims.setup_trainer(accelerator="gpu", devices=1, logger=logger)
sims.train()
```

This will automatically load in your `.h5ad` file, where the `class_label` is assumed to be a valid column in the `.obs` attribute. Alternatively, if your labels are stored in a separate csv, you may also initialize the class like
```python
sims = SIMS(data=['my/data/file.h5ad'], labelfiles=['my/label/file.csv'], class_label='class_label')
sims.train()
```

This will set up the underlying dataloaders, model, model checkpointing, and everything else we need. Model checkpoints will be saved every training epoch. 

To load in a model to infer new cell types on an unlabeled dataset, we load in the model checkpoint, point to the label file that we originally trained on, and run the `predict` method on new data.

```python
sims = SIMS(weights_path='myawesomemodel.ckpt', labelfiles=['my/label/file.csv'], class_label='class_label')

cell_predictions = sims.predict('my/new/unlabeled.h5ad')
```

Finally, to look at the explainability of the model, we similarly run 
```python
explainability_matrix = sims.explain('my/new/unlabeled.h5ad') # this can also be labeled data, of course 
```

## Custom training jobs / logging
To customize the underlying `pl.Trainer` and SIMS model params, we can initialize the SIMS model like 
```python 
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from scsims import SIMS

wandb_logger = WandbLogger(project=f"My Project", name=f"SIMS Model Training") # set up the logger to log data to Weights and Biases

sims = SIMS(data=adata, class_label='class_label')
sims.setup_model(n_a=64, n_d=64, weights=sims.weights)  # weighting loss inversely proportional by label freq, helps learn rare cell types (recommended)
sims.setup_trainer(
    logger=wandb_logger,
    callbacks=[
        EarlyStopping(
            monitor="val_loss",
            patience=50,
        ),
        LearningRateMonitor(logging_interval="epoch"),
    ],
    num_epochs=100,
)
sims.train()
```
This will train the SIMS model on the given expression matrices with target variable given by the `class_label` column in each label file.

## Using SIMS inside github codespaces
If you are using SIMS only for predictions using an already trained model, github codespaces is the recommended way to use this tool. You can also use this pipeline to train it in smaller datasets as the computing services offered in codespaces are modest.
To use this tool in github codespaces start by forking the repo in your github account. Then create a new codespace with the SIMS repo as the Repository of choice.
Once inside the newly created environment pull the latest SIMS image:
```docker
docker pull jmlehrer/sims:latest
```
Run the docker container mounting the file folder containing datasets and model checkpoints to the filesystem:
```docker
docker run -it -v /path/to/local/folder:/path/in/container [image_name] /bin/bash
```
Run main.py to check if the installation has been completed. You can alter this file as shown above to perform the different tasks.
```bash
python main.py
```


