Metadata-Version: 2.1
Name: lssm
Version: 0.1.0
Summary: Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
Home-page: https://github.com/franckalbinet/lssm
Author: Franck Albinet
Author-email: franckalbinet@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: scikit-learn
Requires-Dist: seaborn
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: torchaudio
Requires-Dist: timm
Requires-Dist: torcheval
Requires-Dist: fastprogress
Requires-Dist: fastdownload
Requires-Dist: palettable
Provides-Extra: dev

# Large Soil Spectral Models (LSSM)


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This is a Python package allowing to reproduce the research work done by
[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the
context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled
**“Multiscale Characterization of Exchangeable Potassium Content in Soil
to Remediate Agricultural Land Affected by Radioactive Contamination
using Machine Learning, Soil Spectroscopy and Remote Sensing”**.

**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,
Dercon, G., 2022. Prediction of exchangeable potassium in soil through
mid-infrared spectroscopy and deep learning: From prediction to
explainability. Artificial Intelligence in Agriculture 6,
230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)
investigated the possibility to predict exchangeable potassium in soil
using large Mid-infrared soil spectral libraries and Deep Learning. Code
available [here](https://github.com/franckalbinet/mirzai).

We are now **exploring the potential to characterize and predict
exchangeable potassium using both Near- and Mid-infrared soil
spectroscopy, with a focus on leveraging advanced Deep Learning models
such as ResNet and ViT transformers through transfer learning**.

*Our Deep Learning pipeline is primarily based on the approach described
by [Jeremy Howard](https://github.com/fastai/course22p2)*.

## Install

``` sh
pip install lssm
```

## Getting started

We demonstrate a typical workflow below to showcase our method.

``` python
from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
                            DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
```

### Loading training & validation data

1.  Load model from `timm` python package, Deep Learning
    State-Of-The-Art (SOTA) pre-trained models:

``` python
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
```

2.  Automatically download large spectral libraries developed by our
    colleagues at [WCRC](https://www.woodwellclimate.org). We focus on
    exchangeable potassium in the example below:

``` python
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
```

    Reading & selecting data ...

3.  A bit of data features and target preprocessing:

``` python
X = Pipeline([('to_abs', ToAbsorbance()), 
              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)
```

    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]

4.  Typical train/test split to get a train and valid dataset:

``` python
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 
                                                      test_size=0.1,
                                                      stratify=ds_name[:n_smp], 
                                                      random_state=41)
```

5.  Finally, creating a custom PyTorch `DataLoader`:

``` python
train_ds, valid_ds = [SpectralDataset(X, y, ) 
                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
```

### Training

``` python
epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
#    - put to GPU
#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
#    - resize the 2D version
#    - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 
       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 
                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)
```

<style>
    /* Turns off some styling */
    progress {
        /* gets rid of default border in Firefox and Opera. */
        border: none;
        /* Needs to be in here for Safari polyfill so background images work as expected. */
        background-size: auto;
    }
    progress:not([value]), progress:not([value])::-webkit-progress-bar {
        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
    }
    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
        background: #F44336;
    }
</style>

    <div>
      <progress value='0' class='' max='1' style='width:300px; height:20px; vertical-align: middle;'></progress>
      0.00% [0/1 00:00&lt;?]
    </div>
    &#10;
&#10;    <div>
      <progress value='55' class='' max='1252' style='width:300px; height:20px; vertical-align: middle;'></progress>
      4.39% [55/1252 00:23&lt;08:42 0.084]
    </div>
    
