Metadata-Version: 2.4
Name: rschip
Version: 0.4.2
Summary: Prepare satellite images and training data for use with deep learning models
Author-email: Tom Wilson <thomaswilson81@gmail.com>
License: MIT
Keywords: satellite,deep learning,tiling,segmentation,geospatial
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rasterio
Requires-Dist: numpy
Requires-Dist: geopandas
Requires-Dist: shapely
Requires-Dist: tqdm
Requires-Dist: pandas
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Provides-Extra: lint
Requires-Dist: flake8; extra == "lint"
Requires-Dist: black; extra == "lint"
Dynamic: license-file

# rschip
![PyPI version](https://img.shields.io/pypi/v/rschip)
![License](https://img.shields.io/github/license/tomwilsonsco/rs-chip)
![Build Status](https://img.shields.io/github/actions/workflow/status/tomwilsonsco/rs-chip/main.yml?branch=main)
![codecov](https://codecov.io/github/tomwilsonsco/rs-chip/branch/main/graph/badge.svg?token=W27NY55T4B)

Split satellite images into smaller fixed-sized tiles, for input into convolutional neural networks (cnn), or vision 
transformers (ViT) such as [Segment Anything](https://arxiv.org/abs/2304.02643).

## Features

- **Tile Satellite Images**: Split large satellite images into smaller chips of specified dimensions. Can min-max normalise 
  or standard scale before writing chips as required.
- **Mask Segmentation**: Generate segmentation mask images from geopackage or shapefile features for supervised 
  segmentation, e.g using [U-Net](https://arxiv.org/abs/1505.04597).
- **Check Background Chips**: Identify image chips containing only background. Useful for when preparing training 
  and testing datasets.

## Installation

Install rschip with pip:

```bash
pip install rschip
```

Requires `rasterio`, `numpy`, `geopandas`, and `shapely`.

## Usage

### 1. ImageChip Class
The `ImageChip` class provides functionality for creating tiles (also known as chips) from large satellite images.

```python
from rschip import ImageChip

# Initialize the ImageChip instance for 128 by 128 tiles
image_chipper = ImageChip(
    input_image_path="path/to/large_image.tif",
    output_path="path/to/output_directory_image",
    pixel_dimensions=128,
    offset=64,
)

# set a min max normaliser 
# e.g for 16 bit Sentinel 2 RGB might use
image_chipper.set_normaliser(min_val=500, max_val=3000)

# Generate chips
image_chipper.chip_image()
```
Each resulting tile is named using a suffix that represents the bottom left `(x, y)`
pixel coordinate position. By default, the prefix of each tile name is taken from the input image file name 
(`input_image_path`), unless you specify `output_name`.

Using the parameter `use_multiprocessing=True` (default) makes chipping process faster by using multiple cores. 

### 2. SegmentationMask Class
The `SegmentationMask` class is used to create a segmentation mask images from geopackage or shapefile using an input image as extent and pixel size reference.

Once the segmentation mask has been created, the segmentation image can also be split into tiles. Some deep learning 
frameworks expect images and corresponding masks to have the same file name in separate directories. The `output_name` argument of ImageChip can ensure this is the case.

```python
from rschip import SegmentationMask, ImageChip

# Initialize the SegmentationMask
seg_mask = SegmentationMask(
    input_image_path="path/to/large_image.tif",
    input_features_path="path/to/geopackage_features.gpkg",
    output_path="path/to/output_mask.tif",
    class_field="ml_class"
)

# Generate segmentation mask image
seg_mask.create_mask()

# Chip the segmentation image to match satellite image
image_chipper = ImageChip(
    input_image_path="path/to/output_mask.tif",
    output_path="path/to/output_directory_mask",
    output_name="large_image",
    pixel_dimensions=128,
    offset=64,
)
image_chipper.chip_image()
```

### 3. CheckBackgroundOnly Class
The `CheckBackgroundOnly` class provides functionality to list image chips that contain only background. Filtering out images only containing background helps to prepare a dataset more suitable for training models.
 
```python
from rschip import CheckBackgroundOnly

# Initialize the CheckBackgroundOnly instance
checker = checkBackgroundOnly(background_val=0, non_background_min=1)

# Find chips with only background
checker.check_background_chips(
    class_chips_dir="path/to/mask_directory",
    image_chips_dir="path/to/image_directory"
)
```
The default assumption is that image and mask equivalent have the same file names as shown in example 2. above. If that is
not the case, use the `masks_prefix`, `images_prefix` arguments which are prefix strings which are considered on checking for
image to mask equivalent using the bottom left (x,y) indices found in the outputs generated by `ImageChip.create_chips()`.

### 4. Create training, validation, test dataset
The `DatasetSplitter` class can be used to split a directory of image chips into a training, a validation, and (optionally) a test set. The process creates a dataset directory as follows:

```
final_dataset_output/
└── dataset/
    ├── images/
    │   ├── train/
    │   │   ├── image_chip_1.tif
    │   │   └── ...
    │   ├── val/
    │   │   ├── image_chip_2.tif
    │   │   └── ...
    │   └── test/
    │       ├── image_chip_3.tif
    │       └── ...
    └── masks/
        ├── train/
        │   ├── image_chip_1.tif
        │   └── ...
        ├── val/
        │   ├── image_chip_2.tif
        │   └── ...
        └── test/
            ├── image_chip_3.tif
            └── ...
```

By default this process uses the `CheckBackgroundOnly` to first find chips with only background and not use those in the list of images to split. This means you can skip straight to this step without explicitly running the `CheckBackgroundOnly` process first if you wish.

```python
from rschip import DatasetSplitter

# initialize the DatasetSplitter
splitter = DatasetSplitter(
    image_dir="path/to/image_chips",
    mask_dir="path/to/mask_chips",
    output_dir="path/to/final_dataset_output",
    train_ratio=0.7,
    val_ratio=0.2,
    test_ratio=0.1,
    seed=42,
    filter_background_only=True,
)

#  the split
splitter.split()

```

## License
This project is licensed under the MIT License - see the LICENSE file for details.
