Metadata-Version: 2.4
Name: folderops
Version: 0.1.1
Summary: Python utilities for dataset organization and preprocessing
Author: Ahamed
License: MIT
Keywords: dataset,images,ml,data-preprocessing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# folderops

`folderops` is a lightweight Python package for common dataset organization tasks in machine learning workflows. It is designed for direct use inside notebooks, research code, and training scripts through clean Python imports.

## Features

- Split image datasets into train, validation, and test folders
- Merge multiple image folders into a single directory
- Organize images into class folders using a CSV label file
- Create nested folder structures from Python dictionaries
- Support common image formats such as `.jpg`, `.jpeg`, `.png`, `.bmp`, `.gif`, `.tif`, `.tiff`, and `.webp`

## Installation

```bash
pip install folderops
```

For local development:

```bash
pip install -e .
```

## Quick Start

```python
from folderops import split_dataset, merge_folders, organize_by_labels, create_structure

split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
)

merge_folders(
    folders=["dataset1/images", "dataset2/images", "dataset3/images"],
    output="merged_images",
)

organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized_dataset",
)

structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}
create_structure(structure)
```

## Public API

### `split_dataset`

Split images from one folder into `train`, `val`, and `test` subdirectories.

```python
split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
    mode="copy",
    extensions=(".jpg", ".png"),
)
```

Key behavior:

- Shuffles files before splitting
- Supports deterministic splits with a random seed
- Supports `copy` and `move` modes
- Validates that split ratios sum to `1.0`

### `merge_folders`

Merge images from multiple folders into one output directory.

```python
merge_folders(
    folders=["dataset1/images", "dataset2/images"],
    output="merged_images",
    mode="copy",
)
```

Key behavior:

- Avoids overwriting duplicate filenames
- Automatically renames duplicates like `image_1.jpg`, `image_2.jpg`
- Supports `copy` and `move` modes

### `organize_by_labels`

Organize images into class folders using a CSV file with `filename,label` rows.

Example `labels.csv`:

```csv
img1.jpg,cat
img2.jpg,dog
img3.jpg,cat
```

Usage:

```python
organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized_dataset",
    mode="copy",
)
```

Key behavior:

- Creates class folders automatically
- Validates image existence before transfer
- Supports configurable CSV delimiter

### `create_structure`

Create nested directories recursively from a dictionary.

```python
structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}

create_structure(structure)
create_structure(structure, root="project_data")
```

## Project Layout

```text
folderops/
├── folderops/
│   ├── __init__.py
│   ├── merger.py
│   ├── organizer.py
│   ├── splitter.py
│   ├── structure.py
│   └── utils.py
├── LICENSE
├── pyproject.toml
└── README.md
```

## Build and Publish

Build the package:

```bash
python -m build
```

Upload to PyPI:

```bash
twine upload dist/*
```

## Development Notes

- Python 3.8+
- No CLI dependency
- Intended for import-based use only
- Uses standard library modules only

## License

MIT License
