Metadata-Version: 2.4
Name: datasety
Version: 0.11.0
Summary: CLI tool for dataset preparation: resize, caption, and synthetic image generation
Project-URL: Homepage, https://github.com/kontextox/datasety
Project-URL: Repository, https://github.com/kontextox/datasety
Project-URL: Issues, https://github.com/kontextox/datasety/issues
Author: kontextox
License-Expression: MIT
License-File: LICENSE
Keywords: captioning,cli,dataset,diffusers,florence-2,image-editing,image-processing,machine-learning,synthetic
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.10
Requires-Dist: pillow>=9.0.0
Provides-Extra: all
Requires-Dist: accelerate; extra == 'all'
Requires-Dist: diffusers>=0.32.0; extra == 'all'
Requires-Dist: einops; extra == 'all'
Requires-Dist: sentencepiece; extra == 'all'
Requires-Dist: timm; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: transformers>=4.38.0; extra == 'all'
Provides-Extra: caption
Requires-Dist: einops; extra == 'caption'
Requires-Dist: timm; extra == 'caption'
Requires-Dist: torch>=2.0.0; extra == 'caption'
Requires-Dist: transformers>=4.38.0; extra == 'caption'
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: synthetic
Requires-Dist: accelerate; extra == 'synthetic'
Requires-Dist: diffusers>=0.32.0; extra == 'synthetic'
Requires-Dist: sentencepiece; extra == 'synthetic'
Requires-Dist: torch>=2.0.0; extra == 'synthetic'
Requires-Dist: transformers>=4.38.0; extra == 'synthetic'
Description-Content-Type: text/markdown

# datasety

CLI tool for dataset preparation: resize, caption, and synthetic image generation.

## Installation

```bash
pip install datasety
```

Install with specific features:

```bash
pip install datasety[caption]     # Florence-2 captioning
pip install datasety[synthetic]   # Qwen image editing
pip install datasety[all]         # All features
```

## Usage

### Resize Images

Resize and crop images to a target resolution:

```bash
datasety resize --input ./images --output ./resized --resolution 768x1024
```

**Options:**

| Option                  | Description                                               | Default             |
| ----------------------- | --------------------------------------------------------- | ------------------- |
| `--input`, `-i`         | Input directory                                           | (required)          |
| `--output`, `-o`        | Output directory                                          | (required)          |
| `--resolution`, `-r`    | Target resolution (WIDTHxHEIGHT)                          | (required)          |
| `--crop-position`       | Crop position: `top`, `center`, `bottom`, `left`, `right` | `center`            |
| `--input-format`        | Comma-separated formats                                   | `jpg,jpeg,png,webp` |
| `--output-format`       | Output format: `jpg`, `png`, `webp`                       | `jpg`               |
| `--output-name-numbers` | Rename files to 1.jpg, 2.jpg, ...                         | `false`             |

**Example:**

```bash
datasety resize \
    --input ./raw_photos \
    --output ./dataset \
    --resolution 1024x1024 \
    --crop-position top \
    --output-format jpg \
    --output-name-numbers
```

**How it works:**

1. Finds all images matching input formats
2. Skips images where either dimension is smaller than target
3. Resizes proportionally so the smaller side matches target
4. Crops from the specified position to exact dimensions
5. Saves with high quality (95% for jpg/webp)

### Generate Captions

Generate captions for images using Microsoft's Florence-2 model:

```bash
datasety caption --input ./images --output ./captions --florence-2-large
```

**Options:**

| Option               | Description                                        | Default                   |
| -------------------- | -------------------------------------------------- | ------------------------- |
| `--input`, `-i`      | Input directory                                    | (required)                |
| `--output`, `-o`     | Output directory for .txt files                    | (required)                |
| `--device`           | `cpu` or `cuda`                                    | `cpu`                     |
| `--trigger-word`     | Text to prepend to captions                        | (none)                    |
| `--prompt`           | Florence-2 task prompt                             | `<MORE_DETAILED_CAPTION>` |
| `--model`            | Any HuggingFace model (overrides base/large flags) | (none)                    |
| `--num-beams`        | Beam search width (1 = greedy)                     | `3`                       |
| `--florence-2-base`  | Use base model (0.23B, faster)                     |                           |
| `--florence-2-large` | Use large model (0.77B, better)                    | (default)                 |

**Available prompts:**

- `<CAPTION>` - Brief caption
- `<DETAILED_CAPTION>` - Detailed caption
- `<MORE_DETAILED_CAPTION>` - Most detailed caption (default)

**Examples:**

```bash
datasety caption \
    --input ./dataset \
    --output ./dataset \
    --device cuda \
    --trigger-word "photo of sks person," \
    --florence-2-large

# Use a custom model
datasety caption \
    --input ./dataset \
    --output ./dataset \
    --device cuda \
    --model "microsoft/Florence-2-large"
```

This creates a `.txt` file for each image with the generated caption.

### Generate Synthetic Images

Generate synthetic variations of images using Qwen-Image-Edit:

```bash
datasety synthetic --input ./images --output ./synthetic --prompt "add a winter hat"
```

**Options:**

| Option              | Description                         | Default                     |
| ------------------- | ----------------------------------- | --------------------------- |
| `--input`, `-i`     | Input directory                     | (required)                  |
| `--output`, `-o`    | Output directory                    | (required)                  |
| `--prompt`, `-p`    | Edit prompt                         | (required)                  |
| `--model`           | Model to use                        | `Qwen/Qwen-Image-Edit-2511` |
| `--device`          | `cpu` or `cuda`                     | `cuda`                      |
| `--steps`           | Number of inference steps           | `40`                        |
| `--cfg-scale`       | Guidance scale                      | `1.0`                       |
| `--true-cfg-scale`  | True CFG scale                      | `4.0`                       |
| `--negative-prompt` | Negative prompt                     | `" "`                       |
| `--num-images`      | Images to generate per input        | `1`                         |
| `--seed`            | Random seed for reproducibility     | (random)                    |
| `--output-format`   | Output format: `png`, `jpg`, `webp` | `png`                       |

**Examples:**

```bash
datasety synthetic \
    --input ./dataset \
    --output ./synthetic \
    --prompt "add sunglasses to the person, keep everything else the same" \
    --device cuda \
    --steps 40 \
    --true-cfg-scale 4.0 \
    --seed 42

# Use a fine-tuned model with fewer steps
datasety synthetic \
    --input ./dataset \
    --output ./synthetic \
    --model "Phr00t/Qwen-Image-Edit-Rapid-AIO" \
    --prompt "add a winter hat" \
    --steps 4 \
    --output-format jpg
```

## Common Workflows

### Prepare a LoRA Training Dataset

```bash
# 1. Resize images to 1024x1024
datasety resize -i ./raw -o ./dataset -r 1024x1024 --crop-position center

# 2. Generate captions with trigger word
datasety caption -i ./dataset -o ./dataset --trigger-word "[trigger]" --device cuda
```

### Augment Dataset with Synthetic Variations

```bash
# Generate variations with different accessories
datasety synthetic \
    -i ./dataset \
    -o ./synthetic \
    --prompt "add a red scarf" \
    --num-images 2 \
    --device cuda
```

### Batch Process with Numbered Files

```bash
datasety resize \
    -i ./photos \
    -o ./processed \
    -r 768x1024 \
    --output-name-numbers \
    --crop-position top
```

## Requirements

- Python 3.10+
- Pillow (for resize)
- PyTorch + Transformers (for caption: `pip install datasety[caption]`)
- PyTorch + Diffusers (for synthetic: `pip install datasety[synthetic]`)

## License

MIT
