Metadata-Version: 2.4
Name: datasety
Version: 0.39.0
Summary: CLI tool for dataset preparation: resize, align, caption, shuffle, synthetic, mask, filter, degrade, and character generation.
Project-URL: Homepage, https://github.com/kontextox/datasety
Project-URL: Repository, https://github.com/kontextox/datasety
Project-URL: Issues, https://github.com/kontextox/datasety/issues
Author: kontextox
License-Expression: MIT
License-File: LICENSE
Keywords: captioning,character,cli,dataset,degradation,diffusers,florence-2,image-editing,image-processing,machine-learning,masking,segmentation,synthetic,upscaling
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: >=3.10
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: pillow>=9.0.0
Provides-Extra: all
Requires-Dist: accelerate; extra == 'all'
Requires-Dist: demucs>=4.0.1; extra == 'all'
Requires-Dist: diffusers>=0.32.0; extra == 'all'
Requires-Dist: einops; extra == 'all'
Requires-Dist: faster-whisper>=1.2.1; extra == 'all'
Requires-Dist: nemo-text-processing>=1.1.0; extra == 'all'
Requires-Dist: num2words>=0.5.14; extra == 'all'
Requires-Dist: numpy>=1.20.0; extra == 'all'
Requires-Dist: peft>=0.6.0; extra == 'all'
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: safetensors; extra == 'all'
Requires-Dist: sam2>=1.0; extra == 'all'
Requires-Dist: sentencepiece; extra == 'all'
Requires-Dist: soundfile>=0.13.1; extra == 'all'
Requires-Dist: timm; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: transformers>=4.38.0; extra == 'all'
Requires-Dist: transformers>=4.45.0; extra == 'all'
Requires-Dist: yt-dlp>=2026.3.17; extra == 'all'
Provides-Extra: audio
Requires-Dist: demucs>=4.0.1; extra == 'audio'
Requires-Dist: faster-whisper>=1.2.1; extra == 'audio'
Requires-Dist: nemo-text-processing>=1.1.0; extra == 'audio'
Requires-Dist: num2words>=0.5.14; extra == 'audio'
Requires-Dist: soundfile>=0.13.1; extra == 'audio'
Requires-Dist: yt-dlp>=2026.3.17; extra == 'audio'
Provides-Extra: caption
Requires-Dist: einops; extra == 'caption'
Requires-Dist: numpy>=1.20.0; extra == 'caption'
Requires-Dist: timm; extra == 'caption'
Requires-Dist: torch>=2.0.0; extra == 'caption'
Requires-Dist: transformers>=4.38.0; extra == 'caption'
Provides-Extra: character
Requires-Dist: accelerate; extra == 'character'
Requires-Dist: diffusers>=0.32.0; extra == 'character'
Requires-Dist: safetensors; extra == 'character'
Requires-Dist: torch>=2.0.0; extra == 'character'
Requires-Dist: transformers>=4.38.0; extra == 'character'
Provides-Extra: degrade
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: filter
Requires-Dist: torch>=2.0.0; extra == 'filter'
Requires-Dist: transformers>=4.38.0; extra == 'filter'
Provides-Extra: mask
Requires-Dist: numpy>=1.20.0; extra == 'mask'
Requires-Dist: sam2>=1.0; extra == 'mask'
Requires-Dist: torch>=2.0.0; extra == 'mask'
Requires-Dist: transformers>=4.45.0; extra == 'mask'
Provides-Extra: synthetic
Requires-Dist: accelerate; extra == 'synthetic'
Requires-Dist: diffusers>=0.32.0; extra == 'synthetic'
Requires-Dist: safetensors; extra == 'synthetic'
Requires-Dist: sentencepiece; extra == 'synthetic'
Requires-Dist: torch>=2.0.0; extra == 'synthetic'
Requires-Dist: transformers>=4.38.0; extra == 'synthetic'
Provides-Extra: train
Requires-Dist: accelerate; extra == 'train'
Requires-Dist: diffusers>=0.32.0; extra == 'train'
Requires-Dist: peft>=0.6.0; extra == 'train'
Requires-Dist: safetensors; extra == 'train'
Requires-Dist: torch>=2.0.0; extra == 'train'
Requires-Dist: transformers>=4.38.0; extra == 'train'
Provides-Extra: workflow
Requires-Dist: pyyaml>=6.0; extra == 'workflow'
Description-Content-Type: text/markdown

# datasety

<img align="right" src="https://raw.githubusercontent.com/kontextox/datasety/refs/heads/main/docs/public/mascot.png" alt="CLI tool for dataset preparation" width="120" />

[![PyPI](https://img.shields.io/pypi/v/datasety)](https://pypi.org/project/datasety/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

CLI tool for dataset preparation — resize, caption, align, shuffle, synthetic editing, masking, degradation, character generation, LoRA training, audio TTS datasets, upload to HuggingFace, and multi-step workflows.

[Full documentation →](https://kontextox.github.io/datasety/)

## Installation

```bash
pip install datasety                 # core (resize, align, shuffle, degrade)
pip install datasety[caption]        # + Florence-2 captioning
pip install datasety[synthetic]      # + image editing (FLUX, Qwen, SDXL)
pip install datasety[mask]           # + segmentation masks (SAM 3, CLIPSeg)
pip install datasety[filter]         # + content filtering (CLIP, NudeNet)
pip install datasety[character]      # + character dataset generation
pip install datasety[workflow]       # + YAML workflow support
pip install datasety[train]          # + LoRA training (FLUX, SDXL)
pip install datasety[audio]          # + TTS audio datasets (YouTube, VAD, Piper)
pip install datasety[upload]         # + upload to HuggingFace Hub
pip install datasety[all]            # everything
```

---

## Commands

### `resize` — Resize & Crop Images

Batch resize images to exact dimensions with configurable crop positions.

<!-- screenshot: resize -->

```bash
datasety resize --input ./raw --output ./resized --resolution 768x1024 --crop-position top
```

<details>
<summary>Options</summary>

| Option                  | Description                                    | Default             |
| ----------------------- | ---------------------------------------------- | ------------------- |
| `--input`, `-i`         | Input directory                                | required\*          |
| `--output`, `-o`        | Output directory                               | required\*          |
| `--input-image`         | Single input image (alternative to dir mode)   |                     |
| `--output-image`        | Single output image (use with `--input-image`) |                     |
| `--resolution`, `-r`    | Target resolution (`WIDTHxHEIGHT`)             |                     |
| `--megapixel`           | Target megapixel count (e.g., 0.5, 1.0)        |                     |
| `--aspect-ratio`        | Aspect ratio `W:H` (e.g., 1:1, 16:9)           |                     |
| `--crop-position`       | `top`, `center`, `bottom`, `left`, `right`     | `center`            |
| `--input-format`        | Comma-separated input formats                  | `jpg,jpeg,png,webp` |
| `--output-format`       | `jpg`, `png`, `webp`                           | `jpg`               |
| `--output-name-numbers` | Rename output files to 1.jpg, 2.jpg, ...       | off                 |
| `--upscale`             | Upscale images smaller than target             | off                 |
| `--min-resolution`      | Skip images below this size (e.g., `256x256`)  |                     |
| `--workers`             | Parallel workers for processing                | `1`                 |
| `--recursive`, `-R`     | Search input directory recursively             | off                 |
| `--progress`            | Show tqdm progress bar                         | off                 |
| `--dry-run`             | Preview without modifying files                | off                 |

</details>

```bash
# Single image
datasety resize --input-image photo.jpg --output-image resized.jpg -r 512x512

# Batch with sequential numbering
datasety resize -i ./photos -o ./dataset -r 1024x1024 --output-name-numbers --crop-position top
```

[Full documentation →](https://kontextox.github.io/datasety/commands/resize)

---

### `caption` — Generate Image Captions

Generate captions using Florence-2 (local) or OpenAI-compatible vision APIs.

<!-- screenshot: caption -->

```bash
datasety caption --input ./images --output ./captions --trigger-word "[trigger]"
```

<details>
<summary>Options</summary>

| Option               | Description                                 | Default                   |
| -------------------- | ------------------------------------------- | ------------------------- |
| `--input`, `-i`      | Input directory                             | required\*                |
| `--output`, `-o`     | Output directory for .txt files             | required\*                |
| `--input-image`      | Single input image                          |                           |
| `--output-caption`   | Single output .txt path                     |                           |
| `--device`           | `auto`, `cpu`, `cuda`, `mps`                | `auto`                    |
| `--trigger-word`     | Text to prepend to each caption             |                           |
| `--prompt`           | Florence-2 task prompt                      | `<MORE_DETAILED_CAPTION>` |
| `--model`            | HF model name or API model ID               |                           |
| `--num-beams`        | Beam search width (1 = greedy)              | `3`                       |
| `--florence-2-base`  | Use Florence-2-base (0.23B, faster)         | default                   |
| `--florence-2-large` | Use Florence-2-large (0.77B, more accurate) |                           |
| `--llm-api`          | Use OpenAI-compatible vision API            |                           |
| `--max-tokens`       | Max response tokens (API mode)              | `300`                     |
| `--temperature`      | Temperature (API mode)                      | `0.3`                     |
| `--skip-existing`    | Skip images that already have a .txt file   | off                       |
| `--append`           | Append text to existing captions            |                           |
| `--prepend`          | Prepend text to existing captions           |                           |
| `--recursive`, `-R`  | Search input directory recursively          | off                       |
| `--progress`         | Show tqdm progress bar                      | off                       |
| `--dry-run`          | Preview without processing                  | off                       |

</details>

```bash
# Florence-2 with trigger word
datasety caption -i ./dataset -o ./dataset --trigger-word "photo of sks person," --device cuda

# OpenAI vision API (supports OPENAI_MODEL env var)
datasety caption -i ./images -o ./captions --llm-api --model gpt-5-nano
```

[Full documentation →](https://kontextox.github.io/datasety/commands/caption)

---

### `align` — Align Control/Target Pairs

Match dimensions, enforce multiples of 32, and unify formats for control/target training pairs. Includes a built-in web server for visual comparison with a compare slider, caption editing, and pair management.

<!-- screenshot: align -->

```bash
datasety align --target ./target --control ./control --dry-run
```

<details>
<summary>Options</summary>

| Option              | Description                              | Default       |
| ------------------- | ---------------------------------------- | ------------- |
| `--target`, `-t`    | Target images directory                  | required      |
| `--control`, `-c`   | Control images directory                 | required      |
| `--multiple-of`     | Align dimensions to this multiple        | `32`          |
| `--output-format`   | Convert all images: `jpg`, `png`, `webp` | keep original |
| `--recursive`, `-R` | Search input directories recursively     | off           |
| `--dry-run`         | Preview changes without modifying files  | off           |

</details>

```bash
# Preview, then apply
datasety align -t ./target -c ./control --dry-run
datasety align -t ./target -c ./control --output-format jpg
```

> **Visual comparison:** use `datasety server -i ./target --control ./control` to browse and compare aligned pairs in the browser.

[Full documentation →](https://kontextox.github.io/datasety/commands/align)

---

### `shuffle` — Random Caption Generation

Generate random captions by picking one variant from each text group.

<!-- screenshot: shuffle -->

```bash
datasety shuffle -i ./images -o ./captions \
    --group "A photo of a person.|Portrait of someone." \
    --group "Remove the hat.|Take off the hat."
```

<details>
<summary>Options</summary>

| Option                | Description                                | Default  |
| --------------------- | ------------------------------------------ | -------- |
| `--input`, `-i`       | Input directory containing images          | required |
| `--output`, `-o`      | Output directory for .txt files            | required |
| `--group`, `-g`       | Inline `\|`-separated, `.txt` file, or URL | required |
| `--separator`         | Separator between groups                   | `" "`    |
| `--seed`              | Random seed for reproducibility            |          |
| `--dry-run`           | Preview captions without writing           | off      |
| `--show-distribution` | Show caption distribution after generation | off      |

</details>

```bash
# Mix file, URL, and inline sources
datasety shuffle -i ./images -o ./captions \
    --group subjects.txt \
    --group "ending A|ending B" \
    --seed 42 --show-distribution
```

[Full documentation →](https://kontextox.github.io/datasety/commands/shuffle)

---

### `synthetic` — Synthetic Image Editing

Generate synthetic variations using image editing models (FLUX.2-klein FP8, FLUX.2-klein-9b-kv, Qwen-Image-Edit-2511, SDXL, LongCat, HunyuanImage). The default model `FLUX.2-klein-4b-fp8` requires no HuggingFace token and fits in ~5 GB VRAM.

<!-- screenshot: synthetic -->

```bash
datasety synthetic --input ./images --output ./synthetic --prompt "add a winter hat" --steps 4
```

<details>
<summary>Options</summary>

| Option               | Description                                                 | Default                                 |
| -------------------- | ----------------------------------------------------------- | --------------------------------------- |
| `--input`, `-i`      | Input directory                                             | required\*                              |
| `--output`, `-o`     | Output directory                                            | required\*                              |
| `--input-image`      | Single input image                                          |                                         |
| `--output-image`     | Single output image                                         |                                         |
| `--prompt`, `-p`     | Edit instruction                                            | required                                |
| `--model`            | Model (auto-detects family or API model)                    | `black-forest-labs/FLUX.2-klein-4b-fp8` |
| `--image-api`        | Use OpenAI-compatible API for generation                    | off                                     |
| `--api-aspect-ratio` | Aspect ratio for `--image-api` (e.g. `16:9`, `9:16`, `1:1`) | auto                                    |
| `--api-image-size`   | Resolution for `--image-api`: `0.5K`, `1K`, `2K`, `4K`      | `1K`                                    |
| `--weights`          | Fine-tuned weights file                                     |                                         |
| `--lora`             | LoRA adapter (repeatable, `:WEIGHT`)                        |                                         |
| `--device`           | `auto`, `cpu`, `cuda`, `mps`                                | `auto`                                  |
| `--cpu-offload`      | Force CPU offload                                           | auto                                    |
| `--steps`            | Inference steps                                             | `4`                                     |
| `--cfg-scale`        | Guidance scale                                              | `2.5`                                   |
| `--true-cfg-scale`   | True CFG (Qwen only)                                        | `4.0`                                   |
| `--negative-prompt`  | Negative prompt                                             | `" "`                                   |
| `--num-images`       | Images per input                                            | `1`                                     |
| `--seed`             | Random seed                                                 |                                         |
| `--gguf`             | GGUF path/URL for quantized loading                         |                                         |
| `--strength`         | Img2img strength (SDXL/FLUX.2, 0.0-1.0)                     | `0.7`                                   |
| `--recursive`, `-R`  | Search input directory recursively                          | off                                     |
| `--output-format`    | `png`, `jpg`, `webp`                                        | `png`                                   |
| `--skip-existing`    | Skip images with existing output                            | off                                     |
| `--batch-size`       | Flush GPU memory every N images                             | `0` (off)                               |
| `--progress`         | Show tqdm progress bar                                      | off                                     |
| `--dry-run`          | Preview without loading models                              | off                                     |

</details>

```bash
# Single image edit
datasety synthetic --input-image photo.jpg --output-image edited.png \
    --prompt "add sunglasses" --steps 4

# Cloud API — FLUX.2-flex (no GPU needed)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety synthetic -i ./images -o ./synthetic \
  --prompt "add a winter hat" --image-api --model black-forest-labs/flux.2-flex \
  --api-aspect-ratio 1:1

# Cloud API — Gemini 2.5 Flash (text+image, supports image-to-image)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety synthetic -i ./images -o ./synthetic \
  --prompt "transform into oil painting style" \
  --model google/gemini-2.5-flash-image --image-api \
  --api-aspect-ratio 3:4 --api-image-size 2K

# FLUX.2-klein-9b-kv (KV-cache, faster multi-reference, ~29 GB VRAM)
datasety synthetic -i ./images -o ./synthetic \
    --model "black-forest-labs/FLUX.2-klein-9b-kv" \
    --prompt "add sunglasses" --steps 4

# Qwen-Image-Edit-2511 with LoRA
datasety synthetic -i ./dataset -o ./synthetic \
    --model "Qwen/Qwen-Image-Edit-2511" \
    --lora "adapter.safetensors:0.8" \
    --prompt "add a red scarf" --steps 40
```

[Full documentation →](https://kontextox.github.io/datasety/commands/synthetic)

---

### `mask` — Text-Prompted Segmentation Masks

Generate binary masks from images using text keywords. Supports SAM 3, SAM 2, and CLIPSeg.

<!-- screenshot: mask -->

```bash
datasety mask --input ./dataset --output ./masks --keywords "face,hair" --device cuda
```

<details>
<summary>Options</summary>

| Option              | Description                        | Default    |
| ------------------- | ---------------------------------- | ---------- |
| `--input`, `-i`     | Input directory                    | required\* |
| `--output`, `-o`    | Output directory for masks         | required\* |
| `--input-image`     | Single input image                 |            |
| `--output-image`    | Single output mask                 |            |
| `--keywords`, `-k`  | Comma-separated keywords           | required   |
| `--model`           | `sam3`, `sam2`, `clipseg`          | `sam3`     |
| `--device`          | `auto`, `cpu`, `cuda`, `mps`       | `auto`     |
| `--threshold`       | Confidence threshold (0.0-1.0)     | `0.3`      |
| `--padding`         | Pixels to expand mask (dilation)   | `0`        |
| `--blur`            | Gaussian blur radius for edges     | `0`        |
| `--invert`          | Invert mask colors                 | off        |
| `--naming`          | `folder` or `suffix` (`_mask`)     | `folder`   |
| `--output-format`   | `png`, `jpg`, `webp`               | `png`      |
| `--skip-existing`   | Skip images with existing masks    | off        |
| `--dry-run`         | Preview detections without saving  | off        |
| `--recursive`, `-R` | Search input directory recursively | off        |
| `--progress`        | Show tqdm progress bar             | off        |

</details>

```bash
# CLIPSeg (lightweight, no extra deps)
datasety mask -i ./dataset -o ./masks -k "face" --model clipseg --threshold 0.5

# SAM 2 with mask refinement
datasety mask -i ./dataset -o ./masks -k "hat,glasses" --model sam2 --padding 5 --blur 3
```

[Full documentation →](https://kontextox.github.io/datasety/commands/mask)

---

### `filter` — Filter Dataset by Content

Filter, curate, or clean datasets based on image content. Use CLIP for arbitrary text queries or NudeNet for NSFW label detection.

<!-- screenshot: filter -->

```bash
datasety filter --input ./dataset --output ./rejected --query "leg,male face" --action move
```

<details>
<summary>Options</summary>

| Option                 | Description                                             | Default  |
| ---------------------- | ------------------------------------------------------- | -------- |
| `--input`, `-i`        | Input directory                                         | required |
| `--output`, `-o`       | Output directory for matched/rejected images            |          |
| `--query`, `-q`        | Comma-separated text queries (CLIP)                     |          |
| `--labels`, `-l`       | Comma-separated NudeNet labels                          |          |
| `--model`              | `clip`, `nudenet`                                       | `clip`   |
| `--action`             | `move`, `copy`, `delete`, `keep`                        | `move`   |
| `--threshold`          | Confidence threshold (0.0-1.0)                          | `0.5`    |
| `--device`             | `auto`, `cpu`, `cuda`, `mps`                            | `auto`   |
| `--confirm`            | Required for destructive actions (`delete`, `keep`)     | off      |
| `--preserve-structure` | Keep subfolder hierarchy in output (with `--recursive`) | off      |
| `--invert`             | Invert match logic (act on non-matches)                 | off      |
| `--log`                | Write CSV log of all decisions to this path             |          |
| `--dry-run`            | Preview detections without modifying files              | off      |
| `--recursive`, `-R`    | Search input directory recursively                      | off      |
| `--progress`           | Show tqdm progress bar                                  | off      |

</details>

```bash
# Move images containing legs or male faces to a reject folder
datasety filter -i ./dataset -o ./rejected --query "leg,male face" --action move

# Delete NSFW images using NudeNet labels
datasety filter -i ./dataset --labels "FEMALE_BREAST_EXPOSED,MALE_GENITALIA_EXPOSED" \
    --action delete --model nudenet --threshold 0.6 --confirm

# Keep only images with "hat and socks", move the rest out
datasety filter -i ./dataset -o ./rejected --query "hat and socks" --action keep

# Dry-run to preview what would be filtered
datasety filter -i ./dataset --query "blurry,low quality" --action delete --dry-run -R

# Write a decision log for review
datasety filter -i ./dataset -o ./rejected --query "outdoor" --action copy --log filter_log.csv
```

[Full documentation →](https://kontextox.github.io/datasety/commands/filter)

---

### `inspect` — Dataset Statistics

Scan a dataset directory and report image count, resolution distribution, format breakdown, file sizes, caption coverage, and optionally detect duplicate images via perceptual hashing.

<!-- screenshot: inspect -->

```bash
datasety inspect --input ./dataset --duplicates
```

<details>
<summary>Options</summary>

| Option              | Description                               | Default  |
| ------------------- | ----------------------------------------- | -------- |
| `--input`, `-i`     | Input directory                           | required |
| `--duplicates`      | Detect duplicate/near-duplicate images    | off      |
| `--json`            | Export report as JSON to this path        |          |
| `--csv`             | Export per-image data as CSV to this path |          |
| `--recursive`, `-R` | Search input directory recursively        | off      |

</details>

```bash
# Full report with duplicate detection
datasety inspect -i ./dataset --duplicates

# Export report to JSON
datasety inspect -i ./dataset --json report.json

# Export per-image data to CSV
datasety inspect -i ./dataset --csv images.csv -R
```

[Full documentation →](https://kontextox.github.io/datasety/commands/inspect)

---

### `server` — Dataset Management Dashboard

<img src="https://raw.githubusercontent.com/kontextox/datasety/refs/heads/main/docs/public/demo.png" alt="Start a universal web server for managing your entire dataset from the browser.">

Start a universal web server for managing your entire dataset from the browser. Browse images in a gallery, edit and create captions, delete or compare images, view statistics, upload new images, and detect duplicates — all in one interface.

```bash
datasety server --input ./dataset
```

<details>
<summary>Options</summary>

| Option              | Description                                           | Default  |
| ------------------- | ----------------------------------------------------- | -------- |
| `--input`, `-i`     | Dataset directory to manage                           | required |
| `--control`, `-c`   | Control images directory (enables Pairs tab)          |          |
| `--port`            | Port for the web server                               | `8080`   |
| `--recursive`, `-R` | Search directories recursively for images             | off      |
| `--duplicates`      | Pre-compute perceptual hashes for duplicate detection | off      |

</details>

```bash
# Start the dashboard on the default port
datasety server -i ./dataset

# With duplicate detection pre-computed
datasety server -i ./dataset --duplicates --port 9000

# Pairs comparison (align workflow)
datasety server -i ./target --control ./control
```

The dashboard provides:

- **Gallery** — thumbnail grid with sorting and filtering; click any image for the detail panel (caption editor, file info, delete)
- **Compare** — drag-slider side-by-side comparison for any two images
- **Pairs** _(with `--control`)_ — compare control/target pairs with a drag slider; edit captions for both sides; delete pairs; arrow-key navigation
- **Stats** — live dataset overview: image count, total size, caption coverage, format and orientation breakdown
- **Upload** — drag images into the browser or use the Upload button to add images to the dataset
- **Keyboard navigation** — arrow keys to move through gallery or pairs, `Ctrl+S` to save, `T` to toggle theme, `?` for help

---

### `degrade` — Image Degradation

Create degraded versions of images for upscale/enhance training. Pure Pillow, no extra dependencies.

<!-- screenshot: degrade -->

```bash
datasety degrade --input ./originals --output ./dataset --type random --intensity-range 0.2-0.8 --paired
```

<details>
<summary>Options</summary>

| Option              | Description                           | Default    |
| ------------------- | ------------------------------------- | ---------- |
| `--input`, `-i`     | Input directory                       | required\* |
| `--output`, `-o`    | Output directory                      | required\* |
| `--input-image`     | Single input image                    |            |
| `--output-image`    | Single output image                   |            |
| `--type`, `-t`      | Degradation type(s), repeatable       | `random`   |
| `--intensity`       | Global intensity (0.0-1.0)            | `0.5`      |
| `--intensity-range` | Random range `MIN-MAX`                |            |
| `--chain`           | Apply multiple types sequentially     | off        |
| `--num-variants`    | Variants per input image              | `1`        |
| `--paired`          | Create `control/` + `target/` subdirs | off        |
| `--seed`            | Random seed                           |            |
| `--output-format`   | `png`, `jpg`, `webp`                  | `png`      |
| `--skip-existing`   | Skip images with existing output      | off        |
| `--workers`         | Parallel workers for processing       | `1`        |
| `--progress`        | Show tqdm progress bar                | off        |
| `--dry-run`         | Preview without writing files         | off        |

**Degradation types:** `lowres`, `oversharpen`, `noise`, `blur`, `jpeg`, `motion-blur`, `pixelate`, `color-bands`, `upscale-sim`, `random`

</details>

```bash
# Chain specific degradations for paired output
datasety degrade -i ./images -o ./dataset --type jpeg --type noise --chain --paired --seed 42

# Multiple random variants per image
datasety degrade -i ./images -o ./degraded --type random --num-variants 3 --intensity-range 0.3-0.8
```

[Full documentation →](https://kontextox.github.io/datasety/commands/degrade)

---

### `character` — Character Dataset Generation

Generate character datasets using LLM-generated prompts + text-to-image (FLUX.2-klein local or cloud API).

<!-- screenshot: character -->

```bash
datasety character --output ./dataset --llm-ollama qwen3.5:4b --num-images 20
```

<details>
<summary>Options</summary>

| Option                    | Description                                            | Default                                 |
| ------------------------- | ------------------------------------------------------ | --------------------------------------- |
| `--reference`, `-r`       | Reference face image(s) (optional, prompt context)     |                                         |
| `--output`, `-o`          | Output directory                                       | required                                |
| `--num-images`, `-n`      | Number of images to generate                           | `10`                                    |
| `--model`                 | Model for generation (local HF or API model ID)        | `black-forest-labs/FLUX.2-klein-4b-fp8` |
| `--gguf`                  | GGUF path/URL for quantized loading                    |                                         |
| `--image-api`             | Use OpenAI-compatible API for image generation         | off                                     |
| `--api-aspect-ratio`      | Aspect ratio for `--image-api` (e.g. `9:16`, `1:1`)    | derived from `--width`/`--height`       |
| `--api-image-size`        | Resolution for `--image-api`: `0.5K`, `1K`, `2K`, `4K` |                                         |
| `--character-description` | Text description of the character                      |                                         |
| `--style`                 | Style guidance (e.g., `photorealistic`)                |                                         |
| `--prompts-only`          | Only generate prompts, skip images                     | off                                     |
| `--prompts-file`          | Load prompts from file instead of LLM                  |                                         |
| `--llm-api`               | Use OpenAI-compatible API for prompts                  |                                         |
| `--llm-ollama MODEL`      | Use local Ollama server for prompts                    |                                         |
| `--llm-gguf PATH`         | Use local GGUF model for prompts                       |                                         |
| `--llm-model REPO`        | Use HuggingFace model for prompts                      |                                         |
| `--device`                | `auto`, `cpu`, `cuda`, `mps`                           | `auto`                                  |
| `--steps`                 | Inference steps                                        | `4`                                     |
| `--cfg-scale`             | Guidance scale                                         | `4.0`                                   |
| `--seed`                  | Random seed                                            |                                         |
| `--height`                | Output image height                                    | `1024`                                  |
| `--width`                 | Output image width                                     | `1024`                                  |
| `--output-format`         | `png`, `jpg`, `webp`                                   | `png`                                   |
| `--batch-size`            | Flush GPU memory every N images                        | `0` (off)                               |
| `--dry-run`               | Preview prompts without generating images              | off                                     |

</details>

```bash
# Generate with local pipeline + Ollama prompts
datasety character -o ./dataset --llm-ollama qwen3.5:4b --num-images 20

# Cloud API for images (no GPU needed)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  datasety character -o ./dataset --prompts-file prompts.txt \
  --image-api --model black-forest-labs/flux.2-flex --api-aspect-ratio 2:3

# Preview prompts only
datasety character -o ./dataset --llm-api --prompts-only
```

[Full documentation →](https://kontextox.github.io/datasety/commands/character)

---

### `sweep` — Parameter Grid Search

Generate workflow YAML files with parameter grid combinations for synthetic editing. Computes the Cartesian product of sweep parameters.

<!-- screenshot: sweep -->

```bash
datasety sweep -i ./images -o ./sweep_output -p "add a winter hat" --steps 4,8,16 --cfg-scale 1.0,2.5,5.0
```

<details>
<summary>Options</summary>

| Option             | Description                              | Default      |
| ------------------ | ---------------------------------------- | ------------ |
| `--input`, `-i`    | Input images directory                   | required     |
| `--output`, `-o`   | Base output directory                    | required     |
| `--prompt`, `-p`   | Edit prompt                              | required     |
| `--steps`          | Comma-separated step values to sweep     |              |
| `--cfg-scale`      | Comma-separated CFG values to sweep      |              |
| `--true-cfg-scale` | Comma-separated true CFG values to sweep |              |
| `--strength`       | Comma-separated strength values to sweep |              |
| `--lora`           | Comma-separated LoRA specs to sweep      |              |
| `--model`          | Comma-separated model names to sweep     |              |
| `--seed`           | Random seed (passed through)             |              |
| `--output-file`    | Output YAML path                         | `sweep.yaml` |
| `--run`            | Generate and immediately execute         | off          |

</details>

```bash
# Generate YAML, inspect, then run
datasety sweep -i ./images -o ./sweep -p "add sunglasses" --steps 4,8,16 --cfg-scale 1.0,2.5
datasety workflow -f sweep.yaml

# Generate and run immediately
datasety sweep -i ./images -o ./sweep -p "add a hat" --steps 4,8 --cfg-scale 2.0,3.0 --run
```

[Full documentation →](https://kontextox.github.io/datasety/commands/sweep)

---

### `train` — LoRA Fine-Tuning

Train a LoRA adapter for image generation models from a local dataset of image + caption pairs.

Supported model families: **FLUX.2-klein** (flow-matching), **SDXL** (DDPM), and **Qwen** (flow-matching, image-editing).

<!-- screenshot: train -->

```bash
datasety train --input ./dataset --output lora.safetensors --steps 500
```

<details>
<summary>Options</summary>

| Option                          | Description                                               | Default                                  |
| ------------------------------- | --------------------------------------------------------- | ---------------------------------------- |
| `--input`, `-i`                 | Dataset directory (images + `.txt` captions)              | required                                 |
| `--output`, `-o`                | Output LoRA `.safetensors` path                           | `lora.safetensors`                       |
| `--model`, `-m`                 | HuggingFace repo ID (base model)                          | `black-forest-labs/FLUX.2-klein-base-4B` |
| `--family`                      | Model family: `flux`, `sdxl`, `qwen`                      | auto-detected                            |
| `--steps`                       | Number of training steps                                  | `100`                                    |
| `--lr`                          | Learning rate                                             | `1e-4`                                   |
| `--lora-rank`                   | LoRA rank                                                 | `16`                                     |
| `--lora-alpha`                  | LoRA alpha                                                | `16.0`                                   |
| `--lora-dropout`                | LoRA dropout rate                                         | `0.0`                                    |
| `--image-size`                  | Training resolution (square crop)                         | `512`                                    |
| `--device`                      | `auto`, `cpu`, `cuda`, `mps`                              | `auto`                                   |
| `--seed`                        | Random seed                                               | `42`                                     |
| `--save-every`                  | Save checkpoint every N steps                             | end only                                 |
| `--resume`                      | Resume from a LoRA checkpoint (.safetensors)              |                                          |
| `--validation-split`            | Fraction of dataset for validation (0.0-0.5)              |                                          |
| `--timestep-type`               | Timestep sampling: `sigmoid`, `lognorm`, `linear`         | `sigmoid`                                |
| `--caption-dropout`             | Probability of dropping caption (unconditional)           | `0.05`                                   |
| `--gradient-checkpointing`      | Enable gradient checkpointing (saves VRAM)                | off                                      |
| `--optimizer`                   | `adamw` or `adamw8bit` (requires bitsandbytes)            | `adamw`                                  |
| `--lr-scheduler`                | LR schedule: `constant`, `cosine`, `linear`               | `constant`                               |
| `--lr-warmup-steps`             | Linear warmup steps before target LR                      | `0`                                      |
| `--gradient-accumulation-steps` | Accumulate gradients over N steps                         | `1`                                      |
| `--min-snr-gamma`               | Min-SNR-γ loss weighting for SDXL (recommended: 5.0)      | disabled                                 |
| `--noise-offset`                | Per-channel noise offset for SDXL (recommended: 0.05–0.1) | `0.0`                                    |

</details>

```bash
# Prepare dataset
datasety resize -i ./raw -o ./dataset -r 512x512
datasety caption -i ./dataset -o ./dataset --trigger-word "ohwx person,"

# Train FLUX.2-klein LoRA (~8 GB VRAM)
datasety train \
    --input ./dataset \
    --output lora/flux_lora.safetensors \
    --model black-forest-labs/FLUX.2-klein-base-4B \
    --steps 500 --lr 1e-4 --lora-rank 16

# Use the trained LoRA
datasety synthetic --input-image photo.jpg --output-image out.png \
    --prompt "ohwx person wearing sunglasses" \
    --lora lora/flux_lora.safetensors:0.8

# SDXL LoRA
datasety train \
    --input ./dataset \
    --output sdxl_lora.safetensors \
    --model stabilityai/stable-diffusion-xl-base-1.0 \
    --family sdxl --steps 500 --image-size 1024
```

[Full documentation →](https://kontextox.github.io/datasety/commands/train)

---

### `audio` — Build TTS Audio Datasets

Build TTS (Text-to-Speech) audio datasets from video or audio files. Supports YouTube URLs, direct media URLs, local files, and directories of files (sorted by name). Extracts audio, transcribes with faster-whisper, normalizes text, and outputs Piper/LJSpeech-compatible datasets.

```bash
datasety audio --input ./video.mp4 --output ./dataset
datasety audio --input ./clips/ --output ./dataset
datasety audio --input "https://www.youtube.com/watch?v=..." --output ./dataset --language uk
```

<details>
<summary>Options</summary>

| Option                | Description                                                             | Default     |
| --------------------- | ----------------------------------------------------------------------- | ----------- |
| `--input`, `-i`       | Input: local file, directory of files, YouTube URL, or direct media URL | required    |
| `--output`, `-o`      | Output directory for the dataset                                        | required    |
| `--sample-rate`       | Output audio sample rate in Hz                                          | `22050`     |
| `--demucs`            | Enable Demucs vocal isolation                                           | `false`     |
| `--demucs-model`      | Demucs model name                                                       | `htdemucs`  |
| `--whisper-model`     | Faster-Whisper model: tiny, base, small, medium, large-v3               | `base`      |
| `--language`          | Language code (e.g., en, es, fr, uk). Auto-detected if omitted          | (auto)      |
| `--device`            | Device: auto, cpu, cuda, mps                                            | `auto`      |
| `--vad`               | Enable voice activity detection (VAD) to filter non-speech              | `false`     |
| `--min-duration`      | Minimum segment duration in seconds                                     | `1.5`       |
| `--max-duration`      | Maximum segment duration in seconds                                     | `30.0`      |
| `--merge-gap`         | Merge segments closer than this many seconds                            | `0.0` (off) |
| `--normalize-numbers` | Expand digits into words                                                | `false`     |
| `--no-clean-text`     | Disable special character stripping                                     | `false`     |
| `--workers`           | Number of parallel file workers (default: 1)                            | `1`         |
| `--keep-temp`         | Keep temporary audio files at this path                                 |             |
| `--resume`            | Resume a previous run (skip existing chunks, append to CSV)             | `false`     |
| `--overwrite`         | Overwrite existing output directory                                     | `false`     |
| `--dry-run`           | Print pipeline steps without executing                                  | `false`     |
| `--verbose`, `-V`     | Print detailed progress messages                                        | `false`     |

</details>

```bash
# YouTube video to TTS dataset (English)
datasety audio --input "https://www.youtube.com/watch?v=..." --output ./tts_dataset

# Directory of audio files sorted by name (1.mp3, 2.mp3, ...)
datasety audio --input ./recordings/ --output ./dataset

# Ukrainian video — always specify language for accurate transcription
datasety audio --input ./video.mp4 --output ./dataset --language uk

# Enable VAD for noisy audio (merges speech into fewer, longer segments)
datasety audio --input ./noisy_video.mp4 --output ./dataset --vad

# Local video with vocal isolation and high-quality transcription
datasety audio --input ./video.mp4 --output ./dataset --demucs --whisper-model large-v3

# Resume a previous run (skip already-processed chunks)
datasety audio --input ./video.mp4 --output ./dataset --resume

# Overwrite existing output
datasety audio --input ./video.mp4 --output ./dataset --overwrite

# Parallel processing of multiple files
datasety audio --input ./videos/ --output ./dataset --workers 4

# Preview pipeline without downloading
datasety audio --input ./video.mp4 --output ./dataset --dry-run --verbose
```

[Full documentation →](https://kontextox.github.io/datasety/commands/audio)

---

### `upload` — Upload to HuggingFace Hub

Upload datasets and model adapters to HuggingFace Hub. Auto-detects type (audio, image, video, document, model, generic) from directory structure and generates HF-compliant README dataset cards with YAML frontmatter.

```bash
datasety upload --path ./tts_dataset --repo-id user/my-voice --type audio
datasety upload --path ./lora_output --repo-id user/klein-lora --type model
datasety upload --path ./dataset --repo-id user/my-dataset --dry-run
```

<details>
<summary>Options</summary>

| Option            | Description                                                                        | Default    |
| ----------------- | ---------------------------------------------------------------------------------- | ---------- |
| `--path`, `-p`    | Path to the dataset or model directory to upload                                   | required   |
| `--repo-id`, `-r` | HuggingFace repo ID (e.g. `username/my-dataset`). Derived from dir name if omitted | (derived)  |
| `--type`, `-t`    | Dataset or model type                                                              | `auto`     |
| `--private`       | Make the repository private                                                        | `false`    |
| `--token`         | HuggingFace API token (or set `HF_TOKEN` env var)                                  | `HF_TOKEN` |
| `--force`         | Force regenerate README.md if it already exists                                    | `false`    |
| `--dry-run`       | Show what would be uploaded without uploading                                      | `false`    |
| `--metadata`      | Extra YAML `key: value` pairs for dataset card frontmatter                         |            |
| `--yes`, `-y`     | Skip all confirmation prompts                                                      | `false`    |
| `--verbose`, `-V` | Print detailed progress messages                                                   | `false`    |

</details>

```bash
# Upload a TTS dataset (auto-generates README with TTS task card)
datasety upload --path ./tts_dataset --repo-id your-username/my-voice --private

# Upload a LoRA adapter
datasety upload --path ./lora.safetensors --repo-id your-username/klein-lora --type model

# Dry-run to verify what will be uploaded
datasety upload --path ./dataset --repo-id user/dataset --dry-run --verbose

# With extra metadata
datasety upload --path ./dataset --repo-id user/dataset \
    --metadata 'license:cc-by-4.0 language: [en,fr]'
```

[Full documentation →](https://kontextox.github.io/datasety/commands/upload)

---

### `workflow` — Multi-Step Pipelines

Run multi-step datasety pipelines from YAML or JSON files with dry-run validation.

<!-- screenshot: workflow -->

```bash
datasety workflow --file datasety.yaml --dry-run
```

<details>
<summary>Options</summary>

| Option         | Description                      | Default     |
| -------------- | -------------------------------- | ----------- |
| `--file`, `-f` | Path to workflow file            | auto-detect |
| `--dry-run`    | Validate steps without executing | off         |

</details>

Create `datasety.yaml`:

```yaml
steps:
  - command: resize
    args:
      input: ./raw
      output: ./resized
      resolution: 768x1024
  - command: caption
    args:
      input: ./resized
      output: ./resized
      llm-api: true
      model: gpt-5-nano
```

```bash
# Validate first, then execute
datasety workflow --dry-run
datasety workflow
```

[Full documentation →](https://kontextox.github.io/datasety/commands/workflow)

---

## License

MIT
