hftool

A powerful CLI for running Hugging Face models: text-to-image, text-to-video, text-to-speech, speech-to-text, and more.

Features

Optimized for AMD ROCm (also supports NVIDIA CUDA, Apple MPS, and CPU).

Installation

Quick Install

pip install hftool

On first run, hftool will detect if PyTorch is missing or misconfigured and offer to install it for you:

============================================================
  hftool - First Time Setup
============================================================

Detected hardware:
  [✓] AMD GPU detected: Radeon RX 7900 XTX

Select PyTorch version to install:

  [1] NVIDIA GPU (CUDA)
  [2] AMD GPU (ROCm 6.2) (recommended)
  [3] Apple Silicon (MPS)
  [4] CPU only
  [5] Skip (install manually later)

Your choice [2]:

You can also run the setup wizard manually at any time:

hftool setup

Install with Specific Features

# Text-to-Image (Z-Image, SDXL, FLUX)
pip install "hftool[with_t2i]"

# Text-to-Video (HunyuanVideo, CogVideoX, Wan2.2)
pip install "hftool[with_t2v]"

# Text-to-Speech (VibeVoice, Bark, MMS-TTS)
pip install "hftool[with_tts]"

# Speech-to-Text (Whisper)
pip install "hftool[with_stt]"

# All features
pip install "hftool[all]"

System Requirements

Development Install

git clone https://github.com/zb-ss/hftool
cd hftool

# Install PyTorch first (see Quick Install above for your platform)
pip install torch torchvision torchaudio  # or with ROCm/CPU index

# Then install hftool in dev mode
pip install -e ".[dev]"  # Includes pytest

pipx Install (Isolated Environment)

# Install hftool
pipx install hftool[all]

# Then inject the correct PyTorch for your platform:
# NVIDIA:
pipx runpip hftool install torch torchvision torchaudio

# AMD ROCm:
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

# CPU only:
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Quick Start

# Generate an image (auto-opens when done!)
hftool -t t2i -i "A cat in space" -o cat.png

# Generate speech
hftool -t tts -i "Hello world" -o hello.wav

# Transcribe audio
hftool -t asr -i recording.wav -o transcript.txt

Auto-open feature: By default, generated images, audio, and video files automatically open in your system’s default application when complete!

When you run a task for the first time, hftool will prompt you to download the required model:

============================================================
Model not found: Z-Image Turbo
============================================================

  Task:     text-to-image
  Model:    Z-Image Turbo
  Repo:     Tongyi-MAI/Z-Image-Turbo
  Size:     ~6.0 GB
  Location: /home/user/.hftool/models/Tongyi-MAI--Z-Image-Turbo

Download this model now? [Y/n]:

Model Management

List Available Models

# List all models
hftool models

# List models for a specific task
hftool models -t text-to-image
hftool models -t t2i  # (using alias)

# Show only downloaded models
hftool models --downloaded

# Output as JSON
hftool models --json

Download Models

# Download default model for a task
hftool download -t text-to-image
hftool download -t t2i  # (using alias)

# Download specific model by short name
hftool download -t t2i -m sdxl

# Download by HuggingFace repo_id
hftool download -m openai/whisper-large-v3

# Download all default models for all tasks
hftool download --all

# Re-download (force)
hftool download -t t2i -f

Check Status

# Show downloaded models and disk usage
hftool status

Clean Up

# Interactive selection (default) - shows numbered list to choose from
hftool clean

# Delete specific model by name
hftool clean -m whisper-large-v3

# Delete multiple models at once
hftool clean -m whisper-large-v3 -m z-image-turbo

# Delete all downloaded models
hftool clean --all

# Skip confirmation prompts
hftool clean --all -y

Interactive selection example:

Downloaded models:
------------------------------------------------------------
  [ 1] Whisper Large v3 (automatic-speech-recognition)
       openai/whisper-large-v3 - 3.1 GB
  [ 2] Z-Image Turbo (text-to-image)
       Tongyi-MAI/Z-Image-Turbo - 6.0 GB
------------------------------------------------------------

Enter model numbers to delete (comma-separated, ranges with -, or 'all'):
Examples: 1,3,5  or  1-3  or  1,3-5,7  or  all

Selection []: 1,2

Custom Storage Location

By default, models are stored in ~/.hftool/models/. You can customize this:

# Set custom location via environment variable
export HFTOOL_MODELS_DIR=/path/to/models

# Or use one-time
HFTOOL_MODELS_DIR=/mnt/storage hftool -t t2i -i "A cat" -o cat.png

Using a .env file (recommended):

Create a .env file in your project directory or ~/.hftool/.env:

# .env
HFTOOL_MODELS_DIR=/data/models
HFTOOL_AUTO_DOWNLOAD=1
HFTOOL_AUTO_OPEN=0

hftool automatically loads .env files on startup.

Auto-Download Mode

To skip interactive prompts and auto-download models:

export HFTOOL_AUTO_DOWNLOAD=1

Auto-Open Output Files

By default, generated images, audio, and video files automatically open in your system’s default application when complete. Control this with:

# Always open (even text files)
hftool -t t2i -i "A cat" -o cat.png --open

# Never open
hftool -t t2i -i "A cat" -o cat.png --no-open

# Or set via environment variable
export HFTOOL_AUTO_OPEN=1    # Always open
export HFTOOL_AUTO_OPEN=0    # Never open

Default behavior: Auto-opens image, audio, and video files. Text output is printed to console.


Usage

Basic Syntax

hftool -t <task> -i <input> [-m <model>] [-o <output>] [-- extra_args]

List Available Tasks

hftool --list-tasks

Task Aliases

Alias Full Name
t2i text-to-image
t2v text-to-video
tts text-to-speech
asr, stt automatic-speech-recognition
llm text-generation

Examples

Text-to-Image

Generate images with Z-Image-Turbo (state-of-the-art open-source model):

# Basic usage (uses default model)
hftool -t t2i -i "A cat wearing a space helmet" -o cat_space.png

# With specific model
hftool -t t2i -m Tongyi-MAI/Z-Image-Turbo \
       -i "A photorealistic sunset over mountains" \
       -o sunset.png

# With custom parameters (Z-Image-Turbo uses 9 steps, guidance_scale=0)
hftool -t t2i -m Tongyi-MAI/Z-Image-Turbo \
       -i "A renaissance painting of a robot" \
       -o robot.png \
       -- --num_inference_steps 9 --guidance_scale 0.0 --height 1024 --width 1024

Other supported models: - stabilityai/stable-diffusion-xl-base-1.0 - black-forest-labs/FLUX.1-schnell


Text-to-Video

Generate videos with HunyuanVideo-1.5:

# Basic usage (480p, ~2.5 second video)
hftool -t t2v -i "A person walking on a beach at sunset" -o beach.mp4

# With specific model and parameters
hftool -t t2v -m hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v \
       -i "A timelapse of clouds moving over a city" \
       -o clouds.mp4 \
       -- --num_frames 61 --num_inference_steps 30

# Image-to-Video (animate an image)
hftool -t i2v -m hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v \
       -i '{"image": "photo.jpg", "prompt": "The person waves hello"}' \
       -o animated.mp4

Other supported models: - THUDM/CogVideoX-5b - Wan-AI/Wan2.1-T2V-1.3B

Note: Requires system ffmpeg for video encoding.


Text-to-Speech

Generate speech with VibeVoice:

# Basic usage
hftool -t tts -i "Hello, this is a test of the text to speech system." -o hello.wav

# With specific model
hftool -t tts -m microsoft/VibeVoice-Realtime-0.5B \
       -i "Welcome to hftool, your command-line AI assistant." \
       -o welcome.wav

# Output as MP3 (requires ffmpeg)
hftool -t tts -i "This will be saved as MP3." -o output.mp3

Other supported models: - suno/bark-small (multi-language, sound effects) - facebook/mms-tts-eng (lightweight)

GLM-TTS Setup (Advanced)

GLM-TTS requires manual installation:

# Clone the repository
git clone https://github.com/zai-org/GLM-TTS.git
cd GLM-TTS && pip install -r requirements.txt

# Set environment variable
export GLMTTS_PATH=/path/to/GLM-TTS

# Run
hftool -t tts -m zai-org/GLM-TTS -i "你好世界" -o hello_chinese.wav

Speech-to-Text (ASR)

Transcribe audio with Whisper:

# Basic transcription
hftool -t asr -i recording.wav -o transcript.txt

# With specific model
hftool -t asr -m openai/whisper-large-v3 -i podcast.mp3 -o transcript.txt

# With timestamps (outputs JSON)
hftool -t asr -i interview.wav -o transcript.json \
       -- --return_timestamps true

# Generate SRT subtitles
hftool -t asr -i video_audio.wav -o subtitles.srt \
       -- --return_timestamps true --format srt

Supported models: - openai/whisper-large-v3 (best quality) - openai/whisper-medium - openai/whisper-small (fastest)


Text Generation (LLMs)

Run language models:

# Basic generation
hftool -t llm -m meta-llama/Llama-3.2-1B-Instruct \
       -i "Explain quantum computing in simple terms:" \
       -o response.txt \
       -- --max_new_tokens 200

Other Tasks

# Image Classification
hftool -t image-classification -m google/vit-base-patch16-224 \
       -i photo.jpg -o result.json

# Object Detection
hftool -t object-detection -m facebook/detr-resnet-50 \
       -i street.jpg -o detections.json

# Summarization
hftool -t summarization -m facebook/bart-large-cnn \
       -i article.txt -o summary.txt

# Translation
hftool -t translation -m Helsinki-NLP/opus-mt-en-de \
       -i "Hello, how are you?" -o translation.txt

CLI Reference

Main Command

Usage: hftool [OPTIONS] COMMAND [ARGS]...

Options:
  -t, --task TEXT         Task to perform
  -m, --model TEXT        Model name/path (uses task default if omitted)
  -i, --input TEXT        Input data: text, file path, or URL
  -o, --output-file TEXT  Output file path (auto-generated if omitted)
  -d, --device TEXT       Device: auto, cuda, mps, cpu (default: auto)
  --dtype TEXT            Data type: bfloat16, float16, float32
  --open / --no-open      Open output with default app (auto for media files)
  --list-tasks            List all available tasks and aliases
  -v, --verbose           Show detailed progress
  --help                  Show this message and exit

Commands:
  setup     Run interactive PyTorch setup wizard
  models    List available models for tasks
  download  Download models from HuggingFace Hub
  status    Show download status and disk usage
  clean     Delete downloaded models
  run       Run a task (alternative to -t flag)

Environment Variables

Variable Description Default
HFTOOL_MODELS_DIR Custom models storage directory ~/.hftool/models/
HFTOOL_AUTO_DOWNLOAD Auto-download models without prompting 0 (disabled)
HFTOOL_AUTO_OPEN Auto-open output files auto (media files only)
HFTOOL_ROCM_PATH Path to ROCm libraries (e.g., Ollama’s bundled ROCm) (none)
HSA_OVERRIDE_GFX_VERSION AMD GPU architecture override (e.g., 11.0.0 for RX 7900) (none)

Passing Model-Specific Arguments

Use -- to pass additional arguments to the underlying model:

hftool -t t2i -i "A cat" -o cat.png \
       -- --num_inference_steps 20 --guidance_scale 7.5 --seed 42

Hardware Recommendations

AMD ROCm (Primary Target)

hftool is optimized for AMD GPUs with ROCm 6.x:

Task Model VRAM Required Notes
Text-to-Image Z-Image-Turbo ~10-12 GB Comfortable on RX 7900 XTX
Text-to-Video HunyuanVideo 480p ~20-24 GB Use CPU offload
Text-to-Video HunyuanVideo 720p ~30-40 GB Requires multi-GPU
Text-to-Speech VibeVoice ~2-4 GB Easy
Speech-to-Text Whisper-large-v3 ~4-6 GB Easy

ROCm Setup (Without System-Wide Installation)

If you have Ollama installed, you can use its bundled ROCm libraries instead of installing ROCm system-wide (which can interfere with gaming GPU drivers).

Step 1: Install PyTorch ROCm in your hftool environment:

# If using pipx:
pipx runpip hftool uninstall torch torchvision torchaudio -y
pipx runpip hftool install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

# If using pip:
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

Step 2: Add ROCm configuration to your .env file (~/.hftool/.env or project directory):

# Use Ollama's bundled ROCm libraries
HFTOOL_ROCM_PATH=/usr/local/lib/ollama/rocm

# Set your GPU architecture (required for AMD GPUs)
# RDNA3: gfx1100 (RX 7900 XTX/XT), gfx1101 (RX 7800/7700), gfx1102 (RX 7600)
# RDNA2: gfx1030 (RX 6900/6800), gfx1031 (RX 6700), gfx1032 (RX 6600)
HSA_OVERRIDE_GFX_VERSION=11.0.0

Step 3: Verify GPU detection:

hftool -t t2i -i "test" -o test.png -v
# Should show "Using device: cuda" or similar

NVIDIA CUDA

Works with CUDA 11.8+ and modern NVIDIA GPUs.

Apple Silicon (MPS)

Basic support for M1/M2/M3 Macs. Some models may require --dtype float32.

CPU

Works but slow. Use smaller models: - openai/whisper-small for ASR - suno/bark-small for TTS


Project Structure

hftool/
├── cli.py              # CLI entry point with subcommands
├── core/
│   ├── device.py       # ROCm/CUDA/MPS/CPU detection
│   ├── registry.py     # Task registry and configuration
│   ├── models.py       # Model registry with download metadata
│   └── download.py     # Model download manager
├── tasks/
│   ├── base.py         # Abstract base task class
│   ├── text_to_image.py
│   ├── text_to_video.py
│   ├── text_to_speech.py
│   ├── speech_to_text.py
│   └── transformers_generic.py
├── io/
│   ├── input_loader.py # Input handling
│   └── output_handler.py # Output handling (ffmpeg)
└── utils/
    └── deps.py         # Dependency checking

Running Tests

pip install -e ".[dev]"
pytest tests/ -v

License

MIT License


Model References