Metadata-Version: 2.4
Name: ytranscript
Version: 0.1.0
Summary: Fetch YouTube video transcripts with automatic Whisper fallback
Project-URL: Homepage, https://github.com/Decolo/ytranscript
Project-URL: Issues, https://github.com/Decolo/ytranscript/issues
License: MIT
License-File: LICENSE
Keywords: cli,subtitles,transcript,whisper,youtube
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: youtube-transcript-api>=1.2.4
Requires-Dist: yt-dlp>=2026.3.17
Description-Content-Type: text/markdown

# ytranscript

Fetch YouTube video transcripts from the command line. Uses YouTube's caption API when available, falls back to local [Whisper](https://github.com/SYSTRAN/faster-whisper) transcription for videos without captions.

## Installation

```bash
pip install ytranscript
```

**System requirement:** [ffmpeg](https://ffmpeg.org/download.html) must be installed for the Whisper fallback.

```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
apt install ffmpeg
```

## Usage

```bash
# Basic — auto-detect language
ytranscript 'https://www.youtube.com/watch?v=VIDEO_ID'

# Markdown output with metadata
ytranscript URL -f md

# JSON output (for pipelines)
ytranscript URL -f json

# Include timestamps
ytranscript URL --timestamps

# Force Whisper transcription
ytranscript URL --whisper

# Specify language
ytranscript URL --lang zh

# Choose Whisper model (tiny/base/small/medium/large)
ytranscript URL --model small
```

## Batch processing

```bash
# Multiple URLs at once
ytranscript URL1 URL2 URL3 --output-dir ./transcripts/

# From a file (one URL per line)
ytranscript --batch urls.txt --output-dir ./transcripts/

# Entire playlist or channel
ytranscript 'https://www.youtube.com/playlist?list=PLAYLIST_ID' --output-dir ./transcripts/
```

## Options

| Flag | Description |
|------|-------------|
| `-f plain\|json\|md` | Output format (default: plain) |
| `--lang CODE` | Language code, e.g. `en`, `zh`, `ja` (default: auto-detect) |
| `--timestamps` | Prefix each line with `[MM:SS]` |
| `--whisper` | Force local Whisper transcription |
| `--model SIZE` | Whisper model: `tiny`, `base`, `small`, `medium`, `large` (default: `base`) |
| `--no-metadata` | Skip title/channel/duration fetch |
| `--no-cache` | Bypass disk cache |
| `--batch FILE` | Read URLs from a file |
| `--output-dir DIR` | Write each transcript to `DIR/{video_id}.{ext}` |

## Caching

Transcripts are cached at `~/.cache/ytranscript/` by default. Subsequent calls for the same video are instant. Use `--no-cache` to force a fresh fetch.

## Environment variables

| Variable | Description |
|----------|-------------|
| `YT_TRANSCRIPT_MODEL` | Default Whisper model size (e.g. `small`) |

## How it works

1. Tries [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api) — fast, no download needed
2. If no captions exist, downloads audio via [yt-dlp](https://github.com/yt-dlp/yt-dlp) and transcribes locally with [faster-whisper](https://github.com/SYSTRAN/faster-whisper)

This covers ~100% of videos, including those without any captions.

## License

MIT
