Metadata-Version: 2.4
Name: stt-utils
Version: 0.1.2
Summary: A utility library for working with speech-to-text transcriptions.
Keywords: stt,speech to text,whisper,alignment,utility,audio,timestamps,transcription,closed captions,subtitles
Author: Mikhail Pankin
Author-email: Mikhail Pankin <mishapankin@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: 3.15
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pydantic>=2.10
Requires-Python: >=3.9
Project-URL: Documentation, https://github.com/mishapankin/stt-utils#readme
Project-URL: Homepage, https://github.com/mishapankin/stt-utils
Project-URL: Issues, https://github.com/mishapankin/stt-utils/issues
Project-URL: Repository, https://github.com/mishapankin/stt-utils
Description-Content-Type: text/markdown

# stt-utils

[![PyPI version](https://badge.fury.io/py/stt-utils.svg)](https://pypi.org/project/stt-utils/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A utility library for working with word-level speech-to-text timestamps. The main goal is to simplify working with timestamps generated by openai's whisper model with "word" timestamp_granularities.

## Features
- Realign word timestamps with the full text
- Merge several transcriptions together
- (Optional dependency) Split long audio into smaller segments by the moments of silence.

## Installation

```bash
pip install stt-utils
```

## Usage

### Python API

Align word timestamps from openai whisper
```python
from stt_utils import UnprocessedTranscription, Transcription

transcription = openai_client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word"],
)

unprocessed_transcription = UnprocessedTranscription(**transcription.model_dump())
aligned_transcription = Transcription.from_unprocessed_transcription(unprocessed_transcription)

aligned_transcription.dump_prevew()
```

## Development
It is recomended to use [uv](https://docs.astral.sh/uv/) toolset for development.

## Testing
There are unittests available in the `tests/` directory.
```
uv run pytest
```


## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.