Metadata-Version: 2.4
Name: audia
Version: 0.1.2
Summary: An agentic Python package that converts ideas and documents into audio – PDF papers, reports, and regulations turned into podcast-style audio files.
Author: Yauheniya Varabyova
Maintainer: Yauheniya Varabyova
Project-URL: Changelog, https://github.com/yauheniya-ai/audia/blob/main/CHANGELOG.md
Project-URL: Documentation, https://audia.readthedocs.io
Project-URL: Repository, https://github.com/yauheniya-ai/audia
Keywords: audio,tts,stt,pdf,arxiv,research,speech,langgraph,agents
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: requests>=2.32.0
Requires-Dist: PyMuPDF>=1.22.0
Requires-Dist: fastapi>=0.110
Requires-Dist: uvicorn[standard]>=0.29
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: jinja2>=3.1
Requires-Dist: aiofiles>=23.0
Requires-Dist: httpx>=0.27
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13.0
Requires-Dist: langgraph>=0.2
Requires-Dist: langchain>=0.3
Requires-Dist: langchain-core>=0.3
Requires-Dist: sqlalchemy>=2.0
Requires-Dist: arxiv>=2.1
Requires-Dist: edge-tts>=7.0
Requires-Dist: soundfile>=0.12
Requires-Dist: numpy>=1.24
Requires-Dist: langchain-anthropic>=0.3
Requires-Dist: anthropic>=0.30
Requires-Dist: langchain-openai>=0.2
Requires-Dist: openai>=1.30
Requires-Dist: faster-whisper>=1.0
Requires-Dist: sounddevice>=0.4
Provides-Extra: kokoro
Requires-Dist: kokoro>=0.9; extra == "kokoro"
Requires-Dist: sounddevice>=0.4; extra == "kokoro"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5; extra == "docs"
Requires-Dist: myst-parser>=0.18; extra == "docs"
Provides-Extra: dev
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: pre-commit>=3.0; extra == "dev"
Requires-Dist: audia[kokoro]; extra == "dev"
Requires-Dist: audia[docs]; extra == "dev"
Dynamic: license-file

# <img src="https://api.iconify.design/streamline-freehand:help-headphones-customer-support-human.svg" width="24" height="24"> audia — turn your ideas into audio

<div align="center">

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-purple.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://img.shields.io/pypi/v/audia?color=blue&label=PyPI)](https://pypi.org/project/audia/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/audia)](https://pypistats.org/packages/audia)
[![Tests](https://github.com/yauheniya-ai/audia/actions/workflows/tests.yml/badge.svg)](https://github.com/yauheniya-ai/audia/actions/workflows/tests.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/yauheniya-ai/88593f7c590674e0f8c99c66c7b58b36/raw/coverage.json)](https://github.com/yauheniya-ai/audia/actions/workflows/tests.yml)
[![GitHub last commit](https://img.shields.io/github/last-commit/yauheniya-ai/audia)](https://github.com/yauheniya-ai/audia/commits/main)

</div>

**audia** is an agentic Python package that converts PDFs — academic papers, reports, regulations — into podcast-style audio files.
It uses an LLM to rewrite content into natural spoken language (math in plain English, tables as sentences, no citations) before passing it to a TTS engine, so the result actually sounds good when read aloud.

<p align="center">
  <img src="https://raw.githubusercontent.com/yauheniya-ai/audia/main/docs/images/Screenshot_CLI.png" width="100%" />
  <em>The audia CLI</em>
</p>

## Features

- **LLM-curated text** — mandatory LLM pass rewrites math notation, condenses tables and acknowledgements, removes citation artefacts, and ensures smooth spoken flow
- **Chunk-level stitching** — long documents are split at paragraph boundaries; each chunk receives the tail of the previous curated output as transition context
- **ArXiv research** — search papers by query and convert them to audio in one command
- **Voice input (STT)** — record a spoken query to trigger an ArXiv search
- **Multiple TTS backends** — `edge-tts` (default, free), `kokoro` (local), or OpenAI TTS
- **Multiple LLM backends** — OpenAI (`gpt-4o-mini` default) or Anthropic
- **CLI** — `audia convert`, `research`, `listen`, `serve`, `info`
- **Web UI** — FastAPI backend + SPA frontend
- **Local storage** — SQLite database for papers and audio files via SQLAlchemy
- **Debug output** — every run saves raw, preprocessed, and curated text to `~/.audia/debug/<run_id>/`

## Tech Stack

**Backend**
- <img src="https://api.iconify.design/devicon:python.svg" width="16" height="16"> [Python](https://www.python.org) 3.10+ — package language
- <img src="https://api.iconify.design/devicon:fastapi.svg" width="16" height="16"> [FastAPI](https://fastapi.tiangolo.com) — backend for the web UI
- <img src="https://api.iconify.design/simple-icons:langgraph.svg" width="16" height="16"> [LangGraph](https://github.com/langchain-ai/langgraph) — agentic pipeline orchestration (PDF → preprocess → LLM curate → TTS)
- <img src="https://api.iconify.design/simple-icons:langchain.svg" width="16" height="16"> [LangChain](https://github.com/langchain-ai/langchain) — LLM abstraction (OpenAI / Anthropic)
- <img src="https://api.iconify.design/logos:microsoft-icon.svg" width="16" height="16"> [edge-tts](https://github.com/rany2/edge-tts) — default TTS backend, no API key required
- <img src="https://upload.wikimedia.org/wikipedia/commons/d/da/SYSTRAN_logo.svg" width="46" height="16"> [faster-whisper](https://github.com/SYSTRAN/faster-whisper) — STT for voice input
- <img src="https://pymupdf.readthedocs.io/en/latest/_static/sidebar-logo-dark.svg" width="16" height="16"> [PyMuPDF](https://pymupdf.readthedocs.io/) — PDF text extraction
- <img src="https://api.iconify.design/devicon:sqlite.svg" width="16" height="16"> [SQLite](https://sqlite.org/docs.html) — local database for papers and audio files

**Frontend**
- <img src="https://api.iconify.design/devicon:react.svg" width="16" height="16"> [React](https://react.dev) — interactive frontend
- <img src="https://api.iconify.design/devicon:vitejs.svg" width="16" height="16"> [Vite](https://vite.dev) — fast dev server and production bundler
- <img src="https://api.iconify.design/devicon:tailwindcss.svg" width="16" height="16"> [Tailwind CSS](https://v2.tailwindcss.com/docs) — utility-first styling
- <img src="https://api.iconify.design/devicon:typescript.svg" width="16" height="16"> [TypeScript](https://www.typescriptlang.org/docs/) — type-safe component and API code

**CLI**
- <img src="https://api.iconify.design/devicon:typer.svg" width="16" height="16"> [Typer](https://typer.tiangolo.com/) + [Rich](https://rich.readthedocs.io/) — CLI with coloured progress output

**Packaging**
- <img src="https://api.iconify.design/devicon:pypi.svg" width="16" height="16"> [PyPI](https://pypi.org/project/audia/) — distributed as an installable Python package

## Installation

```bash
pip install audia
```

For CLI usage, [pipx](https://pipx.pypa.io/) is recommended — it installs `audia` in an isolated environment while exposing the command globally:

```bash
pipx install "audia"
```

Optional extras:

| Extra | Installs |
|---|---|
| `kokoro` | local Kokoro TTS |

```bash
pip install audia[kokoro]
```

## Configuration

Copy `.env.example` to `.env` in your working directory and set your API key:

```bash
cp .env.example .env
```

Minimum required settings:

```dotenv
AUDIA_LLM_PROVIDER=openai           # or anthropic
AUDIA_OPENAI_API_KEY=sk-...
```

All settings use the `AUDIA_` prefix. Run `audia info` to see the active configuration.

## Quick Start

**Show active configuration:**

```bash
audia info
```

**Convert a local PDF:**

```bash
audia convert paper.pdf
```

**Convert multiple PDFs to a specific output folder:**

```bash
audia convert paper1.pdf paper2.pdf --output ~/audiobooks
```

**Search ArXiv and convert the top results:**

```bash
audia research "retrieval augmented generation" --max-results 3 --convert
```

**Start the web UI:**

```bash
audia serve
# → http://localhost:8000
```

## Pipeline

The pipeline can be entered in three ways:

| Entry point | Command |
|---|---|
| Voice input | `audia listen` — record speech, LLM distils a search query, confirm, then runs the full pipeline |
| Text query | `audia research "retrieval augmented generation"` — search ArXiv by text, select papers, run pipeline |
| Local PDF | `audia convert paper.pdf` — skip Steps 0, go straight to extraction |

When starting from voice or text, the full five-step [LangGraph](https://github.com/langchain-ai/langgraph) pipeline runs. For local PDFs, Steps 1–4 run directly:

```
 [voice input]          [text query]
      │                      │
      ▼                      │
  Microphone                 │
  (faster-whisper STT)       │
      │                      │
      ▼                      │
  LLM query distillation     │        ← extracts concise ArXiv search terms
      │                      │           from natural speech
      ▼                      │
  Confirm / re-record?       │
      │  yes                 │
      ▼                      ▼
Step 0 — ArXiv search    (or use local PDF)
 │        arxiv API: fetch metadata, download PDF
 │
 ▼
Step 1 — PDF extraction       PyMuPDF: text + metadata per page
 │
 ▼
Step 2 — Heuristic pre-pass   Regex: strip citations, LaTeX commands, figure captions
 │
 ▼
Step 3 — LLM curation         Chunked LLM pass: math → English, tables → sentences,
 │                             smooth spoken transitions between chunks
 ▼
Step 4 — TTS synthesis        edge-tts (or kokoro / OpenAI): split into ~3800-char
                               chunks, synthesise, concatenate → .mp3
```

Output files for a run on `2025_Xu+.pdf`:

```
~/.audia/audio/2025_Xu+_20260329_084445.mp3
~/.audia/debug/2025_Xu+_20260329_084445/
    1_raw.txt            ← PyMuPDF output
    2_preprocessed.txt   ← after heuristic pass
    3_curated.txt        ← after LLM curation
```

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/my-change`)
3. Make your changes
4. Run the test suite: `pytest --cov=src --cov-report=term-missing`
5. Submit a pull request

## License

MIT — see [LICENSE](https://raw.githubusercontent.com/yauheniya-ai/audia/main/LICENSE) for details.
