Metadata-Version: 2.4
Name: atom-audio-engine
Version: 0.1.6
Summary: A pluggable, async-first Python framework for real-time audio-to-audio conversational AI
Author-email: ATOM Group <info@atomgroup.ng>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ATOM-GROUP-NG/audio-engine
Project-URL: Repository, https://github.com/ATOM-GROUP-NG/audio-engine.git
Project-URL: Issues, https://github.com/ATOM-GROUP-NG/audio-engine/issues
Keywords: audio,speech-to-text,text-to-speech,llm,conversational-ai,real-time,streaming,websocket
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: websockets>=12.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Provides-Extra: asr
Requires-Dist: openai>=1.0.0; extra == "asr"
Requires-Dist: deepgram-sdk>=3.0.0; extra == "asr"
Requires-Dist: assemblyai>=0.20.0; extra == "asr"
Requires-Dist: cartesia>=1.0.0; extra == "asr"
Provides-Extra: llm
Requires-Dist: anthropic>=0.18.0; extra == "llm"
Requires-Dist: groq>=0.4.0; extra == "llm"
Provides-Extra: tts
Requires-Dist: cartesia>=1.0.0; extra == "tts"
Requires-Dist: elevenlabs>=1.0.0; extra == "tts"
Provides-Extra: all
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: deepgram-sdk>=3.0.0; extra == "all"
Requires-Dist: assemblyai>=0.20.0; extra == "all"
Requires-Dist: cartesia>=1.0.0; extra == "all"
Requires-Dist: anthropic>=0.18.0; extra == "all"
Requires-Dist: groq>=0.4.0; extra == "all"
Requires-Dist: elevenlabs>=1.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

## Features

- **Pluggable Architecture**: Swap ASR, LLM, and TTS providers easily
- **Real-time Streaming**: WebSocket server for low-latency conversations
- **GeneFace++ Integration**: Optional face animation from audio
- **Simple API**: Get started with just a few lines of code

## Installation

```bash
pip install atom-audio-engine
```

For development with all optional dependencies:

```bash
pip install atom-audio-engine[all,dev]
```

## Quick Start

### Basic Usage

```python
from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)
```

### WebSocket Server

```python
from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()
```

### With GeneFace++ Face Animation

```python
from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)
```

## Architecture

```
User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video
```

## Directory Structure

```
audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts
```

## Implementing a Provider

### Custom ASR

```python
from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass
```

### Custom LLM

```python
from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass
```

### Custom TTS

```python
from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass
```

## WebSocket Protocol

### Client → Server

- **Binary**: Raw audio chunks (PCM 16-bit, 16kHz mono)
- **JSON**: `{"type": "end_of_speech"}` or `{"type": "reset"}`

### Server → Client

- **Binary**: Response audio chunks
- **JSON Events**:
  - `{"type": "connected", "client_id": "..."}`
  - `{"type": "transcript", "text": "..."}`
  - `{"type": "response_text", "text": "..."}`
  - `{"type": "response_start"}`
  - `{"type": "response_end"}`

## Environment Variables

```bash
# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true
```

## License

MIT
