Metadata-Version: 2.4
Name: livekit-plugins-google
Version: 1.0.2
Summary: Agent Framework plugin for services from Google Cloud
Project-URL: Documentation, https://docs.livekit.io
Project-URL: Website, https://livekit.io/
Project-URL: Source, https://github.com/livekit/agents
Author: LiveKit
License-Expression: Apache-2.0
Keywords: audio,livekit,realtime,video,webrtc
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9.0
Requires-Dist: google-auth<3,>=2
Requires-Dist: google-cloud-speech<3,>=2
Requires-Dist: google-cloud-texttospeech<3,>=2
Requires-Dist: google-genai==1.5.0
Requires-Dist: livekit-agents>=1.0.2
Description-Content-Type: text/markdown

# LiveKit Plugins Google

Agent Framework plugin for services from Google Cloud. Currently supporting Google's [Speech-to-Text](https://cloud.google.com/speech-to-text) API.

## Installation

```bash
pip install livekit-plugins-google
```

## Pre-requisites

For credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials).

To use the STT and TTS API, you'll need to enable the respective services for your Google Cloud project.

- Cloud Speech-to-Text API
- Cloud Text-to-Speech API


## Gemini Multimodal Live

Gemini Multimodal Live can be used with the `MultimodalAgent` class. See examples/multimodal_agent/gemini_agent.py for an example.

### Live Video Input (experimental)

You can push video frames to your Gemini Multimodal Live session alongside the audio automatically handled by the `MultimodalAgent`.  The basic approach is to subscribe to the video track, create a video stream, sample frames at a suitable frame rate, and push them into the RealtimeSession:

```
# Make sure you subscribe to audio and video tracks
await ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL)

# Create your RealtimeModel and store a reference
model = google.beta.realtime.RealtimeModel(
    # ...
)

# Create your MultimodalAgent as usual
agent = MultimodalAgent(
    model=model,
    # ...
)

# Async method to process the video track and push frames to Gemini
async def _process_video_track(self, track: Track):
    video_stream = VideoStream(track)
    last_frame_time = 0
    
    async for event in video_stream:
        current_time = asyncio.get_event_loop().time()
        
        # Sample at 1 FPS
        if current_time - last_frame_time < 1.0: 
            continue
            
        last_frame_time = current_time
        frame = event.frame
        
        # Push the frame into the RealtimeSession
        model.sessions[0].push_video(frame)
        
    await video_stream.aclose()

# Subscribe to new tracks and process them
@ctx.room.on("track_subscribed")
def _on_track_subscribed(track: Track, pub, participant):
    if track.kind == TrackKind.KIND_VIDEO:
        asyncio.create_task(self._process_video_track(track))
```



