Metadata-Version: 2.4
Name: speechbot
Version: 1.0.0
Summary: Speech-driven bots and services (e.g. Telegram) with pluggable speech matching.
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: pyTelegramBotAPI
Requires-Dist: speechmatching
Dynamic: license-file

speechbot
#########

The ``speechbot`` package is a framework for building speech-first
chatbots with a block tree. It is designed for bots where a user moves between
options by speaking a word.

The bot is configured with:

- a block tree that defines steps and word labels for moving between
  blocks
- a speech engine that matches incoming voice messages to those labels

This repository also contains a Telegram service implementation so the bot can
be run as a chat bot.

At a high level, ``speechbot`` runs a voice menu. A JSON file named
``tree.json`` lists the conversation steps and the spoken words that move
between them.

CLI guide
=========

This section focuses on running the bot and configuring ``tree.json``.

Quick start
-----------

1. Install the package.
2. Set ``TELEGRAM_BOT_TOKEN``.
3. Save a minimal ``tree.json``.
4. Run the bot and complete the setup prompts for missing audio.

Minimal ``tree.json``::

    {
      "root_id": "root",
      "blocks": [
        {
          "id": "root",
          "prompt_text": "Say hello or help",
          "edges": [
            {"word": "hello", "to": "hello"},
            {"word": "help", "to": "help"}
          ]
        },
        {
          "id": "hello",
          "prompt_text": "Last word: {last_word}",
          "edges": []
        },
        {
          "id": "help",
          "prompt_text": "Help menu",
          "edges": []
        }
      ]
    }

Run from the directory that contains ``tree.json``::

    export TELEGRAM_BOT_TOKEN="123456:ABC..."
    python3 -m speechbot telegram --tree tree.json --data-dir data \
        --speech-engine speechmatching --debug-users <admin_user_id>

Expected messages (example)::

    Bot: <audio message for "Say hello or help">
    Bot: Say hello or help
    User: <voice message>
    Bot: Heard: hello
    Bot: Last word: hello

Requirements
------------

Install the package from PyPI [pypi]_ ::

    pip install speechbot

The Telegram service requires a Telegram bot token. To get a token, create a
bot with BotFather [botfather]_ in Telegram and copy the token it returns.
BotFather can be found by searching for ``@BotFather`` in Telegram. Store the
token in ``TELEGRAM_BOT_TOKEN`` before starting the bot.

The default speech engine uses the ``speechmatching`` package. Typical
requirements include:

- ``ffmpeg`` available on ``PATH`` (Telegram voice messages are compressed
  audio files);
- the Python dependencies for ``speechmatching``;
- Docker access for the default Docker-based speech model.

Docker image
------------

The CLI is available as the Docker image ``aukesch/speechbot``. This can be
used to run the bot without installing the Python package locally.

Example run::

    docker pull aukesch/speechbot
    docker run --rm \
        -e TELEGRAM_BOT_TOKEN="123456:ABC..." \
        -v "$PWD":/work \
        -w /work \
        --entrypoint speechbot \
        aukesch/speechbot \
        telegram --tree tree.json --data-dir data --speech-engine speechmatching \
            --debug-users <admin_user_id>

Example Dockerfile::

    FROM aukesch/speechbot
    COPY . /app
    WORKDIR /app
    ENTRYPOINT ["speechbot"]
    CMD ["telegram", "--tree", "tree.json", "--data-dir", "data",
         "--speech-engine", "speechmatching"]

Run the bot
-----------

Export the token, then start the Telegram service::

    export TELEGRAM_BOT_TOKEN=\"123456:ABC...\"
    python3 -m speechbot telegram --tree examples/basic/tree.json \
        --data-dir data --speech-engine speechmatching \
        --debug-users <admin_user_id>

On startup, ``speechbot`` checks that all required assets exist. If word
recordings, prompt recordings, or referenced media files are missing, the bot
starts a guided setup process in Telegram to collect them.

While collecting recordings and uploads, you can send multiple voice messages
per word recording. For prompt recordings and media files, sending another
upload replaces the previous one. The ``/next`` command moves on, ``/skip``
moves on without saving, ``/status`` shows remaining items, and ``/done``
finishes setup.

When setup is active, the bot temporarily switches to a temporary setup tree.
When ``--debug-users`` is set, setup is limited to those user identifiers.
Setup is
limited to one user per chat, and other users will see a busy message until
setup completes. When all required assets exist, the bot returns to the main
tree and normal interaction continues.

**User commands**

- ``/start`` resets the state to the tree root.
- ``/undo`` restores the previous state snapshot.

The tree cannot be moved through using text messages. Text messages only replay
the prompt for the current block.

**Example setup transcript**

This is a short example of the prompt recording setup process::

    Bot: Prompt setup (1/2). Send a voice message (or audio file) reading this text aloud:
    Bot:
    Bot: Hello there
    Bot:
    Bot: Send /next to continue (after at least 1 recording), /skip to move on without saving, /status for progress. Sending another recording replaces the previous one.
    User: <voice message>
    Bot: Saved prompt recording. Send another recording to replace it, or /next for the next prompt.
    User: /next
    Bot: Prompt setup (2/2). Send a voice message (or audio file) reading this text aloud:
    Bot:
    Bot: Welcome
    Bot:
    Bot: Send /next to continue (after at least 1 recording), /skip to move on without saving, /status for progress. Sending another recording replaces the previous one.
    User: <voice message>
    Bot: Saved prompt recording. All required prompt recordings exist. Send another recording to replace it, or /done (or /next) to finish setup.
    User: /done
    Bot: Prompt setup complete. Continuing...

Shop builder
------------

The shop builder is an interactive admin process that runs inside Telegram and
writes a new tree to ``data/shop/tree.json``::

    python3 examples/shop_builder/main.py telegram --data-dir data

When the shop is published (``/publish``), the builder also creates a ``.zip``
package (``data/shop/shop.zip``) containing ``tree.json`` and all referenced
shop media under ``data/``. The builder sends that zip back via Telegram.

To run the generated shop directly from that zip, start ``speechbot`` without a
local ``tree.json`` and upload the zip as a document::

    python3 -m speechbot telegram --data-dir data --debug-users <admin_user_id>

The generated tree can also be run directly by referencing its path::

    python3 -m speechbot telegram --tree data/shop/tree.json \
        --data-dir data --speech-engine speechmatching

CLI reference
-------------

The CLI accepts one required argument and several optional arguments.

Required argument:

- ``service``: service name. Only ``telegram`` is supported.

Optional arguments:

- ``--token``: service token. If not set, the ``TELEGRAM_BOT_TOKEN``
  environment variable is used.
- ``--poll-timeout-s``: long polling timeout in seconds.
- ``--tree``: path to the block tree JSON file. If not set, the ``BOT_TREE``
  environment variable is used and defaults to ``tree.json``.
- ``--data-dir``: root data directory. If not set, the ``BOT_DATA``
  environment variable is used and defaults to ``data``.
- ``--debug-users``: Telegram user identifiers separated by comma that are
  allowed to run debug commands. If not set, the ``BOT_DEBUG_USERS``
  environment variable is used.
- ``--speech-engine``: matcher engine under ``speechbot/matchers``. If not
  set, the ``BOT_SPEECH_ENGINE`` environment variable is used and defaults
  to ``speechmatching``.

Tree.json reference
-------------------

The tree JSON file defines blocks and how to move between blocks. Each block
has a list
of ``edges`` that map a word label to a destination block id.

Each block is a step in the conversation. The ``prompt_text`` is shown when
the block is active. Each edge is a spoken option that moves to another
block. A recording is needed for each word label under
``data/recordings/<word>/``. A minimal example appears in Quick start.

**Structure**

The top level keys are:

- ``root_id``: id of the entry block.
- ``blocks``: list of block objects.

Each block supports:

- ``id``: unique block id.
- ``prompt_text``: text shown to the user when the block is active. If empty,
  the default prompt ``Say one of the available options.`` is used.
- ``edges``: list of edges in the form ``{"word": "...", "to": "..."}``.
- ``on_enter``: optional section for actions and context updates.

The ``on_enter`` section supports:

- ``text``: a text message that is sent when entering the block.
- ``photo``: list of photo paths to send.
- ``video``: list of video paths to send.
- ``audio``: list of audio paths to send.
- ``context``: map to set context keys.
- ``context_inc``: map of number increases.
- ``context_delete``: list of keys to remove from context.

If multiple photos or videos are provided, the service sends them as an album.
Text fields in ``prompt_text`` and ``on_enter.text`` are formatted with
``text.format(**context)``.

Media paths are resolved from the current working directory, or relative to
the tree JSON file location if the file is not found.

**Files on disk**

===============================  =============================
Path                             Purpose
===============================  =============================
tree.json                        Block tree definition
data/state/                      Per-user state JSON files
data/recordings/<word>/          Word recordings for matching
data/prompts/prompt_<sha256>/     Prompt recordings for prompts
data/inbox/                      Downloaded service media
data/media/                      Media referenced by tree.json
===============================  =============================

Prompt recordings send a spoken version of prompt text. They are not used for
word matching.

State and debug
---------------

**Context**

Per-user ``context`` is stored in the user state file and is carried across
blocks. The bot updates some keys automatically:

- ``last_word``
- ``from_block_id``
- ``block_id``

Text can include simple formatting expressions using
``text.format(**context)``. If formatting fails (for example because a key is
missing), the original string is kept.

Missing format keys do not raise a user-visible error. The original text is
used instead.

**Undo history**

The bot keeps a limited history of previous states. Users can restore the most
recent state using the ``/undo`` command.

**Debug commands**

Debug commands can be enabled for specific Telegram user identifiers, for
example
with::

    python3 -m speechbot telegram --tree examples/basic/tree.json \
        --data-dir data --debug-users 123,456

When enabled, the following commands are available:

- ``/debug`` shows the raw state information.
- ``/where`` shows the current block id.
- ``/context`` shows the current context map.
- ``/history`` shows the history length.

.. [pypi] https://pypi.org/project/speechbot/
.. [botfather] https://core.telegram.org/bots/features#botfather

Developer guide
===============

This section is for extending ``speechbot`` in Python.

Custom blocks
-------------

Blocks can be implemented in Python by using
``speechbot.blocks.CustomBlock`` and setting a ``block_id`` class attribute.
Custom blocks run inside the bot like normal blocks, but they should override
``handle`` to implement custom logic. The ``handle`` method receives the
incoming message, the user state, and a runtime object. When running under the
standard bot, that runtime object is the ``speechbot.bot.Bot`` instance.

Custom blocks are used by example code such as the shop builder, where the
interactive logic is written in Python rather than purely in JSON.

Example:

.. code-block:: python

    from speechbot.blocks import CustomBlock
    from speechbot.protocol import OutgoingText

    class HelloBlock(CustomBlock):
        block_id = 'hello'

        def __init__(self, prompt_text='Say hello'):
            CustomBlock.__init__(self, prompt_text)

        def handle(self, incoming, state, runtime):
            return ([OutgoingText(
                chat_id=incoming.chat_id,
                text='Hello from Python.'
            )], None)

        def on_enter_actions(self, incoming, state, runtime):
            return [OutgoingText(
                chat_id=incoming.chat_id,
                text='Entering the hello block.'
            )]

Custom services
---------------

The CLI only uses the Telegram service. To use another service, write
a custom service that connects the platform to the bot message handler.

A custom service needs to:

- receive messages and map them to Incoming classes from
  ``speechbot.protocol``
- include ``service``, ``chat_id``, ``user_id`` and ``message_id`` along
  with any metadata in ``meta``
- download media to disk and set ``path`` for ``IncomingVoice``,
  ``IncomingAudio``, ``IncomingPhoto``, ``IncomingVideo`` and
  ``IncomingDocument``
- call the bot message handler and run every returned ``Outgoing`` action
- map ``OutgoingMediaGroup`` to an album when supported, or send each item

Example:

.. code-block:: python

    from speechbot.protocol import IncomingText, OutgoingText

    class DummyService:
        def __init__(self):
            self._message_handler = None

        def run(self, message_handler):
            self._message_handler = message_handler
            incoming = IncomingText(
                service='dummy',
                chat_id=1,
                user_id=1,
                message_id=1,
                data='hello'
            )
            actions = message_handler(incoming)
            self._send(actions)

        def _send(self, actions):
            for action in actions:
                if type(action) is OutgoingText:
                    self._send_text(action.chat_id, action.text)

        def _send_text(self, chat_id, text):
            print('send to {}: {}'.format(chat_id, text))

Most services will also need a run loop like
``speechbot.services.telegram.TelegramService.run``. The built-in CLI does
not know about new services, so create a custom entrypoint or extend
``speechbot/cli.py`` to add a new service option.

Speech engines
--------------

``speechmatching`` is the default matcher, but additional engines can be added.
Create a module under ``speechbot/matchers`` that implements
``SpeechEngine`` from ``speechbot.matchers``. The engine must provide
``add_recording`` and ``match``. If debug output is needed, implement
``set_debug`` and ``get_last_debug`` similar to the speechmatching engine.

Add the engine to ``load_speech_engine`` in
``speechbot/matchers/__init__.py`` so the ``--speech-engine`` option can
find it.

Example:

.. code-block:: python

    from speechbot.matchers import SpeechEngine

    class DummyEngine(SpeechEngine):
        def __init__(self):
            self._labels = set()

        def add_recording(self, identifier, path):
            self._labels.add(identifier)

        def match(self, voice_path, identifiers=None):
            if identifiers is None:
                selected_identifiers = list(self._labels)
            else:
                selected_identifiers = [
                    i for i in identifiers if i in self._labels
                ]
            if len(selected_identifiers) == 0:
                return None
            return selected_identifiers[0]
