Metadata-Version: 2.2
Name: cjm-transcript-segment-align
Version: 0.0.2
Summary: FastHTML dual-column text segmentation & VAD alignment UI for transcript decomposition workflows with forced alignment-based text splitting for aligning text segments with VAD chunks.
Home-page: https://github.com/cj-mills/cjm-transcript-segment-align
Author: Christian J. Mills
Author-email: 9126128+cj-mills@users.noreply.github.com
License: Apache-2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cjm-plugin-system
Requires-Dist: cjm_transcription_plugin_system
Requires-Dist: cjm_transcript_segmentation
Requires-Dist: cjm_transcript_vad_align
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# cjm-transcript-segment-align


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_transcript_segment_align
```

## Project Structure

    nbs/
    ├── components/ (4)
    │   ├── handlers.ipynb         # Handler wrappers for cross-domain coordination (alignment status updates)
    │   ├── helpers.ipynb          # State extraction helpers for cross-domain coordination in Phase 2 combined step
    │   ├── keyboard_config.ipynb  # Shared keyboard navigation configuration for the combined Phase 2 step
    │   └── step_renderer.ipynb    # Phase 2 combined step renderer: dual-column layout for Segment & Align
    ├── routes/ (2)
    │   ├── chrome.ipynb            # Shared chrome switching route handlers for the combined Phase 2 step
    │   └── forced_alignment.ipynb  # Routes for triggering forced alignment, polling progress, and toggling between NLTK and force-aligned pre-splits
    ├── services/ (1)
    │   └── forced_alignment.ipynb  # Forced alignment service for audio-informed text pre-splitting via forced alignment plugin
    └── html_ids.ipynb  # HTML ID constants for Phase 2 Shell: Dual-Column Layout shared chrome

Total: 8 notebooks across 3 directories

## Module Dependencies

``` mermaid
graph LR
    components_handlers[components.handlers<br/>handlers]
    components_helpers[components.helpers<br/>helpers]
    components_keyboard_config[components.keyboard_config<br/>keyboard_config]
    components_step_renderer[components.step_renderer<br/>step_combined]
    html_ids[html_ids<br/>html_ids]
    routes_chrome[routes.chrome<br/>chrome]
    routes_forced_alignment[routes.forced_alignment<br/>forced_alignment]
    services_forced_alignment[services.forced_alignment<br/>forced_alignment]

    components_handlers --> components_keyboard_config
    components_handlers --> html_ids
    components_handlers --> components_step_renderer
    components_keyboard_config --> html_ids
    components_step_renderer --> components_helpers
    components_step_renderer --> components_keyboard_config
    components_step_renderer --> html_ids
    routes_chrome --> html_ids
    routes_chrome --> components_keyboard_config
    routes_chrome --> components_step_renderer
    routes_forced_alignment --> html_ids
    routes_forced_alignment --> components_step_renderer
    routes_forced_alignment --> services_forced_alignment
```

*13 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### chrome (`chrome.ipynb`)

> Shared chrome switching route handlers for the combined Phase 2 step

#### Import

``` python
from cjm_transcript_segment_align.routes.chrome import (
    DEBUG_SWITCH_CHROME,
    init_chrome_router
)
```

#### Functions

``` python
async def _handle_switch_chrome(
    state_store:SQLiteWorkflowStateStore,  # State store instance
    workflow_id:str,  # Workflow identifier
    request,  # FastHTML request object
    sess,  # FastHTML session object
    seg_urls:SegmentationUrls,  # URL bundle for segmentation routes
    align_urls:AlignmentUrls,  # URL bundle for alignment routes
) -> tuple:  # OOB swaps for shared chrome containers
    "Switch shared chrome content based on active column."
```

``` python
def init_chrome_router(
    state_store: SQLiteWorkflowStateStore,  # State store instance
    workflow_id: str,  # Workflow identifier
    seg_urls: SegmentationUrls,  # URL bundle for segmentation routes
    align_urls: AlignmentUrls,  # URL bundle for alignment routes
    prefix: str,  # Route prefix (e.g., "/workflow/core/chrome")
) -> Tuple[APIRouter, Dict[str, Callable]]:  # (router, route_dict)
    "Initialize chrome switching routes."
```

#### Variables

``` python
DEBUG_SWITCH_CHROME = False
```

### forced_alignment (`forced_alignment.ipynb`)

> Routes for triggering forced alignment, polling progress, and toggling
> between NLTK and force-aligned pre-splits

#### Import

``` python
from cjm_transcript_segment_align.routes.forced_alignment import (
    FA_CONTAINER_ID,
    FA_STATUS_ID,
    render_fa_trigger_button,
    render_fa_progress,
    render_fa_toggle,
    render_fa_controls,
    init_forced_alignment_routers
)
```

#### Functions

``` python
def render_fa_trigger_button(
    trigger_url: str,  # URL for forced alignment trigger route
    disabled: bool = False,  # Whether button is disabled
) -> Any:  # Force Align trigger button
    "Render the Force Align trigger button."
```

``` python
def render_fa_progress(
    progress_val: float,  # Progress value 0.0-1.0
    message: str,  # Progress stage message
    progress_url: str,  # URL for progress polling
) -> Any:  # Progress indicator with polling
    "Render forced alignment progress indicator with HTMX polling."
```

``` python
def render_fa_toggle(
    active_presplit: str,  # "nltk" or "forced_alignment"
    toggle_url: str,  # URL for toggle route
) -> Any:  # Toggle button group
    "Render the NLTK / Force Aligned toggle button group."
```

``` python
def render_fa_controls(
    trigger_url: str = "",  # URL for trigger route
    toggle_url: str = "",  # URL for toggle route
    active_presplit: Optional[str] = None,  # Current active mode (None = no FA done yet)
    fa_available: bool = False,  # Whether FA plugin is available
    oob: bool = False,  # Whether to render as OOB swap
) -> Any:  # FA controls container
    """
    Render the forced alignment controls container.
    
    Shows either:
    - Trigger button (if FA not yet run)
    - Toggle (if FA has been run)
    - Nothing (if FA plugin not available)
    """
```

``` python
async def _handle_fa_trigger(
    state_store: SQLiteWorkflowStateStore,
    workflow_id: str,
    fa_service: ForcedAlignmentService,
    source_service: SourceService,
    request: Any,
    sess: Any,
    seg_urls: SegmentationUrls,
    progress_url: str,
    toggle_url: str,
) -> Any:  # OOB updates for card stack, alignment status, FA controls, mini-stats
    "Trigger forced alignment and replace working segments."
```

``` python
async def _handle_fa_toggle(
    state_store: SQLiteWorkflowStateStore,
    workflow_id: str,
    request: Any,
    sess: Any,
    seg_urls: SegmentationUrls,
    toggle_url: str,
) -> Any:  # OOB updates for card stack, alignment status, FA controls, mini-stats
    "Toggle between NLTK and force-aligned pre-splits."
```

``` python
def init_forced_alignment_routers(
    state_store: SQLiteWorkflowStateStore,  # State store instance
    workflow_id: str,  # Workflow identifier
    fa_service: ForcedAlignmentService,  # Forced alignment service
    source_service: SourceService,  # Source service for audio paths/text
    seg_urls: SegmentationUrls,  # Segmentation URL bundle
    prefix: str,  # Route prefix (e.g., "/fa")
) -> Tuple[APIRouter, Dict[str, Callable]]:  # (router, route_dict)
    "Initialize forced alignment routes."
```

#### Variables

``` python
FA_CONTAINER_ID
FA_STATUS_ID
```

### forced_alignment (`forced_alignment.ipynb`)

> Forced alignment service for audio-informed text pre-splitting via
> forced alignment plugin

#### Import

``` python
from cjm_transcript_segment_align.services.forced_alignment import (
    map_fa_words_to_text,
    assign_words_to_chunks,
    build_segments_from_alignment,
    ForcedAlignmentService
)
```

#### Functions

``` python
def _strip_punct(text: str) -> str
    "Strip punctuation from text for comparison with FA output."
```

``` python
def map_fa_words_to_text(
    text: str,  # Original text with punctuation
    fa_items: List[ForcedAlignItem],  # FA word-level alignment results
) -> List[Tuple[int, int]]:  # List of (start_char, end_char) spans into original text
    """
    Map forced alignment words back to character spans in the original text.
    
    Walks through the original text, matching each FA word (punctuation-stripped)
    against original text tokens. Returns character offset pairs for each FA word.
    """
```

``` python
def assign_words_to_chunks(
    fa_items: List[ForcedAlignItem],  # FA word-level alignment results
    vad_chunks: List[VADChunk],  # VAD chunks with start/end times
) -> List[int]:  # Chunk index for each FA word
    """
    Assign each FA word to a VAD chunk based on timestamp overlap.
    
    Words whose start_time falls within a chunk's [start, end] range are
    assigned to that chunk. Words in silence gaps are assigned to the
    nearest chunk by time proximity.
    """
```

``` python
def build_segments_from_alignment(
    text: str,  # Original text with punctuation
    spans: List[Tuple[int, int]],  # Character spans from map_fa_words_to_text
    assignments: List[int],  # Chunk index per word from assign_words_to_chunks
    num_chunks: int,  # Total number of VAD chunks
    source_id: Optional[str] = None,  # Source block ID for traceability
    source_provider_id: Optional[str] = None,  # Source provider identifier
) -> List[TextSegment]:  # One segment per VAD chunk
    """
    Build TextSegment list by grouping words by their assigned VAD chunk.
    
    Each VAD chunk gets one TextSegment whose text is the joined original
    (punctuated) words assigned to that chunk.
    """
```

#### Classes

``` python
class ForcedAlignmentService:
    def __init__(
        self,
        plugin_manager: PluginManager,  # Plugin manager for accessing forced alignment plugin
        plugin_name: str = "cjm-transcription-plugin-qwen3-forced-aligner",  # Name of the FA plugin
    )
    "Service for audio-informed text pre-splitting via forced alignment plugin."
    
    def __init__(
            self,
            plugin_manager: PluginManager,  # Plugin manager for accessing forced alignment plugin
            plugin_name: str = "cjm-transcription-plugin-qwen3-forced-aligner",  # Name of the FA plugin
        )
        "Initialize the forced alignment service."
    
    def is_available(self) -> bool:  # True if plugin is loaded and ready
            """Check if the forced alignment plugin is available."""
            return self._manager.get_plugin(self._plugin_name) is not None
    
        def ensure_loaded(
            self,
            config: Optional[Dict[str, Any]] = None,  # Optional plugin configuration
        ) -> bool:  # True if successfully loaded
        "Check if the forced alignment plugin is available."
    
    def ensure_loaded(
            self,
            config: Optional[Dict[str, Any]] = None,  # Optional plugin configuration
        ) -> bool:  # True if successfully loaded
        "Ensure the forced alignment plugin is loaded."
    
    async def align_and_split_async(
            self,
            audio_path: str,  # Path to the audio file
            text: str,  # Original transcript text blob (with punctuation)
            vad_chunks: List[VADChunk],  # VAD chunks for this audio
            source_id: Optional[str] = None,  # Source block ID for traceability
            source_provider_id: Optional[str] = None,  # Source provider identifier
        ) -> List[TextSegment]:  # One segment per VAD chunk
        "Run forced alignment and split text into segments matching VAD chunks."
    
    def align_and_split(
            self,
            audio_path: str,  # Path to the audio file
            text: str,  # Original transcript text blob
            vad_chunks: List[VADChunk],  # VAD chunks for this audio
            source_id: Optional[str] = None,
            source_provider_id: Optional[str] = None,
        ) -> List[TextSegment]:  # One segment per VAD chunk
        "Run forced alignment and split text synchronously."
    
    async def align_and_split_combined_async(
            self,
            source_blocks: List[Any],  # SourceBlock objects with id, provider_id, text
            audio_paths: List[str],  # Audio file path per source block
            vad_chunks_by_source: List[List[VADChunk]],  # VAD chunks per source block
        ) -> List[TextSegment]:  # Combined segments with global indexing
        "Align and split multiple source blocks with their respective audio."
```

#### Variables

``` python
_PUNCT_RE
```

### handlers (`handlers.ipynb`)

> Handler wrappers for cross-domain coordination (alignment status
> updates)

#### Import

``` python
from cjm_transcript_segment_align.components.handlers import (
    wrapped_seg_split,
    wrapped_seg_merge,
    wrapped_seg_undo,
    wrapped_seg_reset,
    wrapped_seg_ai_split,
    wrap_seg_mutation_handler,
    wrap_align_mutation_handler,
    create_seg_init_chrome_wrapper,
    create_align_init_chrome_wrapper
)
```

#### Functions

``` python
def _find_session_id(args, kwargs):
    """Find session_id from args or kwargs."""
    # First check kwargs
    if 'sess' in kwargs
    "Find session_id from args or kwargs."
```

``` python
def wrap_seg_mutation_handler(
    handler: Callable,  # Handler function to wrap
) -> Callable:  # Wrapped handler that appends alignment status OOB
    """
    Wrap a segmentation mutation handler to add alignment status OOB.
    
    The handler is expected to take (state_store, workflow_id, ...) as first params.
    """
```

``` python
def wrap_align_mutation_handler(
    handler: Callable,  # Handler function to wrap
) -> Callable:  # Wrapped handler that appends alignment status OOB
    """
    Wrap an alignment mutation handler to add alignment status OOB.
    
    The handler is expected to take (state_store, workflow_id, ...) as first params.
    """
```

``` python
def create_seg_init_chrome_wrapper(
    align_urls:AlignmentUrls,  # URL bundle for alignment routes (for KB system)
    switch_chrome_url:str,  # URL for chrome switching (for KB system)
    fa_trigger_url:str="",  # URL for forced alignment trigger (optional)
    fa_toggle_url:str="",  # URL for forced alignment toggle (optional)
    fa_available:bool=False,  # Whether forced alignment plugin is available
) -> Callable:  # Wrapped handler that builds KB system and shared chrome
    """
    Create a wrapper for seg init that builds combined KB system and shared chrome.
    
    This is a factory that captures the URLs needed for KB system assembly.
    Optionally includes forced alignment controls if FA plugin is available.
    """
```

``` python
def create_align_init_chrome_wrapper() -> Callable:  # Wrapped handler that adds alignment status
    """Create a wrapper for align init that adds mini-stats and alignment status.
    
    Alignment init is simpler than seg init - it doesn't need to build the
    full KB system (seg init handles that). It just updates alignment-specific
    chrome and the alignment status badge.
    """
    async def wrapped_align_init(
        state_store:WorkflowStateStore,
        workflow_id:str,
        source_service:SourceService,
        alignment_service:AlignmentService,
        request:Any,
        sess:Any,
        urls:AlignmentUrls,
        visible_count:int=5,
        card_width:int=40,
    )
    """
    Create a wrapper for align init that adds mini-stats and alignment status.
    
    Alignment init is simpler than seg init - it doesn't need to build the
    full KB system (seg init handles that). It just updates alignment-specific
    chrome and the alignment status badge.
    """
```

### helpers (`helpers.ipynb`)

> State extraction helpers for cross-domain coordination in Phase 2
> combined step

#### Import

``` python
from cjm_transcript_segment_align.components.helpers import (
    SEG_DEFAULT_VISIBLE_COUNT,
    SEG_DEFAULT_CARD_WIDTH,
    ALIGN_DEFAULT_VISIBLE_COUNT,
    ALIGN_DEFAULT_CARD_WIDTH,
    check_alignment_ready,
    extract_seg_state,
    extract_alignment_state,
    get_segment_count,
    get_chunk_count
)
```

#### Functions

``` python
def check_alignment_ready(
    segment_count:int,  # Number of text segments
    chunk_count:int,  # Number of VAD chunks
) -> bool:  # True if counts match for 1:1 alignment
    "Check if segment and VAD chunk counts match for 1:1 alignment."
```

``` python
def extract_seg_state(
    ctx:InteractionContext,  # Interaction context with state
) -> Dict[str, Any]:  # Extracted state values
    "Extract segmentation state as explicit values for renderers."
```

``` python
def extract_alignment_state(
    ctx:InteractionContext,  # Interaction context with state
) -> Dict[str, Any]:  # Extracted state values
    "Extract alignment state as explicit values for renderers."
```

``` python
def get_segment_count(
    ctx:InteractionContext,  # Interaction context with state
) -> int:  # Number of segments
    "Get segment count from state without full extraction."
```

``` python
def get_chunk_count(
    ctx:InteractionContext,  # Interaction context with state
) -> int:  # Number of VAD chunks
    "Get VAD chunk count from state without full extraction."
```

#### Variables

``` python
SEG_DEFAULT_VISIBLE_COUNT = 3
SEG_DEFAULT_CARD_WIDTH = 80
ALIGN_DEFAULT_VISIBLE_COUNT = 5
ALIGN_DEFAULT_CARD_WIDTH = 40
```

### html_ids (`html_ids.ipynb`)

> HTML ID constants for Phase 2 Shell: Dual-Column Layout shared chrome

#### Import

``` python
from cjm_transcript_segment_align.html_ids import (
    CombinedHtmlIds
)
```

#### Classes

``` python
class CombinedHtmlIds:
    "HTML ID constants for Phase 2 Shell: Dual-Column Layout shared chrome."
    
    def as_selector(
            id_str:str  # The HTML ID to convert
        ) -> str:  # CSS selector with # prefix
        "Convert an ID to a CSS selector format."
```

### keyboard_config (`keyboard_config.ipynb`)

> Shared keyboard navigation configuration for the combined Phase 2 step

#### Import

``` python
from cjm_transcript_segment_align.components.keyboard_config import (
    DEBUG_KB_SYSTEM,
    ZONE_CHANGE_CALLBACK,
    SWITCH_CHROME_BTN_ID,
    render_keyboard_hints_collapsible,
    build_combined_kb_system,
    generate_zone_change_js
)
```

#### Functions

``` python
def render_keyboard_hints_collapsible(
    manager:ZoneManager,  # Keyboard zone manager with actions configured
    container_id:str="sd-keyboard-hints",  # HTML ID for the hints container
    include_zone_switch:bool=False,  # Whether to include zone switch hints
) -> Any:  # Collapsible keyboard hints component
    "Render keyboard shortcut hints in a collapsible DaisyUI collapse."
```

``` python
def build_combined_kb_system(
    seg_urls:SegmentationUrls,  # URL bundle for segmentation routes
    align_urls:AlignmentUrls,  # URL bundle for alignment routes
) -> Tuple[ZoneManager, Any]:  # (keyboard manager, rendered keyboard system)
    "Build combined keyboard system with segmentation and alignment zones."
```

``` python
def generate_zone_change_js(
    switch_chrome_url:str="",  # URL for chrome swap handler (empty = no swap)
) -> Script:  # Script element with zone change callback and click handlers
    "Generate JavaScript for zone change handling and column click handlers."
```

#### Variables

``` python
DEBUG_KB_SYSTEM = True
ZONE_CHANGE_CALLBACK = 'onCombinedZoneChange'
SWITCH_CHROME_BTN_ID = 'sd-switch-chrome-btn'
```

### step_combined (`step_renderer.ipynb`)

> Phase 2 combined step renderer: dual-column layout for Segment & Align

#### Import

``` python
from cjm_transcript_segment_align.components.step_renderer import (
    DEBUG_COMBINED_RENDER,
    render_seg_mini_stats_badge,
    render_align_mini_stats_badge,
    render_alignment_status_text,
    render_alignment_status,
    render_footer_inner_content,
    render_combined_step
)
```

#### Functions

``` python
def _render_column_header(
    title:str,  # Column title (e.g., "Text Decomposition")
    stats_id:str,  # HTML ID for the mini-stats badge area
    header_id:str,  # HTML ID for the column header container
    initial_text:str="--",  # Initial text for the mini-stats badge
) -> Any:  # Column header component
    "Render a column header with title and mini-stats badge."
```

``` python
def render_seg_mini_stats_badge(
    segments:List[TextSegment],  # Current segments
    oob:bool=False,  # Whether to render as OOB swap
) -> Any:  # Mini-stats badge Span
    "Render the segmentation mini-stats badge for the column header."
```

``` python
def render_align_mini_stats_badge(
    chunks:List[VADChunk],  # Current VAD chunks
    oob:bool=False,  # Whether to render as OOB swap
) -> Any:  # Mini-stats badge Span
    "Render the alignment mini-stats badge for the column header."
```

``` python
def render_alignment_status_text(
    segment_count:int,  # Number of text segments
    chunk_count:int,  # Number of VAD chunks
) -> str:  # Status message text
    "Generate alignment status message based on segment and VAD chunk counts."
```

``` python
def render_alignment_status(
    segment_count:int,  # Number of text segments
    chunk_count:int,  # Number of VAD chunks
    oob:bool=False,  # Whether to render as OOB swap
) -> Any:  # Alignment status badge component
    "Render the alignment status indicator badge."
```

``` python
def render_footer_inner_content(
    column_footer:Any,  # Column-specific footer content (decomp or align)
    segment_count:int,  # Number of text segments
    chunk_count:int,  # Number of VAD chunks
) -> Any:  # Styled wrapper div with column footer and alignment status
    """
    Render the footer inner content with consistent styling.
    
    This ensures the footer layout (justify-between) is preserved across
    all OOB swaps. Both the column-specific footer content and the
    alignment status indicator are wrapped in a flex container.
    """
```

``` python
def _placeholder(
    text:str,  # Placeholder message
) -> Any:  # Styled placeholder paragraph
    "Render a placeholder text element for uninitialized chrome containers."
```

``` python
def _render_shared_chrome(
    seg_state:dict=None,  # Extracted segmentation state (None = show placeholders)
    align_state:dict=None,  # Extracted alignment state (None = no VAD data yet)
    urls:SegmentationUrls=None,  # Segmentation URL bundle (required when seg_state provided)
    kb_manager:Any=None,  # Keyboard manager (required when seg_state provided)
) -> tuple:  # (hints, toolbar, controls, footer)
    """
    Render shared chrome containers, populated with segmentation content when initialized.
    
    Takes extracted state dicts from `extract_seg_state()` and `extract_alignment_state()`
    which contain deserialized TextSegment and VADChunk objects.
    """
```

``` python
def _render_seg_column(
    is_active:bool=True,  # Whether this column is initially active
    column_body:Any=None,  # Pre-rendered column body (None = not initialized)
    mini_stats_text:str="--",  # Mini-stats badge text
    init_url:str="",  # URL for auto-trigger initialization
) -> Any:  # Left column component
    "Render the left segmentation column."
```

``` python
def _render_alignment_column(
    is_active:bool=False,  # Whether this column is initially active
    column_body:Any=None,  # Pre-rendered column body (None = not initialized)
    mini_stats_text:str="--",  # Mini-stats badge text
    init_url:str="",  # URL for auto-trigger initialization
) -> Any:  # Right column component
    "Render the right alignment column."
```

``` python
def _render_keyboard_system_container(
    kb_system:Any=None,  # Rendered keyboard system (None = empty container)
    oob:bool=False,  # Whether to render as OOB swap
) -> Any:  # Div with id=KEYBOARD_SYSTEM containing KB elements
    "Render stable container for keyboard navigation system elements."
```

``` python
def render_combined_step(
    ctx:InteractionContext,  # Interaction context with state and data
    seg_urls:SegmentationUrls=None,  # URL bundle for segmentation routes
    align_urls:AlignmentUrls=None,  # URL bundle for alignment routes
    switch_chrome_url:str="",  # URL for chrome switching route
) -> Any:  # FastHTML component with full dual-column layout
    "Render Phase 2: Combined Segment & Align step with dual-column layout."
```

#### Variables

``` python
DEBUG_COMBINED_RENDER = True
_FOOTER_INNER_CLS
_SEG_COLUMN_CLS
_ALIGNMENT_COLUMN_CLS
```
