Group semantic intervals into final product clusters.
Use every input interval at most once.
Each interval may include duration_seconds and identity_cues distilled from original host speech.
Intervals:
{intervals_text}
