Analysis Comparison Report
Analyses Included in Comparison
{{ result.name }}
{{ result.themes|length }} themes
-
{% for t in result.themes %}
- {{ t.name }} {% endfor %}
Theme Network (UMAP Projection)
2-D UMAP projection of theme embeddings from each analysis, shown in different colours. Each point represents a theme; proximity reflects semantic similarity in the original embedding space.
Pairwise Comparisons
Select a pair to view detailed comparison metrics.
{{ comp.a.name }} vs {{ comp.b.name }} {% if comparison.comparison_plots.embeddings_csv and comparison.comparison_plots.embeddings_csv[key] %} Embeddings CSV {% endif %}
Cosine Similarity
Cosine similarity measures the angle between theme embedding vectors. Values range from 0 to 1 for text embeddings, with 1 indicating identical direction.
Summary Statistics
Thematic analysis doesn't have ground truth, so traditional precision/recall don't apply. Instead, we measure coverage (did themes find matches?) and fidelity (how close are the best matches?). Based on cosine similarity.
Proportion of themes with at least one match above threshold ({{ comparison.config.threshold }})
- Hit Rate A: {{ "%.1f"|format(comp.stats.hit_rate_a * 100) }}%
- Hit Rate B: {{ "%.1f"|format(comp.stats.hit_rate_b * 100) }}%
- Jaccard: {{ "%.3f"|format(comp.stats.jaccard) }}
High hit rates indicate both analyses found similar conceptual territory.
How close are the best matches? (Mean of each theme's best match similarity)
- A→B: {{ "%.3f"|format(comp.stats.mean_max_sim_a_to_b) }}
- B→A: {{ "%.3f"|format(comp.stats.mean_max_sim_b_to_a) }}
- Fidelity: {{ "%.3f"|format(comp.stats.fidelity) }}
Fidelity is the harmonic mean of directional scores. Higher = tighter semantic alignment.
{{ comp.stats.similarity_matrix }}
Angular Similarity
Angular similarity uses the angular distance between vectors (arccos of cosine), normalized to [0,1]. Unlike cosine, it's a proper metric that satisfies the triangle inequality, making it mathematically rigorous for averaging.
Best Matches (1:1)
The Hungarian algorithm finds the optimal one-to-one pairing that maximizes total similarity. Each theme maps to at most one theme in the other set -- no reuse allowed.
What this enables: Hungarian matching makes precision/recall well-defined by removing ambiguity about what counts as a "match". After matching: each pair = one prediction; unmatched A themes = false negatives; unmatched B themes = false positives.
Limitation: This penalises legitimate theme refinement (splitting one theme into two is treated as error). Use OT if you want to reward decomposition.
{{ "%.3f"|format(comp.stats.hungarian.soft_metrics.soft_precision) }}
Average similarity of optimal pairs
Interpretation: "How good are the best one-to-one correspondences?" Higher = tighter semantic alignment between the two theme sets.
{% if comp.stats.hungarian.distribution.n_pairs > 0 %}Distribution of {{ comp.stats.hungarian.distribution.n_pairs }} pairs above threshold:
- Median: {{ "%.3f"|format(comp.stats.hungarian.distribution.median) }} (Q1: {{ "%.3f"|format(comp.stats.hungarian.distribution.q1) }}, Q3: {{ "%.3f"|format(comp.stats.hungarian.distribution.q3) }})
- Range: {{ "%.3f"|format(comp.stats.hungarian.distribution.min) }} -- {{ "%.3f"|format(comp.stats.hungarian.distribution.max) }}
Based on {{ comp.stats.hungarian.distribution.n_pairs }} matched pairs above threshold ({{ comparison.config.threshold }})
{{ "%.0f"|format(comp.stats.hungarian.thresholded_metrics.recall * 100) }}%
Recall
(coverage of A)
{{ "%.0f"|format(comp.stats.hungarian.thresholded_metrics.precision * 100) }}%
Precision
(coverage of B)
{{ "%.3f"|format(comp.stats.hungarian.thresholded_metrics.true_jaccard) }}
Jaccard
(for Raza)
Caution: These metrics penalise over-splitting (hurts precision) and under-coverage (hurts recall). Only meaningful if you assume themes should map 1-to-1.
Hungarian algorithm finds the optimal one-to-one assignment.
| Theme in {{ comp.a.name }} | Theme in {{ comp.b.name }} | Angular Similarity |
|---|---|---|
|
{{ theme_a.theme_name }} {{ theme_a.embedded_string }} |
{{ theme_b.theme_name }} {{ theme_b.embedded_string }} |
{{ "%.3f"|format(similarity) }} |
No optimal pairs found.
{% endif %}Unbalanced Optimal Transport (Many-to-Many Alignment)
Unbalanced Optimal Transport allows themes to remain unmatched, representing genuinely novel or missing concepts. Unlike balanced OT (which forces all mass to transport), unbalanced OT permits themes to be left out when no good match exists. The reg_m (K) parameter controls the penalty for leaving mass unmatched.
This plot shows how unmatched mass changes as the K parameter varies. Lower K values allow more mass to remain unmatched (stricter matching), while higher K values force more themes to align even with poor matches.
| K | Shared Mass | Unmatched | Avg Cost | Splits (mean) | Joins (mean) | Relative Score |
|---|---|---|---|---|---|---|
| {{ "%.2f"|format(k_val) }}{% if k_val == comp.stats.default_k %} ★{% endif %}{% if k_val == comp.stats.elbow_k %} ◆{% endif %} | {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}% | {{ "%.1f"|format(ot_k.ot.unmatched_mass * 100) }}% | {{ "%.3f"|format(ot_k.ot.avg_cost) }} | {{ "%.2f"|format(ot_k.split_join_stats.splits_from_a.mean) }} | {{ "%.2f"|format(ot_k.split_join_stats.joins_to_b.mean) }} | {% if ot_k.ot.shared_mass_relative is defined %}{{ "%.2f"|format(ot_k.ot.shared_mass_relative) }}{% else %}-{% endif %} |
★ = Default K value. Higher K = stronger penalty for unmatching = more mass forced to transport. Lower K = more permissive, allowing themes to remain unmatched.
{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%
Shared Mass
{{ "%.2f"|format(ot_k.ot.shared_mass_relative) }}
Relative
0=random, 1=perfect
- Null: {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%
- Excess: +{{ "%.1f"|format(ot_k.ot.shared_mass_excess * 100) }}%
- Effect: {{ "%.1f"|format(ot_k.ot.shared_mass_effect) }}σ
{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%
Shared Mass
Null baseline not computed for this K value.
Interpretation: Of the possible improvement beyond random alignment, what fraction did we achieve? Values > 0.3 suggest meaningful structure; > 0.5 is good; > 0.7 is strong.
{% endif %}{{ "%.2f"|format(ot_k.ot.avg_cost_relative) }}
Relative Score
0 = random, 1 = perfect
{% else %}{{ "%.3f"|format(ot_k.ot.avg_cost) }}
Average Cost
{% endif %}- Observed: {{ "%.3f"|format(ot_k.ot.avg_cost) }} {% if ot_k.ot.null_avg_cost_mean is defined %}
- Null mean: {{ "%.3f"|format(ot_k.ot.null_avg_cost_mean) }}
- Improvement: {{ "%.3f"|format(ot_k.ot.avg_cost_improvement) }} {% endif %}
Interpretation: How much lower is the transport cost compared to random? Lower cost = better semantic alignment. Values > 0.3 suggest themes are meaningfully closer than chance.
{% else %}Null baseline comparison only computed for default K={{ "%.1f"|format(comp.stats.default_k) }}.
{% endif %}{{ "%.1f"|format(ot_k.ot.unmatched_mass * 100) }}%
Interpretation: Proportion of theme-mass that couldn't find a good match.
Themes in A flowing to multiple themes in B
- Mean: {{ "%.2f"|format(ot_k.split_join_stats.splits_from_a.mean) }}
- Median: {{ "%.1f"|format(ot_k.split_join_stats.splits_from_a.median) }}
- Mode: {{ ot_k.split_join_stats.splits_from_a.mode }}
- Max: {{ ot_k.split_join_stats.splits_from_a.max }}
- Themes with >1 target: {{ ot_k.split_join_stats.splits_from_a.n_multiple }}/{{ ot_k.split_join_stats.splits_from_a.total }} ({{ "%.0f"|format(ot_k.split_join_stats.splits_from_a.pct_multiple * 100) }}%)
Distribution:
Themes in B receiving from multiple themes in A
- Mean: {{ "%.2f"|format(ot_k.split_join_stats.joins_to_b.mean) }}
- Median: {{ "%.1f"|format(ot_k.split_join_stats.joins_to_b.median) }}
- Mode: {{ ot_k.split_join_stats.joins_to_b.mode }}
- Max: {{ ot_k.split_join_stats.joins_to_b.max }}
- Themes with >1 source: {{ ot_k.split_join_stats.joins_to_b.n_multiple }}/{{ ot_k.split_join_stats.joins_to_b.total }} ({{ "%.0f"|format(ot_k.split_join_stats.joins_to_b.pct_multiple * 100) }}%)
Distribution:
Transport Flow (Sankey)
Width of links shows amount of mass transported between themes. Colour indicates alignment quality (green = good match, pink = poor match). Hover over links for details.
Transport Plan Heatmap
Each cell shows percentage of transported mass flowing from A to B theme. Values sum to 100%.
For each theme, how much of its mass was transported? Low coverage = theme is conceptually distinct from the other set.
| Theme | Coverage |
|---|---|
| {{ theme.theme_name }} | {{ "%.2f"|format(ot_k.ot.coverage_a[i]) }} |
| Theme | Coverage |
|---|---|
| {{ theme.theme_name }} | {{ "%.2f"|format(ot_k.ot.coverage_b[i]) }} |
Shared Mass Comparison (Default K={{ "%.1f"|format(comp.stats.default_k) }})
| A ↔ B (observed) | {{ "%.1f"|format(comp.stats.ot.shared_mass * 100) }}% |
| A ↔ Bsalad (null mean) | {{ "%.1f"|format(comp.stats.ot.null_shared_mass_mean * 100) }}% |
| Relative improvement | {{ "%.2f"|format(comp.stats.ot.shared_mass_relative) }} (0=random, 1=perfect) |
Symmetric null: Both A and B themes are independently scrambled into "word salad" strings. Null combines A vs Bsalad and Asalad vs B (N=50 each direction, 100 total). This tests whether both sets have real semantic content.
{% if comp.stats.word_salad_samples %}Each sample contains {{ comp.stats.word_salad_samples[0]|length }} scrambled "themes" (matching B's theme count). Words from B's themes are randomly shuffled while preserving length.
{% for sample_idx, sample in enumerate(comp.stats.word_salad_samples) %}Secondary Metrics (Effect Sizes)
Effect sizes use MAD (median absolute deviation) for robustness. Do not compare across analyses with different embedding lengths.
- Shared mass effect: {{ "%.2f"|format(comp.stats.ot.shared_mass_effect) }} MADs above null {% if comp.stats.ot.avg_cost_effect is defined %}
- Avg cost effect: {{ "%.2f"|format(comp.stats.ot.avg_cost_effect) }} MADs better than null {% endif %}
Embedding Metadata
- Mean embedding length A: {{ "%.1f"|format(comp.stats.mean_embedding_words_a) }} words
- Mean embedding length B: {{ "%.1f"|format(comp.stats.mean_embedding_words_b) }} words
Longer embeddings → more stable null → inflated effect sizes. Use relative metrics for cross-analysis comparisons.
Best Matches (many:many)
Shows best match for each theme, allowing multiple themes to match the same target. OT columns show mass flow from optimal transport (default K={{ "%.2f"|format(comp.stats.default_k) }}).
For each theme in {{ comp.a.name }}, the most similar theme in {{ comp.b.name }}
| Theme in {{ comp.a.name }} | Best Match in {{ comp.b.name }} | Sim | Mass | % | Coverage |
|---|---|---|---|---|---|
|
{{ theme_a.theme_name }} {{ theme_a.embedded_string }} |
{{ theme_b.theme_name }} {{ theme_b.embedded_string }} |
{{ "%.2f"|format(match.similarity) }} | {{ "%.3f"|format(match.mass_transferred) }} | {{ "%.0f"|format(match.mass_pct) }}% | {{ "%.1f"|format(match.mass_total * 100) }}% |
For each theme in {{ comp.b.name }}, the most similar theme in {{ comp.a.name }}
| Theme in {{ comp.b.name }} | Best Match in {{ comp.a.name }} | Sim | Mass | % | Coverage |
|---|---|---|---|---|---|
|
{{ theme_b.theme_name }} {{ theme_b.embedded_string }} |
{{ theme_a.theme_name }} {{ theme_a.embedded_string }} |
{{ "%.2f"|format(match.similarity) }} | {{ "%.3f"|format(match.mass_transferred) }} | {{ "%.0f"|format(match.mass_pct) }}% | {{ "%.1f"|format(match.mass_total * 100) }}% |
Additional Similarity Metrics
Alternative similarity functions for specialized analyses.
Shepard Similarity (k={{ comp.stats.shepard_k_value }})
Exponential decay on angular distance. Cognitively realistic similarity function.
Within-set baseline: Mean = {{ "%.3f"|format(comp.stats.within_set_stats.mean) }}, SD = {{ "%.3f"|format(comp.stats.within_set_stats.std) }}
Percentile-Normalized
Cross-set similarity relative to within-set distribution. 0.80 = more similar than 80% of within-set pairs.
Z-Score Normalized
Standard deviations above/below typical within-set similarity. Useful for identifying outliers.
Comparison Configuration
{{ comparison.config | tojson(indent=2) }}