Analysis Comparison Report


Analyses Included in Comparison

{% for result in comparison.results %}
{{ result.name }}

{{ result.themes|length }} themes

    {% for t in result.themes %}
  • {{ t.name }}
  • {% endfor %}
{% endfor %}

Embedding Details

The actual strings that were embedded for similarity comparison. Labels are used in plots; embedded strings are used for calculating similarity.

{% for result in comparison.results %}
{{ result.name }}
{% for item in comparison.embedded_strings.get(result.name, []) %} {% endfor %}
Theme Name Label (in plots) Embedded String (for similarity)
{{ item.theme_name }} {{ item.label }} {{ item.embedded_string }}
{% endfor %}

Theme Network (UMAP Projection)

2-D UMAP projection of theme embeddings from each analysis, shown in different colours. Each point represents a theme; proximity reflects semantic similarity in the original embedding space.

UMAP projection of theme embeddings
Interpreting this plot: UMAP is a non-linear dimensionality-reduction method that prioritises preserving local neighbourhood structure rather than global variance. Nearby points can be interpreted as closely related themes, while larger-scale distances and cluster shapes should be interpreted qualitatively rather than metrically. This plot is intended as an exploratory visualisation of thematic relationships and overlap between sets, not as a quantitative evaluation.

Pairwise Comparisons

Select a pair to view detailed comparison metrics.

{% for key, comp in comparison.by_comparisons().items() %}

{{ comp.a.name }} vs {{ comp.b.name }} {% if comparison.comparison_plots.embeddings_csv and comparison.comparison_plots.embeddings_csv[key] %} Embeddings CSV {% endif %}

Cosine Similarity

Cosine similarity measures the angle between theme embedding vectors. Values range from 0 to 1 for text embeddings, with 1 indicating identical direction.

Continuous Values
Cosine similarity heatmap
Binary Match (threshold={{ comparison.config.threshold }})
Thresholded heatmap

Summary Statistics

Thematic analysis doesn't have ground truth, so traditional precision/recall don't apply. Instead, we measure coverage (did themes find matches?) and fidelity (how close are the best matches?). Based on cosine similarity.

Coverage (Hit Rates)

Proportion of themes with at least one match above threshold ({{ comparison.config.threshold }})

  • Hit Rate A: {{ "%.1f"|format(comp.stats.hit_rate_a * 100) }}%
  • Hit Rate B: {{ "%.1f"|format(comp.stats.hit_rate_b * 100) }}%
  • Jaccard: {{ "%.3f"|format(comp.stats.jaccard) }}

High hit rates indicate both analyses found similar conceptual territory.

Fidelity (Match Quality)

How close are the best matches? (Mean of each theme's best match similarity)

  • A→B: {{ "%.3f"|format(comp.stats.mean_max_sim_a_to_b) }}
  • B→A: {{ "%.3f"|format(comp.stats.mean_max_sim_b_to_a) }}
  • Fidelity: {{ "%.3f"|format(comp.stats.fidelity) }}

Fidelity is the harmonic mean of directional scores. Higher = tighter semantic alignment.

Similarity Matrix
{{ comp.stats.similarity_matrix }}

Angular Similarity

Angular similarity uses the angular distance between vectors (arccos of cosine), normalized to [0,1]. Unlike cosine, it's a proper metric that satisfies the triangle inequality, making it mathematically rigorous for averaging.

Angular similarity heatmap

Best Matches (1:1)

The Hungarian algorithm finds the optimal one-to-one pairing that maximizes total similarity. Each theme maps to at most one theme in the other set -- no reuse allowed.

Intuition: "If I had to explain set B's themes to someone who only knew set A, which single theme in A would each B theme correspond to, with no reuse?"

What this enables: Hungarian matching makes precision/recall well-defined by removing ambiguity about what counts as a "match". After matching: each pair = one prediction; unmatched A themes = false negatives; unmatched B themes = false positives.

Limitation: This penalises legitimate theme refinement (splitting one theme into two is treated as error). Use OT if you want to reward decomposition.
Mean Matched Similarity (primary metric)

{{ "%.3f"|format(comp.stats.hungarian.soft_metrics.soft_precision) }}

Average similarity of optimal pairs

Interpretation: "How good are the best one-to-one correspondences?" Higher = tighter semantic alignment between the two theme sets.

{% if comp.stats.hungarian.distribution.n_pairs > 0 %}

Distribution of {{ comp.stats.hungarian.distribution.n_pairs }} pairs above threshold:

  • Median: {{ "%.3f"|format(comp.stats.hungarian.distribution.median) }}   (Q1: {{ "%.3f"|format(comp.stats.hungarian.distribution.q1) }}, Q3: {{ "%.3f"|format(comp.stats.hungarian.distribution.q3) }})
  • Range: {{ "%.3f"|format(comp.stats.hungarian.distribution.min) }} -- {{ "%.3f"|format(comp.stats.hungarian.distribution.max) }}
{% endif %}
Precision / Recall (use with caution)

Based on {{ comp.stats.hungarian.distribution.n_pairs }} matched pairs above threshold ({{ comparison.config.threshold }})

{{ "%.0f"|format(comp.stats.hungarian.thresholded_metrics.recall * 100) }}%

Recall

(coverage of A)

{{ "%.0f"|format(comp.stats.hungarian.thresholded_metrics.precision * 100) }}%

Precision

(coverage of B)

{{ "%.3f"|format(comp.stats.hungarian.thresholded_metrics.true_jaccard) }}

Jaccard

(for Raza)


Caution: These metrics penalise over-splitting (hurts precision) and under-coverage (hurts recall). Only meaningful if you assume themes should map 1-to-1.

Optimal Matched Pairs ({{ comp.stats.hungarian.all_pairs|length }})
{% if comp.stats.hungarian.all_pairs|length > 0 %}

Hungarian algorithm finds the optimal one-to-one assignment.

{% for i, j, similarity in comp.stats.hungarian.all_pairs %} {% set theme_a = comp.embedded_a[i] %} {% set theme_b = comp.embedded_b[j] %} {% endfor %}
Theme in {{ comp.a.name }} Theme in {{ comp.b.name }} Angular Similarity
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ "%.3f"|format(similarity) }}
{% else %}

No optimal pairs found.

{% endif %}

Unbalanced Optimal Transport (Many-to-Many Alignment)

Unbalanced Optimal Transport allows themes to remain unmatched, representing genuinely novel or missing concepts. Unlike balanced OT (which forces all mass to transport), unbalanced OT permits themes to be left out when no good match exists. The reg_m (K) parameter controls the penalty for leaving mass unmatched.

Unmatched Mass vs K (Scree Plot)

This plot shows how unmatched mass changes as the K parameter varies. Lower K values allow more mass to remain unmatched (stricter matching), while higher K values force more themes to align even with poor matches.

★ Default K={{ "%.2f"|format(comp.stats.default_k) }}  |  â—† Diminishing returns K={{ "%.2f"|format(comp.stats.elbow_k) }} (beyond this K, gains in shared mass become marginal)
Unmatched mass scree plot
Summary Table: Metrics Across K Values
{% for k_val in comp.stats.k_values %} {% set ot_k = comp.stats.ot_by_k[k_val] %} {% endfor %}
K Shared Mass Unmatched Avg Cost Splits (mean) Joins (mean) Relative Score
{{ "%.2f"|format(k_val) }}{% if k_val == comp.stats.default_k %} ★{% endif %}{% if k_val == comp.stats.elbow_k %} ◆{% endif %} {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}% {{ "%.1f"|format(ot_k.ot.unmatched_mass * 100) }}% {{ "%.3f"|format(ot_k.ot.avg_cost) }} {{ "%.2f"|format(ot_k.split_join_stats.splits_from_a.mean) }} {{ "%.2f"|format(ot_k.split_join_stats.joins_to_b.mean) }} {% if ot_k.ot.shared_mass_relative is defined %}{{ "%.2f"|format(ot_k.ot.shared_mass_relative) }}{% else %}-{% endif %}

★ = Default K value. Higher K = stronger penalty for unmatching = more mass forced to transport. Lower K = more permissive, allowing themes to remain unmatched.

{% for k_val in comp.stats.k_values %} {% set ot_k = comp.stats.ot_by_k[k_val] %}
Shared Mass{% if ot_k.ot.shared_mass_relative is defined %} (Relative to Null){% endif %}
{% if ot_k.ot.shared_mass_relative is defined %}

{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%

Shared Mass

{{ "%.2f"|format(ot_k.ot.shared_mass_relative) }}

Relative

0=random, 1=perfect

  • Null: {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%
  • Excess: +{{ "%.1f"|format(ot_k.ot.shared_mass_excess * 100) }}%
  • Effect: {{ "%.1f"|format(ot_k.ot.shared_mass_effect) }}σ
{% else %}

{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%

Shared Mass

Null baseline not computed for this K value.

{% endif %}
{% if ot_k.ot.shared_mass_relative is defined %}

Interpretation: Of the possible improvement beyond random alignment, what fraction did we achieve? Values > 0.3 suggest meaningful structure; > 0.5 is good; > 0.7 is strong.

{% endif %}
Average Cost{% if ot_k.ot.avg_cost_relative is defined %} (Relative to Null){% endif %}
{% if ot_k.ot.avg_cost_relative is defined %}

{{ "%.2f"|format(ot_k.ot.avg_cost_relative) }}

Relative Score

0 = random, 1 = perfect

{% else %}

{{ "%.3f"|format(ot_k.ot.avg_cost) }}

Average Cost

{% endif %}
  • Observed: {{ "%.3f"|format(ot_k.ot.avg_cost) }}
  • {% if ot_k.ot.null_avg_cost_mean is defined %}
  • Null mean: {{ "%.3f"|format(ot_k.ot.null_avg_cost_mean) }}
  • Improvement: {{ "%.3f"|format(ot_k.ot.avg_cost_improvement) }}
  • {% endif %}
{% if ot_k.ot.avg_cost_relative is defined %}

Interpretation: How much lower is the transport cost compared to random? Lower cost = better semantic alignment. Values > 0.3 suggest themes are meaningfully closer than chance.

{% else %}

Null baseline comparison only computed for default K={{ "%.1f"|format(comp.stats.default_k) }}.

{% endif %}
Unmatched Mass

{{ "%.1f"|format(ot_k.ot.unmatched_mass * 100) }}%

Interpretation: Proportion of theme-mass that couldn't find a good match.

Splits from A

Themes in A flowing to multiple themes in B

  • Mean: {{ "%.2f"|format(ot_k.split_join_stats.splits_from_a.mean) }}
  • Median: {{ "%.1f"|format(ot_k.split_join_stats.splits_from_a.median) }}
  • Mode: {{ ot_k.split_join_stats.splits_from_a.mode }}
  • Max: {{ ot_k.split_join_stats.splits_from_a.max }}
  • Themes with >1 target: {{ ot_k.split_join_stats.splits_from_a.n_multiple }}/{{ ot_k.split_join_stats.splits_from_a.total }} ({{ "%.0f"|format(ot_k.split_join_stats.splits_from_a.pct_multiple * 100) }}%)
{% if ot_k.split_join_stats.splits_from_a.counts %}

Distribution:

{% for n, count in ot_k.split_join_stats.splits_from_a.counts.items() %} {{ n }}→{{ count }} {% endfor %}
{% endif %}
Joins to B

Themes in B receiving from multiple themes in A

  • Mean: {{ "%.2f"|format(ot_k.split_join_stats.joins_to_b.mean) }}
  • Median: {{ "%.1f"|format(ot_k.split_join_stats.joins_to_b.median) }}
  • Mode: {{ ot_k.split_join_stats.joins_to_b.mode }}
  • Max: {{ ot_k.split_join_stats.joins_to_b.max }}
  • Themes with >1 source: {{ ot_k.split_join_stats.joins_to_b.n_multiple }}/{{ ot_k.split_join_stats.joins_to_b.total }} ({{ "%.0f"|format(ot_k.split_join_stats.joins_to_b.pct_multiple * 100) }}%)
{% if ot_k.split_join_stats.joins_to_b.counts %}

Distribution:

{% for n, count in ot_k.split_join_stats.joins_to_b.counts.items() %} {{ n }}→{{ count }} {% endfor %}
{% endif %}
Transport Visualisations (K={{ "%.2f"|format(k_val) }})
Transport Flow (Sankey)

Width of links shows amount of mass transported between themes. Colour indicates alignment quality (green = good match, pink = poor match). Hover over links for details.

Transport Plan Heatmap

Each cell shows percentage of transported mass flowing from A to B theme. Values sum to 100%.

Transport heatmap
Coverage by Theme (K={{ "%.2f"|format(k_val) }})

For each theme, how much of its mass was transported? Low coverage = theme is conceptually distinct from the other set.

{{ comp.a.name }} → {{ comp.b.name }} {% for i, theme in enumerate(comp.embedded_a) %} {% endfor %}
Theme Coverage
{{ theme.theme_name }} {{ "%.2f"|format(ot_k.ot.coverage_a[i]) }}
{{ comp.b.name }} → {{ comp.a.name }} {% for i, theme in enumerate(comp.embedded_b) %} {% endfor %}
Theme Coverage
{{ theme.theme_name }} {{ "%.2f"|format(ot_k.ot.coverage_b[i]) }}
{% endfor %}
Null Baseline Comparison
Shared Mass Comparison (Default K={{ "%.1f"|format(comp.stats.default_k) }})
A ↔ B (observed) {{ "%.1f"|format(comp.stats.ot.shared_mass * 100) }}%
A ↔ Bsalad (null mean) {{ "%.1f"|format(comp.stats.ot.null_shared_mass_mean * 100) }}%
Relative improvement {{ "%.2f"|format(comp.stats.ot.shared_mass_relative) }} (0=random, 1=perfect)

Symmetric null: Both A and B themes are independently scrambled into "word salad" strings. Null combines A vs Bsalad and Asalad vs B (N=50 each direction, 100 total). This tests whether both sets have real semantic content.

{% if comp.stats.word_salad_samples %}

Each sample contains {{ comp.stats.word_salad_samples[0]|length }} scrambled "themes" (matching B's theme count). Words from B's themes are randomly shuffled while preserving length.

{% for sample_idx, sample in enumerate(comp.stats.word_salad_samples) %}
Sample {{ sample_idx + 1 }}:
{% for text in sample %}
{{ loop.index }} {{ text }}
{% endfor %}
{% endfor %}
{% endif %}
Secondary Metrics (Effect Sizes)

Effect sizes use MAD (median absolute deviation) for robustness. Do not compare across analyses with different embedding lengths.

  • Shared mass effect: {{ "%.2f"|format(comp.stats.ot.shared_mass_effect) }} MADs above null
  • {% if comp.stats.ot.avg_cost_effect is defined %}
  • Avg cost effect: {{ "%.2f"|format(comp.stats.ot.avg_cost_effect) }} MADs better than null
  • {% endif %}
Embedding Metadata
  • Mean embedding length A: {{ "%.1f"|format(comp.stats.mean_embedding_words_a) }} words
  • Mean embedding length B: {{ "%.1f"|format(comp.stats.mean_embedding_words_b) }} words

Longer embeddings → more stable null → inflated effect sizes. Use relative metrics for cross-analysis comparisons.


Best Matches (many:many)

Shows best match for each theme, allowing multiple themes to match the same target. OT columns show mass flow from optimal transport (default K={{ "%.2f"|format(comp.stats.default_k) }}).

{{ comp.a.name }} → {{ comp.b.name }}

For each theme in {{ comp.a.name }}, the most similar theme in {{ comp.b.name }}

{% for match in comp.stats.best_matches_a_to_b %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% endfor %}
Theme in {{ comp.a.name }} Best Match in {{ comp.b.name }} Sim Mass % Coverage
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ "%.2f"|format(match.similarity) }} {{ "%.3f"|format(match.mass_transferred) }} {{ "%.0f"|format(match.mass_pct) }}% {{ "%.1f"|format(match.mass_total * 100) }}%
{{ comp.b.name }} → {{ comp.a.name }}

For each theme in {{ comp.b.name }}, the most similar theme in {{ comp.a.name }}

{% for match in comp.stats.best_matches_b_to_a %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% endfor %}
Theme in {{ comp.b.name }} Best Match in {{ comp.a.name }} Sim Mass % Coverage
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ "%.2f"|format(match.similarity) }} {{ "%.3f"|format(match.mass_transferred) }} {{ "%.0f"|format(match.mass_pct) }}% {{ "%.1f"|format(match.mass_total * 100) }}%

Additional Similarity Metrics

Alternative similarity functions for specialized analyses.

Shepard Similarity (k={{ comp.stats.shepard_k_value }})

Exponential decay on angular distance. Cognitively realistic similarity function.

Within-set baseline: Mean = {{ "%.3f"|format(comp.stats.within_set_stats.mean) }}, SD = {{ "%.3f"|format(comp.stats.within_set_stats.std) }}

Shepard similarity heatmap
Percentile-Normalized

Cross-set similarity relative to within-set distribution. 0.80 = more similar than 80% of within-set pairs.

Percentile-normalized heatmap
Z-Score Normalized

Standard deviations above/below typical within-set similarity. Useful for identifying outliers.

Z-score normalized heatmap
{% endfor %}

Comparison Configuration

{{ comparison.config | tojson(indent=2) }}