EvalVault Analysis Report

Run ID: run-week4-current Generated: 2025-12-29 20:12:49

Executive Summary

NLP Analysis

Text Statistics (Questions)

  • Total Words: 103
  • Total Sentences: 20
  • Avg Word Length: 3.7
  • Vocabulary Diversity: 80.6%
  • Question Type Distribution

    TypeCountPercentage
    Factual1680.0%
    Comparative210.0%
    Reasoning210.0%

    Top Keywords

    얼마인가요, 무엇인가요, 따라, 보험료가, 또는, 종신보험의, 이상, 있습니다, 가능합니다, 받을

    NLP Insights

  • High vocabulary diversity in questions
  • Questions are short and concise
  • Dominant question type: factual (80%)
  • Top keywords: 얼마인가요, 무엇인가요, 따라
  • Causal Analysis

    Significant Factor Impacts

    FactorMetricDirectionStrengthCorrelation
    context_lengthfaithfulness↑ positivestrong0.526
    context_lengthcontext_precision↑ positivestrong0.534

    Strong Causal Relationships

  • context_lengthfaithfulness: Higher values increases scores (confidence: 0.82)
  • context_lengthcontext_precision: Higher values increases scores (confidence: 0.83)
  • Root Cause Analysis

  • faithfulness: Primary causes - context_length
  • - context_length positively affects faithfulness (r=0.53)
  • context_precision: Primary causes - context_length
  • - context_length positively affects context_precision (r=0.53)

    Recommended Interventions

  • 🔴 faithfulness: Provide more comprehensive context information
  • - Expected: More detailed contexts may improve faithfulness
  • 🔴 context_precision: Provide more comprehensive context information
  • - Expected: More detailed contexts may improve context_precision

    Causal Insights

  • Found 2 significant factor-metric relationships out of 21 analyzed
  • Identified 2 strong causal relationships with confidence > 0.7
  • Most impactful factor: context_length on context_precision (r=0.53)
  • Most common root cause across metrics: context_length (appears in 2 metrics)
  • Factors with no significant impact: context_length, question_length, question_complexity
  • Recommendations

    No critical issues found. Continue monitoring evaluation metrics.

    ---

    *Report generated by EvalVault on 2025-12-29 20:12:49*