{% extends "base.html" %} {% block title %}Evaluate — Fine-Tuning — ICDEV™ Dashboard{% endblock %} {% block content %}
Automated evaluation with BLEU, ROUGE-L, perplexity scoring, and A/B comparison (D-FT-14, D-FT-15). Auto-promotion thresholds: BLEU >= 0.30, ROUGE-L >= 0.40, perplexity improvement >= 10%.
| Model Version | Type | Test Set | BLEU | ROUGE-L | Perplexity | Significance | Pass? | Date |
|---|---|---|---|---|---|---|---|---|
| {{ ev.model_version_id[:12] }}... | {{ ev.eval_type }} | {{ ev.test_set_size }} | {{ "%.3f"|format(ev.bleu_score) }} | {{ "%.3f"|format(ev.rouge_l_score) }} | {{ "%.1f"|format(ev.perplexity) }} | {{ "%.3f"|format(ev.statistical_significance) if ev.statistical_significance else '--' }} | {% if ev.pass_threshold %} PASS {% else %} FAIL {% endif %} | {{ ev.evaluated_at }} |
| No evaluations yet. Train a model and run evaluation. | ||||||||