{% extends "base.html" %} {% block title %}Evaluate — Fine-Tuning — ICDEV™ Dashboard{% endblock %} {% block content %}

Model Evaluations

Automated evaluation with BLEU, ROUGE-L, perplexity scoring, and A/B comparison (D-FT-14, D-FT-15). Auto-promotion thresholds: BLEU >= 0.30, ROUGE-L >= 0.40, perplexity improvement >= 10%.

Auto-Promotion Thresholds (D-FT-16): BLEU >= 0.30 AND ROUGE-L >= 0.40 AND perplexity improvement >= 10%

All Evaluations

{% for ev in evaluations %} {% endfor %} {% if not evaluations %} {% endif %}
Model Version Type Test Set BLEU ROUGE-L Perplexity Significance Pass? Date
{{ ev.model_version_id[:12] }}... {{ ev.eval_type }} {{ ev.test_set_size }} {{ "%.3f"|format(ev.bleu_score) }} {{ "%.3f"|format(ev.rouge_l_score) }} {{ "%.1f"|format(ev.perplexity) }} {{ "%.3f"|format(ev.statistical_significance) if ev.statistical_significance else '--' }} {% if ev.pass_threshold %} PASS {% else %} FAIL {% endif %} {{ ev.evaluated_at }}
No evaluations yet. Train a model and run evaluation.
{% endblock %}