Model Comparison Report

Comparing runs:

Baseline: baseline

Current: current

Status: PASSED

Executive Summary

Overall Changes

4 Improvements 🟢
0 Regressions 🔴
6 Unchanged ⚪

Status

PASSED

sentiment

Improvements: 2

Unchanged: 3

Regressions: 0

Samples

Sample Input Baseline Current Expected Status
1 text='Great product, love it!' positive positive positive ⚪
2 text='Terrible experience, would not recommend' negative negative negative ⚪
3 text="It's okay, nothing special" neutral neutral neutral ⚪
4 text='Amazing quality and service!' neutral positive positive 🟢
5 text='Product broke after one use' neutral negative negative 🟢
Sample Input Baseline Current Expected
4 text='Amazing quality and service!' neutral positive positive
5 text='Product broke after one use' neutral negative negative
Sample Input Baseline Current Expected

rating

Improvements: 2

Unchanged: 3

Regressions: 0

Samples

Sample Input Baseline Current Expected Status
1 text='Great product, love it!' 4.5 5.0 5.0 🟢
2 text='Terrible experience, would not recommend' 1.5 1.0 1.0 🟢
3 text="It's okay, nothing special" 3.0 3.0 3.0 ⚪
4 text='Amazing quality and service!' 3.0 5.0 4.5 ⚪
5 text='Product broke after one use' 3.0 1.0 1.5 ⚪
Sample Input Baseline Current Expected
1 text='Great product, love it!' 4.5 5.0 5.0
2 text='Terrible experience, would not recommend' 1.5 1.0 1.0
Sample Input Baseline Current Expected