Comparing runs:
Baseline: baseline
Current: current
Status: PASSED
Improvements: 2
Unchanged: 3
Regressions: 0
| Sample | Input | Baseline | Current | Expected | Status |
|---|---|---|---|---|---|
| 1 | text='Great product, love it!' | positive | positive | positive | ⚪ |
| 2 | text='Terrible experience, would not recommend' | negative | negative | negative | ⚪ |
| 3 | text="It's okay, nothing special" | neutral | neutral | neutral | ⚪ |
| 4 | text='Amazing quality and service!' | neutral | positive | positive | 🟢 |
| 5 | text='Product broke after one use' | neutral | negative | negative | 🟢 |
| Sample | Input | Baseline | Current | Expected |
|---|---|---|---|---|
| 4 | text='Amazing quality and service!' | neutral | positive | positive |
| 5 | text='Product broke after one use' | neutral | negative | negative |
| Sample | Input | Baseline | Current | Expected |
|---|
Improvements: 2
Unchanged: 3
Regressions: 0
| Sample | Input | Baseline | Current | Expected | Status |
|---|---|---|---|---|---|
| 1 | text='Great product, love it!' | 4.5 | 5.0 | 5.0 | 🟢 |
| 2 | text='Terrible experience, would not recommend' | 1.5 | 1.0 | 1.0 | 🟢 |
| 3 | text="It's okay, nothing special" | 3.0 | 3.0 | 3.0 | ⚪ |
| 4 | text='Amazing quality and service!' | 3.0 | 5.0 | 4.5 | ⚪ |
| 5 | text='Product broke after one use' | 3.0 | 1.0 | 1.5 | ⚪ |
| Sample | Input | Baseline | Current | Expected |
|---|---|---|---|---|
| 1 | text='Great product, love it!' | 4.5 | 5.0 | 5.0 |
| 2 | text='Terrible experience, would not recommend' | 1.5 | 1.0 | 1.0 |
| Sample | Input | Baseline | Current | Expected |
|---|