Loading examples...
Loaded 12 examples

============================================================
Configuration: Single-pass (default)
============================================================

============================================================
Starting DSPy Pydantic optimization
============================================================
Model: Product
Optimizer: BOOTSTRAPFEWSHOT
Examples: 12
Fields to optimize: 3

Initial field descriptions (set during initialization):
  name: Product name
  price: Product price in USD
  category: Product category
Optimization threads: 4
============================================================

Training examples: 9
Validation examples: 3

Evaluating baseline configuration...
Baseline average score: 77.78%

Optimizing prompts and field descriptions...
  - 3 field descriptions
 11%|█████████████                                                                                                        | 1/9 [00:12<01:42, 12.80s/it]
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.

Evaluating optimized configuration...

============================================================
Optimization complete
============================================================
Baseline score: 77.78%
Final score: 88.89%
Improvement: +11.11% (+14.3%)
API calls: 22
Total tokens: 11,214
============================================================


============================================================
Configuration: Sequential
============================================================

============================================================
Starting DSPy Pydantic optimization
============================================================
Model: Product
Optimizer: BOOTSTRAPFEWSHOT
Examples: 12
Fields to optimize: 3

Initial field descriptions (set during initialization):
  name: Product name
  price: Product price in USD
  category: Product category
Optimization threads: 4
============================================================

Training examples: 9
Validation examples: 3
Baseline average score: 77.78%

Phase 1: Optimizing field descriptions (deepest-first)...
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:25<00:32,  6.48s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
77.78% → 88.89% ✓
    name: 78% → 89% ✓
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:19<00:24,  4.97s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
88.89% (no improvement)
    price: 89% → 89% –
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:22<00:28,  5.68s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
88.89% (no improvement)
    category: 89% → 89% –

============================================================
Optimization complete
============================================================
Baseline score: 77.78%
Final score: 88.89%
Improvement: +11.11%
API calls: 39
Total tokens: 28,247
============================================================


============================================================
Configuration: Sequential + Parallel
============================================================

============================================================
Starting DSPy Pydantic optimization
============================================================
Model: Product
Optimizer: BOOTSTRAPFEWSHOT
Examples: 12
Fields to optimize: 3

Initial field descriptions (set during initialization):
  name: Product name
  price: Product price in USD
  category: Product category
Optimization threads: 4
============================================================

Training examples: 9
Validation examples: 3
Baseline average score: 77.78%

Phase 1: Optimizing field descriptions (deepest-first)...
  0%|                                                                                                                             | 0/9 [00:00<?, ?it/s] 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:20<00:25,  5.13s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.                                  | 3/9 [00:17<00:35,  5.88s/it]
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:22<00:28,  5.66s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.                                  | 4/9 [00:22<00:27,  5.43s/it]
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:24<00:30,  6.13s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
77.78% → 88.89% ✓
    category: 78% → 89% ✓
77.78% → 88.89% ✓
    price: 78% → 89% ✓
77.78% → 88.89% ✓
    name: 78% → 89% ✓

============================================================
Optimization complete
============================================================
Baseline score: 77.78%
Final score: 88.89%
Improvement: +11.11%
API calls: 42
Total tokens: 30,223
============================================================


============================================================
Configuration: Sequential + Max Val=5
============================================================

============================================================
Starting DSPy Pydantic optimization
============================================================
Model: Product
Optimizer: BOOTSTRAPFEWSHOT
Examples: 12
Fields to optimize: 3

Initial field descriptions (set during initialization):
  name: Product name
  price: Product price in USD
  category: Product category
Optimization threads: 4
============================================================

Training examples: 9
Validation examples: 3
Baseline average score: 77.78%

Phase 1: Optimizing field descriptions (deepest-first)...
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:22<00:27,  5.54s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
77.78% → 88.89% ✓
    name: 78% → 89% ✓
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:21<00:26,  5.28s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
88.89% (no improvement)
    price: 89% → 89% –
 44%|████████████████████████████████████████████████████                                                                 | 4/9 [00:23<00:29,  5.83s/it]
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
88.89% (no improvement)
    category: 89% → 89% –

============================================================
Optimization complete
============================================================
Baseline score: 77.78%
Final score: 88.89%
Improvement: +11.11%
API calls: 39
Total tokens: 29,201
============================================================



====================================================================================================
ABLATION STUDY RESULTS
====================================================================================================

Config                         Time (s)     Baseline     Optimized    Improvement  API Calls    Status      
----------------------------------------------------------------------------------------------------
Single-pass (default)          64.03        77.78%       88.89%       11.11%       22           ✓           
Sequential                     116.98       77.78%       88.89%       11.11%       39           ✓           
Sequential + Parallel          57.27        77.78%       88.89%       11.11%       42           ✓           
Sequential + Max Val=5         111.33       77.78%       88.89%       11.11%       39           ✓           


INSIGHTS:
  Fastest: Sequential + Parallel (57.27s)
  Best quality: Single-pass (default) (score: 88.89%)
  Sequential: 0.55× vs single-pass
  Sequential + Parallel: 1.12× vs single-pass
  Sequential + Max Val=5: 0.58× vs single-pass