Retrieval + Inference benchmark for LLM-powered codebase Q&A — Powered by jCodeMunch + Groq
| # | Provider / Model | Judge Score | P@5 | Recall | Exact Match | Avg Time | Cost | Questions | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | anthropic / claude-sonnet-4-6 | 0.81 | 0.17 | 0.36 | 16.1% | 15.02s | $1.1180 | 31 | |
| 2 | groq / llama-3.3-70b-versatile | 0.69 | 0.00 | 0.00 | 0.0% | 40.09s | $0.0479 | 9 | |
| 3 | anthropic / claude-haiku-4-5-20251001 | 0.68 | 0.17 | 0.36 | 16.1% | 6.99s | $0.2891 | 31 |