munch-bench Leaderboard

Retrieval + Inference benchmark for LLM-powered codebase Q&A — Powered by jCodeMunch + Groq

# Provider / Model Judge Score P@5 Recall Exact Match Avg Time Cost Questions
1 anthropic / claude-sonnet-4-6 0.81
0.17 0.36 16.1% 15.02s $1.1180 31
2 groq / llama-3.3-70b-versatile 0.69
0.00 0.00 0.0% 40.09s $0.0479 9
3 anthropic / claude-haiku-4-5-20251001 0.68
0.17 0.36 16.1% 6.99s $0.2891 31

LLM Judge Score by Model

Cost vs Accuracy