Dwarf-Bench
An LLM benchmark that grades models on obscure dwarf trivia from across fantasy media. A learning exercise in building an eval pipeline from scratch.
An LLM benchmark that grades models on obscure dwarf trivia from across fantasy media. A learning exercise in building an eval pipeline from scratch.
Description of what makes this project interesting.