Projects

Dwarf-Bench

An LLM benchmark that grades models on obscure dwarf trivia from across fantasy media. A learning exercise in building an eval pipeline from scratch.