ManiBench has been introduced as a benchmark specifically aimed at assessing visual-logic drift and syntactic hallucinations in Manim code generation.
This benchmark targets the limitations of traditional benchmarks such as HumanEval and MBPP, which do not adequately evaluate code intended for dynamic educational visuals.
By focusing on these aspects, ManiBench aims to enhance the effectiveness of code generation tools in producing pedagogically relevant visual content.