📄️ AutoEval
The AutoEval dataset determines whether a problem is passed by appending a piece of test code to the model output code and executing it. It supports multiple languages.
📄️ CommonOJ
The CommonOJ dataset aims to unify the evaluation of competitive programming problems. These problems comes with multiple formats: