Using Datasets

This is an introduction to how to evaluate models on specific datasets.

Please refer to Dataset Details to understand the concept of datasets.

We provide one implementation that is as correct as possible, but we do not guarantee that the logics of the dataset are exactly the same as the original.

There are also some datasets that are being organized and tested, which will be released gradually:

Shadow Humaneval
CRUXEval
NaturalCodeBench
PAL-Math
verilog-eval