跳到主要内容

MHPP

Source: MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

MHPP does not provide test cases for the problems, and submissions can only be checked by the author. Here, in order to partially test MHPP, we keep only one example from the released samples as an illustration, and use the remaining samples (usually two) as test cases. Users can modify the data to add more test cases on their own.

The initial release of MHPP contains 140 problems, with the full 210 version released later. Both versions are provided.

Subset Selection

Configuration

FieldValueDescription
pretrain_modeRemove first instructions to prevent pretrain models from generating more problems
run_timeout
Execution timeout in seconds

Usage

from datasets import load_dataset
import requests

config = {
'pretrain_mode': False,
'run_timeout': 20,
'dataset_type': "MHPPDataset"
}

# Get dataset data in sandbox format
data = list(load_dataset("laylarsssss/FusedMHPP", "mhpp", split="test"))

config['provided_data'] = data
prompts = requests.post('http://localhost:8080/get_prompts', json={
'dataset': 'mhpp',
'config': config
}).json()

print('please perform model inference on these prompts:')
print('\n'.join([p['prompt'] for p in prompts[:3]]))
print('...')

# your model inference code here
completions = ['' for _ in prompts]

for completion, sample in zip(completions, data):
config['provided_data'] = sample
res = requests.post('http://localhost:8080/submit', json={
'dataset': 'mhpp',
'id': '',
'completion': completion,
'config': config
})

print(f'result: {res.json()}')
break

Note: always put raw completion in the request, Sandbox will handle the extraction of code according to different modes.