MHPP
Source: MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation
MHPP does not provide test cases for the problems, and submissions can only be checked by the author. Here, in order to partially test MHPP, we keep only one example from the released samples as an illustration, and use the remaining samples (usually two) as test cases. Users can modify the data to add more test cases on their own.
The initial release of MHPP contains 140 problems, with the full 210 version released later. Both versions are provided.
Subset Selection
Configuration
Field | Value | Description |
---|---|---|
pretrain_mode | Remove first instructions to prevent pretrain models from generating more problems | |
run_timeout | Execution timeout in seconds |
Usage
from datasets import load_dataset
import requests
config = {
'pretrain_mode': False,
'run_timeout': 20,
'dataset_type': "MHPPDataset"
}
# Get dataset data in sandbox format
data = list(load_dataset("laylarsssss/FusedMHPP", "mhpp", split="test"))
config['provided_data'] = data
prompts = requests.post('http://localhost:8080/get_prompts', json={
'dataset': 'mhpp',
'config': config
}).json()
print('please perform model inference on these prompts:')
print('\n'.join([p['prompt'] for p in prompts[:3]]))
print('...')
# your model inference code here
completions = ['' for _ in prompts]
for completion, sample in zip(completions, data):
config['provided_data'] = sample
res = requests.post('http://localhost:8080/submit', json={
'dataset': 'mhpp',
'id': '',
'completion': completion,
'config': config
})
print(f'result: {res.json()}')
break
Note: always put raw completion in the request, Sandbox will handle the extraction of code according to different modes.