MHPP

Source: MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

MHPP does not provide test cases for the problems, and submissions can only be checked by the author. Here, in order to partially test MHPP, we keep only one example from the released samples as an illustration, and use the remaining samples (usually two) as test cases. Users can modify the data to add more test cases on their own.

The initial release of MHPP contains 140 problems, with the full 210 version released later. Both versions are provided.

Subset Selection

mhpp_140mhpp_210

Configuration

Field	Value	Description
pretrain_mode		Remove first instructions to prevent pretrain models from generating more problems
run_timeout		Execution timeout in seconds

Usage

from datasets import load_dataset
import requests

config = {
  'pretrain_mode': False,
  'run_timeout': 20,
  'dataset_type': "MHPPDataset"
}

# Get dataset data in sandbox format
data = list(load_dataset("laylarsssss/FusedMHPP", "mhpp", split="test"))

config['provided_data'] = data
prompts = requests.post('http://localhost:8080/get_prompts', json={
  'dataset': 'mhpp',
  'config': config
}).json()

print('please perform model inference on these prompts:')
print('\n'.join([p['prompt'] for p in prompts[:3]]))
print('...')

# your model inference code here
completions = ['' for _ in prompts]

for completion, sample in zip(completions, data):
    config['provided_data'] = sample
    res = requests.post('http://localhost:8080/submit', json={
        'dataset': 'mhpp',
        'id': '',
        'completion': completion,
        'config': config
    })

    print(f'result: {res.json()}')
    break

Note: always put raw completion in the request, Sandbox will handle the extraction of code according to different modes.