Data¶
benchbench.data¶
- benchbench.data.load_cardinal_benchmark(dataset_name, do_rerank=True, **kwargs)[source]¶
Load a cardinal benchmark.
- Parameters:
dataset_name (str) – Name for the benchmark.
do_rerank (bool) – Whether re-rank the data based on the average score.
**kwargs – Other arguments.
- Returns:
pd.DataFrame: data. list: cols.
- Return type:
tuple
- benchbench.data.load_ordinal_benchmark(dataset_name, do_rerank=True, **kwargs)[source]¶
Load an ordinal benchmark.
- Parameters:
dataset_name (str) – name for the benchmark
do_rerank (bool) – whether re-rank the data based on the winning rate
**kwargs – other arguments
- Returns:
pd.DataFrame: data list: cols
- Return type:
tuple
- data.cardinal_benchmark_list = ['GLUE', 'SuperGLUE', 'OpenLLM', 'MMLU', 'BigBenchHard', 'MTEB', 'VTAB']¶
- data.ordinal_benchmark_list = ['BigCode', 'HELM-accuracy', 'HELM-bias', 'HELM-calibration', 'HELM-fairness', 'HELM-efficiency', 'HELM-robustness', 'HELM-summarization', 'HELM-toxicity', 'HEIM-alignment_auto', 'HEIM-nsfw', 'HEIM-quality_auto', 'HEIM-aesthetics_auto', 'HEIM-alignment_human', 'HEIM-nudity', 'HEIM-quality_human', 'HEIM-aesthetics_human', 'HEIM-black_out', 'HEIM-originality']¶