Measures¶

benchbench.measures.cardinal¶

benchbench.measures.cardinal.appr_rank_diff(score, old_rank, use_weighted_loss=False)[source]¶

Approximate the rank difference between the old rank and the new rank.

Parameters:

score (np.array) – Scores for all models across all tasks.
old_rank (np.array) – Original rank.
use_weighted_loss (bool) – Whether use weighted loss.

Returns:

The loss.

Return type:

torch.Tensor

benchbench.measures.cardinal.get_diversity(data, cols)[source]¶

Calculate the diversity for a given benchmark.

Parameters:

data (pd.DataFrame) – Each row represents a model, each column represents a task.
cols (list) – The column names of the tasks.

Returns:

(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks.

Return type:

tuple

benchbench.measures.cardinal.get_sensitivity(data, cols, min_value=0.01, lr=1.0, num_steps=1000, stop_threshold=1e-05, normalize_epsilon=True, use_weighted_loss=None, return_weight=False, verbose=False)[source]¶

Calculate the sensitivity for a given benchmark.

Parameters:

data (pd.DataFrame) – Each row represents a model, each column represents a task.
cols (list) – The column names of the tasks.
min_value (float) – Min values for epsilon.
lr (float) – Learning rate for optimization.
num_steps (int) – Number of steps for optimization.
stop_threshold (float) – Stop if the loss change is smaller than this value.
normalize_epsilon (bool) – Whether normalize epsilon by std.
use_weighted_loss (bool) – Whether use weighted approximation loss, if None, use both and return the better one.
return_weight (bool) – Whether return alpha.
verbose (bool) – Whether output logs.

Returns:

If return_weight is True, return ((tau, MRC), alpha); else return (tau, MRC).

Return type:

tuple

benchbench.measures.ordinal¶

benchbench.measures.ordinal.appr_rank_diff(new_win_rate, inv_indices, orig_rank)[source]¶

Approximate the rank difference between the original win rate and the new win rate.

Parameters:

new_win_rate (np.array) – win rate for all models
inv_indices (list) – invaraint indices
orig_rank (np.array) – original win rate for only models in inv_indices

Returns:

approximated loss

Return type:

torch.Tensor

benchbench.measures.ordinal.get_diversity(data, cols)[source]¶

Calculate the diversity for a given benchmark.

Parameters:

data (pd.DataFrame) – each row represents a model, each column represents a task
cols (list) – the column names of the tasks

Returns:

(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks

Return type:

tuple

benchbench.measures.ordinal.get_selected_win_rate(win_rate_matrix, w, inv_indices, do_sample=True)[source]¶

Get the win rate for the selected indices.

Parameters:

win_rate_matrix (torch.Tensor) – i-th row and j-th column represents the win rate of i-th model over j-th model
w (torch.Tensor) – unnormalized normalized probability for each model to be selected
inv_indices (list) – indices for L
do_sample (bool) – select models based on sampling or not

Returns:

torch.Tensor: new_win_rate np.array: new_indices

Return type:

tuple

benchbench.measures.ordinal.get_sensitivity(data, cols, inv_indices=None, lr=0.01, num_step=1000, return_indices=False)[source]¶

Calculate the sensitivity for a given benchmark.

Parameters:

data (pd.DataFrame) – each row represents a model, each column represents a task
cols (list) – the column names of the tasks
inv_indices (list) – indices for L, the rest will be used as L^C
lr (float) – learning rate for optimization
num_step (int) – number of steps for optimization
return_indices (bool) – whether return the indices of selected irrelevant models

Returns:

((tau, MRC), indices) if return_indices is True, else (tau, MRC)

Return type:

tuple

Measures¶

benchbench.measures.cardinal¶

benchbench.measures.ordinal¶

BenchBench

Navigation

Related Topics