Measures¶
benchbench.measures.cardinal¶
- benchbench.measures.cardinal.appr_rank_diff(score, old_rank, use_weighted_loss=False)[source]¶
Approximate the rank difference between the old rank and the new rank.
- Parameters:
score (np.array) – Scores for all models across all tasks.
old_rank (np.array) – Original rank.
use_weighted_loss (bool) – Whether use weighted loss.
- Returns:
The loss.
- Return type:
torch.Tensor
- benchbench.measures.cardinal.get_diversity(data, cols)[source]¶
Calculate the diversity for a given benchmark.
- Parameters:
data (pd.DataFrame) – Each row represents a model, each column represents a task.
cols (list) – The column names of the tasks.
- Returns:
(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks.
- Return type:
tuple
- benchbench.measures.cardinal.get_sensitivity(data, cols, min_value=0.01, lr=1.0, num_steps=1000, stop_threshold=1e-05, normalize_epsilon=True, use_weighted_loss=None, return_weight=False, verbose=False)[source]¶
Calculate the sensitivity for a given benchmark.
- Parameters:
data (pd.DataFrame) – Each row represents a model, each column represents a task.
cols (list) – The column names of the tasks.
min_value (float) – Min values for epsilon.
lr (float) – Learning rate for optimization.
num_steps (int) – Number of steps for optimization.
stop_threshold (float) – Stop if the loss change is smaller than this value.
normalize_epsilon (bool) – Whether normalize epsilon by std.
use_weighted_loss (bool) – Whether use weighted approximation loss, if None, use both and return the better one.
return_weight (bool) – Whether return alpha.
verbose (bool) – Whether output logs.
- Returns:
If return_weight is True, return ((tau, MRC), alpha); else return (tau, MRC).
- Return type:
tuple
benchbench.measures.ordinal¶
- benchbench.measures.ordinal.appr_rank_diff(new_win_rate, inv_indices, orig_rank)[source]¶
Approximate the rank difference between the original win rate and the new win rate.
- Parameters:
new_win_rate (np.array) – win rate for all models
inv_indices (list) – invaraint indices
orig_rank (np.array) – original win rate for only models in inv_indices
- Returns:
approximated loss
- Return type:
torch.Tensor
- benchbench.measures.ordinal.get_diversity(data, cols)[source]¶
Calculate the diversity for a given benchmark.
- Parameters:
data (pd.DataFrame) – each row represents a model, each column represents a task
cols (list) – the column names of the tasks
- Returns:
(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks
- Return type:
tuple
- benchbench.measures.ordinal.get_selected_win_rate(win_rate_matrix, w, inv_indices, do_sample=True)[source]¶
Get the win rate for the selected indices.
- Parameters:
win_rate_matrix (torch.Tensor) – i-th row and j-th column represents the win rate of i-th model over j-th model
w (torch.Tensor) – unnormalized normalized probability for each model to be selected
inv_indices (list) – indices for L
do_sample (bool) – select models based on sampling or not
- Returns:
torch.Tensor: new_win_rate np.array: new_indices
- Return type:
tuple
- benchbench.measures.ordinal.get_sensitivity(data, cols, inv_indices=None, lr=0.01, num_step=1000, return_indices=False)[source]¶
Calculate the sensitivity for a given benchmark.
- Parameters:
data (pd.DataFrame) – each row represents a model, each column represents a task
cols (list) – the column names of the tasks
inv_indices (list) – indices for L, the rest will be used as L^C
lr (float) – learning rate for optimization
num_step (int) – number of steps for optimization
return_indices (bool) – whether return the indices of selected irrelevant models
- Returns:
((tau, MRC), indices) if return_indices is True, else (tau, MRC)
- Return type:
tuple