Measures

benchbench.measures.cardinal

benchbench.measures.cardinal.appr_rank_diff(score, old_rank, use_weighted_loss=False)[source]

Approximate the rank difference between the old rank and the new rank.

Parameters:
  • score (np.array) – Scores for all models across all tasks.

  • old_rank (np.array) – Original rank.

  • use_weighted_loss (bool) – Whether use weighted loss.

Returns:

The loss.

Return type:

torch.Tensor

benchbench.measures.cardinal.get_diversity(data, cols)[source]

Calculate the diversity for a given benchmark.

Parameters:
  • data (pd.DataFrame) – Each row represents a model, each column represents a task.

  • cols (list) – The column names of the tasks.

Returns:

(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks.

Return type:

tuple

benchbench.measures.cardinal.get_sensitivity(data, cols, min_value=0.01, lr=1.0, num_steps=1000, stop_threshold=1e-05, normalize_epsilon=True, use_weighted_loss=None, return_weight=False, verbose=False)[source]

Calculate the sensitivity for a given benchmark.

Parameters:
  • data (pd.DataFrame) – Each row represents a model, each column represents a task.

  • cols (list) – The column names of the tasks.

  • min_value (float) – Min values for epsilon.

  • lr (float) – Learning rate for optimization.

  • num_steps (int) – Number of steps for optimization.

  • stop_threshold (float) – Stop if the loss change is smaller than this value.

  • normalize_epsilon (bool) – Whether normalize epsilon by std.

  • use_weighted_loss (bool) – Whether use weighted approximation loss, if None, use both and return the better one.

  • return_weight (bool) – Whether return alpha.

  • verbose (bool) – Whether output logs.

Returns:

If return_weight is True, return ((tau, MRC), alpha); else return (tau, MRC).

Return type:

tuple

benchbench.measures.ordinal

benchbench.measures.ordinal.appr_rank_diff(new_win_rate, inv_indices, orig_rank)[source]

Approximate the rank difference between the original win rate and the new win rate.

Parameters:
  • new_win_rate (np.array) – win rate for all models

  • inv_indices (list) – invaraint indices

  • orig_rank (np.array) – original win rate for only models in inv_indices

Returns:

approximated loss

Return type:

torch.Tensor

benchbench.measures.ordinal.get_diversity(data, cols)[source]

Calculate the diversity for a given benchmark.

Parameters:
  • data (pd.DataFrame) – each row represents a model, each column represents a task

  • cols (list) – the column names of the tasks

Returns:

(W, max_MRC), where max_MRC refers to max MRC over every pair of tasks

Return type:

tuple

benchbench.measures.ordinal.get_selected_win_rate(win_rate_matrix, w, inv_indices, do_sample=True)[source]

Get the win rate for the selected indices.

Parameters:
  • win_rate_matrix (torch.Tensor) – i-th row and j-th column represents the win rate of i-th model over j-th model

  • w (torch.Tensor) – unnormalized normalized probability for each model to be selected

  • inv_indices (list) – indices for L

  • do_sample (bool) – select models based on sampling or not

Returns:

torch.Tensor: new_win_rate np.array: new_indices

Return type:

tuple

benchbench.measures.ordinal.get_sensitivity(data, cols, inv_indices=None, lr=0.01, num_step=1000, return_indices=False)[source]

Calculate the sensitivity for a given benchmark.

Parameters:
  • data (pd.DataFrame) – each row represents a model, each column represents a task

  • cols (list) – the column names of the tasks

  • inv_indices (list) – indices for L, the rest will be used as L^C

  • lr (float) – learning rate for optimization

  • num_step (int) – number of steps for optimization

  • return_indices (bool) – whether return the indices of selected irrelevant models

Returns:

((tau, MRC), indices) if return_indices is True, else (tau, MRC)

Return type:

tuple