Evaluation and metrics
This page covers performance and fairness evaluation helpers and their bootstrap variants.
Performance metrics
- error_parity.evaluation.evaluate_performance(y_true, y_pred)[source]
Evaluates the provided predictions on common performance metrics.
- Parameters:
y_true (np.ndarray) – The true class labels.
y_pred (np.ndarray) – The discretized predictions.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
Fairness metrics
- error_parity.evaluation.evaluate_fairness(y_true, y_pred, sensitive_attribute, return_groupwise_metrics=False)[source]
Evaluates fairness as the ratios between group-wise performance metrics.
- Parameters:
y_true (np.ndarray) – The true class labels.
y_pred (np.ndarray) – The discretized predictions.
sensitive_attribute (np.ndarray) – The sensitive attribute (protected group membership).
return_groupwise_metrics (Optional[bool], optional) – Whether to return group-wise performance metrics (bool: True) or only the ratios between these metrics (bool: False), by default False.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
End-to-end evaluation
- error_parity.evaluation.evaluate_predictions(y_true, y_pred_scores, sensitive_attribute=None, return_groupwise_metrics=False, **threshold_target)[source]
Evaluates the given predictions on both performance and fairness metrics.
Will only evaluate fairness if sensitive_attribute is provided.
Note
The value of log_loss may be inaccurate when using scikit-learn<1.2.
- Parameters:
y_true (np.ndarray) – The true labels.
y_pred_scores (np.ndarray) – The predicted scores.
sensitive_attribute (np.ndarray, optional) – The sensitive attribute - which protected group each sample belongs to. If not provided, will not compute fairness metrics.
return_groupwise_metrics (bool) – Whether to return groupwise performance metrics (requires providing sensitive_attribute).
- Returns:
A dictionary of (key, value) -> (metric_name, metric_value).
- Return type:
Bootstrap estimates
- error_parity.evaluation.evaluate_predictions_bootstrap(y_true, y_pred_scores, sensitive_attribute, k=200, confidence_pct=95, seed=42, **threshold_target)[source]
Computes bootstrap estimates of several metrics for the given predictions.
- Parameters:
y_true (np.ndarray) – The true labels.
y_pred_scores (np.ndarray) – The score predictions.
sensitive_attribute (np.ndarray) – The sensitive attribute data.
k (int, optional) – How many bootstrap samples to draw, by default 200.
confidence_pct (float, optional) – How large of a confidence interval to use when reporting lower and upper bounds, by default 95 (i.e., 2.5 to 97.5 percentile of results).
seed (int, optional) – The random seed, by default 42.
- Returns:
A dictionary of results
- Return type: