error_parity
package
error_parity.threshold_optimizer module
Solver for the relaxed equal odds problem.
- class error_parity.threshold_optimizer.RelaxedThresholdOptimizer(*, predictor, constraint='equalized_odds', tolerance=0.0, false_pos_cost=1.0, false_neg_cost=1.0, l_p_norm=inf, max_roc_ticks=1000, seed=42)[source]
Bases:
Classifier
Class to encapsulate all the logic needed to compute the optimal equal odds classifier (with possibly relaxed constraints).
Initializes the relaxed equal odds wrapper.
- Parameters:
predictor (callable[(np.ndarray), float]) – A trained score predictor that takes in samples, X, in shape (num_samples, num_features), and outputs real-valued scores, R, in shape (num_samples,).
constraint (str) – The fairness constraint to use. By default “equalized_odds”.
tolerance (float) – The absolute tolerance for the equal odds fairness constraint. Will allow for tolerance difference between group-wise ROC points.
false_pos_cost (float, optional) – The cost of a FALSE POSITIVE error, by default 1.0.
false_neg_cost (float, optional) – The cost of a FALSE NEGATIVE error, by default 1.0.
l_p_norm (int, optional) – The l-p norm to use when computing distances between group ROC points. Used only for the “equalized odds” constraint (different l-p norms lead to different equalized-odds relaxations). By default np.inf, which corresponds to the l-inf norm.
max_roc_ticks (int, optional) – The maximum number of ticks (points) in each group’s ROC, when computing the optimal fair classifier, by default 1000.
seed (int) – A random seed used for reproducibility when producing randomized classifiers.
- constraint_violation(constraint_name=None, l_p_norm=None)[source]
Theoretical constraint violation of the LP solution found.
- Parameters:
constraint_name (str, optional) – Optionally, may provide another constraint name that will be used instead of this classifier’s self.constraint;
l_p_norm (int, optional) – Which l-p norm to use when computing distances between group ROC points. Used only for the “equalized odds” constraint.
- Returns:
The fairness constraint violation.
- Return type:
float
- cost(false_pos_cost=None, false_neg_cost=None)[source]
Computes the theoretical cost of the solution found.
NOTE: use false_pos_cost==false_neg_cost==1 for the 0-1 loss (the standard error rate), which is equal to 1 - accuracy.
- Parameters:
false_pos_cost (float, optional) – The cost of a FALSE POSITIVE error, by default will take the value given in the object’s constructor.
false_neg_cost (float, optional) – The cost of a FALSE NEGATIVE error, by default will take the value given in the object’s constructor.
- Returns:
The cost of the solution found.
- Return type:
float
- demographic_parity_violation()[source]
Computes the theoretical violation of the demographic parity constraint.
That is, the maximum distance between groups’ PPR (positive prediction rate).
- Returns:
The demographic parity constraint violation.
- Return type:
float
- equalized_odds_violation(l_p_norm=None)[source]
Computes the theoretical violation of the equal odds constraint (i.e., the maximum l-inf distance between the ROC point of any pair of groups).
- Parameters:
l_p_norm (int, optional) – Which l-p norm to use when computing distances between group ROC points.
- Returns:
The equal-odds constraint violation.
- Return type:
float
- error_rate_parity_constraint_violation(error_type)[source]
Computes the theoretical violation of an error-rate parity constraint.
- Parameters:
error_type (str) –
- One of the following values:
”fp”, for false positive errors (FPR or TNR parity); “fn”, for false negative errors (TPR or FNR parity).
- Returns:
The maximum constraint violation among all groups.
- Return type:
float
- fit(X, y, *, group, y_scores=None)[source]
Fit this predictor to achieve the (possibly relaxed) equal odds constraint on the provided data.
- Parameters:
X (np.ndarray) – The input features.
y (np.ndarray) – The input labels.
group (np.ndarray) – The group membership of each sample. Assumes groups are numbered [0, 1, …, num_groups-1].
y_scores (np.ndarray, optional) – The pre-computed model predictions on this data.
- Returns:
Returns self.
- Return type:
callable
- property global_prevalence: ndarray
Global prevalence, i.e., P(Y=1).
- property global_roc_point: ndarray
Global ROC point achieved by solution.
- property groupwise_prevalence: ndarray
Group-specific prevalence, i.e., P(Y=1|A=a)
- property groupwise_roc_data: dict
Group-specific ROC data containing (FPR, TPR, threshold) triplets.
- property groupwise_roc_hulls: dict
Group-specific ROC convex hulls achieved by underlying predictor.
- property groupwise_roc_points: ndarray
Group-specific ROC points achieved by solution.
error_parity.pareto_curve module
Utils for computing the fairness-accuracy Pareto frontier of a classifier.
- error_parity.pareto_curve.compute_inner_and_outer_adjustment_ci(postproc_results_df, perf_metric, disp_metric, data_type='test', constant_clf_perf=None)[source]
Computes the interior/inner and exterior/outer adjustment curves, corresponding to the confidence intervals (by default 95% c.i.).
- Returns:
postproc_results_df – A tuple containing (xticks, inner_yticks, outer_yticks).
- Return type:
tuple[np.array, np.array, np.array]
- error_parity.pareto_curve.compute_postprocessing_curve(model, fit_data, eval_data, fairness_constraint='equalized_odds', l_p_norm=inf, bootstrap=True, tolerance_ticks=array([0., 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), tolerance_tick_step=None, predict_method='predict_proba', n_jobs=None, **kwargs)[source]
Computes the fairness and performance of the given classifier after adjusting (postprocessing) for varying levels of fairness tolerance.
- Parameters:
model (object) – The model to use.
fit_data (tuple) – Data triplet to use to fit postprocessing intervention, (X, Y, S), respectively containing the features, labels, and sensitive attribute.
eval_data (tuple or dict[tuple]) – Data triplet to use to evaluate postprocessing intervention on (same format as fit_data), or a dictionary of <data_name>-><data_triplet> containing multiple datasets to evaluate on.
fairness_constraint (str, optional) – The fairness constraint to use , by default “equalized_odds”.
l_p_norm (int, optional) – The norm to use when computing the fairness constraint, by default np.inf. Note: only compatible with the “equalized odds” constraint.
bootstrap (bool, optional) – Whether to compute uncertainty estimates via bootstrapping, by default False.
tolerance_ticks (list, optional) – List of constraint tolerances to use when computing adjustment curve. By default will use higher granularity/precision for lower levels of disparity, and lower granularity for higher levels of disparity. Should correspond to a sorted list of values between 0 and 1. Will be ignored if tolerance_tick_step is provided.
tolerance_tick_step (float, optional) – Distance between constraint tolerances in the adjustment curve. Will override tolerance_ticks if provided!
predict_method (str, optional) – Which method to call to obtain predictions out of the given model. Use predict_method=”__call__” for a callable predictor, or the default predict_method=”predict_proba” for a predictor with sklearn interface.
n_jobs (int, optional) – Number of parallel jobs to use, if omitted will use os.cpu_count()-1.
- Returns:
postproc_results_df – A DataFrame containing the results, one row per tolerance tick.
- Return type:
pd.DataFrame
- error_parity.pareto_curve.fit_and_evaluate_postprocessing(postproc_template, tolerance, fit_data, eval_data, seed=42, y_fit_pred_scores=None, bootstrap=True, **bootstrap_kwargs)[source]
Fit and evaluate a postprocessing intervention on the given predictor.
- Parameters:
postproc_template (RelaxedThresholdOptimizer) – An object that serves as the template to copy when creating the postprocessing optimizer.
tolerance (float) – The tolerance (or slack) for fairness constraint fulfillment. This value will override the tolerance attribute of the postproc_template object.
fit_data (tuple) – The data used to fit postprocessing.
eval_data (tuple or dict[tuple]) – The data or sequence of data to evaluate postprocessing on. If a tuple is provided, will call it “eval” data in the returned results dictionary; if a dict is provided, will assume {<key_1>: <data_1>, …}.
seed (int, optional) – The random seed, by default 42
y_fit_pred_scores (np.ndarray, optional) – The pre-computed predicted scores for the fit_data; if provided, will avoid re-computing these predictions for each function call.
bootstrap (bool, optional) – Whether to use bootstrapping when computing metric results for postprocessing, by default True.
bootstrap_kwargs (dict, optional) – Any extra arguments to pass on to the bootstrapping function, by default None.
- Returns:
results – A dictionary of results, whose keys are the data type, and values the metric values obtained by postprocessing on that data type.
For example: >>> { >>> “validation”: {“accuracy”: 0.7, “…”: “…”}, >>> “test”: {“accuracy”: 0.65, “…”: “…”}, >>> }
- Return type:
dict[str, dict]
- error_parity.pareto_curve.get_envelope_of_postprocessing_frontier(postproc_results_df, perf_col='accuracy_mean_test', disp_col='equalized_odds_diff_mean_test', constant_clf_perf=0.5, constant_clf_disp=0.0)[source]
Computes points in envelope of the given postprocessing frontier results.
- Parameters:
postproc_results_df (pd.DataFrame) – The postprocessing frontier results DF.
perf_col (str, optional) – Name of the column containing performance results, by default “accuracy_mean_test”
disp_col (str, optional) – Name of column containing disparity results, by default “equalized_odds_diff_mean_test”
constant_clf_perf (float, optional) – The performance of a dummy constant classifier (in the same metric as perf_col), by default 0.5.
constant_clf_disp (float, optional) – The disparity of a dummy constant classifier (in the same metric as disp_col), by default 0.0; assumes a constant classifier fulfills fairness!
- Returns:
A 2-D array containing the points in the convex hull of the Pareto curve.
- Return type:
np.ndarray
error_parity.plotting module
Utils for plotting postprocessing frontier and postprocessing solution.
- error_parity.plotting.plot_postprocessing_frontier(postproc_results_df, *, perf_metric, disp_metric, show_data_type, constant_clf_perf, model_name=None, color='black')[source]
Helper to plot the given post-processing frontier results.
Will use bootstrapped results if available, including plotting confidence intervals.
- Parameters:
postproc_results_df (pd.DataFrame) – The DataFrame containing postprocessing results. This should be the output of a call to compute_postprocessing_curve(.).
perf_metric (str) – Which performance metric to plot (horizontal axis).
disp_metric (str) – Which disparity metric to plot (vertical axis).
show_data_type (str) – The type of data to show results for; usually this will be “test”.
constant_clf_perf (float) – Performance achieved by the constant classifier; this is the point of lowest performance and lowest disparity achievable by postprocessing.
model_name (str, optional) – Shown in the plot legend. Name of the model to be postprocessed.
color (str, optional) – Which color to use for plotting the postprocessing curve, by default “black”.
- error_parity.plotting.plot_postprocessing_solution(*, postprocessed_clf, plot_roc_curves=False, plot_roc_hulls=True, plot_group_optima=True, plot_group_triangulation=True, plot_global_optimum=True, plot_diagonal=True, plot_relaxation=False, group_name_map=None, figure=None, **fig_kwargs)[source]
Plots the group-specific solutions found for this predictor.
- Parameters:
postprocessed_clf (RelaxedThresholdOptimizer) – A postprocessed classifier already fitted on some data.
plot_roc_curves (bool, optional) – Whether to plot the global ROC curves, by default False.
plot_roc_hulls (bool, optional) – Whether to plot the global ROC convex hulls, by default True.
plot_group_optima (bool, optional) – Whether to plot the group-specific optima, by default True.
plot_group_triangulation (bool, optional) – Whether to plot the triangulation of a group-specific solution, when such triangulation is needed to achieve a target ROC point.
plot_global_optimum (bool, optional) – Whether to plot the global optimum ROC point, by default True.
plot_diagonal (bool, optional) – Whether to plot the ROC diagonal with FPR=TPR, by default True.
plot_relaxation (bool, optional) – Whether to plot the constraint relaxation bounding box, by default False.
group_name_map (dict, optional) – A dictionary mapping each group’s value to an appropriate name to show in the plot legend, by default None.
figure (matplotlib.figure.Figure, optional) – A matplotlib figure to use when plotting, by default will generate a new figure for plotting.
error_parity.evaluation module
A set of functions to evaluate predictions on common performance and fairness metrics, possibly at a specified FPR or FNR target.
Based on: https://github.com/AndreFCruz/hpt/blob/main/src/hpt/evaluation.py
- error_parity.evaluation.eval_accuracy_and_equalized_odds(y_true, y_pred_binary, sensitive_attr, l_p_norm=inf, display=False)[source]
Evaluate accuracy and equalized odds of the given predictions.
- Parameters:
y_true (np.ndarray) – The true class labels.
y_pred_binary (np.ndarray) – The predicted class labels.
sensitive_attr (np.ndarray) – The sensitive attribute data.
l_p_norm (int, optional) – The norm to use for the constraint violation, by default np.inf.
display (bool, optional) – Whether to print results or not, by default False.
- Returns:
A tuple of (fairness, equalized odds violation).
- Return type:
tuple[float, float]
- error_parity.evaluation.evaluate_fairness(y_true, y_pred, sensitive_attribute, return_groupwise_metrics=False)[source]
Evaluates fairness as the ratios between group-wise performance metrics.
- Parameters:
y_true (np.ndarray) – The true class labels.
y_pred (np.ndarray) – The discretized predictions.
sensitive_attribute (np.ndarray) – The sensitive attribute (protected group membership).
return_groupwise_metrics (Optional[bool], optional) – Whether to return group-wise performance metrics (bool: True) or only the ratios between these metrics (bool: False), by default False.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
dict
- error_parity.evaluation.evaluate_performance(y_true, y_pred)[source]
Evaluates the provided predictions on common performance metrics.
- Parameters:
y_true (np.ndarray) – The true class labels.
y_pred (np.ndarray) – The discretized predictions.
- Returns:
A dictionary with key-value pairs of (metric name, metric value).
- Return type:
dict
- error_parity.evaluation.evaluate_predictions(y_true, y_pred_scores, sensitive_attribute=None, return_groupwise_metrics=False, **threshold_target)[source]
Evaluates the given predictions on both performance and fairness metrics.
Will only evaluate fairness if sensitive_attribute is provided.
Note
The value of log_loss may be inaccurate when using scikit-learn<1.2.
- Parameters:
y_true (np.ndarray) – The true labels.
y_pred_scores (np.ndarray) – The predicted scores.
sensitive_attribute (np.ndarray, optional) – The sensitive attribute - which protected group each sample belongs to. If not provided, will not compute fairness metrics.
return_groupwise_metrics (bool) – Whether to return groupwise performance metrics (requires providing sensitive_attribute).
- Returns:
A dictionary of (key, value) -> (metric_name, metric_value).
- Return type:
dict
- error_parity.evaluation.evaluate_predictions_bootstrap(y_true, y_pred_scores, sensitive_attribute, k=200, confidence_pct=95, seed=42, **threshold_target)[source]
Computes bootstrap estimates of several metrics for the given predictions.
- Parameters:
y_true (np.ndarray) – The true labels.
y_pred_scores (np.ndarray) – The score predictions.
sensitive_attribute (np.ndarray) – The sensitive attribute data.
k (int, optional) – How many bootstrap samples to draw, by default 200.
confidence_pct (float, optional) – How large of a confidence interval to use when reporting lower and upper bounds, by default 95 (i.e., 2.5 to 97.5 percentile of results).
seed (int, optional) – The random seed, by default 42.
- Returns:
A dictionary of results
- Return type:
dict
error_parity.binarize module
Module to binarize continuous-score predictions.
Based on: https://github.com/AndreFCruz/hpt/blob/main/src/hpt/binarize.py
- error_parity.binarize.compute_binary_predictions(y_true, y_pred_scores, threshold=None, tpr=None, fpr=None, ppr=None, random_seed=42)[source]
Discretizes the given score predictions into binary labels.
If necessary, will randomly untie binary predictions with equal score.
- Parameters:
y_true (np.ndarray) – The true binary labels
y_pred_scores (np.ndarray) – Predictions as a continuous score between 0 and 1
threshold (Optional[float], optional) – Whether to use a specified (global) threshold, by default None
tpr (Optional[float], optional) – Whether to target a specified TPR (true positive rate, or recall), by default None
fpr (Optional[float], optional) – Whether to target a specified FPR (false positive rate), by default None
ppr (Optional[float], optional) – Whether to target a specified PPR (positive prediction rate), by default None
- Returns:
The binarized predictions according to the specified target.
- Return type:
np.ndarray
error_parity.classifiers module
Helper functions to construct and use randomized classifiers.
- class error_parity.classifiers.BinaryClassifier(score_predictor, threshold)[source]
Bases:
Classifier
Constructs a deterministic binary classifier, by thresholding a real-valued score predictor.
Constructs a deterministic binary classifier from the given real-valued score predictor and a threshold in {0, 1}.
- class error_parity.classifiers.BinaryClassifierAtROCDiagonal(target_fpr=None, target_tpr=None, seed=42)[source]
Bases:
Classifier
A dummy classifier whose predictions have no correlation with the input features, but achieves whichever target FPR or TPR you want (on ROC diag.)
- class error_parity.classifiers.EnsembleGroupwiseClassifiers(group_to_clf)[source]
Bases:
Classifier
Constructs a classifier from a set of group-specific classifiers.
Constructs a classifier from a set of group-specific classifiers.
Must be provided exactly one classifier per unique group value.
- Parameters:
group_to_clf (dict[int | str, callable]) – A mapping of group value to the classifier that should handle predictions for that specific group.
- class error_parity.classifiers.RandomizedClassifier(classifiers, probabilities, seed=42)[source]
Bases:
Classifier
Constructs a randomized classifier from the given classifiers and their probabilities.
Constructs a randomized classifier from the given classifiers and their probabilities.
This classifier will compute predictions for the whole input dataset at once, which will in general be faster for larger inputs (when compared to predicting each sample separately).
- Parameters:
classifiers (list[callable]) – A list of classifiers
probabilities (list[float]) – A list of probabilities for each given classifier, where probabilities[idx] is the probability of using the prediction from classifiers[idx].
seed (int, optional) – A random seed, by default 42.
- Returns:
The corresponding randomized classifier.
- Return type:
callable
- static construct_at_target_ROC(predictor, roc_curve_data, target_roc_point, seed=42)[source]
Constructs a randomized classifier in the interior of the convex hull of the classifier’s ROC curve, at a given target ROC point.
- Parameters:
predictor (callable) – A predictor that outputs real-valued scores in range [0; 1].
roc_curve_data (tuple[np.array...]) – The ROC curve of the given classifier, as a tuple of (FPR values; TPR values; threshold values).
target_roc_point (np.ndarray) – The target ROC point in (FPR, TPR).
- Returns:
rand_clf – A (randomized) binary classifier whose expected FPR and TPR corresponds to the given target ROC point.
- Return type:
callable
- static find_points_for_target_ROC(roc_curve_data, target_roc_point)[source]
Retrieves a set of realizable points (and respective weights) in the provided ROC curve that can be used to realize any target ROC in the interior of the ROC curve.
NOTE: this method is a bit redundant – has functionality in common with RandomizedClassifier.construct_at_target_ROC()
- static find_weights_given_two_points(point_A, point_B, target_point)[source]
Given two ROC points corresponding to existing binary classifiers, find the weights that result in a classifier whose ROC point is target_point.
May need to interpolate the two given points with a third point corresponding to a random classifier (random uniform distribution with different thresholds).
- Returns:
Returns a tuple of numpy arrays (Ws, Ps), such that Ws @ Ps == target_point. The 1st array, Ws, corresponds to the weights of each point in the 2nd array, Ps.
- Return type:
tuple[np.ndarray, np.ndarray]
error_parity.cvxpy_utils module
A set of helper functions for using cvxpy.
- error_parity.cvxpy_utils.compute_fair_optimum(*, fairness_constraint, tolerance, groupwise_roc_hulls, group_sizes_label_pos, group_sizes_label_neg, groupwise_prevalence, global_prevalence, false_positive_cost=1.0, false_negative_cost=1.0, l_p_norm=inf)[source]
Computes the solution to finding the optimal fair (equal odds) classifier.
Can relax the equal odds constraint by some given tolerance.
- Parameters:
fairness_constraint (str) –
The name of the fairness constraint under which the LP will be optimized. Possible inputs are:
- ’equalized_odds’
match true positive and false positive rates across groups
tolerance (float) – A value for the tolerance when enforcing the fairness constraint.
groupwise_roc_hulls (dict[int, np.ndarray]) – A dict mapping each group to the convex hull of the group’s ROC curve. The convex hull is an np.array of shape (n_points, 2), containing the points that form the convex hull of the ROC curve, sorted in COUNTER CLOCK-WISE order.
group_sizes_label_pos (np.ndarray) – The relative or absolute number of positive samples in each group.
group_sizes_label_neg (np.ndarray) – The relative or absolute number of negative samples in each group.
global_prevalence (float) – The global prevalence of positive samples.
false_positive_cost (float, optional) – The cost of a FALSE POSITIVE error, by default 1.
false_negative_cost (float, optional) – The cost of a FALSE NEGATIVE error, by default 1.
l_p_norm (int | str, optional) – The type of l-p norm to use when computing the distance between two ROC points. Used only for the “equalized_odds” constraint. By default uses np.inf (l-infinity distance): the maximum between groups’ TPR and FPR differences. Using l_p_norm=1 will correspond to the average_abs_odds_difference. See the following link for more information on this parameter: https://www.cvxpy.org/api_reference/cvxpy.atoms.other_atoms.html#norm
- Returns:
(groupwise_roc_points, global_roc_point) – A tuple pair, (<1>, <2>), containing: 1: an array with the group-wise ROC points for the solution. 2: an array with the single global ROC point for the solution.
- Return type:
tuple[np.ndarray, np.ndarray]
- error_parity.cvxpy_utils.compute_halfspace_inequality(p1, p2)[source]
- Computes the halfspace inequality defined by the vector p1->p2, such that
Ax + b <= 0, where A and b are extracted from the line that goes through p1->p2.
As such, the inequality enforces that points must lie on the LEFT of the line defined by the p1->p2 vector.
In other words, input points are assumed to be in COUNTER CLOCK-WISE order (right-hand rule).
- Parameters:
p1 (np.ndarray) – A point in the halfspace.
p2 (np.ndarray) – Another point in the halfspace.
- Returns:
Returns an array of size=(n_dims + 1), with format [A; b], representing the inequality Ax + b <= 0.
- Return type:
tuple[float, float, float]
- Raises:
RuntimeError – Thrown in case if inconsistent internal state variables.
- error_parity.cvxpy_utils.compute_line(p1, p2)[source]
Computes the slope and intercept of the line that passes through the two given points.
The intercept is the value at x=0! (or NaN for vertical lines)
For vertical lines just use the x-value of one of the points to find the intercept at y=0.
- Parameters:
p1 (np.ndarray) – A 2-D point.
p2 (np.ndarray) – A 2-D point.
- Returns:
A tuple pair with (slope, intercept) of the line that goes from p1 to p2.
- Return type:
tuple[float, float]
- Raises:
ValueError – Raised when input is invalid, e.g., when p1 == p2.
- error_parity.cvxpy_utils.make_cvxpy_halfspace_inequality(p1, p2, cvxpy_point)[source]
Creates a single cvxpy inequality constraint that enforces the given point, cvxpy_point, to lie on the left of the vector p1->p2.
Points must be sorted in counter clock-wise order!
- Parameters:
p1 (np.ndarray) – A point p1.
p2 (np.ndarray) – Another point p2.
cvxpy_point (Variable) – The cvxpy variable over which the constraint will be applied.
- Returns:
A linear inequality constraint of type Ax + b <= 0.
- Return type:
Expression
- error_parity.cvxpy_utils.make_cvxpy_point_in_polygon_constraints(polygon_vertices, cvxpy_point)[source]
Creates the set of cvxpy constraints that force the given cvxpy variable point to lie within the polygon defined by the given vertices.
- Parameters:
polygon_vertices (np.ndarray) – A sequence of points that make up a polygon. Points must be sorted in COUNTER CLOCK-WISE order! (right-hand rule)
cvxpy_point (cvxpy.Variable) – A cvxpy variable representing a point, over which the constraints will be applied.
- Returns:
A list of cvxpy constraints.
- Return type:
list[Expression]
error_parity.roc_utils module
Helper functions to solve the relaxed equal odds problem.
- error_parity.roc_utils.calc_cost_of_point(fpr, fnr, prevalence, false_pos_cost=1.0, false_neg_cost=1.0)[source]
Calculates the cost of the given ROC point.
- Parameters:
fpr (float) – The false positive rate (FPR).
fnr (float) – The false negative rate (FNR).
prevalence (float) – The prevalence of positive samples in the dataset, i.e., np.sum(y_true) / len(y_true)
false_pos_cost (float, optional) – The cost of a false positive error, by default 1.
false_neg_cost (float, optional) – The cost of a false negative error, by default 1.
- Returns:
cost – The cost of the given ROC point (divided by the size of the dataset).
- Return type:
float
- error_parity.roc_utils.compute_global_roc_from_groupwise(groupwise_roc_points, groupwise_label_pos_weight, groupwise_label_neg_weight)[source]
Computes the global ROC point that corresponds to the provided group-wise ROC points.
The global ROC is a linear combination of the group-wise points, with different weights for computing FPR and TPR – the first related to LNs, and the second to LPs.
- Parameters:
groupwise_roc_points (np.ndarray) – An array of shape (n_groups, n_roc_dims) containing one ROC point per group.
groupwise_label_pos_weight (np.ndarray) – The relative size of each group in terms of its label POSITIVE samples (out of all POSITIVE samples, how many are in each group).
groupwise_label_neg_weight (np.ndarray) – The relative size of each group in terms of its label NEGATIVE samples (out of all NEGATIVE samples, how many are in each group).
- Returns:
global_roc_point – A single point that corresponds to the global outcome of the given group-wise ROC points.
- Return type:
np.ndarray
- error_parity.roc_utils.compute_roc_point_from_predictions(y_true, y_pred_binary)[source]
Computes the ROC point associated with the provided binary predictions.
- Parameters:
y_true (np.ndarray) – The true labels.
y_pred_binary (np.ndarray) – The binary predictions.
- Returns:
The resulting ROC point, i.e., a tuple (FPR, TPR).
- Return type:
tuple[float, float]
- error_parity.roc_utils.roc_convex_hull(roc_points)[source]
Computes the convex hull of the provided ROC points.
- Parameters:
roc_points (np.ndarray) – An array of shape (n_points, n_dims) containing all points of a provided ROC curve.
- Returns:
hull_points – An array of shape (n_hull_points, n_dim) containing all points in the convex hull of the ROC curve.
- Return type:
np.ndarray