Using the RelaxedThresholdOptimizer
The error_parity.threshold_optimizer.RelaxedThresholdOptimizer
wraps a score-based predictor and realizes a fairness-constrained classifier.
Constructor
- class error_parity.threshold_optimizer.RelaxedThresholdOptimizer(*, predictor, constraint='equalized_odds', tolerance=0.0, false_pos_cost=1.0, false_neg_cost=1.0, l_p_norm=inf, max_roc_ticks=1000, seed=42)[source]
Bases:
Classifier
Class to encapsulate all the logic needed to compute the optimal equal odds classifier (with possibly relaxed constraints).
Initializes the relaxed equal odds wrapper.
- Parameters:
predictor (callable[(np.ndarray), float]) – A trained score predictor that takes in samples, X, in shape (num_samples, num_features), and outputs real-valued scores, R, in shape (num_samples,).
constraint (str) – The fairness constraint to use. By default “equalized_odds”.
tolerance (float) – The absolute tolerance for the equal odds fairness constraint. Will allow for tolerance difference between group-wise ROC points.
false_pos_cost (float, optional) – The cost of a FALSE POSITIVE error, by default 1.0.
false_neg_cost (float, optional) – The cost of a FALSE NEGATIVE error, by default 1.0.
l_p_norm (int, optional) – The l-p norm to use when computing distances between group ROC points. Used only for the “equalized odds” constraint (different l-p norms lead to different equalized-odds relaxations). By default np.inf, which corresponds to the l-inf norm.
max_roc_ticks (int, optional) – The maximum number of ticks (points) in each group’s ROC, when computing the optimal fair classifier, by default 1000.
seed (int) – A random seed used for reproducibility when producing randomized classifiers.
- constraint_violation(constraint_name=None, l_p_norm=None)[source]
Theoretical constraint violation of the LP solution found.
- Parameters:
- Returns:
The fairness constraint violation.
- Return type:
- cost(false_pos_cost=None, false_neg_cost=None)[source]
Computes the theoretical cost of the solution found.
NOTE: use false_pos_cost==false_neg_cost==1 for the 0-1 loss (the standard error rate), which is equal to 1 - accuracy.
- Parameters:
- Returns:
The cost of the solution found.
- Return type:
- demographic_parity_violation()[source]
Computes the theoretical violation of the demographic parity constraint.
That is, the maximum distance between groups’ PPR (positive prediction rate).
- Returns:
The demographic parity constraint violation.
- Return type:
- equalized_odds_violation(l_p_norm=None)[source]
Computes the theoretical violation of the equal odds constraint (i.e., the maximum l-inf distance between the ROC point of any pair of groups).
- error_rate_parity_constraint_violation(error_type)[source]
Computes the theoretical violation of an error-rate parity constraint.
- fit(X, y, *, group, y_scores=None)[source]
Fit this predictor to achieve the (possibly relaxed) equal odds constraint on the provided data.
- Parameters:
X (np.ndarray) – The input features.
y (np.ndarray) – The input labels.
group (np.ndarray) – The group membership of each sample. Assumes groups are numbered [0, 1, …, num_groups-1].
y_scores (np.ndarray, optional) – The pre-computed model predictions on this data.
- Returns:
Returns self.
- Return type:
callable
- property groupwise_roc_data: dict
Group-specific ROC data containing (FPR, TPR, threshold) triplets.
Tips
Ensure
group
values are integers0..G-1
.If your model returns a 2-D array of probabilities, the optimizer will use the last column (
[:, -1]
).Control the solution search resolution with
max_roc_ticks
if your ROC arrays are large.Use
l_p_norm
withconstraint="equalized_odds"
to pick (ell_1), (ell_2), or (ell_infty).Use
false_pos_cost
andfalse_neg_cost
to reflect asymmetric error costs; the methodcost
reports the theoretical cost at the global solution point.