Postprocessing frontier (Pareto curve)
Use error_parity.pareto_curve.compute_postprocessing_curve()
to compute the fairness–performance frontier across tolerances.
API
- error_parity.pareto_curve.compute_postprocessing_curve(model, fit_data, eval_data, fairness_constraint='equalized_odds', l_p_norm=inf, bootstrap=True, tolerance_ticks=array([0., 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), tolerance_tick_step=None, predict_method='predict_proba', n_jobs=None, **kwargs)[source]
Computes the fairness and performance of the given classifier after adjusting (postprocessing) for varying levels of fairness tolerance.
- Parameters:
model (object) – The model to use.
fit_data (tuple) – Data triplet to use to fit postprocessing intervention, (X, Y, S), respectively containing the features, labels, and sensitive attribute.
eval_data (tuple or dict[tuple]) – Data triplet to use to evaluate postprocessing intervention on (same format as fit_data), or a dictionary of <data_name>-><data_triplet> containing multiple datasets to evaluate on.
fairness_constraint (str, optional) – The fairness constraint to use , by default “equalized_odds”.
l_p_norm (int, optional) – The norm to use when computing the fairness constraint, by default np.inf. Note: only compatible with the “equalized odds” constraint.
bootstrap (bool, optional) – Whether to compute uncertainty estimates via bootstrapping, by default False.
tolerance_ticks (list, optional) – List of constraint tolerances to use when computing adjustment curve. By default will use higher granularity/precision for lower levels of disparity, and lower granularity for higher levels of disparity. Should correspond to a sorted list of values between 0 and 1. Will be ignored if tolerance_tick_step is provided.
tolerance_tick_step (float, optional) – Distance between constraint tolerances in the adjustment curve. Will override tolerance_ticks if provided!
predict_method (str, optional) – Which method to call to obtain predictions out of the given model. Use predict_method=”__call__” for a callable predictor, or the default predict_method=”predict_proba” for a predictor with sklearn interface.
n_jobs (int, optional) – Number of parallel jobs to use, if omitted will use os.cpu_count()-1.
- Returns:
postproc_results_df – A DataFrame containing the results, one row per tolerance tick.
- Return type:
pd.DataFrame
Interpretation
The returned
pandas.DataFrame
has one row per tolerance tick and columns for each metric and dataset split.Use
error_parity.plotting.plot_postprocessing_frontier()
to visualize the envelope of the frontier and optional bootstrap confidence intervals.