Postprocessing frontier (Pareto curve)

Use error_parity.pareto_curve.compute_postprocessing_curve() to compute the fairness–performance frontier across tolerances.

API

error_parity.pareto_curve.compute_postprocessing_curve(model, fit_data, eval_data, fairness_constraint='equalized_odds', l_p_norm=inf, bootstrap=True, tolerance_ticks=array([0., 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), tolerance_tick_step=None, predict_method='predict_proba', n_jobs=None, **kwargs)[source]

Computes the fairness and performance of the given classifier after adjusting (postprocessing) for varying levels of fairness tolerance.

Parameters:
  • model (object) – The model to use.

  • fit_data (tuple) – Data triplet to use to fit postprocessing intervention, (X, Y, S), respectively containing the features, labels, and sensitive attribute.

  • eval_data (tuple or dict[tuple]) – Data triplet to use to evaluate postprocessing intervention on (same format as fit_data), or a dictionary of <data_name>-><data_triplet> containing multiple datasets to evaluate on.

  • fairness_constraint (str, optional) – The fairness constraint to use , by default “equalized_odds”.

  • l_p_norm (int, optional) – The norm to use when computing the fairness constraint, by default np.inf. Note: only compatible with the “equalized odds” constraint.

  • bootstrap (bool, optional) – Whether to compute uncertainty estimates via bootstrapping, by default False.

  • tolerance_ticks (list, optional) – List of constraint tolerances to use when computing adjustment curve. By default will use higher granularity/precision for lower levels of disparity, and lower granularity for higher levels of disparity. Should correspond to a sorted list of values between 0 and 1. Will be ignored if tolerance_tick_step is provided.

  • tolerance_tick_step (float, optional) – Distance between constraint tolerances in the adjustment curve. Will override tolerance_ticks if provided!

  • predict_method (str, optional) – Which method to call to obtain predictions out of the given model. Use predict_method=”__call__” for a callable predictor, or the default predict_method=”predict_proba” for a predictor with sklearn interface.

  • n_jobs (int, optional) – Number of parallel jobs to use, if omitted will use os.cpu_count()-1.

Returns:

postproc_results_df – A DataFrame containing the results, one row per tolerance tick.

Return type:

pd.DataFrame

Interpretation