fairpy.model

Certifying and Removing Disparate Impact

Reference:: https://dl.acm.org/doi/pdf/10.1145/2783258.2783311
Code adopted from:: https://github.com/algofairness/BlackBoxAuditing

# TODO: bugs on data types in numpy

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.
s_idx_list of int: The index(es) of the sensitive attribute(s) in the data matrix used to repair data.
repair_feat_idx_list of int: The index(es) of the feature(s) to be repaired in the data matrix.
num_feat_idx_list of int: The index(es) of the numerical feature(s) in the data matrix.

Examples

>>> from fairpy.dataset import Adult
>>> from fairpy.model import DIRemover
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = DIRemover(s_idx=dataset.feat_idx.sen_idx, num_feat_idx=0)
>>> new_X_train = model.fit_transform(split_data.X_train)

Fit DIRemover and return repaired data matrix.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xarray-like of shape (n_samples, n_features): Repaired data matrix.

class fairpy.model.EqOddsCalib

Equality of Opportunity in Supervised Learning

Reference:: https://arxiv.org/pdf/1610.02413.pdf
Code adopted from:: https://github.com/gpleiss/equalized_odds_and_calibration

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.

Examples

>>> from fairpy.dataset import Adult
>>> from fairpy.model import EqOddsCalib
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> EqOddsCalib.fit(pred=pred, y=split_data.y_train, s=split_data.s_train)
>>> fair_pred = EqOddsCalib.transform(pred=pred, s=split_data.s_test)

static base_rate(label): Percentage of samples belonging to the positive class

Fit the model according to the given training data.

Parameters

predarray-like of shape (n_samples,): Predictions to be calibrated, where n_samples is the number of samples.
yarray-like of shape (n_samples,): Target vector relative to predictions.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

self: Fitted estimator.

static fn_cost(pred, label): Generalized false negative cost

static fnr(pred, label): False negative rate

static fp_cost(pred, label): Generalized false positive cost

static fpr(pred, label): False positive rate

static tnr(pred, label): True negative rate

static tpr(pred, label): True positive rate

Transform predictions to be fair.

Parameters

predarray-like of shape (n_samples,): Predictions to be calibrated, where n_samples is the number of samples.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

predndarray of shape (n_samples,): Debiased predictions.

class fairpy.model.FairCstr(cstr: str = 'fair', sep_cstr: bool = False, gamma: float = 0.5, max_iter: int = 100000)

Fairness Constraints: Mechanisms for Fair Classification

Currently do not have the functionality ‘sep_constraint’ in the original implementation.

Reference:: http://proceedings.mlr.press/v54/zafar17a/zafar17a.pdf
Code adopted from:: https://github.com/mbilalzafar/fair-classification

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.
coef_ndarray of shape (1, n_features): Coefficient of the features in the decision function.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import FairCstr
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = FairCstr()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,): Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target vector relative to X.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

self: Fitted estimator.

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,): Vector containing the class labels for each sample.

class fairpy.model.FairGLM(solver: str = 'CG', fit_intercept: bool = True, max_iter: int = 100, lam: float = 0.001, tol: float = 0.0001)

“Fair Generalized Linear Models with a Convex Penalty”

Reference:: https://proceedings.mlr.press/v162/do22a/do22a.pdf
Code adopted from:: https://github.com/hyungrok-do/fair-glm-cvx

# TODO; add support to multi-class and regression

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to the classifier.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.
coef_ndarray of shape (1, n_features): Coefficient of the features in the decision function.
intercept_ndarray of shape (1,): Intercept (a.k.a. bias) added to the decision function.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import FairGLM
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = FairGLM()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,): Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target vector relative to X.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

self: Fitted estimator.

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,): Vector containing the class labels for each sample.

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes): Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class fairpy.model.FairRank(K: int, P: float, alpha: float)

FA*IR: A Fair Top-k Ranking Algorithm

Reference:: https://dl.acm.org/doi/pdf/10.1145/3132847.3132938
Code adopted from:: https://github.com/fair-search/fairsearch-fair-python

Attributes

s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.

Examples

>>> from fairpy.model import FairRank
>>> model = FairRank(K=5, P=0.5, alpha=0.10)
>>> scores = [0.98, 0.97, 0.85, 0.84, 0.83, 0.55]
>>> s = ["male", "male", "male", "female", "female", "female"]
>>> fair_rank = model.transform(scores=scores, s=s)

Transform the ranking to be fair based on scores.

Parameters

scoresarray-like of shape (n_samples,): Scores for ranking samples, where n_samples is the number of samples.
sarray-like of shape (n_samples,): Sensitive attributes relative to scores.

Returns

rankarray-like of shape (n_samples,): Fair rank with indexes corresponding to scores.

class fairpy.model.IFair(s_idx: int | Sequence[int] | None = None, K: int = 2, max_iter: int = 200, restarts: int = 3, epsilon: float = 0.0001, w_recon: float = 1.0, w_fair: float = 1.0)

iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making

Reference:: https://arxiv.org/pdf/1806.01059.pdf
Code adopted from:: https://github.com/plahoti-lgtm/iFair

The time complexity is O(N^2) for every optimization iteration.

Attributes

n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.
s_idx_list of int: The index(es) of the sensitive attribute(s) in the data matrix.
opt_params_ndarray of shape (n_features * n_centroids + n_features): Solved coefficient in probabilistic clustering

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LabelBias
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = IFair()
>>> model.fit(split_data.X_train)
>>> fair_data = model.transform(split_data.X_train)

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

self: Fitted estimator.

Fit the estimator and transform data matrix to fair data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xndarray of shape (n_samples, n_features): Debiased training vector.

Transform data matrix to fair data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xndarray of shape (n_samples, n_features): Debiased training vector.

class fairpy.model.LabelBias(metric: str = 'dp', estimator: Any | None = None, max_iter: int = 100, tol: float = 0.001, lr: float = 1.0)

“Identifying and Correcting Label Bias in Machine Learning”: Adaptively learn the weights for sensitive groups by fitting the sub-estimator multiple times
Reference:: http://proceedings.mlr.press/v108/jiang20a/jiang20a.pdf
Code adopted from:: https://github.com/google-research/google-research/tree/master/label_bias

# TODO: add support for equal odds

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.
weights_ndarray of shape (n_sample,): Weights for training samples solved by LabelBias.
fitted_estimator_an object of the classifier: Fitted estimator.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LabelBias
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = LabelBias()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target vector relative to X.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

self: Fitted estimator.

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,): Vector containing the class labels for each sample.

predict_proba(X: ndarray) → Any

Probability estimates. Only available if ‘predict_proba’ is implemented in estimator.

Parameters

Xarray-like of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes): Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class fairpy.model.LinearFairERM(estimator: Any | None = None)

Empirical risk minimization under fairness constraints

Reference:: https://proceedings.neurips.cc/paper/2018/file/83cdcec08fbf90370fcf53bdd56604ff-Paper.pdf
Code adopted from:: https://github.com/jmikko/fair_ERM

Attributes

classes_ndarray of shape (n_classes,): A list of class labels known to the classifier.
s_classes_ndarray of shape (n_sensitive_group,): A list of sensitive classes known to LabelBias during training.
fitted_estimator_an object of the classifier: Fitted estimator.
n_features_in_int: Number of features seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LinearFairERM
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = LinearFairERM()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)

decision_function(X) → Any

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,): Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Feature transformation for fairness.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the transformations.

Returns

trans_Xndarray of shape (n_samples,): Vector containing the class labels for each sample.

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features): Training vector, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target vector relative to X.
sarray-like of shape (n_samples,): Sensitive attributes relative to X.

Returns

self: Fitted estimator.

predict(X) → Any

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,): Vector containing the class labels for each sample.

predict_log_proba(X) → Any

Predict logarithm of probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes): Returns the logarithm of probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

predict_proba(X) → Any

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features): Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes): Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.